| ====================================== |
| Syntax of AMDGPU Instruction Modifiers |
| ====================================== |
| |
| .. contents:: |
| :local: |
| |
| Conventions |
| =========== |
| |
| The following notation is used throughout this document: |
| |
| =================== ============================================================= |
| Notation Description |
| =================== ============================================================= |
| {0..N} Any integer value in the range from 0 to N (inclusive). |
| <x> Syntax and meaning of *x* is explained elsewhere. |
| =================== ============================================================= |
| |
| .. _amdgpu_syn_modifiers: |
| |
| Modifiers |
| ========= |
| |
| DS Modifiers |
| ------------ |
| |
| .. _amdgpu_synid_ds_offset8: |
| |
| offset8 |
| ~~~~~~~ |
| |
| Specifies an immediate unsigned 8-bit offset, in bytes. The default value is 0. |
| |
| Used with DS instructions which have 2 addresses. |
| |
| =================== ===================================================== |
| Syntax Description |
| =================== ===================================================== |
| offset:{0..0xFF} Specifies an unsigned 8-bit offset as a positive |
| :ref:`integer number <amdgpu_synid_integer_number>`. |
| =================== ===================================================== |
| |
| Examples: |
| |
| .. parsed-literal:: |
| |
| offset:255 |
| offset:0xff |
| |
| .. _amdgpu_synid_ds_offset16: |
| |
| offset16 |
| ~~~~~~~~ |
| |
| Specifies an immediate unsigned 16-bit offset, in bytes. The default value is 0. |
| |
| Used with DS instructions which have 1 address. |
| |
| ==================== ====================================================== |
| Syntax Description |
| ==================== ====================================================== |
| offset:{0..0xFFFF} Specifies an unsigned 16-bit offset as a positive |
| :ref:`integer number <amdgpu_synid_integer_number>`. |
| ==================== ====================================================== |
| |
| Examples: |
| |
| .. parsed-literal:: |
| |
| offset:65535 |
| offset:0xffff |
| |
| .. _amdgpu_synid_sw_offset16: |
| |
| pattern |
| ~~~~~~~ |
| |
| This is a special modifier which may be used with *ds_swizzle_b32* instruction only. |
| It specifies a swizzle pattern in numeric or symbolic form. The default value is 0. |
| |
| See AMD documentation for more information. |
| |
| ======================================================= =========================================================== |
| Syntax Description |
| ======================================================= =========================================================== |
| offset:{0..0xFFFF} Specifies a 16-bit swizzle pattern. |
| offset:swizzle(QUAD_PERM,{0..3},{0..3},{0..3},{0..3}) Specifies a quad permute mode pattern |
| |
| Each number is a lane *id*. |
| offset:swizzle(BITMASK_PERM, "<mask>") Specifies a bitmask permute mode pattern. |
| |
| The pattern converts a 5-bit lane *id* to another |
| lane *id* with which the lane interacts. |
| |
| *mask* is a 5 character sequence which |
| specifies how to transform the bits of the |
| lane *id*. |
| |
| The following characters are allowed: |
| |
| * "0" - set bit to 0. |
| |
| * "1" - set bit to 1. |
| |
| * "p" - preserve bit. |
| |
| * "i" - inverse bit. |
| |
| offset:swizzle(BROADCAST,{2..32},{0..N}) Specifies a broadcast mode. |
| |
| Broadcasts the value of any particular lane to |
| all lanes in its group. |
| |
| The first numeric parameter is a group |
| size and must be equal to 2, 4, 8, 16 or 32. |
| |
| The second numeric parameter is an index of the |
| lane being broadcasted. |
| |
| The index must not exceed group size. |
| offset:swizzle(SWAP,{1..16}) Specifies a swap mode. |
| |
| Swaps the neighboring groups of |
| 1, 2, 4, 8 or 16 lanes. |
| offset:swizzle(REVERSE,{2..32}) Specifies a reverse mode. |
| |
| Reverses the lanes for groups of 2, 4, 8, 16 or 32 lanes. |
| ======================================================= =========================================================== |
| |
| Numeric parameters may be specified as either :ref:`integer numbers<amdgpu_synid_integer_number>` or |
| :ref:`absolute expressions<amdgpu_synid_absolute_expression>`. |
| |
| Examples: |
| |
| .. parsed-literal:: |
| |
| offset:255 |
| offset:0xffff |
| offset:swizzle(QUAD_PERM, 0, 1, 2 ,3) |
| offset:swizzle(BITMASK_PERM, "01pi0") |
| offset:swizzle(BROADCAST, 2, 0) |
| offset:swizzle(SWAP, 8) |
| offset:swizzle(REVERSE, 30 + 2) |
| |
| .. _amdgpu_synid_gds: |
| |
| gds |
| ~~~ |
| |
| Specifies whether to use GDS or LDS memory (LDS is the default). |
| |
| ======================================== ================================================ |
| Syntax Description |
| ======================================== ================================================ |
| gds Use GDS memory. |
| ======================================== ================================================ |
| |
| |
| EXP Modifiers |
| ------------- |
| |
| .. _amdgpu_synid_done: |
| |
| done |
| ~~~~ |
| |
| Specifies if this is the last export from the shader to the target. By default, current |
| instruction does not finish an export sequence. |
| |
| ======================================== ================================================ |
| Syntax Description |
| ======================================== ================================================ |
| done Indicates the last export operation. |
| ======================================== ================================================ |
| |
| .. _amdgpu_synid_compr: |
| |
| compr |
| ~~~~~ |
| |
| Indicates if the data are compressed (data are not compressed by default). |
| |
| ======================================== ================================================ |
| Syntax Description |
| ======================================== ================================================ |
| compr Data are compressed. |
| ======================================== ================================================ |
| |
| .. _amdgpu_synid_vm: |
| |
| vm |
| ~~ |
| |
| Specifies valid mask flag state (off by default). |
| |
| ======================================== ================================================ |
| Syntax Description |
| ======================================== ================================================ |
| vm Set valid mask flag. |
| ======================================== ================================================ |
| |
| FLAT Modifiers |
| -------------- |
| |
| .. _amdgpu_synid_flat_offset12: |
| |
| offset12 |
| ~~~~~~~~ |
| |
| Specifies an immediate unsigned 12-bit offset, in bytes. The default value is 0. |
| |
| Cannot be used with *global/scratch* opcodes. GFX9 only. |
| |
| ================= ====================================================== |
| Syntax Description |
| ================= ====================================================== |
| offset:{0..4095} Specifies a 12-bit unsigned offset as a positive |
| :ref:`integer number <amdgpu_synid_integer_number>`. |
| ================= ====================================================== |
| |
| Examples: |
| |
| .. parsed-literal:: |
| |
| offset:4095 |
| offset:0xff |
| |
| .. _amdgpu_synid_flat_offset13s: |
| |
| offset13s |
| ~~~~~~~~~ |
| |
| Specifies an immediate signed 13-bit offset, in bytes. The default value is 0. |
| |
| Can be used with *global/scratch* opcodes only. GFX9 only. |
| |
| ============================ ======================================================= |
| Syntax Description |
| ============================ ======================================================= |
| offset:{-4096..4095} Specifies a 13-bit signed offset as an |
| :ref:`integer number <amdgpu_synid_integer_number>`. |
| ============================ ======================================================= |
| |
| Examples: |
| |
| .. parsed-literal:: |
| |
| offset:-4000 |
| offset:0x10 |
| |
| glc |
| ~~~ |
| |
| See a description :ref:`here<amdgpu_synid_glc>`. |
| |
| slc |
| ~~~ |
| |
| See a description :ref:`here<amdgpu_synid_slc>`. |
| |
| tfe |
| ~~~ |
| |
| See a description :ref:`here<amdgpu_synid_tfe>`. |
| |
| nv |
| ~~ |
| |
| See a description :ref:`here<amdgpu_synid_nv>`. |
| |
| MIMG Modifiers |
| -------------- |
| |
| .. _amdgpu_synid_dmask: |
| |
| dmask |
| ~~~~~ |
| |
| Specifies which channels (image components) are used by the operation. By default, no channels |
| are used. |
| |
| =============== ===================================================== |
| Syntax Description |
| =============== ===================================================== |
| dmask:{0..15} Specifies image channels as a positive |
| :ref:`integer number <amdgpu_synid_integer_number>`. |
| |
| Each bit corresponds to one of 4 image |
| components (RGBA). |
| |
| If the specified bit value |
| is 0, the component is not used, value 1 means |
| that the component is used. |
| =============== ===================================================== |
| |
| This modifier has some limitations depending on instruction kind: |
| |
| =================================================== ======================== |
| Instruction Kind Valid dmask Values |
| =================================================== ======================== |
| 32-bit atomic *cmpswap* 0x3 |
| 32-bit atomic instructions except for *cmpswap* 0x1 |
| 64-bit atomic *cmpswap* 0xF |
| 64-bit atomic instructions except for *cmpswap* 0x3 |
| *gather4* 0x1, 0x2, 0x4, 0x8 |
| Other instructions any value |
| =================================================== ======================== |
| |
| Examples: |
| |
| .. parsed-literal:: |
| |
| dmask:0xf |
| dmask:0b1111 |
| dmask:3 |
| |
| .. _amdgpu_synid_unorm: |
| |
| unorm |
| ~~~~~ |
| |
| Specifies whether the address is normalized or not (the address is normalized by default). |
| |
| ======================== ======================================== |
| Syntax Description |
| ======================== ======================================== |
| unorm Force the address to be unnormalized. |
| ======================== ======================================== |
| |
| glc |
| ~~~ |
| |
| See a description :ref:`here<amdgpu_synid_glc>`. |
| |
| slc |
| ~~~ |
| |
| See a description :ref:`here<amdgpu_synid_slc>`. |
| |
| .. _amdgpu_synid_r128: |
| |
| r128 |
| ~~~~ |
| |
| Specifies texture resource size. The default size is 256 bits. |
| |
| GFX7 and GFX8 only. |
| |
| =================== ================================================ |
| Syntax Description |
| =================== ================================================ |
| r128 Specifies 128 bits texture resource size. |
| =================== ================================================ |
| |
| .. WARNING:: Using this modifier should descrease *rsrc* operand size from 8 to 4 dwords, but assembler does not currently support this feature. |
| |
| tfe |
| ~~~ |
| |
| See a description :ref:`here<amdgpu_synid_tfe>`. |
| |
| .. _amdgpu_synid_lwe: |
| |
| lwe |
| ~~~ |
| |
| Specifies LOD warning status (LOD warning is disabled by default). |
| |
| ======================================== ================================================ |
| Syntax Description |
| ======================================== ================================================ |
| lwe Enables LOD warning. |
| ======================================== ================================================ |
| |
| .. _amdgpu_synid_da: |
| |
| da |
| ~~ |
| |
| Specifies if an array index must be sent to TA. By default, array index is not sent. |
| |
| ======================================== ================================================ |
| Syntax Description |
| ======================================== ================================================ |
| da Send an array-index to TA. |
| ======================================== ================================================ |
| |
| .. _amdgpu_synid_d16: |
| |
| d16 |
| ~~~ |
| |
| Specifies data size: 16 or 32 bits (32 bits by default). Not supported by GFX7. |
| |
| ======================================== ================================================ |
| Syntax Description |
| ======================================== ================================================ |
| d16 Enables 16-bits data mode. |
| |
| On loads, convert data in memory to 16-bit |
| format before storing it in VGPRs. |
| |
| For stores, convert 16-bit data in VGPRs to |
| 32 bits before going to memory. |
| |
| Note that GFX8.0 does not support data packing. |
| Each 16-bit data element occupies 1 VGPR. |
| |
| GFX8.1 and GFX9 support data packing. |
| Each pair of 16-bit data elements |
| occupies 1 VGPR. |
| ======================================== ================================================ |
| |
| .. _amdgpu_synid_a16: |
| |
| a16 |
| ~~~ |
| |
| Specifies size of image address components: 16 or 32 bits (32 bits by default). GFX9 only. |
| |
| ======================================== ================================================ |
| Syntax Description |
| ======================================== ================================================ |
| a16 Enables 16-bits image address components. |
| ======================================== ================================================ |
| |
| Miscellaneous Modifiers |
| ----------------------- |
| |
| .. _amdgpu_synid_glc: |
| |
| glc |
| ~~~ |
| |
| This modifier has different meaning for loads, stores, and atomic operations. |
| The default value is off (0). |
| |
| See AMD documentation for details. |
| |
| ======================================== ================================================ |
| Syntax Description |
| ======================================== ================================================ |
| glc Set glc bit to 1. |
| ======================================== ================================================ |
| |
| .. _amdgpu_synid_slc: |
| |
| slc |
| ~~~ |
| |
| Specifies cache policy. The default value is off (0). |
| |
| See AMD documentation for details. |
| |
| ======================================== ================================================ |
| Syntax Description |
| ======================================== ================================================ |
| slc Set slc bit to 1. |
| ======================================== ================================================ |
| |
| .. _amdgpu_synid_tfe: |
| |
| tfe |
| ~~~ |
| |
| Controls access to partially resident textures. The default value is off (0). |
| |
| See AMD documentation for details. |
| |
| ======================================== ================================================ |
| Syntax Description |
| ======================================== ================================================ |
| tfe Set tfe bit to 1. |
| ======================================== ================================================ |
| |
| .. _amdgpu_synid_nv: |
| |
| nv |
| ~~ |
| |
| Specifies if instruction is operating on non-volatile memory. By default, memory is volatile. |
| |
| GFX9 only. |
| |
| ======================================== ================================================ |
| Syntax Description |
| ======================================== ================================================ |
| nv Indicates that instruction operates on |
| non-volatile memory. |
| ======================================== ================================================ |
| |
| MUBUF/MTBUF Modifiers |
| --------------------- |
| |
| .. _amdgpu_synid_idxen: |
| |
| idxen |
| ~~~~~ |
| |
| Specifies whether address components include an index. By default, no components are used. |
| |
| Can be used together with :ref:`offen<amdgpu_synid_offen>`. |
| |
| Cannot be used with :ref:`addr64<amdgpu_synid_addr64>`. |
| |
| ======================================== ================================================ |
| Syntax Description |
| ======================================== ================================================ |
| idxen Address components include an index. |
| ======================================== ================================================ |
| |
| .. _amdgpu_synid_offen: |
| |
| offen |
| ~~~~~ |
| |
| Specifies whether address components include an offset. By default, no components are used. |
| |
| Can be used together with :ref:`idxen<amdgpu_synid_idxen>`. |
| |
| Cannot be used with :ref:`addr64<amdgpu_synid_addr64>`. |
| |
| ======================================== ================================================ |
| Syntax Description |
| ======================================== ================================================ |
| offen Address components include an offset. |
| ======================================== ================================================ |
| |
| .. _amdgpu_synid_addr64: |
| |
| addr64 |
| ~~~~~~ |
| |
| Specifies whether a 64-bit address is used. By default, no address is used. |
| |
| GFX7 only. Cannot be used with :ref:`offen<amdgpu_synid_offen>` and |
| :ref:`idxen<amdgpu_synid_idxen>` modifiers. |
| |
| ======================================== ================================================ |
| Syntax Description |
| ======================================== ================================================ |
| addr64 A 64-bit address is used. |
| ======================================== ================================================ |
| |
| .. _amdgpu_synid_buf_offset12: |
| |
| offset12 |
| ~~~~~~~~ |
| |
| Specifies an immediate unsigned 12-bit offset, in bytes. The default value is 0. |
| |
| =============================== ====================================================== |
| Syntax Description |
| =============================== ====================================================== |
| offset:{0..0xFFF} Specifies a 12-bit unsigned offset as a positive |
| :ref:`integer number <amdgpu_synid_integer_number>`. |
| =============================== ====================================================== |
| |
| Examples: |
| |
| .. parsed-literal:: |
| |
| offset:0 |
| offset:0x10 |
| |
| glc |
| ~~~ |
| |
| See a description :ref:`here<amdgpu_synid_glc>`. |
| |
| slc |
| ~~~ |
| |
| See a description :ref:`here<amdgpu_synid_slc>`. |
| |
| .. _amdgpu_synid_lds: |
| |
| lds |
| ~~~ |
| |
| Specifies where to store the result: VGPRs or LDS (VGPRs by default). |
| |
| ======================================== =========================== |
| Syntax Description |
| ======================================== =========================== |
| lds Store result in LDS. |
| ======================================== =========================== |
| |
| tfe |
| ~~~ |
| |
| See a description :ref:`here<amdgpu_synid_tfe>`. |
| |
| .. _amdgpu_synid_dfmt: |
| |
| dfmt |
| ~~~~ |
| |
| TBD |
| |
| .. _amdgpu_synid_nfmt: |
| |
| nfmt |
| ~~~~ |
| |
| TBD |
| |
| SMRD/SMEM Modifiers |
| ------------------- |
| |
| glc |
| ~~~ |
| |
| See a description :ref:`here<amdgpu_synid_glc>`. |
| |
| nv |
| ~~ |
| |
| See a description :ref:`here<amdgpu_synid_nv>`. |
| |
| VINTRP Modifiers |
| ---------------- |
| |
| .. _amdgpu_synid_high: |
| |
| high |
| ~~~~ |
| |
| Specifies which half of the LDS word to use. Low half of LDS word is used by default. |
| GFX9 only. |
| |
| ======================================== ================================ |
| Syntax Description |
| ======================================== ================================ |
| high Use high half of LDS word. |
| ======================================== ================================ |
| |
| VOP1/VOP2 DPP Modifiers |
| ----------------------- |
| |
| GFX8 and GFX9 only. |
| |
| .. _amdgpu_synid_dpp_ctrl: |
| |
| dpp_ctrl |
| ~~~~~~~~ |
| |
| Specifies how data are shared between threads. This is a mandatory modifier. |
| There is no default value. |
| |
| Note. The lanes of a wavefront are organized in four banks and four rows. |
| |
| ======================================== ================================================ |
| Syntax Description |
| ======================================== ================================================ |
| quad_perm:[{0..3},{0..3},{0..3},{0..3}] Full permute of 4 threads. |
| row_mirror Mirror threads within row. |
| row_half_mirror Mirror threads within 1/2 row (8 threads). |
| row_bcast:15 Broadcast 15th thread of each row to next row. |
| row_bcast:31 Broadcast thread 31 to rows 2 and 3. |
| wave_shl:1 Wavefront left shift by 1 thread. |
| wave_rol:1 Wavefront left rotate by 1 thread. |
| wave_shr:1 Wavefront right shift by 1 thread. |
| wave_ror:1 Wavefront right rotate by 1 thread. |
| row_shl:{1..15} Row shift left by 1-15 threads. |
| row_shr:{1..15} Row shift right by 1-15 threads. |
| row_ror:{1..15} Row rotate right by 1-15 threads. |
| ======================================== ================================================ |
| |
| Note: Numeric parameters may be specified as either |
| :ref:`integer numbers<amdgpu_synid_integer_number>` or |
| :ref:`absolute expressions<amdgpu_synid_absolute_expression>`. |
| |
| Examples: |
| |
| .. parsed-literal:: |
| |
| quad_perm:[0, 1, 2, 3] |
| row_shl:3 |
| |
| .. _amdgpu_synid_row_mask: |
| |
| row_mask |
| ~~~~~~~~ |
| |
| Controls which rows are enabled for data sharing. By default, all rows are enabled. |
| |
| Note. The lanes of a wavefront are organized in four banks and four rows. |
| |
| ======================================== ===================================================== |
| Syntax Description |
| ======================================== ===================================================== |
| row_mask:{0..15} Specifies a *row mask* as a positive |
| :ref:`integer number <amdgpu_synid_integer_number>`. |
| |
| Each of 4 bits in the mask controls one |
| row (0 - disabled, 1 - enabled). |
| ======================================== ===================================================== |
| |
| Examples: |
| |
| .. parsed-literal:: |
| |
| row_mask:0xf |
| row_mask:0b1010 |
| row_mask:0b1111 |
| |
| .. _amdgpu_synid_bank_mask: |
| |
| bank_mask |
| ~~~~~~~~~ |
| |
| Controls which banks are enabled for data sharing. By default, all banks are enabled. |
| |
| Note. The lanes of a wavefront are organized in four banks and four rows. |
| |
| ======================================== ======================================================= |
| Syntax Description |
| ======================================== ======================================================= |
| bank_mask:{0..15} Specifies a *bank mask* as a positive |
| :ref:`integer number <amdgpu_synid_integer_number>`. |
| |
| Each of 4 bits in the mask controls one |
| bank (0 - disabled, 1 - enabled). |
| ======================================== ======================================================= |
| |
| Examples: |
| |
| .. parsed-literal:: |
| |
| bank_mask:0x3 |
| bank_mask:0b0011 |
| bank_mask:0b1111 |
| |
| .. _amdgpu_synid_bound_ctrl: |
| |
| bound_ctrl |
| ~~~~~~~~~~ |
| |
| Controls data sharing when accessing an invalid lane. By default, data sharing with |
| invalid lanes is disabled. |
| |
| ======================================== ================================================ |
| Syntax Description |
| ======================================== ================================================ |
| bound_ctrl:0 Enables data sharing with invalid lanes. |
| |
| Accessing data from an invalid lane will |
| return zero. |
| ======================================== ================================================ |
| |
| VOP1/VOP2/VOPC SDWA Modifiers |
| ----------------------------- |
| |
| GFX8 and GFX9 only. |
| |
| clamp |
| ~~~~~ |
| |
| See a description :ref:`here<amdgpu_synid_clamp>`. |
| |
| omod |
| ~~~~ |
| |
| See a description :ref:`here<amdgpu_synid_omod>`. |
| |
| GFX9 only. |
| |
| .. _amdgpu_synid_dst_sel: |
| |
| dst_sel |
| ~~~~~~~ |
| |
| Selects which bits in the destination are affected. By default, all bits are affected. |
| |
| ======================================== ================================================ |
| Syntax Description |
| ======================================== ================================================ |
| dst_sel:DWORD Use bits 31:0. |
| dst_sel:BYTE_0 Use bits 7:0. |
| dst_sel:BYTE_1 Use bits 15:8. |
| dst_sel:BYTE_2 Use bits 23:16. |
| dst_sel:BYTE_3 Use bits 31:24. |
| dst_sel:WORD_0 Use bits 15:0. |
| dst_sel:WORD_1 Use bits 31:16. |
| ======================================== ================================================ |
| |
| |
| .. _amdgpu_synid_dst_unused: |
| |
| dst_unused |
| ~~~~~~~~~~ |
| |
| Controls what to do with the bits in the destination which are not selected |
| by :ref:`dst_sel<amdgpu_synid_dst_sel>`. |
| By default, unused bits are preserved. |
| |
| ======================================== ================================================ |
| Syntax Description |
| ======================================== ================================================ |
| dst_unused:UNUSED_PAD Pad with zeros. |
| dst_unused:UNUSED_SEXT Sign-extend upper bits, zero lower bits. |
| dst_unused:UNUSED_PRESERVE Preserve bits. |
| ======================================== ================================================ |
| |
| .. _amdgpu_synid_src0_sel: |
| |
| src0_sel |
| ~~~~~~~~ |
| |
| Controls which bits in the src0 are used. By default, all bits are used. |
| |
| ======================================== ================================================ |
| Syntax Description |
| ======================================== ================================================ |
| src0_sel:DWORD Use bits 31:0. |
| src0_sel:BYTE_0 Use bits 7:0. |
| src0_sel:BYTE_1 Use bits 15:8. |
| src0_sel:BYTE_2 Use bits 23:16. |
| src0_sel:BYTE_3 Use bits 31:24. |
| src0_sel:WORD_0 Use bits 15:0. |
| src0_sel:WORD_1 Use bits 31:16. |
| ======================================== ================================================ |
| |
| .. _amdgpu_synid_src1_sel: |
| |
| src1_sel |
| ~~~~~~~~ |
| |
| Controls which bits in the src1 are used. By default, all bits are used. |
| |
| ======================================== ================================================ |
| Syntax Description |
| ======================================== ================================================ |
| src1_sel:DWORD Use bits 31:0. |
| src1_sel:BYTE_0 Use bits 7:0. |
| src1_sel:BYTE_1 Use bits 15:8. |
| src1_sel:BYTE_2 Use bits 23:16. |
| src1_sel:BYTE_3 Use bits 31:24. |
| src1_sel:WORD_0 Use bits 15:0. |
| src1_sel:WORD_1 Use bits 31:16. |
| ======================================== ================================================ |
| |
| .. _amdgpu_synid_sdwa_operand_modifiers: |
| |
| VOP1/VOP2/VOPC SDWA Operand Modifiers |
| ------------------------------------- |
| |
| Operand modifiers are not used separately. They are applied to source operands. |
| |
| GFX8 and GFX9 only. |
| |
| abs |
| ~~~ |
| |
| See a description :ref:`here<amdgpu_synid_abs>`. |
| |
| neg |
| ~~~ |
| |
| See a description :ref:`here<amdgpu_synid_neg>`. |
| |
| .. _amdgpu_synid_sext: |
| |
| sext |
| ~~~~ |
| |
| Sign-extends value of a (sub-dword) operand to fill all 32 bits. |
| Has no effect for 32-bit operands. |
| |
| Valid for integer operands only. |
| |
| ======================================== ================================================ |
| Syntax Description |
| ======================================== ================================================ |
| sext(<operand>) Sign-extend operand value. |
| ======================================== ================================================ |
| |
| Examples: |
| |
| .. parsed-literal:: |
| |
| sext(v4) |
| sext(v255) |
| |
| VOP3 Modifiers |
| -------------- |
| |
| .. _amdgpu_synid_vop3_op_sel: |
| |
| op_sel |
| ~~~~~~ |
| |
| Selects the low [15:0] or high [31:16] operand bits for source and destination operands. |
| By default, low bits are used for all operands. |
| |
| The number of values specified with the op_sel modifier must match the number of instruction |
| operands (both source and destination). First value controls src0, second value controls src1 |
| and so on, except that the last value controls destination. |
| The value 0 selects the low bits, while 1 selects the high bits. |
| |
| Note. op_sel modifier affects 16-bit operands only. For 32-bit operands the value specified |
| by op_sel must be 0. |
| |
| GFX9 only. |
| |
| ======================================== ============================================================ |
| Syntax Description |
| ======================================== ============================================================ |
| op_sel:[{0..1},{0..1}] Select operand bits for instructions with 1 source operand. |
| op_sel:[{0..1},{0..1},{0..1}] Select operand bits for instructions with 2 source operands. |
| op_sel:[{0..1},{0..1},{0..1},{0..1}] Select operand bits for instructions with 3 source operands. |
| ======================================== ============================================================ |
| |
| Examples: |
| |
| .. parsed-literal:: |
| |
| op_sel:[0,0] |
| op_sel:[0,1] |
| |
| .. _amdgpu_synid_clamp: |
| |
| clamp |
| ~~~~~ |
| |
| Clamp meaning depends on instruction. |
| |
| For *v_cmp* instructions, clamp modifier indicates that the compare signals |
| if a floating point exception occurs. By default, signaling is disabled. |
| Not supported by GFX7. |
| |
| For integer operations, clamp modifier indicates that the result must be clamped |
| to the largest and smallest representable value. By default, there is no clamping. |
| Integer clamping is not supported by GFX7. |
| |
| For floating point operations, clamp modifier indicates that the result must be clamped |
| to the range [0.0, 1.0]. By default, there is no clamping. |
| |
| Note. Clamp modifier is applied after :ref:`output modifiers<amdgpu_synid_omod>` (if any). |
| |
| ======================================== ================================================ |
| Syntax Description |
| ======================================== ================================================ |
| clamp Enables clamping (or signaling). |
| ======================================== ================================================ |
| |
| .. _amdgpu_synid_omod: |
| |
| omod |
| ~~~~ |
| |
| Specifies if an output modifier must be applied to the result. |
| By default, no output modifiers are applied. |
| |
| Note. Output modifiers are applied before :ref:`clamping<amdgpu_synid_clamp>` (if any). |
| |
| Output modifiers are valid for f32 and f64 floating point results only. |
| They must not be used with f16. |
| |
| Note. *v_cvt_f16_f32* is an exception. This instruction produces f16 result |
| but accepts output modifiers. |
| |
| ======================================== ================================================ |
| Syntax Description |
| ======================================== ================================================ |
| mul:2 Multiply the result by 2. |
| mul:4 Multiply the result by 4. |
| div:2 Multiply the result by 0.5. |
| ======================================== ================================================ |
| |
| .. _amdgpu_synid_vop3_operand_modifiers: |
| |
| VOP3 Operand Modifiers |
| ---------------------- |
| |
| Operand modifiers are not used separately. They are applied to source operands. |
| |
| .. _amdgpu_synid_abs: |
| |
| abs |
| ~~~ |
| |
| Computes absolute value of its operand. Applied before :ref:`neg<amdgpu_synid_neg>` (if any). |
| Valid for floating point operands only. |
| |
| ======================================== ================================================ |
| Syntax Description |
| ======================================== ================================================ |
| abs(<operand>) Get absolute value of operand. |
| \|<operand>| The same as above. |
| ======================================== ================================================ |
| |
| Examples: |
| |
| .. parsed-literal:: |
| |
| abs(v36) |
| \|v36| |
| |
| .. _amdgpu_synid_neg: |
| |
| neg |
| ~~~ |
| |
| Computes negative value of its operand. Applied after :ref:`abs<amdgpu_synid_abs>` (if any). |
| Valid for floating point operands only. |
| |
| ======================================== ================================================ |
| Syntax Description |
| ======================================== ================================================ |
| neg(<operand>) Get negative value of operand. |
| -<operand> The same as above. |
| ======================================== ================================================ |
| |
| Examples: |
| |
| .. parsed-literal:: |
| |
| neg(v[0]) |
| -v4 |
| |
| VOP3P Modifiers |
| --------------- |
| |
| This section describes modifiers of *regular* VOP3P instructions. |
| |
| *v_mad_mix_f32*, *v_mad_mixhi_f16* and *v_mad_mixlo_f16* |
| instructions use these modifiers :ref:`in a special manner<amdgpu_synid_mad_mix>`. |
| |
| GFX9 only. |
| |
| .. _amdgpu_synid_op_sel: |
| |
| op_sel |
| ~~~~~~ |
| |
| Selects the low [15:0] or high [31:16] operand bits as input to the operation |
| which results in the lower-half of the destination. |
| By default, low bits are used for all operands. |
| |
| The number of values specified by the *op_sel* modifier must match the number of source |
| operands. First value controls src0, second value controls src1 and so on. |
| |
| The value 0 selects the low bits, while 1 selects the high bits. |
| |
| ================================= ============================================================= |
| Syntax Description |
| ================================= ============================================================= |
| op_sel:[{0..1}] Select operand bits for instructions with 1 source operand. |
| op_sel:[{0..1},{0..1}] Select operand bits for instructions with 2 source operands. |
| op_sel:[{0..1},{0..1},{0..1}] Select operand bits for instructions with 3 source operands. |
| ================================= ============================================================= |
| |
| Examples: |
| |
| .. parsed-literal:: |
| |
| op_sel:[0,0] |
| op_sel:[0,1,0] |
| |
| .. _amdgpu_synid_op_sel_hi: |
| |
| op_sel_hi |
| ~~~~~~~~~ |
| |
| Selects the low [15:0] or high [31:16] operand bits as input to the operation |
| which results in the upper-half of the destination. |
| By default, high bits are used for all operands. |
| |
| The number of values specified by the *op_sel_hi* modifier must match the number of source |
| operands. First value controls src0, second value controls src1 and so on. |
| |
| The value 0 selects the low bits, while 1 selects the high bits. |
| |
| =================================== ============================================================= |
| Syntax Description |
| =================================== ============================================================= |
| op_sel_hi:[{0..1}] Select operand bits for instructions with 1 source operand. |
| op_sel_hi:[{0..1},{0..1}] Select operand bits for instructions with 2 source operands. |
| op_sel_hi:[{0..1},{0..1},{0..1}] Select operand bits for instructions with 3 source operands. |
| =================================== ============================================================= |
| |
| Examples: |
| |
| .. parsed-literal:: |
| |
| op_sel_hi:[0,0] |
| op_sel_hi:[0,0,1] |
| |
| .. _amdgpu_synid_neg_lo: |
| |
| neg_lo |
| ~~~~~~ |
| |
| Specifies whether to change sign of operand values selected by |
| :ref:`op_sel<amdgpu_synid_op_sel>`. These values are then used |
| as input to the operation which results in the upper-half of the destination. |
| |
| The number of values specified by this modifier must match the number of source |
| operands. First value controls src0, second value controls src1 and so on. |
| |
| The value 0 indicates that the corresponding operand value is used unmodified, |
| the value 1 indicates that negative value of the operand must be used. |
| |
| By default, operand values are used unmodified. |
| |
| This modifier is valid for floating point operands only. |
| |
| ================================ ================================================================== |
| Syntax Description |
| ================================ ================================================================== |
| neg_lo:[{0..1}] Select affected operands for instructions with 1 source operand. |
| neg_lo:[{0..1},{0..1}] Select affected operands for instructions with 2 source operands. |
| neg_lo:[{0..1},{0..1},{0..1}] Select affected operands for instructions with 3 source operands. |
| ================================ ================================================================== |
| |
| Examples: |
| |
| .. parsed-literal:: |
| |
| neg_lo:[0] |
| neg_lo:[0,1] |
| |
| .. _amdgpu_synid_neg_hi: |
| |
| neg_hi |
| ~~~~~~ |
| |
| Specifies whether to change sign of operand values selected by |
| :ref:`op_sel_hi<amdgpu_synid_op_sel_hi>`. These values are then used |
| as input to the operation which results in the upper-half of the destination. |
| |
| The number of values specified by this modifier must match the number of source |
| operands. First value controls src0, second value controls src1 and so on. |
| |
| The value 0 indicates that the corresponding operand value is used unmodified, |
| the value 1 indicates that negative value of the operand must be used. |
| |
| By default, operand values are used unmodified. |
| |
| This modifier is valid for floating point operands only. |
| |
| =============================== ================================================================== |
| Syntax Description |
| =============================== ================================================================== |
| neg_hi:[{0..1}] Select affected operands for instructions with 1 source operand. |
| neg_hi:[{0..1},{0..1}] Select affected operands for instructions with 2 source operands. |
| neg_hi:[{0..1},{0..1},{0..1}] Select affected operands for instructions with 3 source operands. |
| =============================== ================================================================== |
| |
| Examples: |
| |
| .. parsed-literal:: |
| |
| neg_hi:[1,0] |
| neg_hi:[0,1,1] |
| |
| clamp |
| ~~~~~ |
| |
| See a description :ref:`here<amdgpu_synid_clamp>`. |
| |
| .. _amdgpu_synid_mad_mix: |
| |
| VOP3P V_MAD_MIX Modifiers |
| ------------------------- |
| |
| *v_mad_mix_f32*, *v_mad_mixhi_f16* and *v_mad_mixlo_f16* instructions |
| use *op_sel* and *op_sel_hi* modifiers |
| in a manner different from *regular* VOP3P instructions. |
| |
| See a description below. |
| |
| GFX9 only. |
| |
| .. _amdgpu_synid_mad_mix_op_sel: |
| |
| m_op_sel |
| ~~~~~~~~ |
| |
| This operand has meaning only for 16-bit source operands as indicated by |
| :ref:`m_op_sel_hi<amdgpu_synid_mad_mix_op_sel_hi>`. |
| It specifies to select either the low [15:0] or high [31:16] operand bits |
| as input to the operation. |
| |
| The number of values specified by the *op_sel* modifier must match the number of source |
| operands. First value controls src0, second value controls src1 and so on. |
| |
| The value 0 indicates the low bits, the value 1 indicates the high 16 bits. |
| |
| By default, low bits are used for all operands. |
| |
| =============================== ================================================ |
| Syntax Description |
| =============================== ================================================ |
| op_sel:[{0..1},{0..1},{0..1}] Select location of each 16-bit source operand. |
| =============================== ================================================ |
| |
| Examples: |
| |
| .. parsed-literal:: |
| |
| op_sel:[0,1] |
| |
| .. _amdgpu_synid_mad_mix_op_sel_hi: |
| |
| m_op_sel_hi |
| ~~~~~~~~~~~ |
| |
| Selects the size of source operands: either 32 bits or 16 bits. |
| By default, 32 bits are used for all source operands. |
| |
| The number of values specified by the *op_sel_hi* modifier must match the number of source |
| operands. First value controls src0, second value controls src1 and so on. |
| |
| The value 0 indicates 32 bits, the value 1 indicates 16 bits. |
| |
| The location of 16 bits in the operand may be specified by |
| :ref:`m_op_sel<amdgpu_synid_mad_mix_op_sel>`. |
| |
| ======================================== ==================================== |
| Syntax Description |
| ======================================== ==================================== |
| op_sel_hi:[{0..1},{0..1},{0..1}] Select size of each source operand. |
| ======================================== ==================================== |
| |
| Examples: |
| |
| .. parsed-literal:: |
| |
| op_sel_hi:[1,1,1] |
| |
| abs |
| ~~~ |
| |
| See a description :ref:`here<amdgpu_synid_abs>`. |
| |
| neg |
| ~~~ |
| |
| See a description :ref:`here<amdgpu_synid_neg>`. |
| |
| clamp |
| ~~~~~ |
| |
| See a description :ref:`here<amdgpu_synid_clamp>`. |