| ===================================== | 
 | Syntax of AMDGPU Instruction Operands | 
 | ===================================== | 
 |  | 
 | .. contents:: | 
 |    :local: | 
 |  | 
 | Conventions | 
 | =========== | 
 |  | 
 | The following notation is used throughout this document: | 
 |  | 
 |     =================== ============================================================================= | 
 |     Notation            Description | 
 |     =================== ============================================================================= | 
 |     {0..N}              Any integer value in the range from 0 to N (inclusive). | 
 |     <x>                 Syntax and meaning of *x* are explained elsewhere. | 
 |     =================== ============================================================================= | 
 |  | 
 | .. _amdgpu_syn_operands: | 
 |  | 
 | Operands | 
 | ======== | 
 |  | 
 | .. _amdgpu_synid_v: | 
 |  | 
 | v (32-bit) | 
 | ---------- | 
 |  | 
 | Vector registers. There are 256 32-bit vector registers. | 
 |  | 
 | A sequence of *vector* registers may be used to operate with more than 32 bits of data. | 
 |  | 
 | Assembler currently supports tuples with 1 to 12, 16 and 32 *vector* registers. | 
 |  | 
 |     =================================================== ==================================================================== | 
 |     Syntax                                              Description | 
 |     =================================================== ==================================================================== | 
 |     **v**\<N>                                           A single 32-bit *vector* register. | 
 |  | 
 |                                                         *N* must be a decimal | 
 |                                                         :ref:`integer number<amdgpu_synid_integer_number>`. | 
 |     **v[**\ <N>\ **]**                                  A single 32-bit *vector* register. | 
 |  | 
 |                                                         *N* may be specified as an | 
 |                                                         :ref:`integer number<amdgpu_synid_integer_number>` | 
 |                                                         or an :ref:`absolute expression<amdgpu_synid_absolute_expression>`. | 
 |     **v[**\ <N>:<K>\ **]**                              A sequence of (\ *K-N+1*\ ) *vector* registers. | 
 |  | 
 |                                                         *N* and *K* may be specified as | 
 |                                                         :ref:`integer numbers<amdgpu_synid_integer_number>` | 
 |                                                         or :ref:`absolute expressions<amdgpu_synid_absolute_expression>`. | 
 |     **[v**\ <N>, \ **v**\ <N+1>, ... **v**\ <K>\ **]**  A sequence of (\ *K-N+1*\ ) *vector* registers. | 
 |  | 
 |                                                         Register indices must be specified as decimal | 
 |                                                         :ref:`integer numbers<amdgpu_synid_integer_number>`. | 
 |     =================================================== ==================================================================== | 
 |  | 
 | Note: *N* and *K* must satisfy the following conditions: | 
 |  | 
 | * *N* <= *K*. | 
 | * 0 <= *N* <= 255. | 
 | * 0 <= *K* <= 255. | 
 | * *K-N+1* must be in the range from 1 to 12 or equal to 16 or 32. | 
 |  | 
 | GFX90A and GFX942 have an additional alignment requirement: | 
 | pairs of *vector* registers must be even-aligned | 
 | (first register must be even). | 
 |  | 
 | Examples: | 
 |  | 
 | .. parsed-literal:: | 
 |  | 
 |   v255 | 
 |   v[0] | 
 |   v[0:1] | 
 |   v[1:1] | 
 |   v[0:3] | 
 |   v[2*2] | 
 |   v[1-1:2-1] | 
 |   [v252] | 
 |   [v252,v253,v254,v255] | 
 |  | 
 | .. _amdgpu_synid_nsa: | 
 |  | 
 | **Non-Sequential Address (NSA) Syntax** | 
 |  | 
 | GFX10+ *image* instructions may use special *NSA* (Non-Sequential Address) | 
 | syntax for *image addresses*: | 
 |  | 
 |     ===================================== ================================================= | 
 |     Syntax                                Description | 
 |     ===================================== ================================================= | 
 |     **[Vm**, \ **Vn**, ... **Vk**\ **]**  A sequence of 32-bit *vector* registers. | 
 |                                           Each register may be specified using the syntax | 
 |                                           defined :ref:`above<amdgpu_synid_v>`. | 
 |  | 
 |                                           In contrast with the standard syntax, registers | 
 |                                           in *NSA* sequence are not required to have | 
 |                                           consecutive indices. Moreover, the same register | 
 |                                           may appear in the sequence more than once. | 
 |  | 
 |                                           GFX11+ has an additional limitation: if address | 
 |                                           size occupies more than 5 dwords, registers | 
 |                                           starting from the 5th element must be contiguous. | 
 |     ===================================== ================================================= | 
 |  | 
 | Examples: | 
 |  | 
 | .. parsed-literal:: | 
 |  | 
 |   [v32,v1,v[2]] | 
 |   [v[32],v[1:1],[v2]] | 
 |   [v4,v4,v4,v4] | 
 |  | 
 | .. _amdgpu_synid_v16: | 
 |  | 
 | v (16-bit) | 
 | ---------- | 
 |  | 
 | 16-bit vector registers. Each :ref:`32-bit vector register<amdgpu_synid_v>` is divided into two 16-bit low and high registers, so there are 512 16-bit vector registers. | 
 |  | 
 | Only VOP3, VOP3P and VINTERP instructions may access all 512 registers (using :ref:`op_sel<amdgpu_synid_op_sel>` modifier). | 
 | VOP1, VOP2 and VOPC instructions may currently access only 128 low 16-bit registers using the syntax described below. | 
 |  | 
 | .. WARNING:: This section is incomplete. The support of 16-bit registers in the assembler is still WIP. | 
 |  | 
 | \ | 
 |     =================================================== ==================================================================== | 
 |     Syntax                                              Description | 
 |     =================================================== ==================================================================== | 
 |     **v**\<N>                                           A single 16-bit *vector* register (low half). | 
 |     =================================================== ==================================================================== | 
 |  | 
 | Note: *N* must satisfy the following conditions: | 
 |  | 
 | * 0 <= *N* <= 127. | 
 |  | 
 | Examples: | 
 |  | 
 | .. parsed-literal:: | 
 |  | 
 |   v127 | 
 |  | 
 | .. _amdgpu_synid_a: | 
 |  | 
 | a | 
 | - | 
 |  | 
 | Accumulator registers. There are 256 32-bit accumulator registers. | 
 |  | 
 | A sequence of *accumulator* registers may be used to operate with more than 32 bits of data. | 
 |  | 
 | Assembler currently supports tuples with 1 to 12, 16 and 32 *accumulator* registers. | 
 |  | 
 |     =================================================== ========================================================= ==================================================================== | 
 |     Syntax                                              Alternative Syntax (SP3)                                  Description | 
 |     =================================================== ========================================================= ==================================================================== | 
 |     **a**\<N>                                           **acc**\<N>                                               A single 32-bit *accumulator* register. | 
 |  | 
 |                                                                                                                   *N* must be a decimal | 
 |                                                                                                                   :ref:`integer number<amdgpu_synid_integer_number>`. | 
 |     **a[**\ <N>\ **]**                                  **acc[**\ <N>\ **]**                                      A single 32-bit *accumulator* register. | 
 |  | 
 |                                                                                                                   *N* may be specified as an | 
 |                                                                                                                   :ref:`integer number<amdgpu_synid_integer_number>` | 
 |                                                                                                                   or an :ref:`absolute expression<amdgpu_synid_absolute_expression>`. | 
 |     **a[**\ <N>:<K>\ **]**                              **acc[**\ <N>:<K>\ **]**                                  A sequence of (\ *K-N+1*\ ) *accumulator* registers. | 
 |  | 
 |                                                                                                                   *N* and *K* may be specified as | 
 |                                                                                                                   :ref:`integer numbers<amdgpu_synid_integer_number>` | 
 |                                                                                                                   or :ref:`absolute expressions<amdgpu_synid_absolute_expression>`. | 
 |     **[a**\ <N>, \ **a**\ <N+1>, ... **a**\ <K>\ **]**  **[acc**\ <N>, \ **acc**\ <N+1>, ... **acc**\ <K>\ **]**  A sequence of (\ *K-N+1*\ ) *accumulator* registers. | 
 |  | 
 |                                                                                                                   Register indices must be specified as decimal | 
 |                                                                                                                   :ref:`integer numbers<amdgpu_synid_integer_number>`. | 
 |     =================================================== ========================================================= ==================================================================== | 
 |  | 
 | Note: *N* and *K* must satisfy the following conditions: | 
 |  | 
 | * *N* <= *K*. | 
 | * 0 <= *N* <= 255. | 
 | * 0 <= *K* <= 255. | 
 | * *K-N+1* must be in the range from 1 to 12 or equal to 16 or 32. | 
 |  | 
 | GFX90A and GFX942 have an additional alignment requirement: | 
 | pairs of *accumulator* registers must be even-aligned | 
 | (first register must be even). | 
 |  | 
 | Examples: | 
 |  | 
 | .. parsed-literal:: | 
 |  | 
 |   a255 | 
 |   a[0] | 
 |   a[0:1] | 
 |   a[1:1] | 
 |   a[0:3] | 
 |   a[2*2] | 
 |   a[1-1:2-1] | 
 |   [a252] | 
 |   [a252,a253,a254,a255] | 
 |  | 
 |   acc0 | 
 |   acc[1] | 
 |   [acc250] | 
 |   [acc2,acc3] | 
 |  | 
 | .. _amdgpu_synid_s: | 
 |  | 
 | s | 
 | - | 
 |  | 
 | Scalar 32-bit registers. The number of available *scalar* registers depends on the GPU: | 
 |  | 
 |     ======= ============================ | 
 |     GPU     Number of *scalar* registers | 
 |     ======= ============================ | 
 |     GFX7    104 | 
 |     GFX8    102 | 
 |     GFX9    102 | 
 |     GFX10+  106 | 
 |     ======= ============================ | 
 |  | 
 | A sequence of *scalar* registers may be used to operate with more than 32 bits of data. | 
 | Assembler currently supports tuples with 1 to 12, 16 and 32 *scalar* registers. | 
 |  | 
 | Pairs of *scalar* registers must be even-aligned (first register must be even). | 
 | Sequences of 4 and more *scalar* registers must be quad-aligned. | 
 |  | 
 |     ======================================================== ==================================================================== | 
 |     Syntax                                                   Description | 
 |     ======================================================== ==================================================================== | 
 |     **s**\ <N>                                               A single 32-bit *scalar* register. | 
 |  | 
 |                                                              *N* must be a decimal | 
 |                                                              :ref:`integer number<amdgpu_synid_integer_number>`. | 
 |  | 
 |     **s[**\ <N>\ **]**                                       A single 32-bit *scalar* register. | 
 |  | 
 |                                                              *N* may be specified as an | 
 |                                                              :ref:`integer number<amdgpu_synid_integer_number>` | 
 |                                                              or an :ref:`absolute expression<amdgpu_synid_absolute_expression>`. | 
 |     **s[**\ <N>:<K>\ **]**                                   A sequence of (\ *K-N+1*\ ) *scalar* registers. | 
 |  | 
 |                                                              *N* and *K* may be specified as | 
 |                                                              :ref:`integer numbers<amdgpu_synid_integer_number>` | 
 |                                                              or :ref:`absolute expressions<amdgpu_synid_absolute_expression>`. | 
 |  | 
 |     **[s**\ <N>, \ **s**\ <N+1>, ... **s**\ <K>\ **]**       A sequence of (\ *K-N+1*\ ) *scalar* registers. | 
 |  | 
 |                                                              Register indices must be specified as decimal | 
 |                                                              :ref:`integer numbers<amdgpu_synid_integer_number>`. | 
 |     ======================================================== ==================================================================== | 
 |  | 
 | Note: *N* and *K* must satisfy the following conditions: | 
 |  | 
 | * *N* must be properly aligned based on the sequence size. | 
 | * *N* <= *K*. | 
 | * 0 <= *N* < *SMAX*\ , where *SMAX* is the number of available *scalar* registers. | 
 | * 0 <= *K* < *SMAX*\ , where *SMAX* is the number of available *scalar* registers. | 
 | * *K-N+1* must be in the range from 1 to 12 or equal to 16 or 32. | 
 |  | 
 | Examples: | 
 |  | 
 | .. parsed-literal:: | 
 |  | 
 |   s0 | 
 |   s[0] | 
 |   s[0:1] | 
 |   s[1:1] | 
 |   s[0:3] | 
 |   s[2*2] | 
 |   s[1-1:2-1] | 
 |   [s4] | 
 |   [s4,s5,s6,s7] | 
 |  | 
 | Examples of *scalar* registers with an invalid alignment: | 
 |  | 
 | .. parsed-literal:: | 
 |  | 
 |   s[1:2] | 
 |   s[2:5] | 
 |  | 
 | .. _amdgpu_synid_trap: | 
 |  | 
 | trap | 
 | ---- | 
 |  | 
 | A set of trap handler registers: | 
 |  | 
 | * :ref:`ttmp<amdgpu_synid_ttmp>` | 
 | * :ref:`tba<amdgpu_synid_tba>` | 
 | * :ref:`tma<amdgpu_synid_tma>` | 
 |  | 
 | .. _amdgpu_synid_ttmp: | 
 |  | 
 | ttmp | 
 | ---- | 
 |  | 
 | Trap handler temporary scalar registers, 32-bits wide. | 
 | The number of available *ttmp* registers depends on the GPU: | 
 |  | 
 |     ======= =========================== | 
 |     GPU     Number of *ttmp* registers | 
 |     ======= =========================== | 
 |     GFX7    12 | 
 |     GFX8    12 | 
 |     GFX9    16 | 
 |     GFX10+  16 | 
 |     ======= =========================== | 
 |  | 
 | A sequence of *ttmp* registers may be used to operate with more than 32 bits of data. | 
 | Assembler currently supports tuples with 1 to 12 and 16 *ttmp* registers. | 
 |  | 
 | Pairs of *ttmp* registers must be even-aligned (first register must be even). | 
 | Sequences of 4 and more *ttmp* registers must be quad-aligned. | 
 |  | 
 |     ============================================================= ==================================================================== | 
 |     Syntax                                                        Description | 
 |     ============================================================= ==================================================================== | 
 |     **ttmp**\ <N>                                                 A single 32-bit *ttmp* register. | 
 |  | 
 |                                                                   *N* must be a decimal | 
 |                                                                   :ref:`integer number<amdgpu_synid_integer_number>`. | 
 |     **ttmp[**\ <N>\ **]**                                         A single 32-bit *ttmp* register. | 
 |  | 
 |                                                                   *N* may be specified as an | 
 |                                                                   :ref:`integer number<amdgpu_synid_integer_number>` | 
 |                                                                   or an :ref:`absolute expression<amdgpu_synid_absolute_expression>`. | 
 |     **ttmp[**\ <N>:<K>\ **]**                                     A sequence of (\ *K-N+1*\ ) *ttmp* registers. | 
 |  | 
 |                                                                   *N* and *K* may be specified as | 
 |                                                                   :ref:`integer numbers<amdgpu_synid_integer_number>` | 
 |                                                                   or :ref:`absolute expressions<amdgpu_synid_absolute_expression>`. | 
 |     **[ttmp**\ <N>, \ **ttmp**\ <N+1>, ... **ttmp**\ <K>\ **]**   A sequence of (\ *K-N+1*\ ) *ttmp* registers. | 
 |  | 
 |                                                                   Register indices must be specified as decimal | 
 |                                                                   :ref:`integer numbers<amdgpu_synid_integer_number>`. | 
 |     ============================================================= ==================================================================== | 
 |  | 
 | Note: *N* and *K* must satisfy the following conditions: | 
 |  | 
 | * *N* must be properly aligned based on the sequence size. | 
 | * *N* <= *K*. | 
 | * 0 <= *N* < *TMAX*, where *TMAX* is the number of available *ttmp* registers. | 
 | * 0 <= *K* < *TMAX*, where *TMAX* is the number of available *ttmp* registers. | 
 | * *K-N+1* must be in the range from 1 to 12 or equal to 16. | 
 |  | 
 | Examples: | 
 |  | 
 | .. parsed-literal:: | 
 |  | 
 |   ttmp0 | 
 |   ttmp[0] | 
 |   ttmp[0:1] | 
 |   ttmp[1:1] | 
 |   ttmp[0:3] | 
 |   ttmp[2*2] | 
 |   ttmp[1-1:2-1] | 
 |   [ttmp4] | 
 |   [ttmp4,ttmp5,ttmp6,ttmp7] | 
 |  | 
 | Examples of *ttmp* registers with an invalid alignment: | 
 |  | 
 | .. parsed-literal:: | 
 |  | 
 |   ttmp[1:2] | 
 |   ttmp[2:5] | 
 |  | 
 | .. _amdgpu_synid_tba: | 
 |  | 
 | tba | 
 | --- | 
 |  | 
 | Trap base address, 64-bits wide. Holds the pointer to the current | 
 | trap handler program. | 
 |  | 
 |     ================== ======================================================================= ============= | 
 |     Syntax             Description                                                             Availability | 
 |     ================== ======================================================================= ============= | 
 |     tba                64-bit *trap base address* register.                                    GFX7, GFX8 | 
 |     [tba]              64-bit *trap base address* register (an SP3 syntax).                    GFX7, GFX8 | 
 |     [tba_lo,tba_hi]    64-bit *trap base address* register (an SP3 syntax).                    GFX7, GFX8 | 
 |     ================== ======================================================================= ============= | 
 |  | 
 | High and low 32 bits of *trap base address* may be accessed as separate registers: | 
 |  | 
 |     ================== ======================================================================= ============= | 
 |     Syntax             Description                                                             Availability | 
 |     ================== ======================================================================= ============= | 
 |     tba_lo             Low 32 bits of *trap base address* register.                            GFX7, GFX8 | 
 |     tba_hi             High 32 bits of *trap base address* register.                           GFX7, GFX8 | 
 |     [tba_lo]           Low 32 bits of *trap base address* register (an SP3 syntax).            GFX7, GFX8 | 
 |     [tba_hi]           High 32 bits of *trap base address* register (an SP3 syntax).           GFX7, GFX8 | 
 |     ================== ======================================================================= ============= | 
 |  | 
 | .. _amdgpu_synid_tma: | 
 |  | 
 | tma | 
 | --- | 
 |  | 
 | Trap memory address, 64-bits wide. | 
 |  | 
 |     ================= ======================================================================= ================== | 
 |     Syntax            Description                                                             Availability | 
 |     ================= ======================================================================= ================== | 
 |     tma               64-bit *trap memory address* register.                                  GFX7, GFX8 | 
 |     [tma]             64-bit *trap memory address* register (an SP3 syntax).                  GFX7, GFX8 | 
 |     [tma_lo,tma_hi]   64-bit *trap memory address* register (an SP3 syntax).                  GFX7, GFX8 | 
 |     ================= ======================================================================= ================== | 
 |  | 
 | High and low 32 bits of *trap memory address* may be accessed as separate registers: | 
 |  | 
 |     ================= ======================================================================= ================== | 
 |     Syntax            Description                                                             Availability | 
 |     ================= ======================================================================= ================== | 
 |     tma_lo            Low 32 bits of *trap memory address* register.                          GFX7, GFX8 | 
 |     tma_hi            High 32 bits of *trap memory address* register.                         GFX7, GFX8 | 
 |     [tma_lo]          Low 32 bits of *trap memory address* register (an SP3 syntax).          GFX7, GFX8 | 
 |     [tma_hi]          High 32 bits of *trap memory address* register (an SP3 syntax).         GFX7, GFX8 | 
 |     ================= ======================================================================= ================== | 
 |  | 
 | .. _amdgpu_synid_flat_scratch: | 
 |  | 
 | flat_scratch | 
 | ------------ | 
 |  | 
 | Flat scratch address, 64-bits wide. Holds the base address of scratch memory. | 
 |  | 
 |     ================================== ================================================================ | 
 |     Syntax                             Description | 
 |     ================================== ================================================================ | 
 |     flat_scratch                       64-bit *flat scratch* address register. | 
 |     [flat_scratch]                     64-bit *flat scratch* address register (an SP3 syntax). | 
 |     [flat_scratch_lo,flat_scratch_hi]  64-bit *flat scratch* address register (an SP3 syntax). | 
 |     ================================== ================================================================ | 
 |  | 
 | High and low 32 bits of *flat scratch* address may be accessed as separate registers: | 
 |  | 
 |     ========================= ========================================================================= | 
 |     Syntax                    Description | 
 |     ========================= ========================================================================= | 
 |     flat_scratch_lo           Low 32 bits of *flat scratch* address register. | 
 |     flat_scratch_hi           High 32 bits of *flat scratch* address register. | 
 |     [flat_scratch_lo]         Low 32 bits of *flat scratch* address register (an SP3 syntax). | 
 |     [flat_scratch_hi]         High 32 bits of *flat scratch* address register (an SP3 syntax). | 
 |     ========================= ========================================================================= | 
 |  | 
 | .. _amdgpu_synid_xnack: | 
 | .. _amdgpu_synid_xnack_mask: | 
 |  | 
 | xnack_mask | 
 | ---------- | 
 |  | 
 | Xnack mask, 64-bits wide. Holds a 64-bit mask of which threads | 
 | received an *XNACK* due to a vector memory operation. | 
 |  | 
 | For availability of *xnack* feature, refer to :ref:`this table<amdgpu-processors>`. | 
 |  | 
 |     ============================== ===================================================== | 
 |     Syntax                         Description | 
 |     ============================== ===================================================== | 
 |     xnack_mask                     64-bit *xnack mask* register. | 
 |     [xnack_mask]                   64-bit *xnack mask* register (an SP3 syntax). | 
 |     [xnack_mask_lo,xnack_mask_hi]  64-bit *xnack mask* register (an SP3 syntax). | 
 |     ============================== ===================================================== | 
 |  | 
 | High and low 32 bits of *xnack mask* may be accessed as separate registers: | 
 |  | 
 |     ===================== ============================================================== | 
 |     Syntax                Description | 
 |     ===================== ============================================================== | 
 |     xnack_mask_lo         Low 32 bits of *xnack mask* register. | 
 |     xnack_mask_hi         High 32 bits of *xnack mask* register. | 
 |     [xnack_mask_lo]       Low 32 bits of *xnack mask* register (an SP3 syntax). | 
 |     [xnack_mask_hi]       High 32 bits of *xnack mask* register (an SP3 syntax). | 
 |     ===================== ============================================================== | 
 |  | 
 | .. _amdgpu_synid_vcc: | 
 | .. _amdgpu_synid_vcc_lo: | 
 |  | 
 | vcc | 
 | --- | 
 |  | 
 | Vector condition code, 64-bits wide. A bit mask with one bit per thread; | 
 | it holds the result of a vector compare operation. | 
 |  | 
 | Note that GFX10+ H/W does not use high 32 bits of *vcc* in *wave32* mode. | 
 |  | 
 |     ================ ========================================================================= | 
 |     Syntax           Description | 
 |     ================ ========================================================================= | 
 |     vcc              64-bit *vector condition code* register. | 
 |     [vcc]            64-bit *vector condition code* register (an SP3 syntax). | 
 |     [vcc_lo,vcc_hi]  64-bit *vector condition code* register (an SP3 syntax). | 
 |     ================ ========================================================================= | 
 |  | 
 | High and low 32 bits of *vector condition code* may be accessed as separate registers: | 
 |  | 
 |     ================ ========================================================================= | 
 |     Syntax           Description | 
 |     ================ ========================================================================= | 
 |     vcc_lo           Low 32 bits of *vector condition code* register. | 
 |     vcc_hi           High 32 bits of *vector condition code* register. | 
 |     [vcc_lo]         Low 32 bits of *vector condition code* register (an SP3 syntax). | 
 |     [vcc_hi]         High 32 bits of *vector condition code* register (an SP3 syntax). | 
 |     ================ ========================================================================= | 
 |  | 
 | .. _amdgpu_synid_m0: | 
 |  | 
 | m0 | 
 | -- | 
 |  | 
 | A 32-bit memory register. It has various uses, | 
 | including register indexing and bounds checking. | 
 |  | 
 |     =========== =================================================== | 
 |     Syntax      Description | 
 |     =========== =================================================== | 
 |     m0          A 32-bit *memory* register. | 
 |     [m0]        A 32-bit *memory* register (an SP3 syntax). | 
 |     =========== =================================================== | 
 |  | 
 | .. _amdgpu_synid_exec: | 
 |  | 
 | exec | 
 | ---- | 
 |  | 
 | Execute mask, 64-bits wide. A bit mask with one bit per thread, | 
 | which is applied to vector instructions and controls which threads execute | 
 | and which ignore the instruction. | 
 |  | 
 | Note that GFX10+ H/W does not use high 32 bits of *exec* in *wave32* mode. | 
 |  | 
 |     ===================== ================================================================= | 
 |     Syntax                Description | 
 |     ===================== ================================================================= | 
 |     exec                  64-bit *execute mask* register. | 
 |     [exec]                64-bit *execute mask* register (an SP3 syntax). | 
 |     [exec_lo,exec_hi]     64-bit *execute mask* register (an SP3 syntax). | 
 |     ===================== ================================================================= | 
 |  | 
 | High and low 32 bits of *execute mask* may be accessed as separate registers: | 
 |  | 
 |     ===================== ================================================================= | 
 |     Syntax                Description | 
 |     ===================== ================================================================= | 
 |     exec_lo               Low 32 bits of *execute mask* register. | 
 |     exec_hi               High 32 bits of *execute mask* register. | 
 |     [exec_lo]             Low 32 bits of *execute mask* register (an SP3 syntax). | 
 |     [exec_hi]             High 32 bits of *execute mask* register (an SP3 syntax). | 
 |     ===================== ================================================================= | 
 |  | 
 | .. _amdgpu_synid_vccz: | 
 |  | 
 | vccz | 
 | ---- | 
 |  | 
 | A single bit flag indicating that the :ref:`vcc<amdgpu_synid_vcc>` | 
 | is all zeros. | 
 |  | 
 | Note: when GFX10+ operates in *wave32* mode, this register reflects | 
 | the state of :ref:`vcc_lo<amdgpu_synid_vcc_lo>`. | 
 |  | 
 | .. _amdgpu_synid_execz: | 
 |  | 
 | execz | 
 | ----- | 
 |  | 
 | A single bit flag indicating that the :ref:`exec<amdgpu_synid_exec>` | 
 | is all zeros. | 
 |  | 
 | Note: when GFX10+ operates in *wave32* mode, this register reflects | 
 | the state of :ref:`exec_lo<amdgpu_synid_exec>`. | 
 |  | 
 | .. _amdgpu_synid_scc: | 
 |  | 
 | scc | 
 | --- | 
 |  | 
 | A single bit flag indicating the result of a scalar compare operation. | 
 |  | 
 | .. _amdgpu_synid_lds_direct: | 
 |  | 
 | lds_direct | 
 | ---------- | 
 |  | 
 | A special operand which supplies a 32-bit value | 
 | fetched from *LDS* memory using :ref:`m0<amdgpu_synid_m0>` as an address. | 
 |  | 
 | .. _amdgpu_synid_null: | 
 |  | 
 | null | 
 | ---- | 
 |  | 
 | This is a special operand that may be used as a source or a destination. | 
 |  | 
 | When used as a destination, the result of the operation is discarded. | 
 |  | 
 | When used as a source, it supplies zero value. | 
 |  | 
 | .. _amdgpu_synid_constant: | 
 |  | 
 | inline constant | 
 | --------------- | 
 |  | 
 | An *inline constant* is an integer or a floating-point value | 
 | encoded as a part of an instruction. Compare *inline constants* | 
 | with :ref:`literals<amdgpu_synid_literal>`. | 
 |  | 
 | Inline constants include: | 
 |  | 
 | * :ref:`Integer inline constants<amdgpu_synid_iconst>`; | 
 | * :ref:`Floating-point inline constants<amdgpu_synid_fconst>`; | 
 | * :ref:`Inline values<amdgpu_synid_ival>`. | 
 |  | 
 | If a number may be encoded as either | 
 | a :ref:`literal<amdgpu_synid_literal>` or | 
 | a :ref:`constant<amdgpu_synid_constant>`, | 
 | the assembler selects the latter encoding as more efficient. | 
 |  | 
 | .. _amdgpu_synid_iconst: | 
 |  | 
 | iconst | 
 | ~~~~~~ | 
 |  | 
 | An :ref:`integer number<amdgpu_synid_integer_number>` or | 
 | an :ref:`absolute expression<amdgpu_synid_absolute_expression>` | 
 | encoded as an *inline constant*. | 
 |  | 
 | Only a small fraction of integer numbers may be encoded as *inline constants*. | 
 | They are enumerated in the table below. | 
 | Other integer numbers are encoded as :ref:`literals<amdgpu_synid_literal>`. | 
 |  | 
 |     ================================== ==================================== | 
 |     Value                              Note | 
 |     ================================== ==================================== | 
 |     {0..64}                            Positive integer inline constants. | 
 |     {-16..-1}                          Negative integer inline constants. | 
 |     ================================== ==================================== | 
 |  | 
 | .. _amdgpu_synid_fconst: | 
 |  | 
 | fconst | 
 | ~~~~~~ | 
 |  | 
 | A :ref:`floating-point number<amdgpu_synid_floating-point_number>` | 
 | encoded as an *inline constant*. | 
 |  | 
 | Only a small fraction of floating-point numbers may be encoded | 
 | as *inline constants*. They are enumerated in the table below. | 
 | Other floating-point numbers are encoded as | 
 | :ref:`literals<amdgpu_synid_literal>`. | 
 |  | 
 |     ===================== ===================================================== ================== | 
 |     Value                 Note                                                  Availability | 
 |     ===================== ===================================================== ================== | 
 |     0.0                   The same as integer constant 0.                       All GPUs | 
 |     0.5                   Floating-point constant 0.5                           All GPUs | 
 |     1.0                   Floating-point constant 1.0                           All GPUs | 
 |     2.0                   Floating-point constant 2.0                           All GPUs | 
 |     4.0                   Floating-point constant 4.0                           All GPUs | 
 |     -0.5                  Floating-point constant -0.5                          All GPUs | 
 |     -1.0                  Floating-point constant -1.0                          All GPUs | 
 |     -2.0                  Floating-point constant -2.0                          All GPUs | 
 |     -4.0                  Floating-point constant -4.0                          All GPUs | 
 |     0.1592                1.0/(2.0*pi). Use only for 16-bit operands.           GFX8+ | 
 |     0.15915494            1.0/(2.0*pi). Use only for 16- and 32-bit operands.   GFX8+ | 
 |     0.15915494309189532   1.0/(2.0*pi).                                         GFX8+ | 
 |     ===================== ===================================================== ================== | 
 |  | 
 | .. WARNING:: Floating-point inline constants cannot be used with *16-bit integer* operands. \ | 
 |              Assembler encodes these values as literals. | 
 |  | 
 | .. _amdgpu_synid_ival: | 
 |  | 
 | ival | 
 | ~~~~ | 
 |  | 
 | A symbolic operand encoded as an *inline constant*. | 
 | These operands provide read-only access to H/W registers. | 
 |  | 
 |     ===================== ========================= ================================================ ============= | 
 |     Syntax                Alternative Syntax (SP3)  Note                                             Availability | 
 |     ===================== ========================= ================================================ ============= | 
 |     shared_base           src_shared_base           Base address of shared memory region.            GFX9+ | 
 |     shared_limit          src_shared_limit          Address of the end of shared memory region.      GFX9+ | 
 |     private_base          src_private_base          Base address of private memory region.           GFX9+ | 
 |     private_limit         src_private_limit         Address of the end of private memory region.     GFX9+ | 
 |     pops_exiting_wave_id  src_pops_exiting_wave_id  A dedicated counter for POPS.                    GFX9, GFX10 | 
 |     ===================== ========================= ================================================ ============= | 
 |  | 
 | .. _amdgpu_synid_literal: | 
 |  | 
 | literal | 
 | ------- | 
 |  | 
 | A *literal* is a 64-bit value encoded as a separate | 
 | 32-bit dword in the instruction stream. Compare *literals* | 
 | with :ref:`inline constants<amdgpu_synid_constant>`. | 
 |  | 
 | If a number may be encoded as either | 
 | a :ref:`literal<amdgpu_synid_literal>` or | 
 | an :ref:`inline constant<amdgpu_synid_constant>`, | 
 | assembler selects the latter encoding as more efficient. | 
 |  | 
 | Literals may be specified as | 
 | :ref:`integer numbers<amdgpu_synid_integer_number>`, | 
 | :ref:`floating-point numbers<amdgpu_synid_floating-point_number>`, | 
 | :ref:`absolute expressions<amdgpu_synid_absolute_expression>` or | 
 | :ref:`relocatable expressions<amdgpu_synid_relocatable_expression>`. | 
 |  | 
 | An instruction may use only one literal, | 
 | but several operands may refer to the same literal. | 
 |  | 
 | .. _amdgpu_synid_uimm8: | 
 |  | 
 | uimm8 | 
 | ----- | 
 |  | 
 | An 8-bit :ref:`integer number<amdgpu_synid_integer_number>` | 
 | or an :ref:`absolute expression<amdgpu_synid_absolute_expression>`. | 
 | The value must be in the range 0..0xFF. | 
 |  | 
 | .. _amdgpu_synid_uimm32: | 
 |  | 
 | uimm32 | 
 | ------ | 
 |  | 
 | A 32-bit :ref:`integer number<amdgpu_synid_integer_number>` | 
 | or an :ref:`absolute expression<amdgpu_synid_absolute_expression>`. | 
 | The value must be in the range 0..0xFFFFFFFF. | 
 |  | 
 | .. _amdgpu_synid_uimm20: | 
 |  | 
 | uimm20 | 
 | ------ | 
 |  | 
 | A 20-bit :ref:`integer number<amdgpu_synid_integer_number>` | 
 | or an :ref:`absolute expression<amdgpu_synid_absolute_expression>`. | 
 |  | 
 | The value must be in the range 0..0xFFFFF. | 
 |  | 
 | .. _amdgpu_synid_simm21: | 
 |  | 
 | simm21 | 
 | ------ | 
 |  | 
 | A 21-bit :ref:`integer number<amdgpu_synid_integer_number>` | 
 | or an :ref:`absolute expression<amdgpu_synid_absolute_expression>`. | 
 |  | 
 | The value must be in the range -0x100000..0x0FFFFF. | 
 |  | 
 | .. _amdgpu_synid_off: | 
 |  | 
 | off | 
 | --- | 
 |  | 
 | A special entity which indicates that the value of this operand is not used. | 
 |  | 
 |     ================================== =================================================== | 
 |     Syntax                             Description | 
 |     ================================== =================================================== | 
 |     off                                Indicates an unused operand. | 
 |     ================================== =================================================== | 
 |  | 
 |  | 
 | .. _amdgpu_synid_number: | 
 |  | 
 | Numbers | 
 | ======= | 
 |  | 
 | .. _amdgpu_synid_integer_number: | 
 |  | 
 | Integer Numbers | 
 | --------------- | 
 |  | 
 | Integer numbers are 64 bits wide. | 
 | They are converted to :ref:`expected operand type<amdgpu_syn_instruction_type>` | 
 | as described :ref:`here<amdgpu_synid_int_conv>`. | 
 |  | 
 | Integer numbers may be specified in binary, octal, | 
 | hexadecimal and decimal formats: | 
 |  | 
 |     ============ =============================== ======== | 
 |     Format       Syntax                          Example | 
 |     ============ =============================== ======== | 
 |     Decimal      [-]?[1-9][0-9]*                 -1234 | 
 |     Binary       [-]?0b[01]+                     0b1010 | 
 |     Octal        [-]?0[0-7]+                     010 | 
 |     Hexadecimal  [-]?0x[0-9a-fA-F]+              0xff | 
 |     \            [-]?[0x]?[0-9][0-9a-fA-F]*[hH]  0ffh | 
 |     ============ =============================== ======== | 
 |  | 
 | .. _amdgpu_synid_floating-point_number: | 
 |  | 
 | Floating-Point Numbers | 
 | ---------------------- | 
 |  | 
 | All floating-point numbers are handled as double (64 bits wide). | 
 | They are converted to | 
 | :ref:`expected operand type<amdgpu_syn_instruction_type>` | 
 | as described :ref:`here<amdgpu_synid_fp_conv>`. | 
 |  | 
 | Floating-point numbers may be specified in hexadecimal and decimal formats: | 
 |  | 
 |     ============ ======================================================== ====================== ==================== | 
 |     Format       Syntax                                                   Examples               Note | 
 |     ============ ======================================================== ====================== ==================== | 
 |     Decimal      [-]?[0-9]*[.][0-9]*([eE][+-]?[0-9]*)?                    -1.234, 234e2          Must include either | 
 |                                                                                                  a decimal separator | 
 |                                                                                                  or an exponent. | 
 |     Hexadecimal  [-]0x[0-9a-fA-F]*(.[0-9a-fA-F]*)?[pP][+-]?[0-9a-fA-F]+   -0x1afp-10, 0x.1afp10 | 
 |     ============ ======================================================== ====================== ==================== | 
 |  | 
 | .. _amdgpu_synid_expression: | 
 |  | 
 | Expressions | 
 | =========== | 
 |  | 
 | An expression is evaluated to a 64-bit integer. | 
 | Note that floating-point expressions are not supported. | 
 |  | 
 | There are two kinds of expressions: | 
 |  | 
 | * :ref:`Absolute<amdgpu_synid_absolute_expression>`. | 
 | * :ref:`Relocatable<amdgpu_synid_relocatable_expression>`. | 
 |  | 
 | .. _amdgpu_synid_absolute_expression: | 
 |  | 
 | Absolute Expressions | 
 | -------------------- | 
 |  | 
 | The value of an absolute expression does not change after program relocation. | 
 | Absolute expressions must not include unassigned and relocatable values | 
 | such as labels. | 
 |  | 
 | Absolute expressions are evaluated to 64-bit integer values and converted to | 
 | :ref:`expected operand type<amdgpu_syn_instruction_type>` | 
 | as described :ref:`here<amdgpu_synid_int_conv>`. | 
 |  | 
 | Examples: | 
 |  | 
 | .. parsed-literal:: | 
 |  | 
 |     x = -1 | 
 |     y = x + 10 | 
 |  | 
 | .. _amdgpu_synid_relocatable_expression: | 
 |  | 
 | Relocatable Expressions | 
 | ----------------------- | 
 |  | 
 | The value of a relocatable expression depends on program relocation. | 
 |  | 
 | Note that use of relocatable expressions is limited to branch targets | 
 | and 32-bit integer operands. | 
 |  | 
 | A relocatable expression is evaluated to a 64-bit integer value, | 
 | which depends on operand kind and | 
 | :ref:`relocation type<amdgpu-relocation-records>` of symbol(s) | 
 | used in the expression. For example, if an instruction refers to a label, | 
 | this reference is evaluated to an offset from the address after | 
 | the instruction to the label address: | 
 |  | 
 | .. parsed-literal:: | 
 |  | 
 |     label: | 
 |     v_add_co_u32_e32 v0, vcc, label, v1  // 'label' operand is evaluated to -4 | 
 |  | 
 | Note that values of relocatable expressions are usually unknown | 
 | at assembly time; they are resolved later by a linker and converted to | 
 | :ref:`expected operand type<amdgpu_syn_instruction_type>` | 
 | as described :ref:`here<amdgpu_synid_rl_conv>`. | 
 |  | 
 | Operands and Operations | 
 | ----------------------- | 
 |  | 
 | Expressions are composed of 64-bit integer operands and operations. | 
 | Operands include :ref:`integer numbers<amdgpu_synid_integer_number>` | 
 | and :ref:`symbols<amdgpu_synid_symbol>`. | 
 |  | 
 | Expressions may also use "." which is a reference | 
 | to the current PC (program counter). | 
 |  | 
 | :ref:`Unary<amdgpu_synid_expression_un_op>` and | 
 | :ref:`binary<amdgpu_synid_expression_bin_op>` | 
 | operations produce 64-bit integer results. | 
 |  | 
 | Syntax of Expressions | 
 | --------------------- | 
 |  | 
 | Syntax of expressions is shown below:: | 
 |  | 
 |     expr ::= expr binop expr | primaryexpr ; | 
 |  | 
 |     primaryexpr ::= '(' expr ')' | symbol | number | '.' | unop primaryexpr ; | 
 |  | 
 |     binop ::= '&&' | 
 |             | '||' | 
 |             | '|' | 
 |             | '^' | 
 |             | '&' | 
 |             | '!' | 
 |             | '==' | 
 |             | '!=' | 
 |             | '<>' | 
 |             | '<' | 
 |             | '<=' | 
 |             | '>' | 
 |             | '>=' | 
 |             | '<<' | 
 |             | '>>' | 
 |             | '+' | 
 |             | '-' | 
 |             | '*' | 
 |             | '/' | 
 |             | '%' ; | 
 |  | 
 |     unop ::= '~' | 
 |            | '+' | 
 |            | '-' | 
 |            | '!' ; | 
 |  | 
 | .. _amdgpu_synid_expression_bin_op: | 
 |  | 
 | Binary Operators | 
 | ---------------- | 
 |  | 
 | Binary operators are described in the following table. | 
 | They operate on and produce 64-bit integers. | 
 | Operators with higher priority are performed first. | 
 |  | 
 |     ========== ========= =============================================== | 
 |     Operator   Priority  Meaning | 
 |     ========== ========= =============================================== | 
 |        \*         5      Integer multiplication. | 
 |        /          5      Integer division. | 
 |        %          5      Integer signed remainder. | 
 |        \+         4      Integer addition. | 
 |        \-         4      Integer subtraction. | 
 |        <<         3      Integer shift left. | 
 |        >>         3      Logical shift right. | 
 |        ==         2      Equality comparison. | 
 |        !=         2      Inequality comparison. | 
 |        <>         2      Inequality comparison. | 
 |        <          2      Signed less than comparison. | 
 |        <=         2      Signed less than or equal comparison. | 
 |        >          2      Signed greater than comparison. | 
 |        >=         2      Signed greater than or equal comparison. | 
 |       \|          1      Bitwise or. | 
 |        ^          1      Bitwise xor. | 
 |        &          1      Bitwise and. | 
 |        &&         0      Logical and. | 
 |        ||         0      Logical or. | 
 |     ========== ========= =============================================== | 
 |  | 
 | .. _amdgpu_synid_expression_un_op: | 
 |  | 
 | Unary Operators | 
 | --------------- | 
 |  | 
 | Unary operators are described in the following table. | 
 | They operate on and produce 64-bit integers. | 
 |  | 
 |     ========== =============================================== | 
 |     Operator   Meaning | 
 |     ========== =============================================== | 
 |        !       Logical negation. | 
 |        ~       Bitwise negation. | 
 |        \+      Integer unary plus. | 
 |        \-      Integer unary minus. | 
 |     ========== =============================================== | 
 |  | 
 | .. _amdgpu_synid_symbol: | 
 |  | 
 | Symbols | 
 | ------- | 
 |  | 
 | A symbol is a named 64-bit integer value, representing a relocatable | 
 | address or an absolute (non-relocatable) number. | 
 |  | 
 | Symbol names have the following syntax: | 
 |     ``[a-zA-Z_.][a-zA-Z0-9_$.@]*`` | 
 |  | 
 | The table below provides several examples of syntax used for symbol definition. | 
 |  | 
 |     ================ ========================================================== | 
 |     Syntax           Meaning | 
 |     ================ ========================================================== | 
 |     .globl <S>       Declares a global symbol S without assigning it a value. | 
 |     .set <S>, <E>    Assigns the value of an expression E to a symbol S. | 
 |     <S> = <E>        Assigns the value of an expression E to a symbol S. | 
 |     <S>:             Declares a label S and assigns it the current PC value. | 
 |     ================ ========================================================== | 
 |  | 
 | A symbol may be used before it is declared or assigned; | 
 | unassigned symbols are assumed to be PC-relative. | 
 |  | 
 | Additional information about symbols may be found :ref:`here<amdgpu-symbols>`. | 
 |  | 
 | .. _amdgpu_synid_conv: | 
 |  | 
 | Type and Size Conversion | 
 | ======================== | 
 |  | 
 | This section describes what happens when a 64-bit | 
 | :ref:`integer number<amdgpu_synid_integer_number>`, a | 
 | :ref:`floating-point number<amdgpu_synid_floating-point_number>` or an | 
 | :ref:`expression<amdgpu_synid_expression>` | 
 | is used for an operand which has a different type or size. | 
 |  | 
 | .. _amdgpu_synid_int_conv: | 
 |  | 
 | Conversion of Integer Values | 
 | ---------------------------- | 
 |  | 
 | Instruction operands may be specified as 64-bit | 
 | :ref:`integer numbers<amdgpu_synid_integer_number>` or | 
 | :ref:`absolute expressions<amdgpu_synid_absolute_expression>`. | 
 | These values are converted to the | 
 | :ref:`expected operand type<amdgpu_syn_instruction_type>` | 
 | using the following steps: | 
 |  | 
 | 1. *Validation*. Assembler checks if the input value may be truncated | 
 | without loss to the required *truncation width* (see the table below). | 
 | There are two cases when this operation is enabled: | 
 |  | 
 |     * The truncated bits are all 0. | 
 |     * The truncated bits are all 1 and the value after truncation has its MSB bit set. | 
 |  | 
 | In all other cases, the assembler triggers an error. | 
 |  | 
 | 2. *Conversion*. The input value is converted to the expected type | 
 | as described in the table below. Depending on operand kind, this conversion | 
 | is performed by either assembler or AMDGPU H/W (or both). | 
 |  | 
 |     ============== ================= =============== ==================================================================== | 
 |     Expected type  Truncation Width  Conversion      Description | 
 |     ============== ================= =============== ==================================================================== | 
 |     i16, u16, b16  16                num.u16         Truncate to 16 bits. | 
 |     i32, u32, b32  32                num.u32         Truncate to 32 bits. | 
 |     i64            32                {-1,num.i32}    Truncate to 32 bits and then sign-extend the result to 64 bits. | 
 |     u64, b64       32                {0,num.u32}     Truncate to 32 bits and then zero-extend the result to 64 bits. | 
 |     f16            16                num.u16         Use low 16 bits as an f16 value. | 
 |     f32            32                num.u32         Use low 32 bits as an f32 value. | 
 |     f64            32                {num.u32,0}     Use low 32 bits of the number as high 32 bits | 
 |                                                      of the result; low 32 bits of the result are zeroed. | 
 |     ============== ================= =============== ==================================================================== | 
 |  | 
 | Examples of enabled conversions: | 
 |  | 
 | .. parsed-literal:: | 
 |  | 
 |     // GFX9 | 
 |  | 
 |     v_add_u16 v0, -1, 0                   // src0 = 0xFFFF | 
 |     v_add_f16 v0, -1, 0                   // src0 = 0xFFFF (NaN) | 
 |                                           // | 
 |     v_add_u32 v0, -1, 0                   // src0 = 0xFFFFFFFF | 
 |     v_add_f32 v0, -1, 0                   // src0 = 0xFFFFFFFF (NaN) | 
 |                                           // | 
 |     v_add_u16 v0, 0xff00, v0              // src0 = 0xff00 | 
 |     v_add_u16 v0, 0xffffffffffffff00, v0  // src0 = 0xff00 | 
 |     v_add_u16 v0, -256, v0                // src0 = 0xff00 | 
 |                                           // | 
 |     s_bfe_i64 s[0:1], 0xffefffff, s3      // src0 = 0xffffffffffefffff | 
 |     s_bfe_u64 s[0:1], 0xffefffff, s3      // src0 = 0x00000000ffefffff | 
 |     v_ceil_f64_e32 v[0:1], 0xffefffff     // src0 = 0xffefffff00000000 (-1.7976922776554302e308) | 
 |                                           // | 
 |     x = 0xffefffff                        // | 
 |     s_bfe_i64 s[0:1], x, s3               // src0 = 0xffffffffffefffff | 
 |     s_bfe_u64 s[0:1], x, s3               // src0 = 0x00000000ffefffff | 
 |     v_ceil_f64_e32 v[0:1], x              // src0 = 0xffefffff00000000 (-1.7976922776554302e308) | 
 |  | 
 | Examples of disabled conversions: | 
 |  | 
 | .. parsed-literal:: | 
 |  | 
 |     // GFX9 | 
 |  | 
 |     v_add_u16 v0, 0x1ff00, v0               // truncated bits are not all 0 or 1 | 
 |     v_add_u16 v0, 0xffffffffffff00ff, v0    // truncated bits do not match MSB of the result | 
 |  | 
 | .. _amdgpu_synid_fp_conv: | 
 |  | 
 | Conversion of Floating-Point Values | 
 | ----------------------------------- | 
 |  | 
 | Instruction operands may be specified as 64-bit | 
 | :ref:`floating-point numbers<amdgpu_synid_floating-point_number>`. | 
 | These values are converted to the | 
 | :ref:`expected operand type<amdgpu_syn_instruction_type>` | 
 | using the following steps: | 
 |  | 
 | 1. *Validation*. Assembler checks if the input f64 number can be converted | 
 | to the *required floating-point type* (see the table below) without overflow | 
 | or underflow. Precision lost is allowed. If this conversion is not possible, | 
 | the assembler triggers an error. | 
 |  | 
 | 2. *Conversion*. The input value is converted to the expected type | 
 | as described in the table below. Depending on operand kind, this is | 
 | performed by either assembler or AMDGPU H/W (or both). | 
 |  | 
 |     ============== ================ ================= ================================================================= | 
 |     Expected type  Required FP Type Conversion        Description | 
 |     ============== ================ ================= ================================================================= | 
 |     i16, u16, b16  f16              f16(num)          Convert to f16 and use bits of the result as an integer value. | 
 |                                                       The value has to be encoded as a literal, or an error occurs. | 
 |                                                       Note that the value cannot be encoded as an inline constant. | 
 |     i32, u32, b32  f32              f32(num)          Convert to f32 and use bits of the result as an integer value. | 
 |     i64, u64, b64  \-               \-                Conversion disabled. | 
 |     f16            f16              f16(num)          Convert to f16. | 
 |     f32            f32              f32(num)          Convert to f32. | 
 |     f64            f64              {num.u32.hi,0}    Use high 32 bits of the number as high 32 bits of the result; | 
 |                                                       zero-fill low 32 bits of the result. | 
 |  | 
 |                                                       Note that the result may differ from the original number. | 
 |     ============== ================ ================= ================================================================= | 
 |  | 
 | Examples of enabled conversions: | 
 |  | 
 | .. parsed-literal:: | 
 |  | 
 |     // GFX9 | 
 |  | 
 |     v_add_f16 v0, 1.0, 0        // src0 = 0x3C00 (1.0) | 
 |     v_add_u16 v0, 1.0, 0        // src0 = 0x3C00 | 
 |                                 // | 
 |     v_add_f32 v0, 1.0, 0        // src0 = 0x3F800000 (1.0) | 
 |     v_add_u32 v0, 1.0, 0        // src0 = 0x3F800000 | 
 |  | 
 |                                 // src0 before conversion: | 
 |                                 //   1.7976931348623157e308 = 0x7fefffffffffffff | 
 |                                 // src0 after conversion: | 
 |                                 //   1.7976922776554302e308 = 0x7fefffff00000000 | 
 |     v_ceil_f64 v[0:1], 1.7976931348623157e308 | 
 |  | 
 |     v_add_f16 v1, 65500.0, v2   // ok for f16. | 
 |     v_add_f32 v1, 65600.0, v2   // ok for f32, but would result in overflow for f16. | 
 |  | 
 | Examples of disabled conversions: | 
 |  | 
 | .. parsed-literal:: | 
 |  | 
 |     // GFX9 | 
 |  | 
 |     v_add_f16 v1, 65600.0, v2    // overflow | 
 |  | 
 | .. _amdgpu_synid_rl_conv: | 
 |  | 
 | Conversion of Relocatable Values | 
 | -------------------------------- | 
 |  | 
 | :ref:`Relocatable expressions<amdgpu_synid_relocatable_expression>` | 
 | may be used with 32-bit integer operands and jump targets. | 
 |  | 
 | When the value of a relocatable expression is resolved by a linker, it is | 
 | converted as needed and truncated to the operand size. The conversion depends | 
 | on :ref:`relocation type<amdgpu-relocation-records>` and operand kind. | 
 |  | 
 | For example, when a 32-bit operand of an instruction refers | 
 | to a relocatable expression *expr*, this reference is evaluated | 
 | to a 64-bit offset from the address after the | 
 | instruction to the address being referenced, *counted in bytes*. | 
 | Then the value is truncated to 32 bits and encoded as a literal: | 
 |  | 
 | .. parsed-literal:: | 
 |  | 
 |     expr = . | 
 |     v_add_co_u32_e32 v0, vcc, expr, v1  // 'expr' operand is evaluated to -4 | 
 |                                         // and then truncated to 0xFFFFFFFC | 
 |  | 
 | As another example, when a branch instruction refers to a label, | 
 | this reference is evaluated to an offset from the address after the | 
 | instruction to the label address, *counted in dwords*. | 
 | Then the value is truncated to 16 bits: | 
 |  | 
 | .. parsed-literal:: | 
 |  | 
 |     label: | 
 |     s_branch label  // 'label' operand is evaluated to -1 and truncated to 0xFFFF |