| ============================== |
| User Guide for AMDGPU Back-end |
| ============================== |
| |
| Introduction |
| ============ |
| |
| The AMDGPU back-end provides ISA code generation for AMD GPUs, starting with |
| the R600 family up until the current Volcanic Islands (GCN Gen 3). |
| |
| |
| Assembler |
| ========= |
| |
| The assembler is currently considered experimental. |
| |
| For syntax examples look in test/MC/AMDGPU. |
| |
| Below some of the currently supported features (modulo bugs). These |
| all apply to the Southern Islands ISA, Sea Islands and Volcanic Islands |
| are also supported but may be missing some instructions and have more bugs: |
| |
| DS Instructions |
| --------------- |
| All DS instructions are supported. |
| |
| FLAT Instructions |
| ------------------ |
| These instructions are only present in the Sea Islands and Volcanic Islands |
| instruction set. All FLAT instructions are supported for these architectures |
| |
| MUBUF Instructions |
| ------------------ |
| All non-atomic MUBUF instructions are supported. |
| |
| SMRD Instructions |
| ----------------- |
| Only the s_load_dword* SMRD instructions are supported. |
| |
| SOP1 Instructions |
| ----------------- |
| All SOP1 instructions are supported. |
| |
| SOP2 Instructions |
| ----------------- |
| All SOP2 instructions are supported. |
| |
| SOPC Instructions |
| ----------------- |
| All SOPC instructions are supported. |
| |
| SOPP Instructions |
| ----------------- |
| |
| Unless otherwise mentioned, all SOPP instructions that have one or more |
| operands accept integer operands only. No verification is performed |
| on the operands, so it is up to the programmer to be familiar with the |
| range or acceptable values. |
| |
| s_waitcnt |
| ^^^^^^^^^ |
| |
| s_waitcnt accepts named arguments to specify which memory counter(s) to |
| wait for. |
| |
| .. code-block:: nasm |
| |
| // Wait for all counters to be 0 |
| s_waitcnt 0 |
| |
| // Equivalent to s_waitcnt 0. Counter names can also be delimited by |
| // '&' or ','. |
| s_waitcnt vmcnt(0) expcnt(0) lgkcmt(0) |
| |
| // Wait for vmcnt counter to be 1. |
| s_waitcnt vmcnt(1) |
| |
| VOP1, VOP2, VOP3, VOPC Instructions |
| ----------------------------------- |
| |
| All 32-bit and 64-bit encodings should work. |
| |
| The assembler will automatically detect which encoding size to use for |
| VOP1, VOP2, and VOPC instructions based on the operands. If you want to force |
| a specific encoding size, you can add an _e32 (for 32-bit encoding) or |
| _e64 (for 64-bit encoding) suffix to the instruction. Most, but not all |
| instructions support an explicit suffix. These are all valid assembly |
| strings: |
| |
| .. code-block:: nasm |
| |
| v_mul_i32_i24 v1, v2, v3 |
| v_mul_i32_i24_e32 v1, v2, v3 |
| v_mul_i32_i24_e64 v1, v2, v3 |
| |
| Assembler Directives |
| -------------------- |
| |
| .hsa_code_object_version major, minor |
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
| |
| *major* and *minor* are integers that specify the version of the HSA code |
| object that will be generated by the assembler. This value will be stored |
| in an entry of the .note section. |
| |
| .hsa_code_object_isa [major, minor, stepping, vendor, arch] |
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
| |
| *major*, *minor*, and *stepping* are all integers that describe the instruction |
| set architecture (ISA) version of the assembly program. |
| |
| *vendor* and *arch* are quoted strings. *vendor* should always be equal to |
| "AMD" and *arch* should always be equal to "AMDGPU". |
| |
| If no arguments are specified, then the assembler will derive the ISA version, |
| *vendor*, and *arch* from the value of the -mcpu option that is passed to the |
| assembler. |
| |
| ISA version, *vendor*, and *arch* will all be stored in a single entry of the |
| .note section. |
| |
| .amd_kernel_code_t |
| ^^^^^^^^^^^^^^^^^^ |
| |
| This directive marks the beginning of a list of key / value pairs that are used |
| to specify the amd_kernel_code_t object that will be emitted by the assembler. |
| The list must be terminated by the *.end_amd_kernel_code_t* directive. For |
| any amd_kernel_code_t values that are unspecified a default value will be |
| used. The default value for all keys is 0, with the following exceptions: |
| |
| - *kernel_code_version_major* defaults to 1. |
| - *machine_kind* defaults to 1. |
| - *machine_version_major*, *machine_version_minor*, and |
| *machine_version_stepping* are derived from the value of the -mcpu option |
| that is passed to the assembler. |
| - *kernel_code_entry_byte_offset* defaults to 256. |
| - *wavefront_size* defaults to 6. |
| - *kernarg_segment_alignment*, *group_segment_alignment*, and |
| *private_segment_alignment* default to 4. Note that alignments are specified |
| as a power of two, so a value of **n** means an alignment of 2^ **n**. |
| |
| The *.amd_kernel_code_t* directive must be placed immediately after the |
| function label and before any instructions. |
| |
| For a full list of amd_kernel_code_t keys, see the examples in |
| test/CodeGen/AMDGPU/hsa.s. For an explanation of the meanings of the different |
| keys, see the comments in lib/Target/AMDGPU/AmdKernelCodeT.h |
| |
| Here is an example of a minimal amd_kernel_code_t specification: |
| |
| .. code-block:: nasm |
| |
| .hsa_code_object_version 1,0 |
| .hsa_code_object_isa |
| |
| .text |
| |
| hello_world: |
| |
| .amd_kernel_code_t |
| enable_sgpr_kernarg_segment_ptr = 1 |
| is_ptr64 = 1 |
| compute_pgm_rsrc1_vgprs = 0 |
| compute_pgm_rsrc1_sgprs = 0 |
| compute_pgm_rsrc2_user_sgpr = 2 |
| kernarg_segment_byte_size = 8 |
| wavefront_sgpr_count = 2 |
| workitem_vgpr_count = 3 |
| .end_amd_kernel_code_t |
| |
| s_load_dwordx2 s[0:1], s[0:1] 0x0 |
| v_mov_b32 v0, 3.14159 |
| s_waitcnt lgkmcnt(0) |
| v_mov_b32 v1, s0 |
| v_mov_b32 v2, s1 |
| flat_store_dword v0, v[1:2] |
| s_endpgm |