docs/AMDGPU/DeveloperGuideline.rst - llvm-project/llvm - Git at Google

 ===============================
 Developer Guideline for AMDGPU
 ===============================

 .. contents::
    :local:

 Introduction
 ============

 This document highlights coding conventions, test policies, and other
 development guidelines that apply to all AMDGPU-related code across the LLVM
 project (the backend in ``llvm/lib/Target/AMDGPU``, Clang AMDGPU support, LLD,
 associated tests, etc.).  It is **not** a replacement for or summary of the
 `LLVM Coding Standards <https://llvm.org/docs/CodingStandards.html>`_ or the
 `LLVM Testing Guide <https://llvm.org/docs/TestingGuide.html>`_; contributors
 are expected to be familiar with those documents as well.

 The topics covered here are those that come up frequently during AMDGPU code
 reviews.  Some overlap with existing upstream rules and are restated here for
 easy reference.

 Coding Standards
 ================

 AMDGPU-related code follows the
 `LLVM Coding Standards <https://llvm.org/docs/CodingStandards.html>`_ with the
 refinements listed below.

 Use of ``auto``
 ---------------

 The LLVM Coding Standards describe the policy for ``auto`` in
 `Use auto Type Deduction to Make Code More Readable <https://llvm.org/docs/CodingStandards.html#use-auto-type-deduction-to-make-code-more-readable>`_.
 Below are more concrete examples of how that policy applies in AMDGPU code.

 Do **not** use ``auto`` except in the following cases:

 * Lambda expressions:

   .. code-block:: c++

      auto Pred = [](unsigned Val) { return Val > 0; };

 * Casts where the target type is already spelled out on the right-hand side
   (``cast``, ``dyn_cast``, ``static_cast``, etc.):

   .. code-block:: c++

      auto *Inst = cast<CallInst>(V);
      auto *MD = dyn_cast<MDNode>(Op);
      auto Width = static_cast<unsigned>(Val);

 * Iterators:

   .. code-block:: c++

      auto It = Container.begin();

 * Structured bindings:

   .. code-block:: c++

      auto [Key, Value] = Pair;

 In all other cases, write the type explicitly.

 .. code-block:: c++

    // Avoid - the type is not obvious from the right-hand side.
    auto Reg = MI.getOperand(0).getReg();
    auto Size = DL.getTypeAllocSize(Ty);

    // Preferred - spell out the type.
    Register Reg = MI.getOperand(0).getReg();
    TypeSize Size = DL.getTypeAllocSize(Ty);

 Use of Braces
 -------------

 The LLVM Coding Standards discuss brace usage in
 `Don't Use Braces on Simple Single-Statement Bodies of if/else/loop Statements <https://llvm.org/docs/CodingStandards.html#don-t-use-braces-on-simple-single-statement-bodies-of-if-else-loop-statements>`_.
 In AMDGPU code, braces may be omitted **only when the single statement
 fits on one line**.  If the statement spans more than one line (e.g. because of
 a long argument list that wraps), keep the braces.

 For ``if``/``else`` chains, if **either** branch requires braces, add braces to
 **all** branches to keep them symmetric.

 .. code-block:: c++

    // OK - single statement on one line, braces omitted.
    for (unsigned I = 0; I < N; ++I)
      doSomething(I);

    // Required - single statement, but spans multiple lines.
    if (Cond) {
      doSomethingElse(LongArgument1,
                      LongArgument2);
    }

    // Required - the else branch needs braces, so the if branch gets them too.
    if (Cond) {
      doSomething();
    } else {
      doSomethingElse(LongArgument1,
                      LongArgument2);
    }

    // Avoid - asymmetric braces.
    if (Cond)
      doSomething();
    else {
      doSomethingElse(LongArgument1,
                      LongArgument2);
    }

 Instruction Naming
 ------------------

 Instruction names in TableGen definitions (e.g. in ``VOP3PInstructions.td``,
 ``VOP2Instructions.td``, etc.) should follow the terminology of the ISA
 documentation.  Use **all-caps** names that match the documentation's name for
 at least one target.  When the compiler needs a variant of a documented
 instruction (e.g. a version with additional property flags or extra register
 uses), append a **lowercase** suffix to distinguish it from the canonical name.

 .. code-block:: text

    // Good - matches the ISA documentation exactly.
    defm V_ADD_F32 : VOP2Inst_VOPD <"v_add_f32", ...>;

    // Good - lowercase suffix for a compiler-invented variant.
    defm V_ADD_F32_e64 : ...;

    // Avoid - deviates from the documented name without reason.
    defm V_Add_F32 : ...;

    // Avoid - all-caps suffix for a compiler-invented variant;
    // use a lowercase suffix instead.
    defm V_ADD_F32_E64 : ...;

 Error and Diagnostic Messages
 -----------------------------

 Messages passed to ``assert``, ``llvm_unreachable``, ``report_fatal_error``,
 diagnostic handlers, and similar should **not** start with an uppercase letter.
 Use lowercase as if the message were a continuation of a sentence, not the
 beginning of one.  Do not end the message with a period.

 .. code-block:: c++

    // Good
    report_fatal_error("malformed block");

    // Avoid
    report_fatal_error("Malformed block");

 Design Practices
 =================

 Prefer Feature Checks over Generation Checks
 ---------------------------------------------

 Avoid conditioning logic on a specific GPU generation (e.g. ``isGFX11()``).
 Generation checks are fragile: when a new generation ships, every such check
 must be audited to decide whether it should include the new generation too.

 Instead, query the specific capability that the code actually depends on via a
 feature predicate (e.g. ``hasFeatureX()``).  Feature predicates are
 self-documenting, compose better across generations, and do not require updates
 when a new generation is added.

 .. code-block:: c++

    // Avoid - ties the logic to a specific generation.
    if (ST.isGFX11())
      handleNewBehaviour();

    // Preferred - checks the actual capability.
    if (ST.hasPackedFP32Ops())
      handleNewBehaviour();

 Prefer Separate Opcodes over Subtarget Checks
 -----------------------------------------------

 When an instruction has different properties on different subtargets (e.g.
 different implicit register uses, different scheduling info, or different
 encoding constraints), define **separate opcodes** for each variant rather than
 using a single opcode and scattering ``if`` checks on the subtarget throughout
 the code.  Distinct opcodes keep TableGen definitions self-contained, make
 scheduling and register allocation more accurate, and avoid a class of bugs
 where a subtarget check is accidentally omitted.

 Document New Builtins
 ---------------------

 All new AMDGPU builtins must have documentation added in
 ``clang/include/clang/Basic/BuiltinsAMDGPUDocs.td``.  The documentation entry
 should be included in the same patch that introduces the builtin.

 Pull Requests
 =============

 Keep Changes Focused
 --------------------

 Each pull request should contain **one logical change**.  Unrelated
 modifications - formatting fixes, variable renames, whitespace cleanup, etc. -
 should be submitted as separate PRs, even if they touch the same files.  Mixing
 unrelated changes into a functional PR makes review harder, obscures the intent
 of the change in ``git log``, and complicates reverts if something goes wrong.

 Test Policy
 ===========

 Well-written tests are essential for a healthy codebase.  The guidelines below
 apply to all AMDGPU regression tests (``llvm/test/CodeGen/AMDGPU``,
 ``llvm/test/MC/AMDGPU``, etc.).  See also the general
 `Best practices for regression tests <https://llvm.org/docs/TestingGuide.html#best-practices-for-regression-tests>`_
 and the
 `Precommit workflow for tests <https://llvm.org/docs/TestingGuide.html#precommit-workflow-for-tests>`_
 in the LLVM Testing Guide.

 Use Minimal, Reduced Tests
 --------------------------

 Every test should be the **smallest input that exercises the behaviour under
 test**.  Avoid copying a full function from a real workload and pasting it into
 a test file.  Instead, reduce the input so that it contains only the
 instructions and control flow needed to trigger the relevant code path.  A
 minimal test is easier to understand, faster to run, and less likely to break
 for unrelated reasons.

 Avoid Undefined Behavior
 ------------------------

 Tests should not rely on undefined behavior (UB).  As the
 `best practices section of the Testing Guide <https://llvm.org/docs/TestingGuide.html#best-practices-for-regression-tests>`_
 notes, avoid ``undef`` and ``poison`` values unless they are the point of the
 test - patterns like ``br i1 undef`` are likely to break as future
 optimizations evolve.

 In addition, avoid loads from or stores to ``null`` unless the test targets an
 address space where address zero is a valid memory location rather than a null
 pointer.  For example, on AMDGPU, address space 0 (generic/flat) treats zero as
 a null pointer, but address space 3 (LDS) does not, so a load from
 ``ptr addrspace(3) null`` can be valid.

 .. code-block:: llvm

    ; Avoid - null is a null pointer in addrspace(0).
    define void @example_bad() {
      %val = load i32, ptr null
      ret void
    }

    ; OK - addrspace(3) has no null pointer, address zero is valid.
    define void @example_ok() {
      %val = load i32, ptr addrspace(3) null
      ret void
    }

 Use Named Values
 ----------------

 Prefer descriptive, named IR values over anonymous numbered values.  Names
 serve as lightweight documentation and make it much easier to understand a
 test's intent at a glance.

 .. code-block:: llvm

    ; Preferred - names describe what each value represents.
    define float @fma_example(float %x, float %y, float %z) {
      %fma = call float @llvm.fma.f32(float %x, float %y, float %z)
      ret float %fma
    }

    ; Avoid - anonymous numbers reveal nothing about intent.
    define float @fma_example(float %0, float %1, float %2) {
      %4 = call float @llvm.fma.f32(float %0, float %1, float %2)
      ret float %4
    }

 Use Compact Virtual Register Numbers in MIR Tests
 --------------------------------------------------

 In MIR tests, virtual register numbers should be compact and start from
 ``%0``.  Avoid leaving gaps or starting at arbitrary high numbers (e.g.
 ``%128``, ``%256``).  Sparse numbering makes tests harder to follow and
 suggests the test was extracted from a larger function without proper
 reduction.

 .. code-block:: none

    # Preferred - compact, sequential numbering.
    %0:vgpr_32 = COPY $vgpr0
    %1:vgpr_32 = COPY $vgpr1
    %2:vgpr_32 = V_ADD_U32_e32 %0, %1, implicit $exec

    # Avoid - sparse numbering with gaps.
    %128:vgpr_32 = COPY $vgpr0
    %130:vgpr_32 = COPY $vgpr1
    %255:vgpr_32 = V_ADD_U32_e32 %128, %130, implicit $exec

 Trim Unnecessary Attributes and Metadata
 -----------------------------------------

 Strip attributes, metadata, and other annotations that are **not relevant to
 the behaviour being tested**.  Extra noise makes it harder to see what a test
 actually depends on and can cause spurious failures when defaults change.

 For example, unless a test specifically exercises a particular function
 attribute or metadata node, remove them:

 .. code-block:: llvm

    ; Preferred - only the essential attributes remain.
    define amdgpu_kernel void @store_i32(ptr addrspace(1) %ptr, i32 %val) {
      store i32 %val, ptr addrspace(1) %ptr
      ret void
    }

    ; Avoid - unrelated attributes and metadata obscure the test's purpose.
    define amdgpu_kernel void @store_i32(ptr addrspace(1) %ptr, i32 %val) #0 !dbg !5 {
      store i32 %val, ptr addrspace(1) %ptr, align 4, !tbaa !11
      ret void
    }

    attributes #0 = { nounwind "frame-pointer"="all" }

 Include Negative Tests
 ----------------------

 Changes that introduce new restrictions, validations, or user-facing constructs
 should include **negative tests** that verify the correct diagnostic or
 rejection.  Cases that require negative tests include, but are not limited to:

 * **New builtins** — verify that wrong argument types, wrong argument counts,
   and unsupported target features produce the expected Sema errors
   (e.g. ``clang/test/SemaOpenCL/builtins-amdgcn-error.cl``).

 * **New backend instructions** — verify that the assembler rejects invalid
   operands, illegal modifiers, and unsupported subtargets
   (e.g. ``llvm/test/MC/AMDGPU/gfx950_err.s``).

 * **New target types or features** — verify that incompatible target IDs,
   missing features, and invalid subtarget combinations are diagnosed
   (e.g. ``clang/test/Driver/invalid-target-id.cl``).

 Cover All Code Paths
 --------------------

 Tests for a PR should ideally cover all of the code changes introduced by that
 PR.  When adding a new instruction, for example, this means testing all
 supported combinations of operand kinds (VGPR, SGPR, immediate, inline
 constant, literal constant) as well as applicable modifiers (``opsel``,
 ``neg_lo``, ``neg_hi``, ``clamp``, etc.).  The goal is to ensure that every
 encoding and selection path exercised by the new code is verified, so that
 regressions in any variant are caught immediately.

 Pipe Input via ``stdin``
 ------------------------

 Where feasible, feed the test file through ``stdin`` using ``< %s`` rather than
 passing it as a positional argument.  This is the conventional style in AMDGPU
 tests and avoids the need for an explicit ``-o -``.

 .. code-block:: bash

    ; llc — preferred.
    ; RUN: llc -mtriple=amdgcn -mcpu=gfx900 < %s | FileCheck %s

    ; opt — preferred.
    ; RUN: opt -S -mtriple=amdgcn -passes=instcombine < %s | FileCheck %s

    ; llvm-mc — preferred.
    ; RUN: llvm-mc -triple=amdgcn -mcpu=gfx900 -show-encoding < %s | FileCheck %s

 This is not always possible.  For example, ``llc`` infers the input format from
 the file extension; when reading from ``stdin`` it defaults to IR, so MIR tests
 need to pass the file as a positional argument:

 .. code-block:: bash

    ; MIR — pass the file directly so llc sees the .mir extension.
    ; RUN: llc -mtriple=amdgcn -mcpu=gfx900 -run-pass=... %s -o - | FileCheck %s

 Use ``-filetype=null`` When Output Is Irrelevant
 -------------------------------------------------

 When a test only needs to verify diagnostics, error messages, or the absence of
 a crash - and does not care about the actual code-generation output - pass
 ``-filetype=null`` to ``llc``.  This skips object or assembly emission entirely,
 making the test faster and avoiding fragile ``CHECK`` lines tied to unrelated
 output.

 .. code-block:: bash

    ; RUN: llc -mtriple=amdgcn -mcpu=gfx900 -filetype=null %s 2>&1 | FileCheck %s

 Auto-Generate Check Lines
 -------------------------

 When possible, use the UTC (Update Test Checks) scripts to generate ``CHECK``
 lines rather than writing them by hand.  Auto-generated checks are
 comprehensive, consistent, and easy to update when output changes.

 The most commonly used scripts for AMDGPU are:

 * ``llvm/utils/update_llc_test_checks.py`` - for ``llc`` CodeGen tests.
 * ``llvm/utils/update_mir_test_checks.py`` - for MIR tests.
 * ``llvm/utils/update_mc_test_checks.py`` - for MC (assembly/disassembly) tests.

 A typical workflow looks like:

 .. code-block:: bash

    # Write the test with a RUN line but no CHECK lines, then auto-generate:
    $ llvm/utils/update_llc_test_checks.py llvm/test/CodeGen/AMDGPU/my-test.ll

    # After a code change that intentionally alters output, re-generate:
    $ llvm/utils/update_llc_test_checks.py --update-only llvm/test/CodeGen/AMDGPU/my-test.ll

 Each script embeds a ``UTC_ARGS:`` comment in the test file so that subsequent
 runs of the script use the same options.  Consult the ``--help`` output of each
 script for the full set of available flags.

 When writing check lines by hand, prefer ``CHECK-NEXT`` over ``CHECK-NOT``.
 Negative pattern matches fail silently when the output changes - the
 ``CHECK-NOT`` pattern may no longer appear for entirely unrelated reasons, and
 the test will still pass without actually verifying the intended behaviour.
 ``CHECK-NEXT`` ties the assertion to a specific position in the output, so any
 unexpected change causes a visible failure.

 .. note::

    Hand-written ``CHECK`` lines are still appropriate when a test needs to
    verify only a narrow slice of the output (e.g. a single instruction) or
    when the auto-generated output would be excessively verbose and obscure the
    intent.  In such cases, keep the hand-written checks focused and document
    why auto-generation was not used.
	===============================
	Developer Guideline for AMDGPU
	===============================

	.. contents::
	:local:

	Introduction
	============

	This document highlights coding conventions, test policies, and other
	development guidelines that apply to all AMDGPU-related code across the LLVM
	project (the backend in ``llvm/lib/Target/AMDGPU``, Clang AMDGPU support, LLD,
	associated tests, etc.). It is not a replacement for or summary of the
	`LLVM Coding Standards <https://llvm.org/docs/CodingStandards.html>`_ or the
	`LLVM Testing Guide <https://llvm.org/docs/TestingGuide.html>`_; contributors
	are expected to be familiar with those documents as well.

	The topics covered here are those that come up frequently during AMDGPU code
	reviews. Some overlap with existing upstream rules and are restated here for
	easy reference.

	Coding Standards
	================

	AMDGPU-related code follows the
	`LLVM Coding Standards <https://llvm.org/docs/CodingStandards.html>`_ with the
	refinements listed below.

	Use of ``auto``
	---------------

	The LLVM Coding Standards describe the policy for ``auto`` in
	`Use auto Type Deduction to Make Code More Readable <https://llvm.org/docs/CodingStandards.html#use-auto-type-deduction-to-make-code-more-readable>`_.
	Below are more concrete examples of how that policy applies in AMDGPU code.

	Do not use ``auto`` except in the following cases:

	* Lambda expressions:

	.. code-block:: c++

	auto Pred = [](unsigned Val) { return Val > 0; };

	* Casts where the target type is already spelled out on the right-hand side
	(``cast``, ``dyn_cast``, ``static_cast``, etc.):

	.. code-block:: c++

	auto *Inst = cast<CallInst>(V);
	auto *MD = dyn_cast<MDNode>(Op);
	auto Width = static_cast<unsigned>(Val);

	* Iterators:

	.. code-block:: c++

	auto It = Container.begin();

	* Structured bindings:

	.. code-block:: c++

	auto [Key, Value] = Pair;

	In all other cases, write the type explicitly.

	.. code-block:: c++

	// Avoid - the type is not obvious from the right-hand side.
	auto Reg = MI.getOperand(0).getReg();
	auto Size = DL.getTypeAllocSize(Ty);

	// Preferred - spell out the type.
	Register Reg = MI.getOperand(0).getReg();
	TypeSize Size = DL.getTypeAllocSize(Ty);

	Use of Braces
	-------------

	The LLVM Coding Standards discuss brace usage in
	`Don't Use Braces on Simple Single-Statement Bodies of if/else/loop Statements <https://llvm.org/docs/CodingStandards.html#don-t-use-braces-on-simple-single-statement-bodies-of-if-else-loop-statements>`_.
	In AMDGPU code, braces may be omitted **only when the single statement
	fits on one line**. If the statement spans more than one line (e.g. because of
	a long argument list that wraps), keep the braces.

	For ``if``/``else`` chains, if either branch requires braces, add braces to
	all branches to keep them symmetric.

	.. code-block:: c++

	// OK - single statement on one line, braces omitted.
	for (unsigned I = 0; I < N; ++I)
	doSomething(I);

	// Required - single statement, but spans multiple lines.
	if (Cond) {
	doSomethingElse(LongArgument1,
	LongArgument2);
	}

	// Required - the else branch needs braces, so the if branch gets them too.
	if (Cond) {
	doSomething();
	} else {
	doSomethingElse(LongArgument1,
	LongArgument2);
	}

	// Avoid - asymmetric braces.
	if (Cond)
	doSomething();
	else {
	doSomethingElse(LongArgument1,
	LongArgument2);
	}

	Instruction Naming
	------------------

	Instruction names in TableGen definitions (e.g. in ``VOP3PInstructions.td``,
	``VOP2Instructions.td``, etc.) should follow the terminology of the ISA
	documentation. Use all-caps names that match the documentation's name for
	at least one target. When the compiler needs a variant of a documented
	instruction (e.g. a version with additional property flags or extra register
	uses), append a lowercase suffix to distinguish it from the canonical name.

	.. code-block:: text

	// Good - matches the ISA documentation exactly.
	defm V_ADD_F32 : VOP2Inst_VOPD <"v_add_f32", ...>;

	// Good - lowercase suffix for a compiler-invented variant.
	defm V_ADD_F32_e64 : ...;

	// Avoid - deviates from the documented name without reason.
	defm V_Add_F32 : ...;

	// Avoid - all-caps suffix for a compiler-invented variant;
	// use a lowercase suffix instead.
	defm V_ADD_F32_E64 : ...;

	Error and Diagnostic Messages
	-----------------------------

	Messages passed to ``assert``, ``llvm_unreachable``, ``report_fatal_error``,
	diagnostic handlers, and similar should not start with an uppercase letter.
	Use lowercase as if the message were a continuation of a sentence, not the
	beginning of one. Do not end the message with a period.

	.. code-block:: c++

	// Good
	report_fatal_error("malformed block");

	// Avoid
	report_fatal_error("Malformed block");

	Design Practices
	=================

	Prefer Feature Checks over Generation Checks
	---------------------------------------------

	Avoid conditioning logic on a specific GPU generation (e.g. ``isGFX11()``).
	Generation checks are fragile: when a new generation ships, every such check
	must be audited to decide whether it should include the new generation too.

	Instead, query the specific capability that the code actually depends on via a
	feature predicate (e.g. ``hasFeatureX()``). Feature predicates are
	self-documenting, compose better across generations, and do not require updates
	when a new generation is added.

	.. code-block:: c++

	// Avoid - ties the logic to a specific generation.
	if (ST.isGFX11())
	handleNewBehaviour();

	// Preferred - checks the actual capability.
	if (ST.hasPackedFP32Ops())
	handleNewBehaviour();

	Prefer Separate Opcodes over Subtarget Checks
	-----------------------------------------------

	When an instruction has different properties on different subtargets (e.g.
	different implicit register uses, different scheduling info, or different
	encoding constraints), define separate opcodes for each variant rather than
	using a single opcode and scattering ``if`` checks on the subtarget throughout
	the code. Distinct opcodes keep TableGen definitions self-contained, make
	scheduling and register allocation more accurate, and avoid a class of bugs
	where a subtarget check is accidentally omitted.

	Document New Builtins
	---------------------

	All new AMDGPU builtins must have documentation added in
	``clang/include/clang/Basic/BuiltinsAMDGPUDocs.td``. The documentation entry
	should be included in the same patch that introduces the builtin.

	Pull Requests
	=============

	Keep Changes Focused
	--------------------

	Each pull request should contain one logical change. Unrelated
	modifications - formatting fixes, variable renames, whitespace cleanup, etc. -
	should be submitted as separate PRs, even if they touch the same files. Mixing
	unrelated changes into a functional PR makes review harder, obscures the intent
	of the change in ``git log``, and complicates reverts if something goes wrong.

	Test Policy
	===========

	Well-written tests are essential for a healthy codebase. The guidelines below
	apply to all AMDGPU regression tests (``llvm/test/CodeGen/AMDGPU``,
	``llvm/test/MC/AMDGPU``, etc.). See also the general
	`Best practices for regression tests <https://llvm.org/docs/TestingGuide.html#best-practices-for-regression-tests>`_
	and the
	`Precommit workflow for tests <https://llvm.org/docs/TestingGuide.html#precommit-workflow-for-tests>`_
	in the LLVM Testing Guide.

	Use Minimal, Reduced Tests
	--------------------------

	Every test should be the **smallest input that exercises the behaviour under
	test**. Avoid copying a full function from a real workload and pasting it into
	a test file. Instead, reduce the input so that it contains only the
	instructions and control flow needed to trigger the relevant code path. A
	minimal test is easier to understand, faster to run, and less likely to break
	for unrelated reasons.

	Avoid Undefined Behavior
	------------------------

	Tests should not rely on undefined behavior (UB). As the
	`best practices section of the Testing Guide <https://llvm.org/docs/TestingGuide.html#best-practices-for-regression-tests>`_
	notes, avoid ``undef`` and ``poison`` values unless they are the point of the
	test - patterns like ``br i1 undef`` are likely to break as future
	optimizations evolve.

	In addition, avoid loads from or stores to ``null`` unless the test targets an
	address space where address zero is a valid memory location rather than a null
	pointer. For example, on AMDGPU, address space 0 (generic/flat) treats zero as
	a null pointer, but address space 3 (LDS) does not, so a load from
	``ptr addrspace(3) null`` can be valid.

	.. code-block:: llvm

	; Avoid - null is a null pointer in addrspace(0).
	define void @example_bad() {
	%val = load i32, ptr null
	ret void
	}

	; OK - addrspace(3) has no null pointer, address zero is valid.
	define void @example_ok() {
	%val = load i32, ptr addrspace(3) null
	ret void
	}

	Use Named Values
	----------------

	Prefer descriptive, named IR values over anonymous numbered values. Names
	serve as lightweight documentation and make it much easier to understand a
	test's intent at a glance.

	.. code-block:: llvm

	; Preferred - names describe what each value represents.
	define float @fma_example(float %x, float %y, float %z) {
	%fma = call float @llvm.fma.f32(float %x, float %y, float %z)
	ret float %fma
	}

	; Avoid - anonymous numbers reveal nothing about intent.
	define float @fma_example(float %0, float %1, float %2) {
	%4 = call float @llvm.fma.f32(float %0, float %1, float %2)
	ret float %4
	}

	Use Compact Virtual Register Numbers in MIR Tests
	--------------------------------------------------

	In MIR tests, virtual register numbers should be compact and start from
	``%0``. Avoid leaving gaps or starting at arbitrary high numbers (e.g.
	``%128``, ``%256``). Sparse numbering makes tests harder to follow and
	suggests the test was extracted from a larger function without proper
	reduction.

	.. code-block:: none

	# Preferred - compact, sequential numbering.
	%0:vgpr_32 = COPY $vgpr0
	%1:vgpr_32 = COPY $vgpr1
	%2:vgpr_32 = V_ADD_U32_e32 %0, %1, implicit $exec

	# Avoid - sparse numbering with gaps.
	%128:vgpr_32 = COPY $vgpr0
	%130:vgpr_32 = COPY $vgpr1
	%255:vgpr_32 = V_ADD_U32_e32 %128, %130, implicit $exec

	Trim Unnecessary Attributes and Metadata
	-----------------------------------------

	Strip attributes, metadata, and other annotations that are **not relevant to
	the behaviour being tested**. Extra noise makes it harder to see what a test
	actually depends on and can cause spurious failures when defaults change.

	For example, unless a test specifically exercises a particular function
	attribute or metadata node, remove them:

	.. code-block:: llvm

	; Preferred - only the essential attributes remain.
	define amdgpu_kernel void @store_i32(ptr addrspace(1) %ptr, i32 %val) {
	store i32 %val, ptr addrspace(1) %ptr
	ret void
	}

	; Avoid - unrelated attributes and metadata obscure the test's purpose.
	define amdgpu_kernel void @store_i32(ptr addrspace(1) %ptr, i32 %val) #0 !dbg !5 {
	store i32 %val, ptr addrspace(1) %ptr, align 4, !tbaa !11
	ret void
	}

	attributes #0 = { nounwind "frame-pointer"="all" }

	Include Negative Tests
	----------------------

	Changes that introduce new restrictions, validations, or user-facing constructs
	should include negative tests that verify the correct diagnostic or
	rejection. Cases that require negative tests include, but are not limited to:

	* New builtins — verify that wrong argument types, wrong argument counts,
	and unsupported target features produce the expected Sema errors
	(e.g. ``clang/test/SemaOpenCL/builtins-amdgcn-error.cl``).

	* New backend instructions — verify that the assembler rejects invalid
	operands, illegal modifiers, and unsupported subtargets
	(e.g. ``llvm/test/MC/AMDGPU/gfx950_err.s``).

	* New target types or features — verify that incompatible target IDs,
	missing features, and invalid subtarget combinations are diagnosed
	(e.g. ``clang/test/Driver/invalid-target-id.cl``).

	Cover All Code Paths
	--------------------

	Tests for a PR should ideally cover all of the code changes introduced by that
	PR. When adding a new instruction, for example, this means testing all
	supported combinations of operand kinds (VGPR, SGPR, immediate, inline
	constant, literal constant) as well as applicable modifiers (``opsel``,
	``neg_lo``, ``neg_hi``, ``clamp``, etc.). The goal is to ensure that every
	encoding and selection path exercised by the new code is verified, so that
	regressions in any variant are caught immediately.

	Pipe Input via ``stdin``
	------------------------

	Where feasible, feed the test file through ``stdin`` using ``< %s`` rather than
	passing it as a positional argument. This is the conventional style in AMDGPU
	tests and avoids the need for an explicit ``-o -``.

	.. code-block:: bash

	; llc — preferred.
	; RUN: llc -mtriple=amdgcn -mcpu=gfx900 < %s \| FileCheck %s

	; opt — preferred.
	; RUN: opt -S -mtriple=amdgcn -passes=instcombine < %s \| FileCheck %s

	; llvm-mc — preferred.
	; RUN: llvm-mc -triple=amdgcn -mcpu=gfx900 -show-encoding < %s \| FileCheck %s

	This is not always possible. For example, ``llc`` infers the input format from
	the file extension; when reading from ``stdin`` it defaults to IR, so MIR tests
	need to pass the file as a positional argument:

	.. code-block:: bash

	; MIR — pass the file directly so llc sees the .mir extension.
	; RUN: llc -mtriple=amdgcn -mcpu=gfx900 -run-pass=... %s -o - \| FileCheck %s

	Use ``-filetype=null`` When Output Is Irrelevant
	-------------------------------------------------

	When a test only needs to verify diagnostics, error messages, or the absence of
	a crash - and does not care about the actual code-generation output - pass
	``-filetype=null`` to ``llc``. This skips object or assembly emission entirely,
	making the test faster and avoiding fragile ``CHECK`` lines tied to unrelated
	output.

	.. code-block:: bash

	; RUN: llc -mtriple=amdgcn -mcpu=gfx900 -filetype=null %s 2>&1 \| FileCheck %s

	Auto-Generate Check Lines
	-------------------------

	When possible, use the UTC (Update Test Checks) scripts to generate ``CHECK``
	lines rather than writing them by hand. Auto-generated checks are
	comprehensive, consistent, and easy to update when output changes.

	The most commonly used scripts for AMDGPU are:

	* ``llvm/utils/update_llc_test_checks.py`` - for ``llc`` CodeGen tests.
	* ``llvm/utils/update_mir_test_checks.py`` - for MIR tests.
	* ``llvm/utils/update_mc_test_checks.py`` - for MC (assembly/disassembly) tests.

	A typical workflow looks like:

	.. code-block:: bash

	# Write the test with a RUN line but no CHECK lines, then auto-generate:
	$ llvm/utils/update_llc_test_checks.py llvm/test/CodeGen/AMDGPU/my-test.ll

	# After a code change that intentionally alters output, re-generate:
	$ llvm/utils/update_llc_test_checks.py --update-only llvm/test/CodeGen/AMDGPU/my-test.ll

	Each script embeds a ``UTC_ARGS:`` comment in the test file so that subsequent
	runs of the script use the same options. Consult the ``--help`` output of each
	script for the full set of available flags.

	When writing check lines by hand, prefer ``CHECK-NEXT`` over ``CHECK-NOT``.
	Negative pattern matches fail silently when the output changes - the
	``CHECK-NOT`` pattern may no longer appear for entirely unrelated reasons, and
	the test will still pass without actually verifying the intended behaviour.
	``CHECK-NEXT`` ties the assertion to a specific position in the output, so any
	unexpected change causes a visible failure.

	.. note::

	Hand-written ``CHECK`` lines are still appropriate when a test needs to
	verify only a narrow slice of the output (e.g. a single instruction) or
	when the auto-generated output would be excessively verbose and obscure the
	intent. In such cases, keep the hand-written checks focused and document
	why auto-generation was not used.