| .. _amdgpu-memmodel: |
| |
| ===================== |
| AMDGPU Memory Model |
| ===================== |
| |
| .. contents:: |
| :local: |
| |
| Introduction |
| ============ |
| |
| The :ref:`LLVM memory model<memmodel>` provides broad guarantees that are |
| sufficient to implement inter-thread communication via memory. But in most |
| communication patterns, not all memory accesses performed by a thread need to be |
| exposed to other threads. Even when they do need to be exposed, not all threads |
| may need to observe these memory accesses. This document describes the *AMDGPU |
| memory model* that allows the user to control how the side-effects of memory |
| accesses are propagated across threads. The programmer expresses this using |
| **new intrinsics and metadata** as described below, and the implementation can |
| then choose a more efficient mechanism to complete them, such as the cache |
| policy bits in an AMDGPU device. |
| |
| The AMDGPU memory model allows executions that are not allowed by the LLVM |
| memory model. At the same time, a simple mapping can be used to implement these |
| new intrinsics and metadata using operations defined in the default LLVM memory |
| model. Thus, **there exists a safe-by-default implementation** that produces |
| executions that are valid in both models. |
| |
| Terminology |
| =========== |
| |
| Memory Accesses |
| Operations that read or write locations in memory are termed as *memory |
| accesses*. Typical examples are ``load``, ``store`` and atomic instructions, |
| as well as many intrinsics. |
| |
| Synchronization Operations |
| Synchronization operations control how the side-effects of memory accesses are |
| propagated in the system. Typical examples are atomic operations (including |
| fences) with at least ``release`` or ``acquire`` ordering. |
| |
| .. _amdgpu-scopes: |
| |
| Scopes |
| ====== |
| |
| A *scope* is an abstract description of sets of memory accesses and |
| synchronization operations in a multi-threaded execution environment. Each such |
| set is called an *instance* of that scope, or a *scope instance* for short. |
| |
| - Each memory access or synchronization operation belongs to at most one |
| instance of every scope defined by the target. |
| - When an operation ``X`` specifies a scope ``S``, it indicates the instance of |
| ``S`` that contains ``X``. This scope instance is also termed as *X's instance |
| of scope S*, or just *X's scope instance* when ``S`` is implied by the |
| context. |
| - When an operation does not specify a scope, it indicates the *system* |
| scope defined below. |
| |
| LLVM scopes |
| ----------- |
| |
| The LLVM Language Reference defines the following :ref:`scopes<syncscope>`: |
| |
| *system scope* (empty string "") |
| There exists a single instance of this scope that contains the memory accesses |
| and synchronization operations performed by all threads. |
| |
| "singlethread" scope |
| Each thread corresponds to a "singlethread" scope instance that contains the |
| memory accesses and synchronization operations performed by that thread. |
| |
| AMDGPU scopes |
| ------------- |
| |
| The AMDGPU backend further refines the LLVM scopes with the following |
| target-defined scopes and constraints: |
| |
| - *system scope* (same as LLVM) |
| - "agent" scope |
| - "cluster" scope |
| - "workgroup" scope |
| - "wavefront" scope |
| - "singlethread" scope (same as LLVM) |
| |
| These are arranged from largest scope (*system scope*) to smallest scope |
| ("singlethread"). |
| |
| - Every instance ``X`` of some scope ``S1`` other than "singlethread" scope is |
| partitioned by the scope ``S2`` one level below it. Each subset defined by this |
| partition is an instance of ``S2`` and is called a *subscope instance* of ``X``. |
| - It follows that if two scope instances ``X`` and ``Y`` intersect, then their |
| intersection is the smaller of ``X`` and ``Y``. |
| - A scope ``S1`` is a *subscope* of a scope ``S2`` if every instance of ``S1`` |
| is a subscope instance of some instance of ``S2``. |
| |
| **Inclusive Scopes**: Two operations ``X`` and ``Y`` are said to have *inclusive |
| scopes* if the scope instance of each operation contains the other operation. In |
| that case, the *common scope instance* ``S'`` of ``X`` and ``Y`` is the |
| intersection of their scope instances. The scope corresponding to ``S'`` is also |
| termed as the *common scope* of ``X`` and ``Y``. |
| |
| Availability and Visibility |
| =========================== |
| |
| The AMDGPU memory model is built on top of the :ref:`happens-before<memmodel>` |
| order defined by the LLVM memory model. But when one of the new intrinsics or |
| metadata is used, **happens-before by itself is not sufficient** to describe its |
| observable effects. Instead, the AMDGPU model uses *availability* and |
| *visibility* to describe how the side-effects of these operations propagate to |
| other threads. |
| |
| Availability determines how *far* the side-effects of a write have been |
| forwarded in the system relative to that write. Visibility determines how |
| *close* the side-effects of the same write have reached relative to an observer |
| operation (typically a read). |
| |
| The AMDGPU memory model *does not change the structure of happens-before*, but |
| changes the rules that determine how operations may observe the side-effects of |
| other operations that *happen-before* them. |
| |
| Consider a write ``W`` that ``happens-before`` a read ``R`` to the same address: |
| |
| - ``R`` can potentially observe the side-effects of ``W`` **only if W is |
| visible** to ``R``. |
| - ``W`` can potentially be visible to ``R`` **only if W is first made |
| available** to ``R``. |
| |
| The instructions used in the default LLVM memory model automatically satisfy |
| these necessary conditions, and hence they can be explained using the rules from |
| either memory model. But the new intrinsics and metadata *opt out* of the LLVM |
| memory model, and can only be explained using the AMDGPU memory model. |
| |
| .. _amdgpu-store-available: |
| |
| store-available |
| --------------- |
| |
| .. code-block:: llvm |
| |
| @llvm.amdgcn.av.global.store.b128(ptr, value, scope) |
| store atomic [syncscope("<target-scope>")] |
| atomicrmw [syncscope("<target-scope>")] |
| cmpxchg [syncscope("<target-scope>")] |
| |
| The ``@llvm.amdgcn.av.global.store.b128`` intrinsic performs a non-atomic |
| *store-available* operation on ``ptr`` with scope ``scope``. |
| |
| An atomic operation that results in a store operation is a *store-available* |
| operation with scope ``syncscope``. |
| |
| .. _amdgpu-load-visible: |
| |
| load-visible |
| ------------ |
| |
| .. code-block:: llvm |
| |
| @llvm.amdgcn.av.global.load.b128(ptr, scope) |
| load atomic [syncscope("<target-scope>")] |
| atomicrmw [syncscope("<target-scope>")] |
| cmpxchg [syncscope("<target-scope>")] |
| |
| The ``@llvm.amdgcn.av.global.load.b128`` intrinsic performs a non-atomic |
| *load-visible* operation on ``ptr`` with scope ``scope``. |
| |
| An atomic operation that results in a read operation is a *load-visible* |
| operation with scope ``syncscope``. |
| |
| .. note:: |
| |
| Metadata cannot be used to model this using ordinary load/store operations, |
| because the scope is necessary for correctness. In a hypothetical operation |
| like this: |
| |
| .. code-block:: llvm |
| |
| store ptr, data, !mmra !{!"amdgcn-av", !"workgroup"} |
| |
| If the metadata is dropped or ignored, there is no guarantee that the store |
| will become available at the intended scope. In implementation terms, the |
| store may be completed at a nearer cache than the one required for that |
| scope. A corresponding *load-visible* that does not access the same near |
| cache will fail to observe this store. |
| |
| MakeAvailable and MakeVisible |
| ----------------------------- |
| |
| .. code-block:: llvm |
| |
| store atomic [syncscope("<target-scope>")] <ordering> [, !mmra !{!"amdgcn-av", !"none"}] |
| load atomic [syncscope("<target-scope>")] <ordering> [, !mmra !{!"amdgcn-av", !"none"}] |
| atomicrmw [syncscope("<target-scope>")] <ordering> [, !mmra !{!"amdgcn-av", !"none"}] |
| cmpxchg [syncscope("<target-scope>")] <ordering> [, !mmra !{!"amdgcn-av", !"none"}] |
| fence [syncscope("<target-scope>")] <ordering> [, !mmra !{!"amdgcn-av", !"none"}] |
| |
| A synchronization operation with at least ``release`` ordering is a |
| ``MakeAvailable`` operation with scope ``syncscope``, if it is not marked as |
| ``!{!"amdgcn-av", !"none"}``. |
| |
| A synchronization operation with at least ``acquire`` ordering is a |
| ``MakeVisible`` operation with scope ``syncscope``, if it is not marked as |
| ``!{!"amdgcn-av", !"none"}``. |
| |
| These operations include ``MakeVisible`` and ``MakeAvailable`` operations by |
| default. The presence of this metadata removes this ability and essentially |
| creates *non-av* ordering operations, i.e., ordering operations that do not |
| establish availability or visibility. |
| |
| For an atomic operation which itself accesses memory (e.g., ``store atomic`` |
| or ``load atomic``), the metadata does not affect the availability or the |
| visibility of the access performed by the operation itself. It only affects |
| the ordering of other memory accesses. |
| |
| .. code-block:: llvm |
| |
| ; This includes the following operations: |
| ; - The atomic store at "agent" scope, |
| ; - A store-available operation at "agent" scope on `ptr`, |
| ; - A `MakeAvailable` operation at "agent" scope that affects previous memory accesses. |
| store atomic syncscope("agent") release ptr |
| |
| ; This includes the following operations: |
| ; - The atomic store at "agent" scope, |
| ; - A store-available operation at "agent" scope on `ptr`. |
| ; Noteably, it does not include a `MakeAvailable` operation on other memory accesses. |
| store atomic syncscope("agent") release ptr, !mmra !{!"amdgcn-av", !"none"} |
| |
| Ordering |
| ======== |
| |
| .. note:: |
| |
| **TODO:** These ordering operations affect all address spaces. We need to |
| eventually make that a parameter similar to the storage class parameter on |
| operations and orders in Vulkan. |
| |
| Availability Operation |
| ---------------------- |
| |
| An operation ``X`` is an *availability operation* on a write ``W`` if one of the |
| following holds: |
| |
| - ``X`` is ``W`` itself, and ``W`` is a *store-available* operation, or, |
| - ``X`` is a ``MakeAvailable`` operation that follows ``W`` in program order, |
| or, |
| - ``X`` is a ``MakeAvailable`` operation whose scope instance includes ``W``, |
| and there is an availability operation ``Z`` on ``W`` such that: |
| |
| - ``Z`` happens-before ``X``, and, |
| - ``Z``'s scope instance includes ``X``. |
| |
| Then ``X`` makes ``W`` available in its own scope instance ``S`` and every |
| subscope instance of ``S`` that also includes ``W``. |
| |
| Visibility Operation |
| -------------------- |
| |
| An operation ``Y`` is a *visibility operation* on a write ``W`` if ``Y`` is a |
| *load-visible* operation to the same address, or a ``MakeVisible`` operation, |
| and one of the following holds: |
| |
| - There exists an *availability* operation ``X`` on write ``W`` such that: |
| |
| - ``X`` happens-before ``Y``, and, |
| - ``X`` and ``Y`` specify inclusive scopes. |
| |
| Then ``Y`` makes ``W`` visible in the common scope instance ``S`` of ``X`` and |
| ``Y``, and every subscope instance of ``S`` that includes ``Y``. |
| |
| - There exists a *visibility* operation ``X`` on write ``W`` such that: |
| |
| - ``X`` happens-before ``Y``, and, |
| - ``X`` makes ``W`` visible in a scope instance ``S1`` that includes ``Y``, and, |
| - ``X`` is included in the scope instance ``S2`` of ``Y``. |
| |
| Then ``Y`` makes ``W`` visible in the intersection ``S`` of ``S1`` and ``S2``, |
| and every subscope instance of ``S`` that includes ``Y``. |
| |
| Location Order |
| -------------- |
| |
| A write ``W`` is *location-ordered* before an access ``Y`` to the same address |
| if ``W`` is program-ordered before ``Y``. |
| |
| A write ``W`` is *location-ordered* before a write ``W1`` to the same address if |
| there exists an availability operation ``Z`` on ``W`` such that: |
| |
| - ``Z`` happens-before ``W1``, and, |
| - ``W1`` is included in ``Z``'s scope instance. |
| |
| A write ``W`` is *location-ordered* before a read ``R`` to the same address if |
| there exists a visibility operation ``Z`` on write ``W`` such that: |
| |
| - ``Z`` is ``R`` itself, or, |
| - ``Z`` precedes ``R`` in program order. |
| |
| The AMDGPU memory model overrides the definition of each byte in the |
| :ref:`LLVM memory model<memmodel>` as follows. |
| |
| Every (defined) read operation ``R`` reads a series of bytes written by |
| (defined) write operations. Each initialized global is assumed to have an |
| initial *system scoped* atomic write operation that is *location-ordered* before |
| any other read or write to that same location. |
| |
| For each byte of a read ``R``, ``R`` may see any write to the same byte, except: |
| |
| - If a write ``W1`` is *location-ordered* before a write ``W2``, and ``W2`` is |
| *location-ordered* before a read ``R``, then ``R`` may not see ``W1``. |
| - If a read ``R`` happens-before a write ``W3``, then ``R`` may not see ``W3``. |
| |
| The value returned by ``R`` is then defined as follows: |
| |
| - If no write is *location-ordered* before a read ``R``, then ``R`` returns |
| ``undef``. |
| - Otherwise if the set consisting of ``R`` and all writes that ``R`` may see |
| contains only atomic operations with inclusive scopes, then ``R`` returns the |
| value written by one of those writes. |
| - Otherwise, if ``R`` may see some write that is not *location-ordered* before |
| ``R``, then ``R`` returns ``undef``. |
| - Otherwise, if ``R`` may see exactly one write ``W``, then ``R`` returns the |
| value written by ``W``. |
| - Otherwise, ``R`` returns ``undef``. |
| |
| Properties |
| ========== |
| |
| .. tip:: |
| |
| This section is informational. |
| |
| The following properties follow from the definitions above: |
| |
| 1. **Happens-before is necessary for location-order.** A write ``W`` is |
| *location-ordered* before a read ``R`` only if ``W`` happens-before ``R``. |
| This follows from the definition of availability and visibility operations, |
| which always require a happens-before link with the preceding operation in |
| the chain. |
| |
| 2. **A write cannot be made available in a scope that does not contain it.** The |
| definition of an availability operation ``X`` requires that ``X``'s scope |
| instance includes ``W`` as a precondition. Since every scope instance that |
| includes ``X`` also includes ``W``, availability cannot reach a scope |
| instance that excludes ``W``. In other words, availability can only "expand |
| outwards" into progressively larger scopes. |
| |
| 3. **Visibility is bounded by availability.** When a write is available in a |
| scope instance, it can be made visible in that scope instance by a visibility |
| operation with the corresponding scope. Subsequent ``MakeVisible`` operations |
| make that write visible into narrower scope instances towards the observer. |
| |
| 4. **A write can be made visible in a scope instance that does not contain it.** |
| The definition of a *visibility operation* anchors scope instances to the |
| observer (``Y``), not to the original write. The only precondition is that the |
| write must already be visible or available in the scope instance of the |
| visibility operation. |
| |
| 5. **Availability and visibility chains.** For a write ``W`` to be visible to a |
| read ``R`` anywhere in the system, the sufficient condition is a chain of |
| happens-before edges that include availability and visibility operations with |
| inclusive scopes. It is not necessary that ``W`` and ``R`` themselves have |
| inclusive scopes. Each link in the availability and visibility definitions |
| only checks the immediate predecessor, so intermediate operations can bridge |
| scope gaps that the endpoints cannot satisfy directly. Such a chain passes |
| through at least one availability operation and at least one visibility |
| operation with inclusive scopes, such that their common scope includes both |
| ``W`` and ``R``. |
| |
| .. _amdgcn-av-vulkan: |
| |
| The Vulkan Memory Model |
| ======================= |
| |
| The AMDGPU memory model draws heavily on the Vulkan memory model. In |
| particular, the following instructions are equivalent. |
| |
| .. csv-table:: |
| :header: "LLVM", "SPIRV", "Available/Visible Semantics" |
| :widths: 20, 20, 60 |
| |
| "``load``", "``OpLoad NonPrivatePointer``", "\-" |
| "``load-visible``", "``OpLoad NonPrivatePointer``", "``MakePointerVisible``" |
| "``store``", "``OpStore NonPrivatePointer``", "\-" |
| "``store-available``", "``OpStore NonPrivatePointer``", "``MakePointerAvailable``" |
| "``load atomic``", "``OpAtomicLoad``", "``MakePointerVisible``. Also ``MakeVisible`` when order is at least ``acquire``." |
| "``load atomic !{!""amdgcn-av"", !""none""}``", "``OpAtomicLoad``", "``MakePointerVisible``" |
| "``store atomic``", "``OpAtomicStore``", "``MakePointerAvailable``. Also ``MakeAvailable`` when order is at least ``release``." |
| "``store atomic !{!""amdgcn-av"", !""none""}``", "``OpAtomicStore``", "``MakePointerAvailable``" |
| "``fence``", "``OpMemoryBarrier``", "``MakeAvailable`` when order is at least ``release``, and ``MakeVisible`` when order is at least ``acquire``." |
| "``fence !{!""amdgcn-av"", !""none""}``", "``OpMemoryBarrier``", "\-" |
| |
| .. note:: |
| |
| The above table is representative only, and does not aim to be exhaustive. In |
| particular, it does not list composite atomic operations like ``rmw`` and |
| ``cmpxchg``. The ordering and semantics of these operations can be determined |
| by combining suitable rules such as: |
| |
| - "``MakeAvailable`` if the order is at least ``release``, and the operation |
| results in a store", |
| - "Only if it is not marked as ``!{!"amdgcn-av", !"none"}``", etc. |
| |
| The AMDGPU memory model is a special case of the Vulkan memory model: |
| |
| a. LLVM fence/atomic ordering operations have ``MakeAvailable`` / |
| ``MakeVisible`` semantics by default, thus satisfying the availability and |
| visibility chains required in Vulkan. Hence the LLVM memory model is a |
| "strong" subset of the Vulkan memory model. |
| b. The AMDGPU memory model described here makes it possible to opt-out of the |
| default ``MakeAvailable`` and ``MakeVisible`` semantics, and instead specify |
| it on select places including the new *load-visible* and *store-available* |
| operations. This expands the subset of the Vulkan memory model that can now |
| be expressed in LLVM IR. |