llvm/docs/MemoryModelRelaxationAnnotations.rst - llvm-project - Git at Google

 ===================================
 Memory Model Relaxation Annotations
 ===================================

 .. contents::
    :local:

 Introduction
 ============

 Memory Model Relaxation Annotations (MMRAs) are target-defined properties
 on instructions that can be used to selectively relax constraints placed
 by the memory model. For example:

 * The use of ``VulkanMemoryModel`` in a SPIRV program allows certain
   memory operations to be reordered across ``acquire`` or ``release``
   operations.
 * OpenCL APIs expose primitives to only fence a specific set of address
   spaces. Carrying that information to the backend can enable the
   use of faster synchronization instructions, rather than fencing all
   address spaces everytime.

 MMRAs offer an opt-in system for targets to relax the default LLVM
 memory model.
 As such, they are attached to an operation using LLVM metadata which
 can always be dropped without affecting correctness.

 Definitions
 ===========

 memory operation
     A load, a store, an atomic, or a function call that is marked as
     accessing memory.

 synchronizing operation
     An instruction that synchronizes memory with other threads (e.g.
     an atomic or a fence).

 tag
     Metadata attached to a memory or synchronizing operation
     that represents some target-defined property regarding memory
     synchronization.

     An operation may have multiple tags that each represent a different
     property.

     A tag is composed of a pair of metadata string: a *prefix* and a *suffix*.

     In LLVM IR, the pair is represented using a metadata tuple.
     In other cases (comments, documentation, etc.), we may use the
     ``prefix:suffix`` notation.
     For example:

     .. code-block::
       :caption: Example: Tags in Metadata

       !0 = !{!"scope", !"workgroup"}  # scope:workgroup
       !1 = !{!"scope", !"device"}     # scope:device
       !2 = !{!"scope", !"system"}     # scope:system

     .. note::

       The only semantics relevant to the optimizer is the
       "compatibility" relation defined below. All other
       semantics are target defined.

     Tags can also be organised in lists to allow operations
     to specify all of the tags they belong to. Such a list
     is referred to as a "set of tags".

     .. code-block::
       :caption: Example: Set of Tags in Metadata

       !0 = !{!"scope", !"workgroup"}
       !1 = !{!"sync-as", !"private"}
       !2 = !{!0, !2}

     .. note::

       If an operation does not have MMRA metadata, it's treated as if
       it has an empty list (``!{}``) of tags.

     Note that it is not an error if a tag is not recognized by the
     instruction it is applied to, or by the current target.
     Such tags are simply ignored.

     Both synchronizing operations and memory operations can have
     zero or more tags attached to them using the ``!mmra`` syntax.

     For the sake of readability in examples below,
     we use a (non-functional) short syntax to represent MMMRA metadata:

     .. code-block::
       :caption: Short Syntax Example

       store %ptr1 # foo:bar
       store %ptr1 !mmra !{!"foo", !"bar"}

     These two notations can be used in this document and are strictly
     equivalent. However, only the second version is functional.

 compatibility
     Two sets of tags are said to be *compatible* iff, for every unique
     tag prefix P present in at least one set:

     - the other set contains no tag with prefix P, or
     - at least one tag with prefix P is common to both sets.

     The above definition implies that an empty set is always compatible
     with any other set. This is an important property as it ensures that
     if a transform drops the metadata on an operation, it can never affect
     correctness. In other words, the memory model cannot be relaxed further
     by deleting metadata from instructions.

 .. _HappensBefore:

 The *happens-before* Relation
 ==============================

 Compatibility checks can be used to opt out of the *happens-before* relation
 established between two instructions.

 Ordering
     When two instructions' metadata are not compatible, any program order
     between them are not in *happens-before*.

     For example, consider two tags ``foo:bar`` and
     ``foo:baz`` exposed by a target:

     .. code-block::

        A: store %ptr1                 # foo:bar
        B: store %ptr2                 # foo:baz
        X: store atomic release %ptr3  # foo:bar

     In the above figure, ``A`` is compatible with ``X``, and hence ``A``
     happens-before ``X``. But ``B`` is not compatible with
     ``X``, and hence it is not happens-before ``X``.

 Synchronization
     If an synchronizing operation has one or more tags, then whether it
     synchronizes-with and participates in the  ``seq_cst`` order with
     other operations is target dependent.

     Whether the following example synchronizes with another sequence depends
     on the target-defined semantics of ``foo:bar`` and ``foo:bux``.

     .. code-block::

        fence release               # foo:bar
        store atomic %ptr1          # foo:bux

 Examples
 --------

 Example 1:
     .. code-block::

       A: store ptr addrspace(1) %ptr2                  # sync-as:1 vulkan:nonprivate
       B: store atomic release ptr addrspace(1) %ptr3   # sync-as:0 vulkan:nonprivate

     A and B are not ordered relative to each other
     (no *happens-before*) because their sets of tags are not compatible.

     Note that the ``sync-as`` value does not have to match the ``addrspace`` value.
     e.g. In Example 1, a store-release to a location in ``addrspace(1)`` wants to
     only synchronize with operations happening in ``addrspace(0)``.

 Example 2:
     .. code-block::

       A: store ptr addrspace(1) %ptr2                 # sync-as:1 vulkan:nonprivate
       B: store atomic release ptr addrspace(1) %ptr3  # sync-as:1 vulkan:nonprivate

     The ordering of A and B is unaffected because their set of tags are
     compatible.

     Note that A and B may or may not be in *happens-before* due to other reasons.

 Example 3:
     .. code-block::

       A: store ptr addrspace(1) %ptr2                 # sync-as:1 vulkan:nonprivate
       B: store atomic release ptr addrspace(1) %ptr3  # vulkan:nonprivate

     The ordering of A and B is unaffected because their set of tags are
     compatible.

 Example 4:
     .. code-block::

       A: store ptr addrspace(1) %ptr2                 # sync-as:1
       B: store atomic release ptr addrspace(1) %ptr3  # sync-as:2

     A and B do not have to be ordered relative to each other
     (no *happens-before*) because their sets of tags are not compatible.

 Use-cases
 =========

 SPIRV ``NonPrivatePointer``
 ---------------------------

 MMRAs can support the SPIRV capability
 ``VulkanMemoryModel``, where synchronizing operations only affect
 memory operations that specify ``NonPrivatePointer`` semantics.

 The example below is generated from a SPIRV program using the
 following recipe:

 - Add ``vulkan:nonprivate`` to every synchronizing operation.
 - Add ``vulkan:nonprivate`` to every non-atomic memory operation
   that is marked ``NonPrivatePointer``.
 - Add ``vulkan:private`` to tags of every non-atomic memory operation
   that is not marked ``NonPrivatePointer``.

 .. code-block::

    Thread T1:
     A: store %ptr1                 # vulkan:nonprivate
     B: store %ptr2                 # vulkan:private
     X: store atomic release %ptr3  # vulkan:nonprivate

    Thread T2:
     Y: load atomic acquire %ptr3   # vulkan:nonprivate
     C: load %ptr2                  # vulkan:private
     D: load %ptr1                  # vulkan:nonprivate

 Compatibility ensures that operation ``A`` is ordered
 relative to ``X`` while operation ``D`` is ordered relative to ``Y``.
 If ``X`` synchronizes with ``Y``, then ``A`` happens-before ``D``.
 No such relation can be inferred about operations ``B`` and ``C``.

 .. note::
    The `Vulkan Memory Model <https://registry.khronos.org/vulkan/specs/1.3-extensions/html/vkspec.html#memory-model-non-private>`_
    considers all atomic operation non-private.

    Whether ``vulkan:nonprivate`` would be specified on atomic operations is
    an implementation detail, as an atomic operation is always ``nonprivate``.
    The implementation may choose to be explicit and emit IR with
    ``vulkan:nonprivate`` on every atomic operation, or it could choose to
    only emit ``vulkan::private`` and assume ``vulkan:nonprivate``
    by default.

 Operations marked with ``vulkan:private`` effectively opt out of the
 happens-before order in a SPIRV program since they are incompatible
 with every synchronizing operation. Note that SPIRV operations that
 are not marked ``NonPrivatePointer`` are not entirely private to the
 thread --- they are implicitly synchronized at the start or end of a
 thread by the Vulkan *system-synchronizes-with* relationship. This
 example assumes that the target-defined semantics of
 ``vulkan:private`` correctly implements this property.

 This scheme is general enough to express the interoperability of SPIRV
 programs with other environments.

 .. code-block::

    Thread T1:
    A: store %ptr1                 # vulkan:nonprivate
    X: store atomic release %ptr2  # vulkan:nonprivate

    Thread T2:
    Y: load atomic acquire %ptr2   # foo:bar
    B: load %ptr1

 In the above example, thread ``T1`` originates from a SPIRV program
 while thread ``T2`` originates from a non-SPIRV program. Whether ``X``
 can synchronize with ``Y`` is target defined.  If ``X`` synchronizes
 with ``Y``, then ``A`` happens before ``B`` (because A/X and
 Y/B are compatible).

 Implementation Example
 ~~~~~~~~~~~~~~~~~~~~~~

 Consider the implementation of SPIRV ``NonPrivatePointer`` on a target
 where all memory operations are cached, and the entire cache is
 flushed or invalidated at a ``release`` or ``acquire`` respectively. A
 possible scheme is that when translating a SPIRV program, memory
 operations marked ``NonPrivatePointer`` should not be cached, and the
 cache contents should not be touched during an ``acquire`` and
 ``release`` operation.

 This could be implemented using the tags that share the ``vulkan:`` prefix,
 as follows:

 - For memory operations:

   - Operations with ``vulkan:nonprivate`` should bypass the cache.
   - Operations with ``vulkan:private`` should be cached.
   - Operations that specify neither or both should conservatively
     bypass the cache to ensure correctness.

 - For synchronizing operations:

   - Operations with ``vulkan:nonprivate`` should not flush or
     invalidate the cache.
   - Operations with ``vulkan:private`` should flush or invalidate the cache.
   - Operations that specify neither or both should conservatively
     flush or invalidate the cache to ensure correctness.

 .. note::
    In such an implementation, dropping the metadata on an operation, while
    not affecting correctness, may have big performance implications.
    e.g. an operation bypasses the cache when it shouldn't.

 Memory Types
 ------------

 MMRAs may express the selective synchronization of
 different memory types.

 As an example, a target may expose an ``sync-as:<N>`` tag to
 pass information about which address spaces are synchronized by the
 execution of a synchronizing operation.

 .. note::
   Address spaces are used here as a common example, but this concept
   can apply for other "memory types". What "memory types" means here is
   up to the target.

 .. code-block::

    # let 1 = global address space
    # let 3 = local address space

    Thread T1:
    A: store %ptr1                                  # sync-as:1
    B: store %ptr2                                  # sync-as:3
    X: store atomic release ptr addrspace(0) %ptr3  # sync-as:3

    Thread T2:
    Y: load atomic acquire ptr addrspace(0) %ptr3   # sync-as:3
    C: load %ptr2                                   # sync-as:3
    D: load %ptr1                                   # sync-as:1

 In the above figure, ``X`` and ``Y`` are atomic operations on a
 location in the ``global``  address space. If ``X`` synchronizes with
 ``Y``, then ``B`` happens-before ``C`` in the ``local`` address
 space. But no such statement can be made about operations ``A`` and
 ``D``, although they are performed on a location in the ``global``
 address space.

 Implementation Example: Adding Address Space Information to Fences
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

 Languages such as OpenCL C provide fence operations such as
 ``atomic_work_item_fence`` that can take an explicit address
 space to fence.

 By default, LLVM has no means to carry that information in the IR, so
 the information is lost during lowering to LLVM IR. This means that
 targets such as AMDGPU have to conservatively emit instructions to
 fence all address spaces in all cases, which can have a noticeable
 performance impact in high-performance applications.

 MMRAs may be used to preserve that information at the IR level, all the
 way through code generation. For example, a fence that only affects the
 global address space ``addrspace(1)`` may be lowered as

 .. code-block::

     fence release # sync-as:1

 and the target may use the presence of ``sync-as:1`` to infer that it
 must only emit instruction to fence the global address space.

 Note that as MMRAs are opt in, a fence that does not have MMRA metadata
 could still be lowered conservatively, so this optimization would only
 apply if the front-end emits the MMRA metadata on the fence instructions.

 Additional Topics
 =================

 .. note::

   The following sections are informational.

 Performance Impact
 ------------------

 MMRAs are a way to capture optimization opportunities in the program.
 But when an operation mentions no tags or conflicting tags,
 the target may need to produce conservative code to ensure correctness
 at the cost of performance. This can happen in the following situations:

 1. When a target first introduces MMRAs, the
    frontend might not have been updated to emit them.
 2. An optimization may drop MMRA metadata.
 3. An optimization may add arbitrary tags to an operation.

 Note that targets can always choose to ignore (or even drop) MMRAs
 and revert to the default behavior/codegen heuristics without
 affecting correctness.

 Consequences of the Absence of *happens-before*
 -----------------------------------------------

 In the :ref:`happens-before<HappensBefore>` section, we defined how an
 *happens-before* relation between two instruction can be broken
 by leveraging compatibility between MMRAs. When the instructions
 are incompatible and there is no *happens-before* relation, we say
 that the instructions "do not have to be ordered relative to each
 other".

 "Ordering" in this context is a very broad term which covers both
 static and runtime aspects.

 When there is no ordering constraint, we *could* statically reorder
 the instructions in an optimizer transform if the reordering does
 not break other constraints as single location coherence.
 Static reordering is one consequence of breaking *happens-before*,
 but is not the most interesting one.

 Run-time consequences are more interesting. When there is an
 *happens-before* relation between instructions, the target has to emit
 synchronization code to ensure other threads will observe the effects of
 the instructions in the right order.

 For instance, the target may have to wait for previous loads & stores to
 finish before starting a fence-release, or there may be a need to flush a
 memory cache before executing the next instruction.
 In the absence of *happens-before*, there is no such requirement and
 no waiting or flushing is required. This may noticeably speed up
 execution in some cases.

 Combining Operations
 --------------------

 If a pass can combine multiple memory or synchronizing operations
 into one, it needs to be able to combine MMRAs. One possible way to
 achieve this is by doing a prefix-wise union of the tag sets.

 Let A and B be two tags set, and U be the prefix-wise union of A and B.
 For every unique tag prefix P present in A or B:

 * If either A or B has no tags with prefix P, no tags with prefix
   P are added to U.
 * If both A and B have at least one tag with prefix P, all tags with prefix
   P from both sets are added to U.

 Passes should avoid aggressively combining MMRAs, as this can result
 in significant losses of information. While this cannot affect
 correctness, it may affect performance.

 As a general rule of thumb, common passes such as SimplifyCFG that
 aggressively combine/reorder operations should only combine
 instructions that have identical sets of tags.
 Passes that combine less frequently, or that are well aware of the cost
 of combining the MMRAs can use the prefix-wise union described above.

 Examples:

 .. code-block::

     A: store release %ptr1  # foo:x, foo:y, bar:x
     B: store release %ptr2  # foo:x, bar:y

     # Unique prefixes P = [foo, bar]
     # "foo:x" is common to A and B so it's added to U.
     # "bar:x" != "bar:y" so it's not added to U.
     U: store release %ptr3  # foo:x

 .. code-block::

     A: store release %ptr1  # foo:x, foo:y
     B: store release %ptr2  # foo:x, bux:y

     # Unique prefixes P = [foo, bux]
     # "foo:x" is common to A and B so it's added to U.
     # No tags have the prefix "bux" in A.
     U: store release %ptr3  # foo:x

 .. code-block::

     A: store release %ptr1
     B: store release %ptr2  # foo:x, bar:y

     # Unique prefixes P = [foo, bar]
     # No tags with "foo" or "bar" in A, so no tags added.
     U: store release %ptr3
	===================================
	Memory Model Relaxation Annotations
	===================================

	.. contents::
	:local:

	Introduction
	============

	Memory Model Relaxation Annotations (MMRAs) are target-defined properties
	on instructions that can be used to selectively relax constraints placed
	by the memory model. For example:

	* The use of ``VulkanMemoryModel`` in a SPIRV program allows certain
	memory operations to be reordered across ``acquire`` or ``release``
	operations.
	* OpenCL APIs expose primitives to only fence a specific set of address
	spaces. Carrying that information to the backend can enable the
	use of faster synchronization instructions, rather than fencing all
	address spaces everytime.

	MMRAs offer an opt-in system for targets to relax the default LLVM
	memory model.
	As such, they are attached to an operation using LLVM metadata which
	can always be dropped without affecting correctness.

	Definitions
	===========

	memory operation
	A load, a store, an atomic, or a function call that is marked as
	accessing memory.

	synchronizing operation
	An instruction that synchronizes memory with other threads (e.g.
	an atomic or a fence).

	tag
	Metadata attached to a memory or synchronizing operation
	that represents some target-defined property regarding memory
	synchronization.

	An operation may have multiple tags that each represent a different
	property.

	A tag is composed of a pair of metadata string: a prefix and a suffix.

	In LLVM IR, the pair is represented using a metadata tuple.
	In other cases (comments, documentation, etc.), we may use the
	``prefix:suffix`` notation.
	For example:

	.. code-block::
	:caption: Example: Tags in Metadata

	!0 = !{!"scope", !"workgroup"} # scope:workgroup
	!1 = !{!"scope", !"device"} # scope:device
	!2 = !{!"scope", !"system"} # scope:system

	.. note::

	The only semantics relevant to the optimizer is the
	"compatibility" relation defined below. All other
	semantics are target defined.

	Tags can also be organised in lists to allow operations
	to specify all of the tags they belong to. Such a list
	is referred to as a "set of tags".

	.. code-block::
	:caption: Example: Set of Tags in Metadata

	!0 = !{!"scope", !"workgroup"}
	!1 = !{!"sync-as", !"private"}
	!2 = !{!0, !2}

	.. note::

	If an operation does not have MMRA metadata, it's treated as if
	it has an empty list (``!{}``) of tags.

	Note that it is not an error if a tag is not recognized by the
	instruction it is applied to, or by the current target.
	Such tags are simply ignored.

	Both synchronizing operations and memory operations can have
	zero or more tags attached to them using the ``!mmra`` syntax.

	For the sake of readability in examples below,
	we use a (non-functional) short syntax to represent MMMRA metadata:

	.. code-block::
	:caption: Short Syntax Example

	store %ptr1 # foo:bar
	store %ptr1 !mmra !{!"foo", !"bar"}

	These two notations can be used in this document and are strictly
	equivalent. However, only the second version is functional.

	compatibility
	Two sets of tags are said to be compatible iff, for every unique
	tag prefix P present in at least one set:

	- the other set contains no tag with prefix P, or
	- at least one tag with prefix P is common to both sets.

	The above definition implies that an empty set is always compatible
	with any other set. This is an important property as it ensures that
	if a transform drops the metadata on an operation, it can never affect
	correctness. In other words, the memory model cannot be relaxed further
	by deleting metadata from instructions.

	.. _HappensBefore:

	The happens-before Relation
	==============================

	Compatibility checks can be used to opt out of the happens-before relation
	established between two instructions.

	Ordering
	When two instructions' metadata are not compatible, any program order
	between them are not in happens-before.

	For example, consider two tags ``foo:bar`` and
	``foo:baz`` exposed by a target:

	.. code-block::

	A: store %ptr1 # foo:bar
	B: store %ptr2 # foo:baz
	X: store atomic release %ptr3 # foo:bar

	In the above figure, ``A`` is compatible with ``X``, and hence ``A``
	happens-before ``X``. But ``B`` is not compatible with
	``X``, and hence it is not happens-before ``X``.

	Synchronization
	If an synchronizing operation has one or more tags, then whether it
	synchronizes-with and participates in the ``seq_cst`` order with
	other operations is target dependent.

	Whether the following example synchronizes with another sequence depends
	on the target-defined semantics of ``foo:bar`` and ``foo:bux``.

	.. code-block::

	fence release # foo:bar
	store atomic %ptr1 # foo:bux

	Examples
	--------

	Example 1:
	.. code-block::

	A: store ptr addrspace(1) %ptr2 # sync-as:1 vulkan:nonprivate
	B: store atomic release ptr addrspace(1) %ptr3 # sync-as:0 vulkan:nonprivate

	A and B are not ordered relative to each other
	(no happens-before) because their sets of tags are not compatible.

	Note that the ``sync-as`` value does not have to match the ``addrspace`` value.
	e.g. In Example 1, a store-release to a location in ``addrspace(1)`` wants to
	only synchronize with operations happening in ``addrspace(0)``.

	Example 2:
	.. code-block::

	A: store ptr addrspace(1) %ptr2 # sync-as:1 vulkan:nonprivate
	B: store atomic release ptr addrspace(1) %ptr3 # sync-as:1 vulkan:nonprivate

	The ordering of A and B is unaffected because their set of tags are
	compatible.

	Note that A and B may or may not be in happens-before due to other reasons.

	Example 3:
	.. code-block::

	A: store ptr addrspace(1) %ptr2 # sync-as:1 vulkan:nonprivate
	B: store atomic release ptr addrspace(1) %ptr3 # vulkan:nonprivate

	The ordering of A and B is unaffected because their set of tags are
	compatible.

	Example 4:
	.. code-block::

	A: store ptr addrspace(1) %ptr2 # sync-as:1
	B: store atomic release ptr addrspace(1) %ptr3 # sync-as:2

	A and B do not have to be ordered relative to each other
	(no happens-before) because their sets of tags are not compatible.

	Use-cases
	=========

	SPIRV ``NonPrivatePointer``
	---------------------------

	MMRAs can support the SPIRV capability
	``VulkanMemoryModel``, where synchronizing operations only affect
	memory operations that specify ``NonPrivatePointer`` semantics.

	The example below is generated from a SPIRV program using the
	following recipe:

	- Add ``vulkan:nonprivate`` to every synchronizing operation.
	- Add ``vulkan:nonprivate`` to every non-atomic memory operation
	that is marked ``NonPrivatePointer``.
	- Add ``vulkan:private`` to tags of every non-atomic memory operation
	that is not marked ``NonPrivatePointer``.

	.. code-block::

	Thread T1:
	A: store %ptr1 # vulkan:nonprivate
	B: store %ptr2 # vulkan:private
	X: store atomic release %ptr3 # vulkan:nonprivate

	Thread T2:
	Y: load atomic acquire %ptr3 # vulkan:nonprivate
	C: load %ptr2 # vulkan:private
	D: load %ptr1 # vulkan:nonprivate

	Compatibility ensures that operation ``A`` is ordered
	relative to ``X`` while operation ``D`` is ordered relative to ``Y``.
	If ``X`` synchronizes with ``Y``, then ``A`` happens-before ``D``.
	No such relation can be inferred about operations ``B`` and ``C``.

	.. note::
	The `Vulkan Memory Model <https://registry.khronos.org/vulkan/specs/1.3-extensions/html/vkspec.html#memory-model-non-private>`_
	considers all atomic operation non-private.

	Whether ``vulkan:nonprivate`` would be specified on atomic operations is
	an implementation detail, as an atomic operation is always ``nonprivate``.
	The implementation may choose to be explicit and emit IR with
	``vulkan:nonprivate`` on every atomic operation, or it could choose to
	only emit ``vulkan::private`` and assume ``vulkan:nonprivate``
	by default.

	Operations marked with ``vulkan:private`` effectively opt out of the
	happens-before order in a SPIRV program since they are incompatible
	with every synchronizing operation. Note that SPIRV operations that
	are not marked ``NonPrivatePointer`` are not entirely private to the
	thread --- they are implicitly synchronized at the start or end of a
	thread by the Vulkan system-synchronizes-with relationship. This
	example assumes that the target-defined semantics of
	``vulkan:private`` correctly implements this property.

	This scheme is general enough to express the interoperability of SPIRV
	programs with other environments.

	.. code-block::

	Thread T1:
	A: store %ptr1 # vulkan:nonprivate
	X: store atomic release %ptr2 # vulkan:nonprivate

	Thread T2:
	Y: load atomic acquire %ptr2 # foo:bar
	B: load %ptr1

	In the above example, thread ``T1`` originates from a SPIRV program
	while thread ``T2`` originates from a non-SPIRV program. Whether ``X``
	can synchronize with ``Y`` is target defined. If ``X`` synchronizes
	with ``Y``, then ``A`` happens before ``B`` (because A/X and
	Y/B are compatible).

	Implementation Example
	~~~~~~~~~~~~~~~~~~~~~~

	Consider the implementation of SPIRV ``NonPrivatePointer`` on a target
	where all memory operations are cached, and the entire cache is
	flushed or invalidated at a ``release`` or ``acquire`` respectively. A
	possible scheme is that when translating a SPIRV program, memory
	operations marked ``NonPrivatePointer`` should not be cached, and the
	cache contents should not be touched during an ``acquire`` and
	``release`` operation.

	This could be implemented using the tags that share the ``vulkan:`` prefix,
	as follows:

	- For memory operations:

	- Operations with ``vulkan:nonprivate`` should bypass the cache.
	- Operations with ``vulkan:private`` should be cached.
	- Operations that specify neither or both should conservatively
	bypass the cache to ensure correctness.

	- For synchronizing operations:

	- Operations with ``vulkan:nonprivate`` should not flush or
	invalidate the cache.
	- Operations with ``vulkan:private`` should flush or invalidate the cache.
	- Operations that specify neither or both should conservatively
	flush or invalidate the cache to ensure correctness.

	.. note::
	In such an implementation, dropping the metadata on an operation, while
	not affecting correctness, may have big performance implications.
	e.g. an operation bypasses the cache when it shouldn't.

	Memory Types
	------------

	MMRAs may express the selective synchronization of
	different memory types.

	As an example, a target may expose an ``sync-as:<N>`` tag to
	pass information about which address spaces are synchronized by the
	execution of a synchronizing operation.

	.. note::
	Address spaces are used here as a common example, but this concept
	can apply for other "memory types". What "memory types" means here is
	up to the target.

	.. code-block::

	# let 1 = global address space
	# let 3 = local address space

	Thread T1:
	A: store %ptr1 # sync-as:1
	B: store %ptr2 # sync-as:3
	X: store atomic release ptr addrspace(0) %ptr3 # sync-as:3

	Thread T2:
	Y: load atomic acquire ptr addrspace(0) %ptr3 # sync-as:3
	C: load %ptr2 # sync-as:3
	D: load %ptr1 # sync-as:1

	In the above figure, ``X`` and ``Y`` are atomic operations on a
	location in the ``global`` address space. If ``X`` synchronizes with
	``Y``, then ``B`` happens-before ``C`` in the ``local`` address
	space. But no such statement can be made about operations ``A`` and
	``D``, although they are performed on a location in the ``global``
	address space.

	Implementation Example: Adding Address Space Information to Fences
	~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

	Languages such as OpenCL C provide fence operations such as
	``atomic_work_item_fence`` that can take an explicit address
	space to fence.

	By default, LLVM has no means to carry that information in the IR, so
	the information is lost during lowering to LLVM IR. This means that
	targets such as AMDGPU have to conservatively emit instructions to
	fence all address spaces in all cases, which can have a noticeable
	performance impact in high-performance applications.

	MMRAs may be used to preserve that information at the IR level, all the
	way through code generation. For example, a fence that only affects the
	global address space ``addrspace(1)`` may be lowered as

	.. code-block::

	fence release # sync-as:1

	and the target may use the presence of ``sync-as:1`` to infer that it
	must only emit instruction to fence the global address space.

	Note that as MMRAs are opt in, a fence that does not have MMRA metadata
	could still be lowered conservatively, so this optimization would only
	apply if the front-end emits the MMRA metadata on the fence instructions.

	Additional Topics
	=================

	.. note::

	The following sections are informational.

	Performance Impact
	------------------

	MMRAs are a way to capture optimization opportunities in the program.
	But when an operation mentions no tags or conflicting tags,
	the target may need to produce conservative code to ensure correctness
	at the cost of performance. This can happen in the following situations:

	1. When a target first introduces MMRAs, the
	frontend might not have been updated to emit them.
	2. An optimization may drop MMRA metadata.
	3. An optimization may add arbitrary tags to an operation.

	Note that targets can always choose to ignore (or even drop) MMRAs
	and revert to the default behavior/codegen heuristics without
	affecting correctness.

	Consequences of the Absence of happens-before
	-----------------------------------------------

	In the :ref:`happens-before<HappensBefore>` section, we defined how an
	happens-before relation between two instruction can be broken
	by leveraging compatibility between MMRAs. When the instructions
	are incompatible and there is no happens-before relation, we say
	that the instructions "do not have to be ordered relative to each
	other".

	"Ordering" in this context is a very broad term which covers both
	static and runtime aspects.

	When there is no ordering constraint, we could statically reorder
	the instructions in an optimizer transform if the reordering does
	not break other constraints as single location coherence.
	Static reordering is one consequence of breaking happens-before,
	but is not the most interesting one.

	Run-time consequences are more interesting. When there is an
	happens-before relation between instructions, the target has to emit
	synchronization code to ensure other threads will observe the effects of
	the instructions in the right order.

	For instance, the target may have to wait for previous loads & stores to
	finish before starting a fence-release, or there may be a need to flush a
	memory cache before executing the next instruction.
	In the absence of happens-before, there is no such requirement and
	no waiting or flushing is required. This may noticeably speed up
	execution in some cases.

	Combining Operations
	--------------------

	If a pass can combine multiple memory or synchronizing operations
	into one, it needs to be able to combine MMRAs. One possible way to
	achieve this is by doing a prefix-wise union of the tag sets.

	Let A and B be two tags set, and U be the prefix-wise union of A and B.
	For every unique tag prefix P present in A or B:

	* If either A or B has no tags with prefix P, no tags with prefix
	P are added to U.
	* If both A and B have at least one tag with prefix P, all tags with prefix
	P from both sets are added to U.

	Passes should avoid aggressively combining MMRAs, as this can result
	in significant losses of information. While this cannot affect
	correctness, it may affect performance.

	As a general rule of thumb, common passes such as SimplifyCFG that
	aggressively combine/reorder operations should only combine
	instructions that have identical sets of tags.
	Passes that combine less frequently, or that are well aware of the cost
	of combining the MMRAs can use the prefix-wise union described above.

	Examples:

	.. code-block::

	A: store release %ptr1 # foo:x, foo:y, bar:x
	B: store release %ptr2 # foo:x, bar:y

	# Unique prefixes P = [foo, bar]
	# "foo:x" is common to A and B so it's added to U.
	# "bar:x" != "bar:y" so it's not added to U.
	U: store release %ptr3 # foo:x

	.. code-block::

	A: store release %ptr1 # foo:x, foo:y
	B: store release %ptr2 # foo:x, bux:y

	# Unique prefixes P = [foo, bux]
	# "foo:x" is common to A and B so it's added to U.
	# No tags have the prefix "bux" in A.
	U: store release %ptr3 # foo:x

	.. code-block::

	A: store release %ptr1
	B: store release %ptr2 # foo:x, bar:y

	# Unique prefixes P = [foo, bar]
	# No tags with "foo" or "bar" in A, so no tags added.
	U: store release %ptr3