|  | .. _convergence-and-uniformity: | 
|  |  | 
|  | ========================== | 
|  | Convergence And Uniformity | 
|  | ========================== | 
|  |  | 
|  | .. contents:: | 
|  | :local: | 
|  |  | 
|  | Introduction | 
|  | ============ | 
|  |  | 
|  | In some environments, groups of threads execute the same program in parallel, | 
|  | where efficient communication within a group is established using special | 
|  | primitives called :ref:`convergent operations<convergent_operations>`. The | 
|  | outcome of a convergent operation is sensitive to the set of threads that | 
|  | participate in it. | 
|  |  | 
|  | The intuitive picture of *convergence* is built around threads executing in | 
|  | "lock step" --- a set of threads is thought of as *converged* if they are all | 
|  | executing "the same sequence of instructions together". Such threads may | 
|  | *diverge* at a *divergent branch*, and they may later *reconverge* at some | 
|  | common program point. | 
|  |  | 
|  | In this intuitive picture, when converged threads execute an instruction, the | 
|  | resulting value is said to be *uniform* if it is the same in those threads, and | 
|  | *divergent* otherwise. Correspondingly, a branch is said to be a uniform branch | 
|  | if its condition is uniform, and it is a divergent branch otherwise. | 
|  |  | 
|  | But the assumption of lock-step execution is not necessary for describing | 
|  | communication at convergent operations. It also constrains the implementation | 
|  | (compiler as well as hardware) by overspecifying how threads execute in such a | 
|  | parallel environment. To eliminate this assumption: | 
|  |  | 
|  | - We define convergence as a relation between the execution of each instruction | 
|  | by different threads and not as a relation between the threads themselves. | 
|  | This definition is reasonable for known targets and is compatible with the | 
|  | semantics of :ref:`convergent operations<convergent_operations>` in LLVM IR. | 
|  | - We also define uniformity in terms of this convergence. The output of an | 
|  | instruction can be examined for uniformity across multiple threads only if the | 
|  | corresponding executions of that instruction are converged. | 
|  |  | 
|  | This document describes a static analysis for determining convergence at each | 
|  | instruction in a function. The analysis extends previous work on divergence | 
|  | analysis [DivergenceSPMD]_ to cover irreducible control-flow. The described | 
|  | analysis is used in LLVM to implement a UniformityAnalysis that determines the | 
|  | uniformity of value(s) computed at each instruction in an LLVM IR or MIR | 
|  | function. | 
|  |  | 
|  | .. [DivergenceSPMD] Julian Rosemann, Simon Moll, and Sebastian | 
|  | Hack. 2021. An Abstract Interpretation for SPMD Divergence on | 
|  | Reducible Control Flow Graphs. Proc. ACM Program. Lang. 5, POPL, | 
|  | Article 31 (January 2021), 35 pages. | 
|  | https://doi.org/10.1145/3434312 | 
|  |  | 
|  | Motivation | 
|  | ========== | 
|  |  | 
|  | Divergent branches constrain | 
|  | program transforms such as changing the CFG or moving a convergent | 
|  | operation to a different point of the CFG. Performing these | 
|  | transformations across a divergent branch can change the sets of | 
|  | threads that execute convergent operations convergently. While these | 
|  | constraints are out of scope for this document, | 
|  | uniformity analysis allows these transformations to identify | 
|  | uniform branches where these constraints do not hold. | 
|  |  | 
|  | Uniformity is also useful by itself on targets that execute threads in | 
|  | groups with shared execution resources (e.g. waves, warps, or | 
|  | subgroups): | 
|  |  | 
|  | - Uniform outputs can potentially be computed or stored on shared | 
|  | resources. | 
|  | - These targets must "linearize" a divergent branch to ensure that | 
|  | each side of the branch is followed by the corresponding threads in | 
|  | the same group. But linearization is unnecessary at uniform | 
|  | branches, since the whole group of threads follows either one side | 
|  | of the branch or the other. | 
|  |  | 
|  | Terminology | 
|  | =========== | 
|  |  | 
|  | Cycles | 
|  | Described in :ref:`cycle-terminology`. | 
|  |  | 
|  | Closed path | 
|  | Described in :ref:`cycle-closed-path`. | 
|  |  | 
|  | Disjoint paths | 
|  | Two paths in a CFG are said to be disjoint if the only nodes common | 
|  | to both are the start node or the end node, or both. | 
|  |  | 
|  | Join node | 
|  | A join node of a branch is a node reachable along disjoint paths | 
|  | starting from that branch. | 
|  |  | 
|  | Diverged path | 
|  | A diverged path is a path that starts from a divergent branch and | 
|  | either reaches a join node of the branch or reaches the end of the | 
|  | function without passing through any join node of the branch. | 
|  |  | 
|  | .. _convergence-dynamic-instances: | 
|  |  | 
|  | Threads and Dynamic Instances | 
|  | ============================= | 
|  |  | 
|  | Each occurrence of an instruction in the program source is called a | 
|  | *static instance*. When a thread executes a program, each execution of | 
|  | a static instance produces a distinct *dynamic instance* of that | 
|  | instruction. | 
|  |  | 
|  | Each thread produces a unique sequence of dynamic instances: | 
|  |  | 
|  | - The sequence is generated along branch decisions and loop | 
|  | traversals. | 
|  | - Starts with a dynamic instance of a "first" instruction. | 
|  | - Continues with dynamic instances of successive "next" | 
|  | instructions. | 
|  |  | 
|  | Threads are independent; some targets may choose to execute them in | 
|  | groups in order to share resources when possible. | 
|  |  | 
|  | .. figure:: convergence-natural-loop.png | 
|  | :name: convergence-natural-loop | 
|  |  | 
|  | .. table:: | 
|  | :name: convergence-thread-example | 
|  | :align: left | 
|  |  | 
|  | +----------+--------+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+ | 
|  | |          |        | 1   | 2   | 3   | 4   | 5   | 6   | 7   | 8   | 9   |      | | 
|  | +----------+--------+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+ | 
|  | | Thread 1 | Entry1 | H1  | B1  | L1  | H3  |     | L3  |     |     |     | Exit | | 
|  | +----------+--------+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+ | 
|  | | Thread 2 | Entry1 | H2  |     | L2  | H4  | B2  | L4  | H5  | B3  | L5  | Exit | | 
|  | +----------+--------+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+ | 
|  |  | 
|  | In the above table, each row is a different thread, listing the | 
|  | dynamic instances produced by that thread from left to right. Each | 
|  | thread executes the same program that starts with an ``Entry`` node | 
|  | and ends with an ``Exit`` node, but different threads may take | 
|  | different paths through the control flow of the program. The columns | 
|  | are numbered merely for convenience, and empty cells have no special | 
|  | meaning. Dynamic instances listed in the same column are converged. | 
|  |  | 
|  | .. _convergence-definition: | 
|  |  | 
|  | Convergence | 
|  | =========== | 
|  |  | 
|  | *Convergence-before* is a strict partial order over dynamic instances | 
|  | that is defined as the transitive closure of: | 
|  |  | 
|  | 1. If dynamic instance ``P`` is executed strictly before ``Q`` in the | 
|  | same thread, then ``P`` is *convergence-before* ``Q``. | 
|  | 2. If dynamic instance ``P`` is executed strictly before ``Q1`` in the | 
|  | same thread, and ``Q1`` is *converged-with* ``Q2``, then ``P`` is | 
|  | *convergence-before* ``Q2``. | 
|  | 3. If dynamic instance ``P1`` is *converged-with* ``P2``, and ``P2`` | 
|  | is executed strictly before ``Q`` in the same thread, then ``P1`` | 
|  | is *convergence-before* ``Q``. | 
|  |  | 
|  | .. table:: | 
|  | :name: convergence-order-example | 
|  | :align: left | 
|  |  | 
|  | +----------+-------+-----+-----+-----+-----+-----+-----+-----+------+ | 
|  | |          | 1     | 2   | 3   | 4   | 5   | 6   | 7   | 8   | 9    | | 
|  | +----------+-------+-----+-----+-----+-----+-----+-----+-----+------+ | 
|  | | Thread 1 | Entry | ... |     |     |     | S2  | T   | ... | Exit | | 
|  | +----------+-------+-----+-----+-----+-----+-----+-----+-----+------+ | 
|  | | Thread 2 | Entry | ... |     | Q2  | R   | S1  |     | ... | Exit | | 
|  | +----------+-------+-----+-----+-----+-----+-----+-----+-----+------+ | 
|  | | Thread 3 | Entry | ... | P   | Q1  |     |     |     | ... |      | | 
|  | +----------+-------+-----+-----+-----+-----+-----+-----+-----+------+ | 
|  |  | 
|  | The above table shows partial sequences of dynamic instances from | 
|  | different threads. Dynamic instances in the same column are assumed | 
|  | to be converged (i.e., related to each other in the converged-with | 
|  | relation). The resulting convergence order includes the edges ``P -> | 
|  | Q2``, ``Q1 -> R``, ``P -> R``, ``P -> T``, etc. | 
|  |  | 
|  | *Converged-with* is a transitive symmetric relation over dynamic instances | 
|  | produced by *different threads* for the *same static instance*. | 
|  |  | 
|  | It is impractical to provide any one definition for the *converged-with* | 
|  | relation, since different environments may wish to relate dynamic instances in | 
|  | different ways. The fact that *convergence-before* is a strict partial order is | 
|  | a constraint on the *converged-with* relation. It is trivially satisfied if | 
|  | different dynamic instances are never converged. Below, we provide a relation | 
|  | called :ref:`maximal converged-with<convergence-maximal>`, which satisifies | 
|  | *convergence-before* and is suitable for known targets. | 
|  |  | 
|  | .. _convergence-note-convergence: | 
|  |  | 
|  | .. note:: | 
|  |  | 
|  | 1. The convergence-before relation is not | 
|  | directly observable. Program transforms are in general free to | 
|  | change the order of instructions, even though that obviously | 
|  | changes the convergence-before relation. | 
|  |  | 
|  | 2. Converged dynamic instances need not be executed at the same | 
|  | time or even on the same resource. Converged dynamic instances | 
|  | of a convergent operation may appear to do so but that is an | 
|  | implementation detail. | 
|  |  | 
|  | 3. The fact that ``P`` is convergence-before | 
|  | ``Q`` does not automatically imply that ``P`` happens-before | 
|  | ``Q`` in a memory model sense. | 
|  |  | 
|  | .. _convergence-maximal: | 
|  |  | 
|  | Maximal Convergence | 
|  | ------------------- | 
|  |  | 
|  | This section defines a constraint that may be used to | 
|  | produce a *maximal converged-with* relation without violating the | 
|  | strict *convergence-before* order. This maximal converged-with | 
|  | relation is reasonable for real targets and is compatible with | 
|  | convergent operations. | 
|  |  | 
|  | The maximal converged-with relation is defined in terms of cycle | 
|  | headers, with the assumption that threads converge at the header on every | 
|  | "iteration" of the cycle. Informally, two threads execute the same iteration of | 
|  | a cycle if they both previously executed the cycle header the same number of | 
|  | times after they entered that cycle. In general, this needs to account for the | 
|  | iterations of parent cycles as well. | 
|  |  | 
|  | **Maximal converged-with:** | 
|  |  | 
|  | Dynamic instances ``X1`` and ``X2`` produced by different threads | 
|  | for the same static instance ``X`` are converged in the maximal | 
|  | converged-with relation if and only if: | 
|  |  | 
|  | - ``X`` is not contained in any cycle, or, | 
|  | - For every cycle ``C`` with header ``H`` that contains ``X``: | 
|  |  | 
|  | - every dynamic instance ``H1`` of ``H`` that precedes ``X1`` in | 
|  | the respective thread is convergence-before ``X2``, and, | 
|  | - every dynamic instance ``H2`` of ``H`` that precedes ``X2`` in | 
|  | the respective thread is convergence-before ``X1``, | 
|  | - without assuming that ``X1`` is converged with ``X2``. | 
|  |  | 
|  | .. note:: | 
|  |  | 
|  | Cycle headers may not be unique to a given CFG if it is irreducible. Each | 
|  | cycle hierarchy for the same CFG results in a different maximal | 
|  | converged-with relation. | 
|  |  | 
|  | For brevity, the rest of the document restricts the term | 
|  | *converged* to mean "related under the maximal converged-with | 
|  | relation for the given cycle hierarchy". | 
|  |  | 
|  | Maximal convergence can now be demonstrated in the earlier example as follows: | 
|  |  | 
|  | .. table:: | 
|  | :align: left | 
|  |  | 
|  | +----------+--------+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+ | 
|  | |          |        | 1   | 2   | 3   | 4   | 5   | 6   | 7   | 8   | 9   |      | | 
|  | +----------+--------+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+ | 
|  | | Thread 1 | Entry1 | H1  | B1  | L1  | H3  |     | L3  |     |     |     | Exit | | 
|  | +----------+--------+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+ | 
|  | | Thread 2 | Entry2 | H2  |     | L2  | H4  | B2  | L4  | H5  | B3  | L5  | Exit | | 
|  | +----------+--------+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+ | 
|  |  | 
|  | - ``Entry1`` and ``Entry2`` are converged. | 
|  | - ``H1`` and ``H2`` are converged. | 
|  | - ``B1`` and ``B2`` are not converged due to ``H4`` which is not | 
|  | convergence-before ``B1``. | 
|  | - ``H3`` and ``H4`` are converged. | 
|  | - ``H3`` is not converged with ``H5`` due to ``H4`` which is not | 
|  | convergence-before ``H3``. | 
|  | - ``L1`` and ``L2`` are converged. | 
|  | - ``L3`` and ``L4`` are converged. | 
|  | - ``L3`` is not converged with ``L5`` due to ``H5`` which is not | 
|  | convergence-before ``L3``. | 
|  |  | 
|  | .. _convergence-cycle-headers: | 
|  |  | 
|  | Dependence on Cycles Headers | 
|  | ---------------------------- | 
|  |  | 
|  | Contradictions in *convergence-before* are possible only between two | 
|  | nodes that are inside some cycle. The dynamic instances of such nodes | 
|  | may be interleaved in the same thread, and this interleaving may be | 
|  | different for different threads. Cycle headers serve as implicit | 
|  | *points of convergence* in the maximal converged-with relation. | 
|  | When a thread executes a node ``X`` once and then executes it again, | 
|  | it must have followed a closed path in the CFG that includes ``X``. | 
|  | Such a path must pass through the header of at least one cycle --- the | 
|  | smallest cycle that includes the entire closed path. In a given | 
|  | thread, two dynamic instances of ``X`` are either separated by the | 
|  | execution of at least one cycle header, or ``X`` itself is a cycle | 
|  | header. | 
|  |  | 
|  | Consider a sequence of nested cycles ``C1``, ``C2``, ..., ``Ck`` such | 
|  | that ``C1`` is the outermost cycle and ``Ck`` is the innermost cycle, | 
|  | with headers ``H1``, ``H2``, ..., ``Hk`` respectively. When a thread | 
|  | enters the cycle ``Ck``, any of the following is possible: | 
|  |  | 
|  | 1. The thread directly entered cycle ``Ck`` without having executed | 
|  | any of the headers ``H1`` to ``Hk``. | 
|  |  | 
|  | 2. The thread executed some or all of the nested headers one or more | 
|  | times. | 
|  |  | 
|  | The maximal converged-with relation captures the following intuition | 
|  | about cycles: | 
|  |  | 
|  | 1. When two threads enter a top-level cycle ``C1``, they execute | 
|  | converged dynamic instances of every node that is a :ref:`child | 
|  | <cycle-parent-block>` of ``C1``. | 
|  |  | 
|  | 2. When two threads enter a nested cycle ``Ck``, they execute | 
|  | converged dynamic instances of every node that is a child of | 
|  | ``Ck``, until either thread exits ``Ck``, if and only if they | 
|  | executed converged dynamic instances of the last nested header that | 
|  | either thread encountered. | 
|  |  | 
|  | Note that when a thread exits a nested cycle ``Ck``, it must follow | 
|  | a closed path outside ``Ck`` to reenter it. This requires executing | 
|  | the header of some outer cycle, as described earlier. | 
|  |  | 
|  | Consider two dynamic instances ``X1`` and ``X2`` produced by threads ``T1`` | 
|  | and ``T2`` for a node ``X`` that is a child of nested cycle ``Ck``. | 
|  | Maximal convergence relates ``X1`` and ``X2`` as follows: | 
|  |  | 
|  | 1. If neither thread executed any header from ``H1`` to ``Hk``, then | 
|  | ``X1`` and ``X2`` are converged. | 
|  |  | 
|  | 2. Otherwise, if there are no converged dynamic instances ``Q1`` and | 
|  | ``Q2`` of any header ``Q`` from ``H1`` to ``Hk`` (where ``Q`` is | 
|  | possibly the same as ``X``), such that ``Q1`` precedes ``X1`` and | 
|  | ``Q2`` precedes ``X2`` in the respective threads, then ``X1`` and | 
|  | ``X2`` are not converged. | 
|  |  | 
|  | 3. Otherwise, consider the pair ``Q1`` and ``Q2`` of converged dynamic | 
|  | instances of a header ``Q`` from ``H1`` to ``Hk`` that occur most | 
|  | recently before ``X1`` and ``X2`` in the respective threads. Then | 
|  | ``X1`` and ``X2`` are converged if and only if there is no dynamic | 
|  | instance of any header from ``H1`` to ``Hk`` that occurs between | 
|  | ``Q1`` and ``X1`` in thread ``T1``, or between ``Q2`` and ``X2`` in | 
|  | thread ``T2``. In other words, ``Q1`` and ``Q2`` represent the last | 
|  | point of convergence, with no other header being executed before | 
|  | executing ``X``. | 
|  |  | 
|  | **Example:** | 
|  |  | 
|  | .. figure:: convergence-both-diverged-nested.png | 
|  | :name: convergence-both-diverged-nested | 
|  |  | 
|  | The above figure shows two nested irreducible cycles with headers | 
|  | ``R`` and ``S``. The nodes ``Entry`` and ``Q`` have divergent | 
|  | branches. The table below shows the convergence between three threads | 
|  | taking different paths through the CFG. Dynamic instances listed in | 
|  | the same column are converged. | 
|  |  | 
|  | .. table:: | 
|  | :align: left | 
|  |  | 
|  | +---------+-------+-----+-----+-----+-----+-----+-----+-----+------+ | 
|  | |         | 1     | 2   | 3   | 4   | 5   | 6   | 7   | 8   | 10   | | 
|  | +---------+-------+-----+-----+-----+-----+-----+-----+-----+------+ | 
|  | | Thread1 | Entry | P1  | Q1  | S1  | P3  | Q3  | R1  | S2  | Exit | | 
|  | +---------+-------+-----+-----+-----+-----+-----+-----+-----+------+ | 
|  | | Thread2 | Entry | P2  | Q2  |     |     |     | R2  | S3  | Exit | | 
|  | +---------+-------+-----+-----+-----+-----+-----+-----+-----+------+ | 
|  | | Thread3 | Entry |     |     |     |     |     | R3  | S4  | Exit | | 
|  | +---------+-------+-----+-----+-----+-----+-----+-----+-----+------+ | 
|  |  | 
|  | - ``P2`` and ``P3`` are not converged due to ``S1`` | 
|  | - ``Q2`` and ``Q3`` are not converged due to ``S1`` | 
|  | - ``S1`` and ``S3`` are not converged due to ``R2`` | 
|  | - ``S1`` and ``S4`` are not converged due to ``R3`` | 
|  |  | 
|  | Informally, ``T1`` and ``T2`` execute the inner cycle a different | 
|  | number of times, without executing the header of the outer cycle. All | 
|  | threads converge in the outer cycle when they first execute the header | 
|  | of the outer cycle. | 
|  |  | 
|  | .. _convergence-uniformity: | 
|  |  | 
|  | Uniformity | 
|  | ========== | 
|  |  | 
|  | 1. The output of two converged dynamic instances is uniform if and | 
|  | only if it compares equal for those two dynamic instances. | 
|  | 2. The output of a static instance ``X`` is uniform *for a given set | 
|  | of threads* if and only if it is uniform for every pair of | 
|  | converged dynamic instances of ``X`` produced by those threads. | 
|  |  | 
|  | A non-uniform value is said to be *divergent*. | 
|  |  | 
|  | For a set ``S`` of threads, the uniformity of each output of a static | 
|  | instance is determined as follows: | 
|  |  | 
|  | 1. The semantics of the instruction may specify the output to be | 
|  | uniform. | 
|  | 2. Otherwise, the output is divergent if the static instance is not | 
|  | :ref:`m-converged <convergence-m-converged>`. | 
|  | 3. Otherwise, if the static instance is m-converged: | 
|  |  | 
|  | 1. If it is a PHI node, its output is uniform if and only | 
|  | if for every pair of converged dynamic instances produced by all | 
|  | threads in ``S``: | 
|  |  | 
|  | a. Both instances choose the same output from converged | 
|  | dynamic instances, and, | 
|  | b. That output is uniform for all threads in ``S``. | 
|  | 2. Otherwise, the output is uniform if and only if the input | 
|  | operands are uniform for all threads in ``S``. | 
|  |  | 
|  | Divergent Cycle Exits | 
|  | --------------------- | 
|  |  | 
|  | When a divergent branch occurs inside a cycle, it is possible that a | 
|  | diverged path continues to an exit of the cycle. This is called a | 
|  | divergent cycle exit. If the cycle is irreducible, the diverged path | 
|  | may re-enter and eventually reach a join within the cycle. Such a join | 
|  | should be examined for the :ref:`diverged entry | 
|  | <convergence-diverged-entry>` criterion. | 
|  |  | 
|  | Nodes along the diverged path that lie outside the cycle experience | 
|  | *temporal divergence*, when two threads executing convergently inside | 
|  | the cycle produce uniform values, but exit the cycle along the same | 
|  | divergent path after executing the header a different number of times | 
|  | (informally, on different iterations of the cycle). For a node ``N`` | 
|  | inside the cycle the outputs may be uniform for the two threads, but | 
|  | any use ``U`` outside the cycle receives a value from non-converged | 
|  | dynamic instances of ``N``. An output of ``U`` may be divergent, | 
|  | depending on the semantics of the instruction. | 
|  |  | 
|  | .. _uniformity-analysis: | 
|  |  | 
|  | Static Uniformity Analysis | 
|  | ========================== | 
|  |  | 
|  | Irreducible control flow results in different cycle hierarchies | 
|  | depending on the choice of headers during depth-first traversal. As a | 
|  | result, a static analysis cannot always determine the convergence of | 
|  | nodes in irreducible cycles, and any uniformity analysis is limited to | 
|  | those static instances whose convergence is independent of the cycle | 
|  | hierarchy: | 
|  |  | 
|  | .. _convergence-m-converged: | 
|  |  | 
|  | **m-converged static instances:** | 
|  |  | 
|  | A static instance ``X`` is *m-converged* for a given CFG if and only | 
|  | if the maximal converged-with relation for its dynamic instances is | 
|  | the same in every cycle hierarchy that can be constructed for that CFG. | 
|  |  | 
|  | .. note:: | 
|  |  | 
|  | In other words, two dynamic instances ``X1`` and ``X2`` of an | 
|  | m-converged static instance ``X`` are converged in some cycle | 
|  | hierarchy if and only if they are also converged in every other | 
|  | cycle hierarchy for the same CFG. | 
|  |  | 
|  | As noted earlier, for brevity, we restrict the term *converged* to | 
|  | mean "related under the maximal converged-with relation for a given | 
|  | cycle hierarchy". | 
|  |  | 
|  |  | 
|  | Each node ``X`` in a given CFG is reported to be m-converged if and | 
|  | only if every cycle that contains ``X`` satisfies the following necessary | 
|  | conditions: | 
|  |  | 
|  | 1. Every divergent branch inside the cycle satisfies the | 
|  | :ref:`diverged entry criterion<convergence-diverged-entry>`, and, | 
|  | 2. There are no :ref:`diverged paths reaching the | 
|  | cycle<convergence-diverged-outside>` from a divergent branch | 
|  | outside it. | 
|  |  | 
|  | .. note:: | 
|  |  | 
|  | A reducible cycle :ref:`trivially satisfies | 
|  | <convergence-reducible-cycle>` the above conditions. In particular, | 
|  | if the whole CFG is reducible, then all nodes in the CFG are | 
|  | m-converged. | 
|  |  | 
|  | The uniformity of each output of a static instance | 
|  | is determined using the criteria | 
|  | :ref:`described earlier <convergence-uniformity>`. The discovery of | 
|  | divergent outputs may cause their uses (including branches) to also | 
|  | become divergent. The analysis propagates this divergence until a | 
|  | fixed point is reached. | 
|  |  | 
|  | The convergence inferred using these criteria is a safe subset of the | 
|  | maximal converged-with relation for any cycle hierarchy. In | 
|  | particular, it is sufficient to determine if a static instance is | 
|  | m-converged for a given cycle hierarchy ``T``, even if that fact is | 
|  | not detected when examining some other cycle hierarchy ``T'``. | 
|  |  | 
|  | This property allows compiler transforms to use the uniformity | 
|  | analysis without being affected by DFS choices made in the underlying | 
|  | cycle analysis. When two transforms use different instances of the | 
|  | uniformity analysis for the same CFG, a "divergent value" result in | 
|  | one analysis instance cannot contradict a "uniform value" result in | 
|  | the other. | 
|  |  | 
|  | Generic transforms such as SimplifyCFG, CSE, and loop transforms | 
|  | commonly change the program in ways that change the maximal | 
|  | converged-with relations. This also means that a value that was | 
|  | previously uniform can become divergent after such a transform. | 
|  | Uniformity has to be recomputed after such transforms. | 
|  |  | 
|  | Divergent Branch inside a Cycle | 
|  | ------------------------------- | 
|  |  | 
|  | .. figure:: convergence-divergent-inside.png | 
|  | :name: convergence-divergent-inside | 
|  |  | 
|  | The above figure shows a divergent branch ``Q`` inside an irreducible | 
|  | cyclic region. When two threads diverge at ``Q``, the convergence of | 
|  | dynamic instances within the cyclic region depends on the cycle | 
|  | hierarchy chosen: | 
|  |  | 
|  | 1. In an implementation that detects a single cycle ``C`` with header | 
|  | ``P``, convergence inside the cycle is determined by ``P``. | 
|  |  | 
|  | 2. In an implementation that detects two nested cycles with headers | 
|  | ``R`` and ``S``, convergence inside those cycles is determined by | 
|  | their respective headers. | 
|  |  | 
|  | .. _convergence-diverged-entry: | 
|  |  | 
|  | A conservative approach would be to simply report all nodes inside | 
|  | irreducible cycles as having divergent outputs. But it is desirable to | 
|  | recognize m-converged nodes in the CFG in order to maximize | 
|  | uniformity. This section describes one such pattern of nodes derived | 
|  | from *closed paths*, which are a property of the CFG and do not depend | 
|  | on the cycle hierarchy. | 
|  |  | 
|  | **Diverged Entry Criterion:** | 
|  |  | 
|  | The dynamic instances of all the nodes in a closed path ``P`` are | 
|  | m-converged only if for every divergent branch ``B`` and its | 
|  | join node ``J`` that lie on ``P``, there is no entry to ``P`` which | 
|  | lies on a diverged path from ``B`` to ``J``. | 
|  |  | 
|  | .. figure:: convergence-closed-path.png | 
|  | :name: convergence-closed-path | 
|  |  | 
|  | Consider the closed path ``P -> Q -> R -> S`` in the above figure. | 
|  | ``P`` and ``R`` are :ref:`entries to the closed | 
|  | path<cycle-closed-path>`. ``Q`` is a divergent branch and ``S`` is a | 
|  | join for that branch, with diverged paths ``Q -> R -> S`` and ``Q -> | 
|  | S``. | 
|  |  | 
|  | - If a diverged entry ``R`` exists, then in some cycle hierarchy, | 
|  | ``R`` is the header of the smallest cycle ``C`` containing the | 
|  | closed path and a :ref:`child cycle<cycle-definition>` ``C'`` | 
|  | exists in the set ``C - R``, containing both branch ``Q`` and join | 
|  | ``S``. When threads diverge at ``Q``, one subset ``M`` continues | 
|  | inside cycle ``C'``, while the complement ``N`` exits ``C'`` and | 
|  | reaches ``R``. Dynamic instances of ``S`` executed by threads in set | 
|  | ``M`` are not converged with those executed in set ``N`` due to the | 
|  | presence of ``R``. Informally, threads that diverge at ``Q`` | 
|  | reconverge in the same iteration of the outer cycle ``C``, but they | 
|  | may have executed the inner cycle ``C'`` differently. | 
|  |  | 
|  | .. table:: | 
|  | :align: left | 
|  |  | 
|  | +---------+-------+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+ | 
|  | |         | 1     | 2   | 3   | 4   | 5   | 6   | 7   | 8   | 9   | 10  | 11   | | 
|  | +---------+-------+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+ | 
|  | | Thread1 | Entry | P1  | Q1  |     |     |     | R1  | S1  | P3  | ... | Exit | | 
|  | +---------+-------+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+ | 
|  | | Thread2 | Entry | P2  | Q2  | S2  | P4  | Q4  | R2  | S4  |     |     | Exit | | 
|  | +---------+-------+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+ | 
|  |  | 
|  | In the table above, ``S2`` is not converged with ``S1`` due to ``R1``. | 
|  |  | 
|  | | | 
|  |  | 
|  | - If ``R`` does not exist, or if any node other than ``R`` is the | 
|  | header of ``C``, then no such child cycle ``C'`` is detected. | 
|  | Threads that diverge at ``Q`` execute converged dynamic instances of | 
|  | ``S`` since they do not encounter the cycle header on any path from | 
|  | ``Q`` to ``S``. Informally, threads that diverge at ``Q`` | 
|  | reconverge at ``S`` in the same iteration of ``C``. | 
|  |  | 
|  | .. table:: | 
|  | :align: left | 
|  |  | 
|  | +---------+-------+-----+-----+-----+-----+-----+-----+-----+-----+------+ | 
|  | |         | 1     | 2   | 3   | 4   | 5   | 6   | 7   | 8   | 9   | 10   | | 
|  | +---------+-------+-----+-----+-----+-----+-----+-----+-----+-----+------+ | 
|  | | Thread1 | Entry | P1  | Q1  | R1  | S1  | P3  | Q3  | R3  | S3  | Exit | | 
|  | +---------+-------+-----+-----+-----+-----+-----+-----+-----+-----+------+ | 
|  | | Thread2 | Entry | P2  | Q2  |     | S2  | P4  | Q4  | R2  | S4  | Exit | | 
|  | +---------+-------+-----+-----+-----+-----+-----+-----+-----+-----+------+ | 
|  |  | 
|  | | | 
|  |  | 
|  | .. note:: | 
|  |  | 
|  | In general, the cycle ``C`` in the above statements is not | 
|  | expected to be the same cycle for different headers. Cycles and | 
|  | their headers are tightly coupled; for different headers in the | 
|  | same outermost cycle, the child cycles detected may be different. | 
|  | The property relevant to the above examples is that for every | 
|  | closed path, there is a cycle ``C`` that contains the path and | 
|  | whose header is on that path. | 
|  |  | 
|  | The diverged entry criterion must be checked for every closed path | 
|  | passing through a divergent branch ``B`` and its join ``J``. Since | 
|  | :ref:`every closed path passes through the header of some | 
|  | cycle<cycle-closed-path-header>`, this amounts to checking every cycle | 
|  | ``C`` that contains ``B`` and ``J``. When the header of ``C`` | 
|  | dominates the join ``J``, there can be no entry to any path from the | 
|  | header to ``J``, which includes any diverged path from ``B`` to ``J``. | 
|  | This is also true for any closed paths passing through the header of | 
|  | an outer cycle that contains ``C``. | 
|  |  | 
|  | Thus, the diverged entry criterion can be conservatively simplified | 
|  | as follows: | 
|  |  | 
|  | For a divergent branch ``B`` and its join node ``J``, the nodes in a | 
|  | cycle ``C`` that contains both ``B`` and ``J`` are m-converged only | 
|  | if: | 
|  |  | 
|  | - ``B`` strictly dominates ``J``, or, | 
|  | - The header ``H`` of ``C`` strictly dominates ``J``, or, | 
|  | - Recursively, there is cycle ``C'`` inside ``C`` that satisfies the | 
|  | same condition. | 
|  |  | 
|  | When ``J`` is the same as ``H`` or ``B``, the trivial dominance is | 
|  | insufficient to make any statement about entries to diverged paths. | 
|  |  | 
|  | .. _convergence-diverged-outside: | 
|  |  | 
|  | Diverged Paths reaching a Cycle | 
|  | ------------------------------- | 
|  |  | 
|  | .. figure:: convergence-divergent-outside.png | 
|  | :name: convergence-divergent-outside | 
|  |  | 
|  | The figure shows two cycle hierarchies with a divergent branch in | 
|  | ``Entry`` instead of ``Q``. For two threads that enter the closed path | 
|  | ``P -> Q -> R -> S`` at ``P`` and ``R`` respectively, the convergence | 
|  | of dynamic instances generated along the path depends on whether ``P`` | 
|  | or ``R`` is the header. | 
|  |  | 
|  | -  Convergence when ``P`` is the header. | 
|  |  | 
|  | .. table:: | 
|  | :align: left | 
|  |  | 
|  | +---------+-------+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+ | 
|  | |         | 1     | 2   | 3   | 4   | 5   | 6   | 7   | 8   | 9   | 10  | 11  | 12  | 13   | | 
|  | +---------+-------+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+ | 
|  | | Thread1 | Entry |     |     |     | P1  | Q1  | R1  | S1  | P3  | Q3  |     | S3  | Exit | | 
|  | +---------+-------+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+ | 
|  | | Thread2 | Entry |     | R2  | S2  | P2  | Q2  |     | S2  | P4  | Q4  | R3  | S4  | Exit | | 
|  | +---------+-------+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+ | 
|  |  | 
|  | | | 
|  |  | 
|  | -  Convergence when ``R`` is the header. | 
|  |  | 
|  | .. table:: | 
|  | :align: left | 
|  |  | 
|  | +---------+-------+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+ | 
|  | |         | 1     | 2   | 3   | 4   | 5   | 6   | 7   | 8   | 9   | 10  | 11  | 12   | | 
|  | +---------+-------+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+ | 
|  | | Thread1 | Entry |     | P1  | Q1  | R1  | S1  | P3  | Q3  | S3  |     |     | Exit | | 
|  | +---------+-------+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+ | 
|  | | Thread2 | Entry |     |     |     | R2  | S2  | P2  | Q2  | S2  | P4  | ... | Exit | | 
|  | +---------+-------+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+ | 
|  |  | 
|  | | | 
|  |  | 
|  | Thus, when diverged paths reach different entries of an irreducible | 
|  | cycle from outside the cycle, the static analysis conservatively | 
|  | reports every node in the cycle as not m-converged. | 
|  |  | 
|  | .. _convergence-reducible-cycle: | 
|  |  | 
|  | Reducible Cycle | 
|  | --------------- | 
|  |  | 
|  | If ``C`` is a reducible cycle with header ``H``, then in any DFS, | 
|  | ``H`` :ref:`must be the header of some cycle<cycle-reducible-headers>` | 
|  | ``C'`` that contains ``C``. Independent of the DFS, there is no entry | 
|  | to the subgraph ``C`` other than ``H`` itself. Thus, we have the | 
|  | following: | 
|  |  | 
|  | 1. The diverged entry criterion is trivially satisfied for a divergent | 
|  | branch and its join, where both are inside subgraph ``C``. | 
|  | 2. When diverged paths reach the subgraph ``C`` from outside, their | 
|  | convergence is always determined by the same header ``H``. | 
|  |  | 
|  | Clearly, this can be determined only in a cycle hierarchy ``T`` where | 
|  | ``C`` is detected as a reducible cycle. No such conclusion can be made | 
|  | in a different cycle hierarchy ``T'`` where ``C`` is part of a larger | 
|  | cycle ``C'`` with the same header, but this does not contradict the | 
|  | conclusion in ``T``. | 
|  |  | 
|  | Controlled Convergence | 
|  | ====================== | 
|  |  | 
|  | :ref:`Convergence control tokens <dynamic_instances_and_convergence_tokens>` | 
|  | provide an explicit semantics for determining which threads are converged at a | 
|  | given point in the program. The impact of this is incorporated in a | 
|  | :ref:`controlled maximal converged-with <controlled_maximal_converged_with>` | 
|  | relation over dynamic instances and a :ref:`controlled m-converged | 
|  | <controlled_m_converged>` property of static instances. The :ref:`uniformity | 
|  | analysis <uniformity-analysis>` implemented in LLVM includes this for targets | 
|  | that support convergence control tokens. |