| .. _transformation-metadata: |
| |
| ============================ |
| Code Transformation Metadata |
| ============================ |
| |
| .. contents:: |
| :local: |
| |
| Overview |
| ======== |
| |
| LLVM transformation passes can be controlled by attaching metadata to |
| the code to transform. By default, transformation passes use heuristics |
| to determine whether or not to perform transformations, and when doing |
| so, other details of how the transformations are applied (e.g., which |
| vectorization factor to select). |
| Unless the optimizer is otherwise directed, transformations are applied |
| conservatively. This conservatism generally allows the optimizer to |
| avoid unprofitable transformations, but in practice, this results in the |
| optimizer not applying transformations that would be highly profitable. |
| |
| Frontends can give additional hints to LLVM passes on which |
| transformations they should apply. This can be additional knowledge that |
| cannot be derived from the emitted IR, or directives passed from the |
| user/programmer. OpenMP pragmas are an example of the latter. |
| |
| If any such metadata is dropped from the program, the code's semantics |
| must not change. |
| |
| Metadata on Loops |
| ================= |
| |
| Attributes can be attached to loops as described in :ref:`llvm.loop`. |
| Attributes can describe properties of the loop, disable transformations, |
| force specific transformations and set transformation options. |
| |
| Because metadata nodes are immutable (with the exception of |
| ``MDNode::replaceOperandWith`` which is dangerous to use on uniqued |
| metadata), in order to add or remove a loop attributes, a new ``MDNode`` |
| must be created and assigned as the new ``llvm.loop`` metadata. Any |
| connection between the old ``MDNode`` and the loop is lost. The |
| ``llvm.loop`` node is also used as LoopID (``Loop::getLoopID()``), i.e. |
| the loop effectively gets a new identifier. For instance, |
| ``llvm.mem.parallel_loop_access`` references the LoopID. Therefore, if |
| the parallel access property is to be preserved after adding/removing |
| loop attributes, any ``llvm.mem.parallel_loop_access`` reference must be |
| updated to the new LoopID. |
| |
| Transformation Metadata Structure |
| ================================= |
| |
| Some attributes describe code transformations (unrolling, vectorizing, |
| loop distribution, etc.). They can either be a hint to the optimizer |
| that a transformation might be beneficial, instruction to use a specific |
| option, , or convey a specific request from the user (such as |
| ``#pragma clang loop`` or ``#pragma omp simd``). |
| |
| If a transformation is forced but cannot be carried-out for any reason, |
| an optimization-missed warning must be emitted. Semantic information |
| such as a transformation being safe (e.g. |
| ``llvm.mem.parallel_loop_access``) can be unused by the optimizer |
| without generating a warning. |
| |
| Unless explicitly disabled, any optimization pass may heuristically |
| determine whether a transformation is beneficial and apply it. If |
| metadata for another transformation was specified, applying a different |
| transformation before it might be inadvertent due to being applied on a |
| different loop or the loop not existing anymore. To avoid having to |
| explicitly disable an unknown number of passes, the attribute |
| ``llvm.loop.disable_nonforced`` disables all optional, high-level, |
| restructuring transformations. |
| |
| The following example avoids the loop being altered before being |
| vectorized, for instance being unrolled. |
| |
| .. code-block:: llvm |
| |
| br i1 %exitcond, label %for.exit, label %for.header, !llvm.loop !0 |
| ... |
| !0 = distinct !{!0, !1, !2} |
| !1 = !{!"llvm.loop.vectorize.enable", i1 true} |
| !2 = !{!"llvm.loop.disable_nonforced"} |
| |
| After a transformation is applied, follow-up attributes are set on the |
| transformed and/or new loop(s). This allows additional attributes |
| including followup-transformations to be specified. Specifying multiple |
| transformations in the same metadata node is possible for compatibility |
| reasons, but their execution order is undefined. For instance, when |
| ``llvm.loop.vectorize.enable`` and ``llvm.loop.unroll.enable`` are |
| specified at the same time, unrolling may occur either before or after |
| vectorization. |
| |
| As an example, the following instructs a loop to be vectorized and only |
| then unrolled. |
| |
| .. code-block:: llvm |
| |
| !0 = distinct !{!0, !1, !2, !3} |
| !1 = !{!"llvm.loop.vectorize.enable", i1 true} |
| !2 = !{!"llvm.loop.disable_nonforced"} |
| !3 = !{!"llvm.loop.vectorize.followup_vectorized", !{"llvm.loop.unroll.enable"}} |
| |
| If, and only if, no followup is specified, the pass may add attributes itself. |
| For instance, the vectorizer adds a ``llvm.loop.isvectorized`` attribute and |
| all attributes from the original loop excluding its loop vectorizer |
| attributes. To avoid this, an empty followup attribute can be used, e.g. |
| |
| .. code-block:: llvm |
| |
| !3 = !{!"llvm.loop.vectorize.followup_vectorized"} |
| |
| The followup attributes of a transformation that cannot be applied will |
| never be added to a loop and are therefore effectively ignored. This means |
| that any followup-transformation in such attributes requires that its |
| prior transformations are applied before the followup-transformation. |
| The user should receive a warning about the first transformation in the |
| transformation chain that could not be applied if it a forced |
| transformation. All following transformations are skipped. |
| |
| Pass-Specific Transformation Metadata |
| ===================================== |
| |
| Transformation options are specific to each transformation. In the |
| following, we present the model for each LLVM loop optimization pass and |
| the metadata to influence them. |
| |
| Loop Vectorization and Interleaving |
| ----------------------------------- |
| |
| Loop vectorization and interleaving is interpreted as a single |
| transformation. It is interpreted as forced if |
| ``!{"llvm.loop.vectorize.enable", i1 true}`` is set. |
| |
| Assuming the pre-vectorization loop is |
| |
| .. code-block:: c |
| |
| for (int i = 0; i < n; i+=1) // original loop |
| Stmt(i); |
| |
| then the code after vectorization will be approximately (assuming an |
| SIMD width of 4): |
| |
| .. code-block:: c |
| |
| int i = 0; |
| if (rtc) { |
| for (; i + 3 < n; i+=4) // vectorized/interleaved loop |
| Stmt(i:i+3); |
| } |
| for (; i < n; i+=1) // epilogue loop |
| Stmt(i); |
| |
| where ``rtc`` is a generated runtime check. |
| |
| ``llvm.loop.vectorize.followup_vectorized`` will set the attributes for |
| the vectorized loop. If not specified, ``llvm.loop.isvectorized`` is |
| combined with the original loop's attributes to avoid it being |
| vectorized multiple times. |
| |
| ``llvm.loop.vectorize.followup_epilogue`` will set the attributes for |
| the remainder loop. If not specified, it will have the original loop's |
| attributes combined with ``llvm.loop.isvectorized`` and |
| ``llvm.loop.unroll.runtime.disable`` (unless the original loop already |
| has unroll metadata). |
| |
| The attributes specified by ``llvm.loop.vectorize.followup_all`` are |
| added to both loops. |
| |
| When using a follow-up attribute, it replaces any automatically deduced |
| attributes for the generated loop in question. Therefore it is |
| recommended to add ``llvm.loop.isvectorized`` to |
| ``llvm.loop.vectorize.followup_all`` which avoids that the loop |
| vectorizer tries to optimize the loops again. |
| |
| Loop Unrolling |
| -------------- |
| |
| Unrolling is interpreted as forced any ``!{!"llvm.loop.unroll.enable"}`` |
| metadata or option (``llvm.loop.unroll.count``, ``llvm.loop.unroll.full``) |
| is present. Unrolling can be full unrolling, partial unrolling of a loop |
| with constant trip count or runtime unrolling of a loop with a trip |
| count unknown at compile-time. |
| |
| If the loop has been unrolled fully, there is no followup-loop. For |
| partial/runtime unrolling, the original loop of |
| |
| .. code-block:: c |
| |
| for (int i = 0; i < n; i+=1) // original loop |
| Stmt(i); |
| |
| is transformed into (using an unroll factor of 4): |
| |
| .. code-block:: c |
| |
| int i = 0; |
| for (; i + 3 < n; i+=4) { // unrolled loop |
| Stmt(i); |
| Stmt(i+1); |
| Stmt(i+2); |
| Stmt(i+3); |
| } |
| for (; i < n; i+=1) // remainder loop |
| Stmt(i); |
| |
| ``llvm.loop.unroll.followup_unrolled`` will set the loop attributes of |
| the unrolled loop. If not specified, the attributes of the original loop |
| without the ``llvm.loop.unroll.*`` attributes are copied and |
| ``llvm.loop.unroll.disable`` added to it. |
| |
| ``llvm.loop.unroll.followup_remainder`` defines the attributes of the |
| remainder loop. If not specified the remainder loop will have no |
| attributes. The remainder loop might not be present due to being fully |
| unrolled in which case this attribute has no effect. |
| |
| Attributes defined in ``llvm.loop.unroll.followup_all`` are added to the |
| unrolled and remainder loops. |
| |
| To avoid that the partially unrolled loop is unrolled again, it is |
| recommended to add ``llvm.loop.unroll.disable`` to |
| ``llvm.loop.unroll.followup_all``. If no follow-up attribute specified |
| for a generated loop, it is added automatically. |
| |
| Unroll-And-Jam |
| -------------- |
| |
| Unroll-and-jam uses the following transformation model (here with an |
| unroll factor if 2). Currently, it does not support a fallback version |
| when the transformation is unsafe. |
| |
| .. code-block:: c |
| |
| for (int i = 0; i < n; i+=1) { // original outer loop |
| Fore(i); |
| for (int j = 0; j < m; j+=1) // original inner loop |
| SubLoop(i, j); |
| Aft(i); |
| } |
| |
| .. code-block:: c |
| |
| int i = 0; |
| for (; i + 1 < n; i+=2) { // unrolled outer loop |
| Fore(i); |
| Fore(i+1); |
| for (int j = 0; j < m; j+=1) { // unrolled inner loop |
| SubLoop(i, j); |
| SubLoop(i+1, j); |
| } |
| Aft(i); |
| Aft(i+1); |
| } |
| for (; i < n; i+=1) { // remainder outer loop |
| Fore(i); |
| for (int j = 0; j < m; j+=1) // remainder inner loop |
| SubLoop(i, j); |
| Aft(i); |
| } |
| |
| ``llvm.loop.unroll_and_jam.followup_outer`` will set the loop attributes |
| of the unrolled outer loop. If not specified, the attributes of the |
| original outer loop without the ``llvm.loop.unroll.*`` attributes are |
| copied and ``llvm.loop.unroll.disable`` added to it. |
| |
| ``llvm.loop.unroll_and_jam.followup_inner`` will set the loop attributes |
| of the unrolled inner loop. If not specified, the attributes of the |
| original inner loop are used unchanged. |
| |
| ``llvm.loop.unroll_and_jam.followup_remainder_outer`` sets the loop |
| attributes of the outer remainder loop. If not specified it will not |
| have any attributes. The remainder loop might not be present due to |
| being fully unrolled. |
| |
| ``llvm.loop.unroll_and_jam.followup_remainder_inner`` sets the loop |
| attributes of the inner remainder loop. If not specified it will have |
| the attributes of the original inner loop. It the outer remainder loop |
| is unrolled, the inner remainder loop might be present multiple times. |
| |
| Attributes defined in ``llvm.loop.unroll_and_jam.followup_all`` are |
| added to all of the aforementioned output loops. |
| |
| To avoid that the unrolled loop is unrolled again, it is |
| recommended to add ``llvm.loop.unroll.disable`` to |
| ``llvm.loop.unroll_and_jam.followup_all``. It suppresses unroll-and-jam |
| as well as an additional inner loop unrolling. If no follow-up |
| attribute specified for a generated loop, it is added automatically. |
| |
| Loop Distribution |
| ----------------- |
| |
| The LoopDistribution pass tries to separate vectorizable parts of a loop |
| from the non-vectorizable part (which otherwise would make the entire |
| loop non-vectorizable). Conceptually, it transforms a loop such as |
| |
| .. code-block:: c |
| |
| for (int i = 1; i < n; i+=1) { // original loop |
| A[i] = i; |
| B[i] = 2 + B[i]; |
| C[i] = 3 + C[i - 1]; |
| } |
| |
| into the following code: |
| |
| .. code-block:: c |
| |
| if (rtc) { |
| for (int i = 1; i < n; i+=1) // coincident loop |
| A[i] = i; |
| for (int i = 1; i < n; i+=1) // coincident loop |
| B[i] = 2 + B[i]; |
| for (int i = 1; i < n; i+=1) // sequential loop |
| C[i] = 3 + C[i - 1]; |
| } else { |
| for (int i = 1; i < n; i+=1) { // fallback loop |
| A[i] = i; |
| B[i] = 2 + B[i]; |
| C[i] = 3 + C[i - 1]; |
| } |
| } |
| |
| where ``rtc`` is a generated runtime check. |
| |
| ``llvm.loop.distribute.followup_coincident`` sets the loop attributes of |
| all loops without loop-carried dependencies (i.e. vectorizable loops). |
| There might be more than one such loops. If not defined, the loops will |
| inherit the original loop's attributes. |
| |
| ``llvm.loop.distribute.followup_sequential`` sets the loop attributes of the |
| loop with potentially unsafe dependencies. There should be at most one |
| such loop. If not defined, the loop will inherit the original loop's |
| attributes. |
| |
| ``llvm.loop.distribute.followup_fallback`` defines the loop attributes |
| for the fallback loop, which is a copy of the original loop for when |
| loop versioning is required. If undefined, the fallback loop inherits |
| all attributes from the original loop. |
| |
| Attributes defined in ``llvm.loop.distribute.followup_all`` are added to |
| all of the aforementioned output loops. |
| |
| It is recommended to add ``llvm.loop.disable_nonforced`` to |
| ``llvm.loop.distribute.followup_fallback``. This avoids that the |
| fallback version (which is likely never executed) is further optimized |
| which would increase the code size. |
| |
| Versioning LICM |
| --------------- |
| |
| The pass hoists code out of loops that are only loop-invariant when |
| dynamic conditions apply. For instance, it transforms the loop |
| |
| .. code-block:: c |
| |
| for (int i = 0; i < n; i+=1) // original loop |
| A[i] = B[0]; |
| |
| into: |
| |
| .. code-block:: c |
| |
| if (rtc) { |
| auto b = B[0]; |
| for (int i = 0; i < n; i+=1) // versioned loop |
| A[i] = b; |
| } else { |
| for (int i = 0; i < n; i+=1) // unversioned loop |
| A[i] = B[0]; |
| } |
| |
| The runtime condition (``rtc``) checks that the array ``A`` and the |
| element `B[0]` do not alias. |
| |
| Currently, this transformation does not support followup-attributes. |
| |
| Loop Interchange |
| ---------------- |
| |
| Currently, the ``LoopInterchange`` pass does not use any metadata. |
| |
| Ambiguous Transformation Order |
| ============================== |
| |
| If there multiple transformations defined, the order in which they are |
| executed depends on the order in LLVM's pass pipeline, which is subject |
| to change. The default optimization pipeline (anything higher than |
| ``-O0``) has the following order. |
| |
| When using the legacy pass manager: |
| |
| - LoopInterchange (if enabled) |
| - SimpleLoopUnroll/LoopFullUnroll (only performs full unrolling) |
| - VersioningLICM (if enabled) |
| - LoopDistribute |
| - LoopVectorizer |
| - LoopUnrollAndJam (if enabled) |
| - LoopUnroll (partial and runtime unrolling) |
| |
| When using the legacy pass manager with LTO: |
| |
| - LoopInterchange (if enabled) |
| - SimpleLoopUnroll/LoopFullUnroll (only performs full unrolling) |
| - LoopVectorizer |
| - LoopUnroll (partial and runtime unrolling) |
| |
| When using the new pass manager: |
| |
| - SimpleLoopUnroll/LoopFullUnroll (only performs full unrolling) |
| - LoopDistribute |
| - LoopVectorizer |
| - LoopUnrollAndJam (if enabled) |
| - LoopUnroll (partial and runtime unrolling) |
| |
| Leftover Transformations |
| ======================== |
| |
| Forced transformations that have not been applied after the last |
| transformation pass should be reported to the user. The transformation |
| passes themselves cannot be responsible for this reporting because they |
| might not be in the pipeline, there might be multiple passes able to |
| apply a transformation (e.g. ``LoopInterchange`` and Polly) or a |
| transformation attribute may be 'hidden' inside another passes' followup |
| attribute. |
| |
| The pass ``-transform-warning`` (``WarnMissedTransformationsPass``) |
| emits such warnings. It should be placed after the last transformation |
| pass. |
| |
| The current pass pipeline has a fixed order in which transformations |
| passes are executed. A transformation can be in the followup of a pass |
| that is executed later and thus leftover. For instance, a loop nest |
| cannot be distributed and then interchanged with the current pass |
| pipeline. The loop distribution will execute, but there is no loop |
| interchange pass following such that any loop interchange metadata will |
| be ignored. The ``-transform-warning`` should emit a warning in this |
| case. |
| |
| Future versions of LLVM may fix this by executing transformations using |
| a dynamic ordering. |