| =========================== |
| LLVM Branch Weight Metadata |
| =========================== |
| |
| .. contents:: |
| :local: |
| |
| Introduction |
| ============ |
| |
| Branch Weight Metadata represents branch weights as its likeliness to be taken |
| (see :doc:`BlockFrequencyTerminology`). Metadata is assigned to an |
| ``Instruction`` that is a terminator as a ``MDNode`` of the ``MD_prof`` kind. |
| The first operator is always a ``MDString`` node with the string |
| "branch_weights". Number of operators depends on the terminator type. |
| |
| Branch weights might be fetch from the profiling file, or generated based on |
| `__builtin_expect`_ and `__builtin_expect_with_probability`_ instruction. |
| |
| All weights are represented as an unsigned 32-bit values, where higher value |
| indicates greater chance to be taken. |
| |
| Supported Instructions |
| ====================== |
| |
| ``BranchInst`` |
| ^^^^^^^^^^^^^^ |
| |
| Metadata is only assigned to the conditional branches. There are two extra |
| operands for the true and the false branch. |
| |
| .. code-block:: none |
| |
| !0 = !{ |
| !"branch_weights", |
| i32 <TRUE_BRANCH_WEIGHT>, |
| i32 <FALSE_BRANCH_WEIGHT> |
| } |
| |
| ``SwitchInst`` |
| ^^^^^^^^^^^^^^ |
| |
| Branch weights are assigned to every case (including the ``default`` case which |
| is always case #0). |
| |
| .. code-block:: none |
| |
| !0 = !{ |
| !"branch_weights", |
| i32 <DEFAULT_BRANCH_WEIGHT> |
| [ , i32 <CASE_BRANCH_WEIGHT> ... ] |
| } |
| |
| ``IndirectBrInst`` |
| ^^^^^^^^^^^^^^^^^^ |
| |
| Branch weights are assigned to every destination. |
| |
| .. code-block:: none |
| |
| !0 = !{ |
| !"branch_weights", |
| i32 <LABEL_BRANCH_WEIGHT> |
| [ , i32 <LABEL_BRANCH_WEIGHT> ... ] |
| } |
| |
| ``CallInst`` |
| ^^^^^^^^^^^^^^^^^^ |
| |
| Calls may have branch weight metadata, containing the execution count of |
| the call. It is currently used in SamplePGO mode only, to augment the |
| block and entry counts which may not be accurate with sampling. |
| |
| .. code-block:: none |
| |
| !0 = !{ |
| !"branch_weights", |
| i32 <CALL_BRANCH_WEIGHT> |
| } |
| |
| ``InvokeInst`` |
| ^^^^^^^^^^^^^^^^^^ |
| |
| Invoke instruction may have branch weight metadata with one or two weights. |
| The second weight is optional and corresponds to the unwind branch. |
| If only one weight is set then it contains the execution count of the call |
| and used in SamplePGO mode only as described for the call instruction. If both |
| weights are specified then the second weight contains count of unwind branch |
| taken and the first weights contains the execution count of the call minus |
| the count of unwind branch taken. Both weights specified are used to calculate |
| BranchProbability as for BranchInst and for SamplePGO the sum of both weights |
| is used. |
| |
| .. code-block:: none |
| |
| !0 = !{ |
| !"branch_weights", |
| i32 <INVOKE_NORMAL_WEIGHT> |
| [ , i32 <INVOKE_UNWIND_WEIGHT> ] |
| } |
| |
| Other |
| ^^^^^ |
| |
| Other terminator instructions are not allowed to contain Branch Weight Metadata. |
| |
| .. _\__builtin_expect: |
| |
| Built-in ``expect`` Instructions |
| ================================ |
| |
| ``__builtin_expect(long exp, long c)`` instruction provides branch prediction |
| information. The return value is the value of ``exp``. |
| |
| It is especially useful in conditional statements. Currently Clang supports two |
| conditional statements: |
| |
| ``if`` statement |
| ^^^^^^^^^^^^^^^^ |
| |
| The ``exp`` parameter is the condition. The ``c`` parameter is the expected |
| comparison value. If it is equal to 1 (true), the condition is likely to be |
| true, in other case condition is likely to be false. For example: |
| |
| .. code-block:: c++ |
| |
| if (__builtin_expect(x > 0, 1)) { |
| // This block is likely to be taken. |
| } |
| |
| ``switch`` statement |
| ^^^^^^^^^^^^^^^^^^^^ |
| |
| The ``exp`` parameter is the value. The ``c`` parameter is the expected |
| value. If the expected value doesn't show on the cases list, the ``default`` |
| case is assumed to be likely taken. |
| |
| .. code-block:: c++ |
| |
| switch (__builtin_expect(x, 5)) { |
| default: break; |
| case 0: // ... |
| case 3: // ... |
| case 5: // This case is likely to be taken. |
| } |
| |
| .. _\__builtin_expect_with_probability: |
| |
| Built-in ``expect.with.probability`` Instruction |
| ================================================ |
| |
| ``__builtin_expect_with_probability(long exp, long c, double probability)`` has |
| the same semantics as ``__builtin_expect``, but the caller provides the |
| probability that ``exp == c``. The last argument ``probability`` must be |
| constant floating-point expression and be in the range [0.0, 1.0] inclusive. |
| The usage is also similar as ``__builtin_expect``, for example: |
| |
| ``if`` statement |
| ^^^^^^^^^^^^^^^^ |
| |
| If the expect comparison value ``c`` is equal to 1(true), and probability |
| value ``probability`` is set to 0.8, that means the probability of condition |
| to be true is 80% while that of false is 20%. |
| |
| .. code-block:: c++ |
| |
| if (__builtin_expect_with_probability(x > 0, 1, 0.8)) { |
| // This block is likely to be taken with probability 80%. |
| } |
| |
| ``switch`` statement |
| ^^^^^^^^^^^^^^^^^^^^ |
| |
| This is basically the same as ``switch`` statement in ``__builtin_expect``. |
| The probability that ``exp`` is equal to the expect value is given in |
| the third argument ``probability``, while the probability of other value is |
| the average of remaining probability(``1.0 - probability``). For example: |
| |
| .. code-block:: c++ |
| |
| switch (__builtin_expect_with_probability(x, 5, 0.7)) { |
| default: break; // Take this case with probability 10% |
| case 0: break; // Take this case with probability 10% |
| case 3: break; // Take this case with probability 10% |
| case 5: break; // This case is likely to be taken with probability 70% |
| } |
| |
| CFG Modifications |
| ================= |
| |
| Branch Weight Metatada is not proof against CFG changes. If terminator operands' |
| are changed some action should be taken. In other case some misoptimizations may |
| occur due to incorrect branch prediction information. |
| |
| Function Entry Counts |
| ===================== |
| |
| To allow comparing different functions during inter-procedural analysis and |
| optimization, ``MD_prof`` nodes can also be assigned to a function definition. |
| The first operand is a string indicating the name of the associated counter. |
| |
| Currently, one counter is supported: "function_entry_count". The second operand |
| is a 64-bit counter that indicates the number of times that this function was |
| invoked (in the case of instrumentation-based profiles). In the case of |
| sampling-based profiles, this operand is an approximation of how many times |
| the function was invoked. |
| |
| For example, in the code below, the instrumentation for function foo() |
| indicates that it was called 2,590 times at runtime. |
| |
| .. code-block:: llvm |
| |
| define i32 @foo() !prof !1 { |
| ret i32 0 |
| } |
| !1 = !{!"function_entry_count", i64 2590} |
| |
| If "function_entry_count" has more than 2 operands, the later operands are |
| the GUID of the functions that needs to be imported by ThinLTO. This is only |
| set by sampling based profile. It is needed because the sampling based profile |
| was collected on a binary that had already imported and inlined these functions, |
| and we need to ensure the IR matches in the ThinLTO backends for profile |
| annotation. The reason why we cannot annotate this on the callsite is that it |
| can only goes down 1 level in the call chain. For the cases where |
| foo_in_a_cc()->bar_in_b_cc()->baz_in_c_cc(), we will need to go down 2 levels |
| in the call chain to import both bar_in_b_cc and baz_in_c_cc. |