| # Debug Info Assignment Tracking |
| |
| Assignment Tracking is an alternative technique for tracking variable location |
| debug info through optimisations in LLVM. It provides accurate variable |
| locations for assignments where a local variable (or a field of one) is the |
| LHS. In rare and complicated circumstances indirect assignments might be |
| optimized away without being tracked, but otherwise we make our best effort to |
| track all variable locations. |
| |
| The core idea is to track more information about source assignments in order |
| and preserve enough information to be able to defer decisions about whether to |
| use non-memory locations (register, constant) or memory locations until after |
| middle end optimisations have run. This is in opposition to using |
| `llvm.dbg.declare` and `llvm.dbg.value`, which is to make the decision for most |
| variables early on, which can result in suboptimal variable locations that may |
| be either incorrect or incomplete. |
| |
| A secondary goal of assignment tracking is to cause minimal additional work for |
| LLVM pass writers, and minimal disruption to LLVM in general. |
| |
| ## Status and usage |
| |
| **Status**: Experimental work in progress. Enabling is strongly advised against |
| except for development and testing. |
| |
| **Enable in Clang**: `-Xclang -fexperimental-assignment-tracking` |
| |
| That causes Clang to get LLVM to run the pass `declare-to-assign`. The pass |
| converts conventional debug intrinsics to assignment tracking metadata and sets |
| the module flag `debug-info-assignment-tracking` to the value `i1 true`. To |
| check whether assignment tracking is enabled for a module call |
| `isAssignmentTrackingEnabled(const Module &M)` (from `llvm/IR/DebugInfo.h`). |
| |
| ## Design and implementation |
| |
| ### Assignment markers: `llvm.dbg.assign` |
| |
| `llvm.dbg.value`, a conventional debug intrinsic, marks out a position in the |
| IR where a variable takes a particular value. Similarly, Assignment Tracking |
| marks out the position of assignments with a new intrinsic called |
| `llvm.dbg.assign`. |
| |
| In order to know where in IR it is appropriate to use a memory location for a |
| variable, each assignment marker must in some way refer to the store, if any |
| (or multiple!), that performs the assignment. That way, the position of the |
| store and marker can be considered together when making that choice. Another |
| important benefit of referring to the store is that we can then build a two-way |
| mapping of stores<->markers that can be used to find markers that need to be |
| updated when stores are modified. |
| |
| An `llvm.dbg.assign` marker that is not linked to any instruction signals that |
| the store that performed the assignment has been optimised out, and therefore |
| the memory location will not be valid for at least some part of the program. |
| |
| Here's the `llvm.dbg.assign` signature. Each parameter is wrapped in |
| `MetadataAsValue`, and `Value *` type parameters are first wrapped in |
| `ValueAsMetadata`: |
| |
| ``` |
| void @llvm.dbg.assign(Value *Value, |
| DIExpression *ValueExpression, |
| DILocalVariable *Variable, |
| DIAssignID *ID, |
| Value *Address, |
| DIExpression *AddressExpression) |
| ``` |
| |
| The first three parameters look and behave like an `llvm.dbg.value`. `ID` is a |
| reference to a store (see next section). `Address` is the destination address |
| of the store and it is modified by `AddressExpression`. An empty/undef/poison |
| address means the address component has been killed (the memory address is no |
| longer a valid location). LLVM currently encodes variable fragment information |
| in `DIExpression`s, so as an implementation quirk the `FragmentInfo` for |
| `Variable` is contained within `ValueExpression` only. |
| |
| The formal LLVM-IR signature is: |
| ``` |
| void @llvm.dbg.assign(metadata, metadata, metadata, metadata, metadata, metadata) |
| ``` |
| |
| ### Instruction link: `DIAssignID` |
| |
| `DIAssignID` metadata is the mechanism that is currently used to encode the |
| store<->marker link. The metadata node has no operands and all instances are |
| `distinct`; equality is checked for by comparing addresses. |
| |
| `llvm.dbg.assign` intrinsics use a `DIAssignID` metadata node instance as an |
| operand. This way it refers to any store-like instruction that has the same |
| `DIAssignID` attachment. E.g. For this test.cpp, |
| |
| ``` |
| int fun(int a) { |
| return a; |
| } |
| ``` |
| compiled without optimisations: |
| ``` |
| $ clang++ test.cpp -o test.ll -emit-llvm -S -g -O0 -Xclang -fexperimental-assignment-tracking |
| ``` |
| we get: |
| ``` |
| define dso_local noundef i32 @_Z3funi(i32 noundef %a) #0 !dbg !8 { |
| entry: |
| %a.addr = alloca i32, align 4, !DIAssignID !13 |
| call void @llvm.dbg.assign(metadata i1 undef, metadata !14, metadata !DIExpression(), metadata !13, metadata i32* %a.addr, metadata !DIExpression()), !dbg !15 |
| store i32 %a, i32* %a.addr, align 4, !DIAssignID !16 |
| call void @llvm.dbg.assign(metadata i32 %a, metadata !14, metadata !DIExpression(), metadata !16, metadata i32* %a.addr, metadata !DIExpression()), !dbg !15 |
| %0 = load i32, i32* %a.addr, align 4, !dbg !17 |
| ret i32 %0, !dbg !18 |
| } |
| |
| ... |
| !13 = distinct !DIAssignID() |
| !14 = !DILocalVariable(name: "a", ...) |
| ... |
| !16 = distinct !DIAssignID() |
| ``` |
| |
| The first `llvm.dbg.assign` refers to the `alloca` through `!DIAssignID !13`, |
| and the second refers to the `store` through `!DIAssignID !16`. |
| |
| ### Store-like instructions |
| |
| In the absence of a linked `llvm.dbg.assign`, a store to an address that is |
| known to be the backing storage for a variable is considered to represent an |
| assignment to that variable. |
| |
| This gives us a safe fall-back in cases where `llvm.dbg.assign` intrinsics have |
| been deleted, the `DIAssignID` attachment on the store has been dropped, or the |
| optimiser has made a once-indirect store (not tracked with Assignment Tracking) |
| direct. |
| |
| ### Middle-end: Considerations for pass-writers |
| |
| #### Non-debug instruction updates |
| |
| **Cloning** an instruction: nothing new to do. Cloning automatically clones a |
| `DIAssignID` attachment. Multiple instructions may have the same `DIAssignID` |
| instruction. In this case, the assignment is considered to take place in |
| multiple positions in the program. |
| |
| **Moving** a non-debug instruction: nothing new to do. Instructions linked to an |
| `llvm.dbg.assign` have their initial IR position marked by the position of the |
| `llvm.dbg.assign`. |
| |
| **Deleting** a non-debug instruction: nothing new to do. Simple DSE does not |
| require any change; it’s safe to delete an instruction with a `DIAssignID` |
| attachment. An `llvm.dbg.assign` that uses a `DIAssignID` that is not attached |
| to any instruction indicates that the memory location isn’t valid. |
| |
| **Merging** stores: In many cases no change is required as `DIAssignID` |
| attachments are automatically merged if `combineMetadata` is called. One way or |
| another, the `DIAssignID` attachments must be merged such that new store |
| becomes linked to all the `llvm.dbg.assign` intrinsics that the merged stores |
| were linked to. This can be achieved simply by calling a helper function |
| `Instruction::mergeDIAssignID`. |
| |
| **Inlining** stores: As stores are inlined we generate `llvm.dbg.assign` |
| intrinsics and `DIAssignID` attachments as if the stores represent source |
| assignments, just like the in frontend. This isn’t perfect, as stores may have |
| been moved, modified or deleted before inlining, but it does at least keep the |
| information about the variable correct within the non-inlined scope. |
| |
| **Splitting** stores: SROA and passes that split stores treat `llvm.dbg.assign` |
| intrinsics similarly to `llvm.dbg.declare` intrinsics. Clone the |
| `llvm.dbg.assign` intrinsics linked to the store, update the FragmentInfo in |
| the `ValueExpression`, and give the split stores (and cloned intrinsics) new |
| `DIAssignID` attachments each. In other words, treat the split stores as |
| separate assignments. For partial DSE (e.g. shortening a memset), we do the |
| same except that `llvm.dbg.assign` for the dead fragment gets an `Undef` |
| `Address`. |
| |
| **Promoting** allocas and store/loads: `llvm.dbg.assign` intrinsics implicitly |
| describe joined values in memory locations at CFG joins, but this is not |
| necessarily the case after promoting (or partially promoting) the |
| variable. Passes that promote variables are responsible for inserting |
| `llvm.dbg.assign` intrinsics after the resultant PHIs generated during |
| promotion. `mem2reg` already has to do this (with `llvm.dbg.value`) for |
| `llvm.dbg.declare`s. Where a store has no linked intrinsic, the store is |
| assumed to represent an assignment for variables stored at the destination |
| address. |
| |
| #### Debug intrinsic updates |
| |
| **Moving** a debug intrinsic: avoid moving `llvm.dbg.assign` intrinsics where |
| possible, as they represent a source-level assignment, whose position in the |
| program should not be affected by optimization passes. |
| |
| **Deleting** a debug intrinsic: Nothing new to do. Just like for conventional |
| debug intrinsics, unless it is unreachable, it’s almost always incorrect to |
| delete a `llvm.dbg.assign` intrinsic. |
| |
| ### Lowering `llvm.dbg.assign` to MIR |
| |
| To begin with only SelectionDAG ISel will be supported. `llvm.dbg.assign` |
| intrinsics are lowered to MIR `DBG_INSTR_REF` instructions. Before this happens |
| we need to decide where it is appropriate to use memory locations and where we |
| must use a non-memory location (or no location) for each variable. In order to |
| make those decisions we run a standard fixed-point dataflow analysis that makes |
| the choice at each instruction, iteratively joining the results for each block. |
| |
| ### TODO list |
| |
| As this is an experimental work in progress so there are some items we still need |
| to tackle: |
| |
| * As mentioned in test llvm/test/DebugInfo/assignment-tracking/X86/diamond-3.ll, |
| the analysis should treat escaping calls like untagged stores. |
| |
| * The system expects locals to be backed by a local alloca. This isn't always |
| the case - sometimes a pointer to storage is passed into a function |
| (e.g. sret, byval). We need to be able to handle those cases. See |
| llvm/test/DebugInfo/Generic/assignment-tracking/track-assignments.ll and |
| clang/test/CodeGen/assignment-tracking/assignment-tracking.cpp for examples. |
| |
| * `trackAssignments` doesn't yet work for variables that have their |
| `llvm.dbg.declare` location modified by a `DIExpression`, e.g. when the |
| address of the variable is itself stored in an `alloca` with the |
| `llvm.dbg.declare` using `DIExpression(DW_OP_deref)`. See `indirectReturn` in |
| llvm/test/DebugInfo/Generic/assignment-tracking/track-assignments.ll and in |
| clang/test/CodeGen/assignment-tracking/assignment-tracking.cpp for an |
| example. |
| |
| * In order to solve the first bullet-point we need to be able to specify that a |
| memory location is available without using a `DIAssignID`. This is because |
| the storage address is not computed by an instruction (it's an argument |
| value) and therefore we have nowhere to put the metadata attachment. To solve |
| this we probably need another marker intrinsic to denote "the variable's |
| stack home is X address" - similar to `llvm.dbg.declare` except that it needs |
| to compose with `llvm.dbg.assign` intrinsics such that the stack home address |
| is only selected as a location for the variable when the `llvm.dbg.assign` |
| intrinsics agree it should be. |
| |
| * Given the above (a special "the stack home is X" intrinsic), and the fact |
| that we can only track assignments with fixed offsets and sizes, I think we |
| can probably get rid of the address and address-expression part, since it |
| will always be computable with the info we have. |