| # Machine IR (MIR) Format Reference Manual |
| |
| ```{contents} |
| :local: |
| ``` |
| |
| ```{warning} |
| This is a work in progress. |
| ``` |
| |
| ## Introduction |
| |
| This document is a reference manual for the Machine IR (MIR) serialization |
| format. MIR is a human-readable serialization format that is used to represent |
| LLVM's {ref}`machine specific intermediate representation <machine code representation>`. |
| |
| The MIR serialization format is designed to be used for testing the code |
| generation passes in LLVM. |
| |
| ## Overview |
| |
| The MIR serialization format uses a YAML container. YAML is a standard |
| data serialization language, and the full YAML language spec can be read at |
| [yaml.org](http://www.yaml.org/spec/1.2/spec.html#Introduction). |
| |
| A MIR file is split into a series of [YAML documents]. The first document |
| can contain an optional embedded LLVM IR module, and the rest of the documents |
| contain the serialized machine functions. |
| |
| [YAML documents]: http://www.yaml.org/spec/1.2/spec.html#id2800132 |
| |
| ## MIR Testing Guide |
| |
| You can use the MIR format for testing in two different ways: |
| |
| - You can write MIR tests that invoke a single code generation pass using the |
| `-run-pass` option in llc. |
| |
| - You can use llc's `-stop-after` option with existing or new LLVM assembly |
| tests and check the MIR output of a specific code generation pass. |
| |
| ### Testing Individual Code Generation Passes |
| |
| The `-run-pass` option in llc allows you to create MIR tests that invoke just |
| a single code generation pass. When this option is used, llc will parse an |
| input MIR file, run the specified code generation pass(es), and output the |
| resulting MIR code. |
| |
| You can generate an input MIR file for the test by using the `-stop-after` or |
| `-stop-before` option in llc. For example, if you would like to write a test |
| for the post register allocation pseudo instruction expansion pass, you can |
| specify the machine copy propagation pass in the `-stop-after` option, as it |
| runs just before the pass that we are trying to test: |
| |
| `llc -stop-after=machine-cp bug-trigger.ll -o test.mir` |
| |
| If the same pass is run multiple times, a run index can be included |
| after the name with a comma. |
| |
| `llc -stop-after=dead-mi-elimination,1 bug-trigger.ll -o test.mir` |
| |
| After generating the input MIR file, you'll have to add a `RUN` line that uses |
| the `-run-pass` option to it. In order to test the post register allocation |
| pseudo instruction expansion pass on X86-64, a run line like the one shown |
| below can be used: |
| |
| `# RUN: llc -o - %s -mtriple=x86_64-- -run-pass=postrapseudos | FileCheck %s` |
| |
| The MIR files are target dependent, so they have to be placed in the |
| target-specific test directories (`lib/CodeGen/TARGETNAME`). They also need to |
| specify a target triple or a target architecture either in the `RUN` line or in |
| the embedded LLVM IR module. |
| |
| #### Simplifying MIR files |
| |
| The MIR code coming out of `-stop-after`/`-stop-before` is very verbose. |
| Tests are more accessible and future proof when simplified: |
| |
| - Use the `-simplify-mir` option with llc. |
| |
| - Machine function attributes often have default values or the test works just |
| as well with default values. Typical candidates for this are: `alignment:`, |
| `exposesReturnsTwice`, `legalized`, `regBankSelected`, `selected`. |
| The whole `frameInfo` section is often unnecessary if there is no special |
| frame usage in the function. `tracksRegLiveness` on the other hand is often |
| necessary for some passes that care about block livein lists. |
| |
| - The (global) `liveins:` list is typically only interesting for early |
| instruction selection passes and can be removed when testing later passes. |
| The per-block `liveins:` on the other hand are necessary if |
| `tracksRegLiveness` is true. |
| |
| - Branch probability data in block `successors:` lists can be dropped if the |
| test doesn't depend on it. Example: |
| `successors: %bb.1(0x40000000), %bb.2(0x40000000)` can be replaced with |
| `successors: %bb.1, %bb.2`. |
| |
| - MIR code contains a whole IR module. This is necessary because there are |
| no equivalents in MIR for global variables, references to external functions, |
| function attributes, metadata, debug info. Instead, some MIR data references |
| the IR constructs. You can often remove them if the test doesn't depend on |
| them. |
| |
| - Alias Analysis is performed on IR values. These are referenced by memory |
| operands in MIR. Example: `:: (load 8 from %ir.foobar, !alias.scope !9)`. |
| If the test doesn't depend on (good) alias analysis the references can be |
| dropped: `:: (load 8)` |
| |
| - MIR blocks can reference IR blocks for debug printing, profile information, |
| or debug locations. Example: `bb.42.myblock` in MIR references the IR block |
| `myblock`. It is usually possible to drop the `.myblock` reference and simply |
| use `bb.42`. |
| |
| - If there are no memory operands or blocks referencing the IR, then the |
| IR function can be replaced by a parameterless dummy function like |
| `define @func() { ret void }`. |
| |
| - It is possible to drop the whole IR section of the MIR file if it only |
| contains dummy functions (see above). The `.mir` loader will create the |
| IR functions automatically in this case. |
| |
| (limitations)= |
| |
| ### Limitations |
| |
| Currently, the MIR format has several limitations in terms of which state it |
| can serialize: |
| |
| - The target-specific state in the target-specific `MachineFunctionInfo` |
| subclasses isn't serialized at the moment. |
| |
| - The target-specific `MachineConstantPoolValue` subclasses (in the ARM and |
| SystemZ backends) aren't serialized at the moment. |
| |
| - The `MCSymbol` machine operands don't support temporary or local symbols. |
| |
| - A lot of the state in `MachineModuleInfo` isn't serialized - only the CFI |
| instructions and the variable debug information from MMI are serialized right |
| now. |
| |
| These limitations impose restrictions on what you can test with the MIR format. |
| For now, tests that would like to test some behaviour that depends on the state |
| of temporary or local `MCSymbol` operands or the exception handling state in |
| MMI, can't use the MIR format. As well as that, tests that test some behaviour |
| that depends on the state of the target-specific `MachineFunctionInfo` or |
| `MachineConstantPoolValue` subclasses can't use the MIR format at the moment. |
| |
| ## High Level Structure |
| |
| (embedded-module)= |
| |
| ### Embedded Module |
| |
| When the first YAML document contains a [YAML block literal string], the MIR |
| parser will treat this string as an LLVM assembly language string that |
| represents an embedded LLVM IR module. |
| Here is an example of a YAML document that contains an LLVM module: |
| |
| ```llvm |
| define i32 @inc(ptr %x) { |
| entry: |
| %0 = load i32, ptr %x |
| %1 = add i32 %0, 1 |
| store i32 %1, ptr %x |
| ret i32 %1 |
| } |
| ``` |
| |
| [YAML block literal string]: http://www.yaml.org/spec/1.2/spec.html#id2795688 |
| |
| ### Machine Functions |
| |
| The remaining YAML documents contain the machine functions. This is an example |
| of such a YAML document: |
| |
| ```text |
| --- |
| name: inc |
| tracksRegLiveness: true |
| liveins: |
| - { reg: '$rdi' } |
| callSites: |
| - { bb: 0, offset: 3, fwdArgRegs: |
| - { arg: 0, reg: '$edi' } } |
| body: | |
| bb.0.entry: |
| liveins: $rdi |
| |
| $eax = MOV32rm $rdi, 1, _, 0, _ |
| $eax = INC32r killed $eax, implicit-def dead $eflags |
| MOV32mr killed $rdi, 1, _, 0, _, $eax |
| CALL64pcrel32 @foo <regmask...> |
| RETQ $eax |
| ... |
| ``` |
| |
| The document above consists of attributes that represent the various |
| properties and data structures in a machine function. |
| |
| The attribute `name` is required, and its value should be identical to the |
| name of a function that this machine function is based on. |
| |
| The attribute `body` is a [YAML block literal string]. Its value represents |
| the function's machine basic blocks and their machine instructions. |
| |
| The attribute `callSites` is a representation of call site information which |
| keeps track of call instructions and registers used to transfer call arguments. |
| |
| ## Machine Instructions Format Reference |
| |
| The machine basic blocks and their instructions are represented using a custom, |
| human-readable serialization language. This language is used in the |
| [YAML block literal string] that corresponds to the machine function's body. |
| |
| A source string that uses this language contains a list of machine basic |
| blocks, which are described in the section below. |
| |
| ### Machine Basic Blocks |
| |
| A machine basic block is defined in a single block definition source construct |
| that contains the block's ID. |
| The example below defines two blocks that have an ID of zero and one: |
| |
| ```text |
| bb.0: |
| <instructions> |
| bb.1: |
| <instructions> |
| ``` |
| |
| A machine basic block can also have a name. It should be specified after the ID |
| in the block's definition: |
| |
| ```text |
| bb.0.entry: ; This block's name is "entry" |
| <instructions> |
| ``` |
| |
| The block's name should be identical to the name of the IR block that this |
| machine block is based on. |
| |
| (block-references)= |
| |
| #### Block References |
| |
| The machine basic blocks are identified by their ID numbers. Individual |
| blocks are referenced using the following syntax: |
| |
| ```text |
| %bb.<id> |
| ``` |
| |
| Example: |
| |
| ```llvm |
| %bb.0 |
| ``` |
| |
| The following syntax is also supported, but the former syntax is preferred for |
| block references: |
| |
| ```text |
| %bb.<id>[.<name>] |
| ``` |
| |
| Example: |
| |
| ```llvm |
| %bb.1.then |
| ``` |
| |
| #### Successors |
| |
| The machine basic block's successors must be specified before any of the |
| instructions: |
| |
| ```text |
| bb.0.entry: |
| successors: %bb.1.then, %bb.2.else |
| <instructions> |
| bb.1.then: |
| <instructions> |
| bb.2.else: |
| <instructions> |
| ``` |
| |
| The branch weights can be specified in parentheses after the successor blocks. |
| The example below defines a block that has two successors with branch weights |
| of 32 and 16: |
| |
| ```text |
| bb.0.entry: |
| successors: %bb.1.then(32), %bb.2.else(16) |
| ``` |
| |
| (bb-liveins)= |
| |
| #### Live In Registers |
| |
| The machine basic block's live in registers have to be specified before any of |
| its instructions: |
| |
| ```text |
| bb.0.entry: |
| liveins: $edi, $esi |
| ``` |
| |
| The list of live in registers and successors can be empty. The language also |
| allows multiple live in register and successor lists; they are combined into |
| one list by the parser. |
| |
| #### Miscellaneous Attributes |
| |
| The attributes `IsAddressTaken`, `IsLandingPad`, |
| `IsInlineAsmBrIndirectTarget` and `Alignment` can be specified in parentheses |
| after the block's definition: |
| |
| ```text |
| bb.0.entry (address-taken): |
| <instructions> |
| bb.2.else (align 4): |
| <instructions> |
| bb.3(landing-pad, align 4): |
| <instructions> |
| bb.4 (inlineasm-br-indirect-target): |
| <instructions> |
| ``` |
| |
| <!-- TODO: Describe the way the reference to an unnamed LLVM IR block can be |
| preserved. --> |
| |
| `Alignment` is specified in bytes, and must be a power of two. |
| |
| (mir-instructions)= |
| |
| ### Machine Instructions |
| |
| A machine instruction is composed of a name, |
| {ref}`machine operands <machine-operands>`, |
| {ref}`instruction flags <instruction-flags>`, and machine memory operands. |
| |
| The instruction's name is usually specified before the operands. The example |
| below shows an instance of the X86 `RETQ` instruction with a single machine |
| operand: |
| |
| ```text |
| RETQ $eax |
| ``` |
| |
| However, if the machine instruction has one or more explicitly defined register |
| operands, the instruction's name has to be specified after them. The example |
| below shows an instance of the AArch64 `LDPXpost` instruction with three |
| defined register operands: |
| |
| ```text |
| $sp, $fp, $lr = LDPXpost $sp, 2 |
| ``` |
| |
| The instruction names are serialized using the exact definitions from the |
| target's `*InstrInfo.td` files, and they are case sensitive. This means that |
| similar instruction names like `TSTri` and `tSTRi` represent different |
| machine instructions. |
| |
| (instruction-flags)= |
| |
| #### Instruction Flags |
| |
| The flag `frame-setup` or `frame-destroy` can be specified before the |
| instruction's name: |
| |
| ```text |
| $fp = frame-setup ADDXri $sp, 0, 0 |
| ``` |
| |
| ```text |
| $x21, $x20 = frame-destroy LDPXi $sp |
| ``` |
| |
| (registers)= |
| |
| #### Bundled Instructions |
| |
| The syntax for bundled instructions is the following: |
| |
| ```text |
| BUNDLE implicit-def $r0, implicit-def $r1, implicit $r2 { |
| $r0 = SOME_OP $r2 |
| $r1 = ANOTHER_OP internal $r0 |
| } |
| ``` |
| |
| The first instruction is often a bundle header. The instructions between `{` |
| and `}` are bundled with the first instruction. |
| |
| (mir-registers)= |
| |
| ### Registers |
| |
| Registers are one of the key primitives in the machine instruction |
| serialization language. They are primarily used in the |
| {ref}`register machine operands <register-operands>`, |
| but they can also be used in a number of other places, like the |
| {ref}`basic block's live in list <bb-liveins>`. |
| |
| The physical registers are identified by their name and by the '$' prefix sigil. |
| They use the following syntax: |
| |
| ```text |
| $<name> |
| ``` |
| |
| The example below shows three X86 physical registers: |
| |
| ```text |
| $eax |
| $r15 |
| $eflags |
| ``` |
| |
| The virtual registers are identified by their ID number and by the '%' sigil. |
| They use the following syntax: |
| |
| ```text |
| %<id> |
| ``` |
| |
| Example: |
| |
| ```text |
| %0 |
| ``` |
| |
| The null registers are represented using an underscore ('`_`'). They can also be |
| represented using a '`$noreg`' named register, although the former syntax |
| is preferred. |
| |
| (machine-operands)= |
| |
| ### Machine Operands |
| |
| There are eighteen different kinds of machine operands, and all of them can be |
| serialized. |
| |
| #### Immediate Operands |
| |
| The immediate machine operands are untyped, 64-bit signed integers. The |
| example below shows an instance of the X86 `MOV32ri` instruction that has an |
| immediate machine operand `-42`: |
| |
| ```text |
| $eax = MOV32ri -42 |
| ``` |
| |
| An immediate operand is also used to represent a subregister index when the |
| machine instruction has one of the following opcodes: |
| |
| - `EXTRACT_SUBREG` |
| |
| - `INSERT_SUBREG` |
| |
| - `REG_SEQUENCE` |
| |
| - `SUBREG_TO_REG` |
| |
| In case this is true, the Machine Operand is printed according to the target. |
| |
| For example: |
| |
| In `AArch64RegisterInfo.td`: |
| |
| ```text |
| def sub_32 : SubRegIndex<32>; |
| ``` |
| |
| If the second operand is an immediate with the value `15` (a target-dependent |
| value), based on the instruction's opcode and the operand's index the operand |
| will be printed as `%subreg.sub_32`: |
| |
| ```text |
| %1:gpr64 = SUBREG_TO_REG %0, %subreg.sub_32 |
| ``` |
| |
| For integers larger than 64 bits, we use a special machine operand, `MO_CImmediate`, |
| which stores the immediate in a `ConstantInt` using an `APInt` (LLVM's |
| arbitrary-precision integers). |
| |
| <!-- TODO: Describe the FPIMM immediate operands. --> |
| |
| (register-operands)= |
| |
| #### Register Operands |
| |
| The {ref}`register <registers>` primitive is used to represent the register |
| machine operands. The register operands can also have optional |
| {ref}`register flags <register-flags>`, |
| {ref}`a subregister index <subregister-indices>`, |
| and a reference to the tied register operand. |
| The full syntax of a register operand is shown below: |
| |
| ```text |
| [<flags>] <register> [ .<subregister-idx-name> ] [ :<register-class> ] [ (tied-def <tied-op>) ] [ (<type>) ] |
| ``` |
| |
| This example shows an instance of the X86 `XOR32rr` instruction that has |
| 5 register operands with different register flags: |
| |
| ```text |
| dead $eax = XOR32rr undef $eax, undef $eax, implicit-def dead $eflags, implicit-def $al |
| ``` |
| |
| Note that subregister-index, register-class and type cannot be specified for |
| physical registers. Additionally, tied-def can only be specified for a use. |
| |
| (register-flags)= |
| |
| ##### Register Flags |
| |
| The table below shows all of the possible register flags along with the |
| corresponding internal `llvm::RegState` representation: |
| |
| <!-- Keep this in sync with MachineInstrBuilder.h --> |
| |
| | Flag | Internal Value | Meaning | |
| | --- | --- | --- | |
| | `def` | `RegState::Define` | Register definition. | |
| | `implicit` | `RegState::Implicit` | Not emitted register (e.g., carry, or temporary result). | |
| | `killed` | `RegState::Kill` | The last use of a register. | |
| | `dead` | `RegState::Dead` | Unused definition. | |
| | `undef` | `RegState::Undef` | Value of the register doesn't matter. | |
| | `early-clobber` | `RegState::EarlyClobber` | Register definition happens before uses. | |
| | `debug-use` | `RegState::Debug` | Register 'use' is for debugging purpose. | |
| | `internal` | `RegState::InternalRead` | Register reads a value that is defined inside the same instruction or bundle. | |
| | `renamable` | `RegState::Renamable` | Register that may be renamed. | |
| | `implicit-def` | `RegState::ImplicitDefine` | `implicit` and `def` | |
| |
| (subregister-indices)= |
| |
| ##### Subregister Indices |
| |
| The register machine operands can reference a portion of a register by using |
| the subregister indices. The example below shows an instance of the `COPY` |
| pseudo instruction that uses the X86 `sub_8bit` subregister index to copy 8 |
| lower bits from the 32-bit virtual register 0 to the 8-bit virtual register 1: |
| |
| ```text |
| %1 = COPY %0.sub_8bit |
| ``` |
| |
| The names of the subregister indices are target specific, and are typically |
| defined in the target's `*RegisterInfo.td` file. |
| |
| #### Constant Pool Indices |
| |
| A constant pool index (CPI) operand is printed using its index in the |
| function's `MachineConstantPool` and an offset. |
| |
| For example, a CPI with the index 1 and offset 8: |
| |
| ```text |
| %1:gr64 = MOV64ri %const.1 + 8 |
| ``` |
| |
| For a CPI with the index 0 and offset -12: |
| |
| ```text |
| %1:gr64 = MOV64ri %const.0 - 12 |
| ``` |
| |
| A constant pool entry is bound to an LLVM IR `Constant` or a target-specific |
| `MachineConstantPoolValue`. When serializing all the function's constants, the |
| following format is used: |
| |
| ```text |
| constants: |
| - id: <index> |
| value: <value> |
| alignment: <alignment> |
| isTargetSpecific: <target-specific> |
| ``` |
| |
| where: |
| - `<index>` is a 32-bit unsigned integer; |
| - `<value>` is a [LLVM IR Constant](https://www.llvm.org/docs/LangRef.html#constants); |
| - `<alignment>` is a 32-bit unsigned integer specified in bytes, and must be |
| power of two; |
| - `<target-specific>` is either true or false. |
| |
| Example: |
| |
| ```text |
| constants: |
| - id: 0 |
| value: 'double 3.250000e+00' |
| alignment: 8 |
| - id: 1 |
| value: 'g-(LPC0+8)' |
| alignment: 4 |
| isTargetSpecific: true |
| ``` |
| |
| #### Global Value Operands |
| |
| The global value machine operands reference the global values from the |
| {ref}`embedded LLVM IR module <embedded-module>`. |
| The example below shows an instance of the X86 `MOV64rm` instruction that has |
| a global value operand named `G`: |
| |
| ```text |
| $rax = MOV64rm $rip, 1, _, @G, _ |
| ``` |
| |
| The named global values are represented using an identifier with the `@` prefix. |
| If the identifier doesn't match the regular expression |
| `[-a-zA-Z$._][-a-zA-Z$._0-9]*`, then this identifier must be quoted. |
| |
| The unnamed global values are represented using an unsigned numeric value with |
| the `@` prefix, as in the following examples: `@0`, `@989`. |
| |
| #### Target-dependent Index Operands |
| |
| A target index operand is a target-specific index and an offset. The |
| target-specific index is printed using target-specific names and a positive or |
| negative offset. |
| |
| For example, the `amdgpu-constdata-start` is associated with the index `0` |
| in the AMDGPU backend. So if we have a target index operand with the index 0 |
| and the offset 8: |
| |
| ```text |
| $sgpr2 = S_ADD_U32 _, target-index(amdgpu-constdata-start) + 8, implicit-def _, implicit-def _ |
| ``` |
| |
| #### Jump-table Index Operands |
| |
| A jump-table index operand with the index 0 is printed as follows: |
| |
| ```text |
| tBR_JTr killed $r0, %jump-table.0 |
| ``` |
| |
| A machine jump-table entry contains a list of `MachineBasicBlocks`. When serializing all the function's jump-table entries, the following format is used: |
| |
| ```text |
| jumpTable: |
| kind: <kind> |
| entries: |
| - id: <index> |
| blocks: [ <bbreference>, <bbreference>, ... ] |
| ``` |
| |
| where `<kind>` describes how the jump table is represented and emitted (plain address, relocations, PIC, etc.), and each `<index>` is a 32-bit unsigned integer and `blocks` contains a list of {ref}`machine basic block references <block-references>`. |
| |
| Example: |
| |
| ```text |
| jumpTable: |
| kind: inline |
| entries: |
| - id: 0 |
| blocks: [ '%bb.3', '%bb.9', '%bb.4.d3' ] |
| - id: 1 |
| blocks: [ '%bb.7', '%bb.7', '%bb.4.d3', '%bb.5' ] |
| ``` |
| |
| #### External Symbol Operands |
| |
| An external symbol operand is represented using an identifier with the `&` |
| prefix. The identifier is surrounded with ""'s and escaped if it has any |
| special non-printable characters in it. |
| |
| Example: |
| |
| ```text |
| CALL64pcrel32 &__stack_chk_fail, csr_64, implicit $rsp, implicit-def $rsp |
| ``` |
| |
| #### MCSymbol Operands |
| |
| An `MCSymbol` operand holds a pointer to an `MCSymbol`. For the limitations |
| of this operand in MIR, see {ref}`limitations <limitations>`. |
| |
| The syntax is: |
| |
| ```text |
| EH_LABEL <mcsymbol Ltmp1> |
| ``` |
| |
| #### Debug Instruction Reference Operands |
| |
| A debug instruction reference operand is a pair of indices, referring to an |
| instruction and an operand within that instruction, respectively; see |
| {ref}`Instruction referencing locations <instruction-referencing-locations>`. |
| |
| The example below uses a reference to Instruction 1, Operand 0: |
| |
| ```text |
| DBG_INSTR_REF !123, !DIExpression(DW_OP_LLVM_arg, 0), dbg-instr-ref(1, 0), debug-location !456 |
| ``` |
| |
| #### CFIIndex Operands |
| |
| A CFI Index operand holds an index into a per-function side-table, |
| `MachineFunction::getFrameInstructions()`, which references all the frame |
| instructions in a `MachineFunction`. A `CFI_INSTRUCTION` may look like it |
| contains multiple operands, but the only operand it contains is the CFI Index. |
| The other operands are tracked by the `MCCFIInstruction` object. |
| |
| The syntax is: |
| |
| ```text |
| CFI_INSTRUCTION offset $w30, -16 |
| ``` |
| |
| which may be emitted later in the MC layer as: |
| |
| ```text |
| .cfi_offset w30, -16 |
| ``` |
| |
| #### IntrinsicID Operands |
| |
| An Intrinsic ID operand contains a generic intrinsic ID or a target-specific ID. |
| |
| The syntax for the `returnaddress` intrinsic is: |
| |
| ```text |
| $x0 = COPY intrinsic(@llvm.returnaddress) |
| ``` |
| |
| #### Predicate Operands |
| |
| A Predicate operand contains an IR predicate from `CmpInst::Predicate`, like |
| `ICMP_EQ`, etc. |
| |
| For an int eq predicate `ICMP_EQ`, the syntax is: |
| |
| ```text |
| %2:gpr(s32) = G_ICMP intpred(eq), %0, %1 |
| ``` |
| |
| #### LaneMask Operands |
| |
| A LaneMask operand contains a LaneBitmask struct representing the covering of a |
| register with sub-registers. Instructions typically associate a LaneMask operand |
| with one or more Register operands, and use it to represent sub-register |
| granularity information like liveness for those associated Register operands. |
| |
| |
| For example, the COPY_LANEMASK instruction uses this operand to copy only active |
| lanes (of the source register) in the mask. The syntax for it would look like: |
| |
| ```text |
| $vgpr1 = COPY_LANEMASK $vgpr0, lanemask(0x00000000000000C0) |
| ``` |
| |
| <!-- TODO: Describe the parsers default behaviour when optional YAML attributes |
| are missing. |
| TODO: Describe the syntax for virtual register YAML definitions. |
| TODO: Describe the machine function's YAML flag attributes. |
| TODO: Describe the syntax for the register mask machine operands. |
| TODO: Describe the frame information YAML mapping. |
| TODO: Describe the syntax of the stack object machine operands and their |
| YAML definitions. |
| TODO: Describe the syntax of the block address machine operands. |
| TODO: Describe the syntax of the metadata machine operands, and the |
| instructions debug location attribute. |
| TODO: Describe the syntax of the register live out machine operands. |
| TODO: Describe the syntax of the machine memory operands. --> |
| |
| #### Comments |
| |
| Machine operands can have C/C++ style comments, which are annotations enclosed |
| between `/*` and `*/` to improve readability of e.g., immediate operands. |
| In the example below, ARM instructions EOR and BCC and immediate operands |
| `14` and `0` have been annotated with their condition codes (CC) |
| definitions, i.e. the `always` and `eq` condition codes: |
| |
| ```text |
| dead renamable $r2, $cpsr = tEOR killed renamable $r2, renamable $r1, 14 /* CC::always */, $noreg |
| t2Bcc %bb.4, 0 /* CC:eq */, killed $cpsr |
| ``` |
| |
| As these annotations are comments, they are ignored by the MI parser. |
| Comments can be added or customized by overriding InstrInfo's hook |
| `createMIROperandComment()`. |
| |
| ### Debug-Info constructs |
| |
| Most of the debugging information in a MIR file is found in the metadata |
| of the embedded module. Within a machine function, that metadata is referred to |
| by various constructs to describe source locations and variable locations. |
| |
| #### Source locations |
| |
| Every MIR instruction may optionally have a trailing reference to a |
| `DILocation` metadata node, after all operands and symbols, but before |
| memory operands: |
| |
| ```text |
| $rbp = MOV64rr $rdi, debug-location !12 |
| ``` |
| |
| The source location attachment is synonymous with the `!dbg` metadata |
| attachment in LLVM-IR. The absence of a source location attachment will be |
| represented by an empty `DebugLoc` object in the machine instruction. |
| |
| #### Fixed variable locations |
| |
| There are several ways of specifying variable locations. The simplest is |
| describing a variable that is permanently located on the stack. In the stack |
| or fixedStack attribute of the machine function, the variable, scope, and |
| any qualifying location modifier are provided: |
| |
| ```text |
| - { id: 0, name: offset.addr, offset: -24, size: 8, alignment: 8, stack-id: default, |
| 4 debug-info-variable: '!1', debug-info-expression: '!DIExpression()', |
| debug-info-location: '!2' } |
| ``` |
| |
| Where: |
| |
| - `debug-info-variable` identifies a DILocalVariable metadata node, |
| |
| - `debug-info-expression` adds qualifiers to the variable location, |
| |
| - `debug-info-location` identifies a DILocation metadata node. |
| |
| These metadata attributes correspond to the operands of a `#dbg_declare` |
| IR debug record, see the {ref}`source level debugging <debug_records>` documentation. |
| |
| #### Varying variable locations |
| |
| Variables that are not always on the stack or change location are specified |
| with the `DBG_VALUE` meta machine instruction. It is synonymous with the |
| `#dbg_value` IR record, and is written: |
| |
| ```text |
| DBG_VALUE $rax, $noreg, !123, !DIExpression(), debug-location !456 |
| ``` |
| |
| The operands to which respectively: |
| |
| 1. Identifies a machine location such as a register, immediate, or frame index, |
| |
| 2. Is either $noreg, or immediate value zero if an extra level of indirection is to be added to the first operand, |
| |
| 3. Identifies a `DILocalVariable` metadata node, |
| |
| 4. Specifies an expression qualifying the variable location, either inline or as a metadata node reference, |
| |
| While the source location identifies the `DILocation` for the scope of the |
| variable. The second operand (`IsIndirect`) is deprecated and to be deleted. |
| All additional qualifiers for the variable location should be made through the |
| expression metadata. |
| |
| (instruction-referencing-locations)= |
| |
| #### Instruction referencing locations |
| |
| This experimental feature aims to separate the specification of variable |
| *values* from the program point where a variable takes on that value. Changes |
| in a variable value occur in the same manner as `DBG_VALUE` meta instructions |
| but using `DBG_INSTR_REF`. Variable values are identified by a pair of |
| instruction number and operand number. Consider the example below: |
| |
| ```text |
| $rbp = MOV64ri 0, debug-instr-number 1, debug-location !12 |
| DBG_INSTR_REF !123, !DIExpression(DW_OP_LLVM_arg, 0), dbg-instr-ref(1, 0), debug-location !456 |
| ``` |
| |
| Instruction numbers are directly attached to machine instructions with an |
| optional `debug-instr-number` attachment, before the optional |
| `debug-location` attachment. The value defined in `$rbp` in the code |
| above would be identified by the pair `<1, 0>`. |
| |
| The 3rd operand of the `DBG_INSTR_REF` above records the instruction |
| and operand number `<1, 0>`, identifying the value defined by the `MOV64ri`. |
| The first two operands to `DBG_INSTR_REF` are identical to `DBG_VALUE_LIST`, |
| and the `DBG_INSTR_REF` s position records where the variable takes on the |
| designated value in the same way. |
| |
| More information about how these constructs are used is available in |
| {doc}`InstrRefDebugInfo`. The related documents {doc}`SourceLevelDebugging` and |
| {doc}`HowToUpdateDebugInfo` may be useful as well. |