Marco Elver | c70f6e1 | 2022-09-06 15:48:23 +0200 | [diff] [blame] | 1 | ========================= |
| 2 | LLVM PC Sections Metadata |
| 3 | ========================= |
| 4 | |
| 5 | .. contents:: |
| 6 | :local: |
| 7 | |
| 8 | Introduction |
| 9 | ============ |
| 10 | |
| 11 | PC Sections Metadata can be attached to instructions and functions, for which |
| 12 | addresses, viz. program counters (PCs), are to be emitted in specially encoded |
| 13 | binary sections. Metadata is assigned as an ``MDNode`` of the ``MD_pcsections`` |
| 14 | (``!pcsections``) kind; the following section describes the metadata format. |
| 15 | |
| 16 | Metadata Format |
| 17 | =============== |
| 18 | |
| 19 | An arbitrary number of interleaved ``MDString`` and constant operators can be |
| 20 | added, where a new ``MDString`` always denotes a section name, followed by an |
| 21 | arbitrary number of auxiliary constant data encoded along the PC of the |
| 22 | instruction or function. The first operator must be a ``MDString`` denoting the |
| 23 | first section. |
| 24 | |
| 25 | .. code-block:: none |
| 26 | |
| 27 | !0 = !{ |
| 28 | !"<section#1>" |
| 29 | [ , !1 ... ] |
| 30 | [ !"<section#2"> |
| 31 | [ , !2 ... ] |
| 32 | ... ] |
| 33 | } |
| 34 | !1 = !{ iXX <aux-consts#1>, ... } |
| 35 | !2 = !{ iXX <aux-consts#2>, ... } |
| 36 | ... |
| 37 | |
| 38 | The occurrence of ``section#1``, ``section#2``, ..., ``section#N`` in the |
| 39 | metadata causes the backend to emit the PC for the associated instruction or |
| 40 | function to all named sections. For each emitted PC in a section #N, the |
| 41 | constants ``aux-consts#N`` in the tuple ``!N`` will be emitted after the PC. |
| 42 | Multiple tuples with constant data may be provided after a section name string |
| 43 | (e.g. ``!0 = !{"s1", !1, !2}``), and a single constant tuple may be reused for |
| 44 | different sections (e.g. ``!0 = !{"s1", !1, "s2", !1}``). |
| 45 | |
| 46 | Binary Encoding |
| 47 | =============== |
| 48 | |
| 49 | *Instructions* result in emitting a single PC, and *functions* result in |
| 50 | emission of the start of the function and a 32-bit size. This is followed by |
| 51 | the auxiliary constants that followed the respective section name in the |
| 52 | ``MD_pcsections`` metadata. |
| 53 | |
| 54 | To avoid relocations in the final binary, each PC address stored at ``entry`` |
| 55 | is a relative relocation, computed as ``pc - entry``. To decode, a user has to |
| 56 | compute ``entry + *entry``. |
| 57 | |
| 58 | The size of each entry depends on the code model. With large and medium sized |
| 59 | code models, the entry size matches pointer size. For any smaller code model |
| 60 | the entry size is just 32 bits. |
| 61 | |
Marco Elver | bf9814b7 | 2023-02-08 12:25:01 +0100 | [diff] [blame] | 62 | Encoding Options |
| 63 | ---------------- |
| 64 | |
| 65 | Optional encoding options can be passed in the first ``MDString`` operator: |
| 66 | ``<section>!<options>``. The following options are available: |
| 67 | |
| 68 | * ``C`` -- Compress constant integers of size 2-8 bytes as ULEB128; this |
| 69 | includes the function size (but excludes the PC entry). |
| 70 | |
| 71 | For example, ``foo!C`` will emit into section ``foo`` with all constants |
| 72 | encoded as ULEB128. |
| 73 | |
Marco Elver | c70f6e1 | 2022-09-06 15:48:23 +0200 | [diff] [blame] | 74 | Guarantees on Code Generation |
| 75 | ============================= |
| 76 | |
| 77 | Attaching ``!pcsections`` metadata to LLVM IR instructions *shall not* affect |
| 78 | optimizations or code generation outside the requested PC sections. |
| 79 | |
| 80 | While relying on LLVM IR metadata to request PC sections makes the above |
| 81 | guarantee relatively trivial, propagation of metadata through the optimization |
| 82 | and code generation pipeline has the following guarantees. |
| 83 | |
| 84 | Metadata Propagation |
| 85 | -------------------- |
| 86 | |
| 87 | In general, LLVM *does not make any guarantees* about preserving IR metadata |
| 88 | (attached to an ``Instruction``) through IR transformations. When using PC |
| 89 | sections metadata, this guarantee is unchanged, and ``!pcsections`` metadata is |
| 90 | remains *optional* until lowering to machine IR (MIR). |
| 91 | |
| 92 | Note for Code Generation |
| 93 | ------------------------ |
| 94 | |
| 95 | As with other LLVM IR metadata, there are no requirements for LLVM IR |
| 96 | transformation passes to preserve ``!pcsections`` metadata, with the following |
| 97 | exceptions: |
| 98 | |
| 99 | * The ``AtomicExpandPass`` shall preserve ``!pcsections`` metadata |
| 100 | according to the below rules 1-4. |
| 101 | |
| 102 | When translating LLVM IR to MIR, the ``!pcsections`` metadata shall be copied |
| 103 | from the source ``Instruction`` to the target ``MachineInstr`` (set with |
| 104 | ``MachineInstr::setPCSections()``). The instruction selectors and MIR |
| 105 | optimization passes shall preserve PC sections metadata as follows: |
| 106 | |
| 107 | 1. Replacements will preserve PC sections metadata of the replaced |
| 108 | instruction. |
| 109 | |
| 110 | 2. Duplications will preserve PC sections metadata of the copied |
| 111 | instruction. |
| 112 | |
| 113 | 3. Merging will preserve PC sections metadata of one of the two |
| 114 | instructions (no guarantee on which instruction's metadata is used). |
| 115 | |
| 116 | 4. Deletions will loose PC sections metadata. |
| 117 | |
| 118 | This is similar to debug info, and the ``BuildMI()`` helper provides a |
| 119 | convenient way to propagate debug info and ``!pcsections`` metadata in the |
| 120 | ``MIMetadata`` bundle. |
| 121 | |
| 122 | Note for Metadata Users |
| 123 | ----------------------- |
| 124 | |
| 125 | Use cases for ``!pcsections`` metadata should either be fully tolerant to |
| 126 | missing metadata, or the passes inserting ``!pcsections`` metadata should run |
| 127 | *after* all LLVM IR optimization passes to preserve the metadata until being |
| 128 | translated to MIR. |