Peter Smith | 6d5603e | 2020-03-10 13:26:50 +0000 | [diff] [blame] | 1 | Linker Script implementation notes and policy |
| 2 | ============================================= |
| 3 | |
| 4 | LLD implements a large subset of the GNU ld linker script notation. The LLD |
| 5 | implementation policy is to implement linker script features as they are |
| 6 | documented in the ld `manual <https://sourceware.org/binutils/docs/ld/Scripts.html>`_ |
| 7 | We consider it a bug if the lld implementation does not agree with the manual |
| 8 | and it is not mentioned in the exceptions below. |
| 9 | |
| 10 | The ld manual is not a complete specification, and is not sufficient to build |
| 11 | an implementation. In particular some features are only defined by the |
| 12 | implementation and have changed over time. |
| 13 | |
| 14 | The lld implementation policy for properties of linker scripts that are not |
| 15 | defined by the documentation is to follow the GNU ld implementation wherever |
| 16 | possible. We reserve the right to make different implementation choices where |
| 17 | it is appropriate for LLD. Intentional deviations will be documented in this |
| 18 | file. |
Fangrui Song | fbf41b5 | 2020-03-03 15:37:12 -0800 | [diff] [blame] | 19 | |
Fangrui Song | 9670029 | 2020-08-20 16:05:27 -0700 | [diff] [blame] | 20 | Symbol assignment |
| 21 | ~~~~~~~~~~~~~~~~~ |
| 22 | |
| 23 | A symbol assignment looks like: |
| 24 | |
| 25 | :: |
| 26 | |
| 27 | symbol = expression; |
| 28 | symbol += expression; |
| 29 | |
| 30 | The first form defines ``symbol``. If ``symbol`` is already defined, it will be |
| 31 | overridden. The other form requires ``symbol`` to be already defined. |
| 32 | |
| 33 | For a simple assignment like ``alias = aliasee;``, the ``st_type`` field is |
| 34 | copied from the original symbol. Any arithmetic operation (e.g. ``+ 0`` will |
| 35 | reset ``st_type`` to ``STT_NOTYPE``. |
| 36 | |
| 37 | The ``st_size`` field is set to 0. |
| 38 | |
Fangrui Song | 899fdf5 | 2021-06-13 12:41:11 -0700 | [diff] [blame] | 39 | SECTIONS command |
| 40 | ~~~~~~~~~~~~~~~~ |
| 41 | |
| 42 | A ``SECTIONS`` command looks like: |
| 43 | |
| 44 | :: |
| 45 | |
| 46 | SECTIONS { |
| 47 | section-command |
| 48 | section-command |
| 49 | ... |
| 50 | } [INSERT [AFTER|BEFORE] anchor_section;] |
| 51 | |
| 52 | Each section-command can be a symbol assignment, an output section description, |
| 53 | or an overlay description. |
| 54 | |
| 55 | When the ``INSERT`` keyword is present, the ``SECTIONS`` command describes some |
| 56 | output sections which should be inserted after or before the specified anchor |
| 57 | section. The insertion occurs after input sections have been mapped to output |
| 58 | sections but before orphan sections have been processed. |
| 59 | |
| 60 | In the case where no linker script has been provided or every ``SECTIONS`` |
| 61 | command is followed by ``INSERT``, LLD applies built-in rules which are similar |
| 62 | to GNU ld's internal linker scripts. |
| 63 | |
Nico Weber | e568ccc | 2022-06-19 18:24:52 -0400 | [diff] [blame] | 64 | - Align the first section in a ``PT_LOAD`` segment according to |
| 65 | ``-z noseparate-code``, ``-z separate-code``, or |
| 66 | ``-z separate-loadable-segments`` |
| 67 | - Define ``__bss_start``, ``end``, ``_end``, ``etext``, ``_etext``, ``edata``, |
| 68 | ``_edata`` |
| 69 | - Sort ``.ctors.*``/``.dtors.*``/``.init_array.*``/``.fini_array.*`` and |
| 70 | PowerPC64 specific ``.toc`` |
Fangrui Song | 899fdf5 | 2021-06-13 12:41:11 -0700 | [diff] [blame] | 71 | - Place input ``.text.*`` into output ``.text``, and handle certain variants |
Nico Weber | e568ccc | 2022-06-19 18:24:52 -0400 | [diff] [blame] | 72 | (``.text.hot.``, ``.text.unknown.``, ``.text.unlikely.``, etc) in the |
| 73 | presence of ``-z keep-text-section-prefix``. |
Fangrui Song | 899fdf5 | 2021-06-13 12:41:11 -0700 | [diff] [blame] | 74 | |
Fangrui Song | fbf41b5 | 2020-03-03 15:37:12 -0800 | [diff] [blame] | 75 | Output section description |
| 76 | ~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| 77 | |
| 78 | The description of an output section looks like: |
| 79 | |
| 80 | :: |
| 81 | |
| 82 | section [address] [(type)] : [AT(lma)] [ALIGN(section_align)] [SUBALIGN](subsection_align)] { |
| 83 | output-section-command |
| 84 | ... |
| 85 | } [>region] [AT>lma_region] [:phdr ...] [=fillexp] [,] |
| 86 | |
| 87 | Output section address |
| 88 | ---------------------- |
| 89 | |
| 90 | When an *OutputSection* *S* has ``address``, LLD will set sh_addr to ``address``. |
| 91 | |
| 92 | The ELF specification says: |
| 93 | |
| 94 | > The value of sh_addr must be congruent to 0, modulo the value of sh_addralign. |
| 95 | |
| 96 | The presence of ``address`` can cause the condition unsatisfied. LLD will warn. |
| 97 | GNU ld from Binutils 2.35 onwards will reduce sh_addralign so that |
| 98 | sh_addr=0 (modulo sh_addralign). |
| 99 | |
Fangrui Song | a40f651 | 2023-11-01 22:35:28 -0700 | [diff] [blame] | 100 | When an output section has no input section, GNU ld will eliminate it if it |
| 101 | only contains symbol assignments (e.g. ``.foo { symbol = 42; }``). LLD will |
| 102 | retain such sections unless all the symbol assignments are unreferenced |
| 103 | ``PROVIDED``. |
| 104 | |
| 105 | When an output section has no input section but advances the location counter, |
| 106 | GNU ld sets the ``SHF_WRITE`` flag. LLD sets the SHF_WRITE flag only if the |
| 107 | preceding output section with non-empty input sections also has the SHF_WRITE |
| 108 | flag. |
| 109 | |
Fangrui Song | fb40a61b2 | 2022-02-14 09:52:20 -0800 | [diff] [blame] | 110 | Output section type |
| 111 | ------------------- |
| 112 | |
| 113 | When an *OutputSection* *S* has ``(type)``, LLD will set ``sh_type`` or |
| 114 | ``sh_flags`` of *S*. ``type`` is one of: |
| 115 | |
| 116 | - ``NOLOAD``: set ``sh_type`` to ``SHT_NOBITS``. |
| 117 | - ``COPY``, ``INFO``, ``OVERLAY``: clear the ``SHF_ALLOC`` bit in ``sh_flags``. |
Fangrui Song | 66f8ac8 | 2022-02-17 12:10:58 -0800 | [diff] [blame] | 118 | - ``TYPE=<value>``: set ``sh_type`` to the specified value. ``<value>`` must be |
| 119 | an integer or one of ``SHT_PROGBITS, SHT_NOTE, SHT_NOBITS, SHT_INIT_ARRAY, |
| 120 | SHT_FINI_ARRAY, SHT_PREINIT_ARRAY``. |
| 121 | |
| 122 | When ``sh_type`` is specified, it is an error if an input section in *S* has a |
| 123 | different type. |
Fangrui Song | fb40a61b2 | 2022-02-14 09:52:20 -0800 | [diff] [blame] | 124 | |
Fangrui Song | fbf41b5 | 2020-03-03 15:37:12 -0800 | [diff] [blame] | 125 | Output section alignment |
| 126 | ------------------------ |
| 127 | |
| 128 | sh_addralign of an *OutputSection* *S* is the maximum of |
| 129 | ``ALIGN(section_align)`` and the maximum alignment of the input sections in |
| 130 | *S*. |
| 131 | |
| 132 | When an *OutputSection* *S* has both ``address`` and ``ALIGN(section_align)``, |
| 133 | GNU ld will set sh_addralign to ``ALIGN(section_align)``. |
Fangrui Song | bb4a36e | 2020-03-28 11:01:37 -0700 | [diff] [blame] | 134 | |
| 135 | Output section LMA |
| 136 | ------------------ |
| 137 | |
| 138 | A load address (LMA) can be specified by ``AT(lma)`` or ``AT>lma_region``. |
| 139 | |
| 140 | - ``AT(lma)`` specifies the exact load address. If the linker script does not |
| 141 | have a PHDRS command, then a new loadable segment will be generated. |
| 142 | - ``AT>lma_region`` specifies the LMA region. The lack of ``AT>lma_region`` |
| 143 | means the default region is used. Note, GNU ld propagates the previous LMA |
| 144 | memory region when ``address`` is not specified. The LMA is set to the |
| 145 | current location of the memory region aligned to the section alignment. |
| 146 | If the linker script does not have a PHDRS command, then if |
| 147 | ``lma_region`` is different from the ``lma_region`` for |
| 148 | the previous OutputSection a new loadable segment will be generated. |
| 149 | |
| 150 | The two keywords cannot be specified at the same time. |
| 151 | |
| 152 | If neither ``AT(lma)`` nor ``AT>lma_region`` is specified: |
| 153 | |
Fangrui Song | 8ffb209 | 2020-06-19 09:07:48 -0700 | [diff] [blame] | 154 | - If the previous section is also in the default LMA region, and the two |
| 155 | section have the same memory regions, the difference between the LMA and the |
| 156 | VMA is computed to be the same as the previous difference. |
Fangrui Song | bb4a36e | 2020-03-28 11:01:37 -0700 | [diff] [blame] | 157 | - Otherwise, the LMA is set to the VMA. |
Fangrui Song | 899fdf5 | 2021-06-13 12:41:11 -0700 | [diff] [blame] | 158 | |
| 159 | Overwrite sections |
| 160 | ~~~~~~~~~~~~~~~~~~ |
| 161 | |
| 162 | An ``OVERWRITE_SECTIONS`` command looks like: |
| 163 | |
| 164 | :: |
| 165 | |
| 166 | OVERWRITE_SECTIONS { |
| 167 | output-section-description |
| 168 | output-section-description |
| 169 | ... |
| 170 | } |
| 171 | |
| 172 | Unlike a ``SECTIONS`` command, ``OVERWRITE_SECTIONS`` does not specify a |
| 173 | section order or suppress the built-in rules. |
| 174 | |
| 175 | If a described output section description also appears in a ``SECTIONS`` |
| 176 | command, the ``OVERWRITE_SECTIONS`` command wins; otherwise, the output section |
| 177 | will be added somewhere following the usual orphan section placement rules. |
| 178 | |
| 179 | If a described output section description also appears in an ``INSERT |
| 180 | [AFTER|BEFORE]`` command, the description will be provided by the |
| 181 | description in the ``OVERWRITE_SECTIONS`` command while the insert command |
| 182 | still applies (possibly after orphan section placement). It is recommended to |
| 183 | leave the brace empty (i.e. ``section : {}``) for the insert command, because |
| 184 | its description will be ignored anyway. |
Fangrui Song | 5a58e98 | 2023-09-14 10:33:11 -0700 | [diff] [blame] | 185 | |
| 186 | Built-in functions |
| 187 | ~~~~~~~~~~~~~~~~~~ |
| 188 | |
| 189 | ``DATA_SEGMENT_RELRO_END(offset, exp)`` defines the end of the ``PT_GNU_RELRO`` |
| 190 | segment when ``-z relro`` (default) is in effect. Sections between |
| 191 | ``DATA_SEGMENT_ALIGN`` and ``DATA_SEGMENT_RELRO_END`` are considered RELRO. |
| 192 | |
| 193 | The typical use case is ``. = DATA_SEGMENT_RELRO_END(0, .);`` followed by |
| 194 | writable but non-RELRO sections. LLD ignores ``offset`` and ``exp`` and aligns |
| 195 | the current location to a max-page-size boundary, ensuring that the next |
| 196 | ``PT_LOAD`` segment will not overlap with the ``PT_GNU_RELRO`` segment. |
| 197 | |
| 198 | LLD will insert ``.relro_padding`` immediately before the symbol assignment |
| 199 | using ``DATA_SEGMENT_RELRO_END``. |
Daniel Thornburgh | 66466ff | 2024-05-13 12:30:50 -0500 | [diff] [blame] | 200 | |
Daniel Thornburgh | 7e8a902 | 2024-08-05 13:06:45 -0700 | [diff] [blame] | 201 | Section Classes |
| 202 | ~~~~~~~~~~~~~~~ |
| 203 | |
| 204 | The ``CLASS`` keyword inside a ``SECTIONS`` command defines classes of input |
| 205 | sections: |
| 206 | |
| 207 | :: |
| 208 | |
| 209 | SECTIONS { |
| 210 | CLASS(class_name) { |
| 211 | input-section-description |
| 212 | input-section-description |
| 213 | ... |
| 214 | } |
| 215 | } |
| 216 | |
| 217 | Input section descriptions refer to a class using ``CLASS(class_name)`` |
| 218 | instead of the usual filename and section name patterns. For example: |
| 219 | |
| 220 | :: |
| 221 | |
| 222 | SECTIONS { |
| 223 | CLASS(c) { *(.rodata.earlier) } |
| 224 | .rodata { *(.rodata) CLASS(c) (*.rodata.later) } |
| 225 | } |
| 226 | |
| 227 | Input sections that are assigned to a class are not matched by later patterns, |
| 228 | just as if they had been assigned to an earlier output section. If a class is |
| 229 | referenced in multiple output sections, when a memory region would overflow, |
| 230 | the linker spills input sections from a reference to later references rather |
| 231 | than failing the link. |
| 232 | |
| 233 | Classes cannot reference other classes; an input section is assigned to at most |
| 234 | one class. |
| 235 | |
| 236 | Sections cannot be specified to possibly spill into or out of |
| 237 | ``INSERT [AFTER|BEFORE]``, ``OVERWRITE_SECTIONS``, or ``/DISCARD/``. |
| 238 | |
Daniel Thornburgh | 66466ff | 2024-05-13 12:30:50 -0500 | [diff] [blame] | 239 | Non-contiguous regions |
| 240 | ~~~~~~~~~~~~~~~~~~~~~~ |
| 241 | |
Daniel Thornburgh | 7e8a902 | 2024-08-05 13:06:45 -0700 | [diff] [blame] | 242 | The flag ``--enable-non-contiguous-regions`` provides a version of the above |
| 243 | spilling functionality that is more compatible with GNU LD. It allows input |
| 244 | sections to spill to later pattern matches. (This globally changes the behavior |
| 245 | of patterns.) Unlike GNU ld, ``/DISCARD/`` only matches previously-unmatched |
| 246 | sections (i.e., the flag does not affect it). Also, if a section fails to fit |
| 247 | at any of its matches, the link fails instead of discarding the section. |
| 248 | Accordingly, the GNU flag ``--enable-non-contiguous-regions-warnings`` is not |
| 249 | implemented, as it exists to warn about such occurrences. |