commit: a6b204b82745764e1460aa1dc26e69ff73195c60
[log]
author: Simon Tatham <simon.tatham@arm.com>
Fri Jul 19 09:01:25 2024 +0100
committer: GitHub <noreply@github.com>
Fri Jul 19 09:01:25 2024 +0100
tree: 26cce2997613f23fd6c51fd818b1001be1b39693
parent: ae2e66b03b7a793642d6291ccd8de2252637416a [diff]

[lld][AArch64] Fix handling of SHT_REL relocation addends. (#98291)

Normally, AArch64 ELF objects use the SHT_RELA type of relocation
section, with addends stored in each relocation. But some legacy AArch64
object producers still use SHT_REL in some situations, storing the
addend in the initial value of the data item or instruction immediate
field that the relocation will modify. LLD was mishandling relocations
of this type in multiple ways.

Firstly, many of the cases in the `getImplicitAddend` switch statement
were apparently based on a misunderstanding. The relocation types that
operate on instructions should be expecting to find an instruction of
the appropriate type, and should extract its immediate field. But many
of them were instead behaving as if they expected to find a raw 64-, 32-
or 16-bit value, and wanted to extract the right range of bits. For
example, the relocation for R_AARCH64_ADD_ABS_LO12_NC read a 16-bit word
and extracted its bottom 12 bits, presumably on the thinking that the
relocation writes the low 12 bits of the value it computes. But the
input addend for SHT_REL purposes occupies the immediate field of an
AArch64 ADD instruction, which meant it should have been reading a
32-bit AArch64 instruction encoding, and extracting bits 10-21 where the
immediate field lives. Worse, the R_AARCH64_MOVW_UABS_G2 relocation was
reading 64 bits from the input section, and since it's only relocating a
32-bit instruction, the second half of those bits would have been
completely unrelated!

Adding to that confusion, most of the values being read were first
sign-extended, and _then_ had a range of bits extracted, which doesn't
make much sense. They should have first extracted some bits from the
instruction encoding, and then sign-extended that 12-, 19-, or 21-bit
result (or whatever else) to a full 64-bit value.

Secondly, after the relocated value was computed, in most cases it was
being written into the target instruction field via a bitwise OR
operation. This meant that if the instruction field didn't initially
contain all zeroes, the wrong result would end up in it. That's not even
a 100% reliable strategy for SHT_RELA, which in some situations is used
for its repeatability (in the sense that applying the relocation twice
should cause the second answer to overwrite the first, so you can
relocate an image in advance to its most likely address, and then do it
again at load time if that turns out not to be available). But for
SHT_REL, when you expect nonzero immediate fields in normal use, it
couldn't possibly work. You could see the effect of this in the existing
test, which had a lot of FFFFFF in the expected output which there
wasn't any plausible justification for.

Finally, one relocation type was actually missing: there was no support
for R_AARCH64_ADR_PREL_LO21 at all.

So I've rewritten most of the cases in `getImplicitAddend`; replaced the
bitwise ORs with overwrites; and replaced the previous test with a much
more thorough one, obtained by writing an input assembly file with
explicitly specified relocations on instructions that also have
carefully selected immediate fields, and then doing some yaml2obj
seddery to turn the RELA relocation section into a REL one.

2 files changed

tree: 26cce2997613f23fd6c51fd818b1001be1b39693

README.md

The LLVM Compiler Infrastructure

Welcome to the LLVM project!

This repository contains the source code for LLVM, a toolkit for the construction of highly optimized compilers, optimizers, and run-time environments.

The LLVM project has multiple components. The core of the project is itself called “LLVM”. This contains all of the tools, libraries, and header files needed to process intermediate representations and convert them into object files. Tools include an assembler, disassembler, bitcode analyzer, and bitcode optimizer.

C-like languages use the Clang frontend. This component compiles C, C++, Objective-C, and Objective-C++ code into LLVM bitcode -- and from there into object files, using LLVM.

Other components include: the libc++ C++ standard library, the LLD linker, and more.

Getting the Source Code and Building LLVM

Consult the Getting Started with LLVM page for information on building and running LLVM.

For information on how to contribute to the LLVM project, please take a look at the Contributing to LLVM guide.

Getting in touch

Join the LLVM Discourse forums, Discord chat, LLVM Office Hours or Regular sync-ups.

The LLVM project has adopted a code of conduct for participants to all modes of communication within the project.