[lldb][MachO] Local structs for larger VA offsets (#159849)

The Mach-O file format has several load commands which specify the
location of data in the file in UInt32 offsets. lldb uses these same
structures to track the offsets of the binary in virtual address space
when it is running. Normally a binary is loaded in memory contiguously,
so this is fine, but on Darwin systems there is a "system shared cache"
where all system libraries are combined into one region of memory and
pre-linked. The shared cache has the TEXT segments for every binary
loaded contiguously, then the DATA segments, and finally a shared common
LINKEDIT segment for all binaries. The virtual address offset from the
TEXT segment for a libray to the LINKEDIT may exceed 4GB of virtual
address space depending on the structure of the shared cache, so this
use of a UInt32 offset will not work.

There was an initial instance of this issue that I fixed last November
in https://github.com/llvm/llvm-project/pull/117832 where I fixed this
issue for the LC_SYMTAB / `symtab_command` structure. But we have the
same issue now with three additional structures;
`linkedit_data_command`, `dyld_info_command`, and `dysymtab_command`.
For all of these we can see the pattern of `dyld_info.export_off +=
linkedit_slide` applied to the offset fields in ObjectFileMachO.

This defines local structures that mirror the Mach-O structures, except
that it uses UInt64 offset fields so we can reuse the same field for a
large virtual address offset at runtime. I defined ctor's from the
genuine structures, as well as operator= methods so the structures can
be read from the Mach-O binary into the standard object, then copied
into our local expanded versions of them. These structures are ABI in
Mach-O and cannot change their layout.

The alternative is to create local variables alongside these Mach-O load
command objects for the offsets that we care about, adjust those by the
correct VA offsets, and only use those local variables instead of the
fields in the objects. I took the approach of the local enhanced
structure in November and I think it is the cleaner approach.

rdar://160384968
2 files changed
tree: 87d47436ce27a7523af0d5dd5aa1cbe36cda31f2
  1. .ci/
  2. .github/
  3. bolt/
  4. clang/
  5. clang-tools-extra/
  6. cmake/
  7. compiler-rt/
  8. cross-project-tests/
  9. flang/
  10. flang-rt/
  11. libc/
  12. libclc/
  13. libcxx/
  14. libcxxabi/
  15. libsycl/
  16. libunwind/
  17. lld/
  18. lldb/
  19. llvm/
  20. llvm-libgcc/
  21. mlir/
  22. offload/
  23. openmp/
  24. orc-rt/
  25. polly/
  26. runtimes/
  27. third-party/
  28. utils/
  29. .clang-format
  30. .clang-format-ignore
  31. .clang-tidy
  32. .git-blame-ignore-revs
  33. .gitattributes
  34. .gitignore
  35. .mailmap
  36. CODE_OF_CONDUCT.md
  37. CONTRIBUTING.md
  38. LICENSE.TXT
  39. pyproject.toml
  40. README.md
  41. SECURITY.md
README.md

The LLVM Compiler Infrastructure

OpenSSF Scorecard OpenSSF Best Practices libc++

Welcome to the LLVM project!

This repository contains the source code for LLVM, a toolkit for the construction of highly optimized compilers, optimizers, and run-time environments.

The LLVM project has multiple components. The core of the project is itself called “LLVM”. This contains all of the tools, libraries, and header files needed to process intermediate representations and convert them into object files. Tools include an assembler, disassembler, bitcode analyzer, and bitcode optimizer.

C-like languages use the Clang frontend. This component compiles C, C++, Objective-C, and Objective-C++ code into LLVM bitcode -- and from there into object files, using LLVM.

Other components include: the libc++ C++ standard library, the LLD linker, and more.

Getting the Source Code and Building LLVM

Consult the Getting Started with LLVM page for information on building and running LLVM.

For information on how to contribute to the LLVM project, please take a look at the Contributing to LLVM guide.

Getting in touch

Join the LLVM Discourse forums, Discord chat, LLVM Office Hours or Regular sync-ups.

The LLVM project has adopted a code of conduct for participants to all modes of communication within the project.