[lldb][Darwin] Fetch detailed binary info in chunks (#190720)

When binaries have been loaded into a process on Darwin, lldb sends a
jGetLoadedDynamicLibrariesInfos packet to get the filepath, uuid, load
address, and detailed information from the mach header/load commands.
For a large UI app, the number of binaries that can be loaded (through
various dependencies) can exceed a thousand these days, and requesting
detailed information on all of those can result in debugserver
allocating too much memory when running in constrained environments, and
being killed.

In 2023 I laid the groundwork to fetch detailed information in chunks,
instead of one large request. The main challenge with this is when we
first attach to a process that is running, we send a "tell me about all
binaries loaded", and that prevents lldb from chunking the reply; the
packet design for jGetLoadedDynamicLibrariesInfos assumes the entire
reply is sent in one packet, instead of the typical gdb remote serial
protocol trick of a response with partial data starting with 'm' and a
response with a complete reply starting with 'l'. The 2023 change is to
add a new key to this packet, `report_load_commands` and when that is
set to `false`, only the load address of the binaries is reported.

lldb then uses the array of load addresses of all the binaries to fetch
detailed information about them in smaller groupings.

This PR implements the lldb side of that work.

Process::GetLoadedDynamicLibrariesInfos now takes a `bool
include_mh_and_load_commands`, ProcessGDBRemote sends that as an
argument in the jGetLoadedDynamicLibrariesInfos packet.

DynamicLoaderMacOS::DoInitialImageFetch is changed to only get the load
addresses on initial attach. If the reply includes the full binary
information (not just load addresses) -- when talking to an old
debugserver -- we will use that information instead of re-fetching it.
On a newer debugserver that only sent the load addresses, we'll send
this list of addresses to the standard method we use when dyld has told
us to load binaries at addresses already.

DynamicLoaderMacOS::AddBinaries, which takes a list of addresses and
fetches detailed information about them, is updated to request only 600
binaries at a time. A typical UI app will be in the 700-1000 binary
range these days, so this will turn one large fetch into two, in most
cases. There are some system UI processes that have many dependencies
that could require three fetches. I picked this number so most debug
sessions will be handled by two requests.

In debugserver MachProcess::FormatDynamicLibrariesIntoJSON, I removed
the obsolete-for-three-years-now `mod_date` field. I was sending back
the binary filepaths for this "don't send the detailed information"
version of the packet - I don't need that, and it just increases the
size, so I stopped sending filepaths in this mode.

I also added a new field for when we ARE sending detailed information,
`sizeof_mh_and_loadcmds`. I don't use this in lldb yet, but when we are
told about a binary and need to read it from memory today, we have an
initial read to get the mach header, which tells us the size of the load
commands. Then we have a second read of the mach header plus load
commands, before we can start binary processing in earnest. This is an
extra read packet and very unnecessary, given that debugserver knows how
large the mach header + load commands are. So I'm returning it here, and
at some point I'll find a way to pipe that into a new memory object file
creation method in lldb. It's one of those "I should really find a way
to remove that extra read some day" cleanups, and while I was in this
area, I'd add this first piece of that.

I don't have a test for this. I've been thinking about an API test that
creates 700 dylibs with empty functions in each, runs it, and confirms
all of the dylibs were loaded. I'd have to grab a packet log to be
completely sure we didn't read the full binary list in one go. But I
worry that compiling and linking even 700 do-nothing dylibs might be too
much. Maybe I should add a setting in DynamicLoaderMacOS::AddBinaries to
reduce the maximum number of binaries that can be read at once, and have
a small nubmer of dylibs. When by-hand testing this, I had a maximum of
5 binaries being queried in one packet.

rdar://109428337
13 files changed
tree: 1a3d74128da6b2ada91720f82befb5be1dd9d8cc
  1. .ci/
  2. .github/
  3. bolt/
  4. clang/
  5. clang-tools-extra/
  6. cmake/
  7. compiler-rt/
  8. cross-project-tests/
  9. flang/
  10. flang-rt/
  11. libc/
  12. libclc/
  13. libcxx/
  14. libcxxabi/
  15. libsycl/
  16. libunwind/
  17. lld/
  18. lldb/
  19. llvm/
  20. llvm-libgcc/
  21. mlir/
  22. offload/
  23. openmp/
  24. orc-rt/
  25. polly/
  26. runtimes/
  27. third-party/
  28. utils/
  29. .clang-format
  30. .clang-format-ignore
  31. .clang-tidy
  32. .git-blame-ignore-revs
  33. .gitattributes
  34. .gitignore
  35. .mailmap
  36. CODE_OF_CONDUCT.md
  37. CONTRIBUTING.md
  38. LICENSE.TXT
  39. pyproject.toml
  40. README.md
  41. SECURITY.md
README.md

The LLVM Compiler Infrastructure

OpenSSF Scorecard OpenSSF Best Practices libc++

Welcome to the LLVM project!

This repository contains the source code for LLVM, a toolkit for the construction of highly optimized compilers, optimizers, and run-time environments.

The LLVM project has multiple components. The core of the project is itself called “LLVM”. This contains all of the tools, libraries, and header files needed to process intermediate representations and convert them into object files. Tools include an assembler, disassembler, bitcode analyzer, and bitcode optimizer.

C-like languages use the Clang frontend. This component compiles C, C++, Objective-C, and Objective-C++ code into LLVM bitcode -- and from there into object files, using LLVM.

Other components include: the libc++ C++ standard library, the LLD linker, and more.

Getting the Source Code and Building LLVM

Consult the Getting Started with LLVM page for information on building and running LLVM.

For information on how to contribute to the LLVM project, please take a look at the Contributing to LLVM guide.

Getting in touch

Join the LLVM Discourse forums, Discord chat, LLVM Office Hours or Regular sync-ups.

The LLVM project has adopted a code of conduct for participants to all modes of communication within the project.