[StaticDataLayout][PGO] Add profile format for static data layout, and the classes to operate on the profiles. (#138170) Context: For https://discourse.llvm.org/t/rfc-profile-guided-static-data-partitioning/83744#p-336543-background-3, we propose to profile memory loads and stores via hardware events, symbolize the addresses of binary static data sections and feed the profile back into compiler for data partitioning. This change adds the profile format for static data layout, and the classes to operate on it. The profile and its format 1. Conceptually, a piece of data (call it a symbol) is represented by its symbol name or its content hash. The former applies to majority of data whose mangled name remains relatively stable over binary releases, and the latter applies to string literals (with name patterns like `.str.<N>[.llvm.<hash>]`. - The symbols with samples are hot data. The number of hot symbols is small relative to all symbols. The profile tracks its sampled counts and locations. Sampled counts come from hardware events, and locations come from debug information in the profiled binary. The symbols without samples are cold data. The number of such cold symbols is large. The profile tracks its representation (the name or content hash). - Based on a preliminary study, debug information coverage for data symbols is partial and best-effort. In the LLVM IR, global variables with source code correspondence may or may not have debug information. Therefore the location information is optional in the profiles. 2. The profile-and-compile cycle is similar to SamplePGO. Profiles are sampled from production binaries, and used in next binary releases. Known cold symbols and new hot symbols can both have zero sampled counts, so the profile records known cold symbols to tell the two for next compile. In the profile's serialization format, strings are concatenated together and compressed. Individual records stores the index. A separate PR will connect this class to InstrProfReader/Writer via MemProfReader/Writer. --------- Co-authored-by: Kazu Hirata <kazu@google.com>

tree: c1bc613b057f352b9026a13e65367544ef356978

README.md

The LLVM Compiler Infrastructure

Welcome to the LLVM project!

This repository contains the source code for LLVM, a toolkit for the construction of highly optimized compilers, optimizers, and run-time environments.

The LLVM project has multiple components. The core of the project is itself called “LLVM”. This contains all of the tools, libraries, and header files needed to process intermediate representations and convert them into object files. Tools include an assembler, disassembler, bitcode analyzer, and bitcode optimizer.

C-like languages use the Clang frontend. This component compiles C, C++, Objective-C, and Objective-C++ code into LLVM bitcode -- and from there into object files, using LLVM.

Other components include: the libc++ C++ standard library, the LLD linker, and more.

Getting the Source Code and Building LLVM

Consult the Getting Started with LLVM page for information on building and running LLVM.

For information on how to contribute to the LLVM project, please take a look at the Contributing to LLVM guide.

Getting in touch

Join the LLVM Discourse forums, Discord chat, LLVM Office Hours or Regular sync-ups.

The LLVM project has adopted a code of conduct for participants to all modes of communication within the project.