| # Instrumentation Profile Format |
| |
| ```{contents} |
| :local: |
| ``` |
| |
| ## Overview |
| |
| Clang supports two types of profiling via instrumentation [^1]: frontend-based |
| and IR-based, and both could support a variety of use cases [^2] . |
| This document describes two binary serialization formats (raw and indexed) to |
| store instrumented profiles with a specific emphasis on IRPGO use case, in the |
| sense that when specific header fields and payload sections have different ways |
| of interpretation across use cases, the documentation is based on IRPGO. |
| |
| ```{note} |
| Frontend-generated profiles are used together with coverage mapping for |
| [source-based code coverage](https://clang.llvm.org/docs/SourceBasedCodeCoverage.html). The [coverage mapping format](https://llvm.org/docs/CoverageMappingFormat.html) is different from |
| profile format. |
| ``` |
| |
| ## Raw Profile Format |
| |
| The raw profile is generated by running the instrumented binary. The raw profile |
| data from an executable or a shared library [^3] consists of a header and |
| multiple sections, with each section as a memory dump. The raw profile data needs |
| to be reasonably compact and fast to generate. |
| |
| There are no backward or forward version compatibility guarantees for the raw profile |
| format. That is, compilers and tools [require](https://github.com/llvm/llvm-project/blob/bffdde8b8e5d9a76a47949cd0f574f3ce656e181/llvm/lib/ProfileData/InstrProfReader.cpp#L551-L558) a specific raw profile version |
| to parse the profiles. |
| |
| To feed profiles back into compilers for an optimized build (e.g., via |
| `-fprofile-use` for IR instrumentation), a raw profile must to be converted into |
| indexed format. |
| |
| ### General Storage Layout |
| |
| The storage layout of raw profile data format is illustrated below. Basically, |
| when the raw profile is read into an memory buffer, the actual byte offset of a |
| section is inferred from the section's order in the layout and size information |
| of all the sections ahead of it. |
| |
| ``` |
| +----+-----------------------+ |
| | | Magic | |
| | +-----------------------+ |
| | | Version | |
| | +-----------------------+ |
| H | Size Info for | |
| E | Section 1 | |
| A +-----------------------+ |
| D | Size Info for | |
| E | Section 2 | |
| R +-----------------------+ |
| | | ... | |
| | +-----------------------+ |
| | | Size Info for | |
| | | Section N | |
| +----+-----------------------+ |
| P | Section 1 | |
| A +-----------------------+ |
| Y | Section 2 | |
| L +-----------------------+ |
| O | ... | |
| A +-----------------------+ |
| D | Section N | |
| +----+-----------------------+ |
| ``` |
| |
| ```{note} |
| Sections might be padded to meet specific alignment requirements. For |
| simplicity, header fields and data sections solely for padding purposes are |
| omitted in the data layout graph above and the rest of this document. |
| ``` |
| |
| ### Header |
| |
| **`Magic`** |
| Magic number encodes profile format (raw, indexed or text). For the raw format, |
| the magic number also encodes the endianness (big or little) and C pointer |
| size (4 or 8 bytes) of the platform on which the profile is generated. |
| |
| A factory method reads the magic number to construct reader properly and returns |
| error upon unrecognized format. Specifically, the factory method and raw profile |
| reader implementation make sure that a raw profile file could be read back on |
| a platform with the opposite endianness and/or the other C pointer size. |
| |
| **`Version`** |
| The lower 32 bits specify the actual version and the most significant 32 bits |
| specify the variant types of the profile. IR-based instrumentation PGO and |
| context-sensitive IR-based instrumentation PGO are two variant types. |
| |
| **`BinaryIdsSize`** |
| The byte size of [binary id] section. |
| |
| **`NumData`** |
| The number of profile metadata. The byte size of [profile metadata](#profile-metadata) section |
| could be computed with this field. |
| |
| **`NumCounter`** |
| The number of entries in the profile counter section. The byte size of [counter](#counter) |
| section could be computed with this field. |
| |
| **`NumBitmapBytes`** |
| The number of bytes in the profile [bitmap](#bitmap) section. |
| |
| **`NamesSize`** |
| The number of bytes in the name section. |
| |
| (CountersDelta)= |
| |
| **`CountersDelta`** |
| This field records the in-memory address difference between the [profile metadata](#profile-metadata) |
| and counter section in the instrumented binary, i.e., `start(__llvm_prf_cnts) - start(__llvm_prf_data)`. |
| |
| It's used jointly with the [CounterPtr](#CounterPtr) field to compute the counter offset |
| relative to `start(__llvm_prf_cnts)`. Check out [calculation-of-counter-offset](#calculation-of-counter-offset) |
| for a visualized explanation. |
| |
| ```{note} |
| The `__llvm_prf_data` object file section might not be loaded into memory |
| when instrumented binary runs or might not get generated in the instrumented |
| binary in the first place. In those cases, `CountersDelta` is not used and |
| other mechanisms are used to match counters with instrumented code. See |
| [lightweight instrumentation] and [binary profile correlation] for examples. |
| ``` |
| |
| **`BitmapDelta`** |
| This field records the in-memory address difference between the [profile metadata](#profile-metadata) |
| and bitmap section in the instrumented binary, i.e., `start(__llvm_prf_bits) - start(__llvm_prf_data)`. |
| |
| It's used jointly with the [BitmapPtr](#BitmapPtr) to find the bitmap of a profile data |
| record, in a similar way to how counters are referenced as explained by |
| [calculation-of-counter-offset](#calculation-of-counter-offset) . |
| |
| Similar to [CountersDelta](#CountersDelta) field, this field may not be used in non-PGO variants |
| of profiles. |
| |
| **`NamesDelta`** |
| Records the in-memory address of name section. Not used except for raw profile |
| reader error checking. |
| |
| **`NumVTables`** |
| Records the number of instrumented vtable entries in the binary. Used for |
| [type profiling]. |
| |
| **`VNamesSize`** |
| Records the byte size in the virtual table names section. Used for [type profiling]. |
| |
| **`ValueKindLast`** |
| Records the number of value kinds. Macro [VALUE_PROF_KIND](https://github.com/llvm/llvm-project/blob/7e405eb722e40c79b7726201d0f76b5dab34ba0f/compiler-rt/include/profile/InstrProfData.inc#L184-L186) defines the value |
| kinds with a description of the kind. |
| |
| ### Payload Sections |
| |
| #### Binary Ids |
| Stores the binary ids of the instrumented binaries to associate binaries with |
| profiles for source code coverage. See [binary id] RFC for the design. |
| |
| (profile-metadata)= |
| |
| #### Profile Metadata |
| |
| This section stores the metadata to map counters and value profiles back to |
| instrumented code regions (e.g., LLVM IR for IRPGO). |
| |
| The in-memory representation of the metadata is [__llvm_profile_data](https://github.com/llvm/llvm-project/blob/7c3b67d2038cfb48a80299089f6a1308eee1df7f/compiler-rt/include/profile/InstrProfData.inc#L65-L95). |
| Some fields are used to reference data from other sections in the profile. |
| The fields are documented as follows: |
| |
| **`NameRef`** |
| The MD5 of the function's PGO name. PGO name has the format |
| `[<filepath><delimiter>]<mangled-name>` where `<filepath>` and |
| `<delimiter>` are provided for local-linkage functions to tell possibly |
| identical functions. |
| |
| (FuncHash)= |
| |
| **`FuncHash`** |
| A checksum of the function's IR, taking control flow graph and instrumented |
| value sites into account. See [computeCFGHash](https://github.com/llvm/llvm-project/blob/7c3b67d2038cfb48a80299089f6a1308eee1df7f/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp#L616-L685) for details. |
| |
| (CounterPtr)= |
| |
| **`CounterPtr`** |
| The in-memory address difference between profile data and the start of corresponding |
| counters. Counter position is stored this way (as a link-time constant) to reduce |
| instrumented binary size compared with snapshotting the address of symbols directly. |
| See [commit a1532ed](https://github.com/llvm/llvm-project/commit/a1532ed27582038e2d9588108ba0fe8237f01844) for further information. |
| |
| ```{note} |
| `CounterPtr` might represent a different value for non-IRPGO use cases. For |
| example, for [binary profile correlation], it represents the absolute address of counter. |
| When in doubt, check source code. |
| ``` |
| |
| (BitmapPtr)= |
| |
| **`BitmapPtr`** |
| The in-memory address difference between profile data and the start address of |
| corresponding bitmap. |
| |
| ```{note} |
| Similar to [CounterPtr](#CounterPtr), this field may represent a different value for non-IRPGO use cases. |
| ``` |
| |
| **`FunctionPointer`** |
| Records the function address when instrumented binary runs. This is used to |
| map the profiled callee address of indirect calls to the `NameRef` during |
| conversion from raw to indexed profiles. |
| |
| **`Values`** |
| Represents value profiles in a two dimensional array. The number of elements |
| in the first dimension is the number of instrumented value sites across all |
| kinds. Each element in the first dimension is the head of a linked list, and |
| the each element in the second dimension is linked list element, carrying |
| `<profiled-value, count>` as payload. This is used by compiler runtime when |
| writing out value profiles. |
| |
| ```{note} |
| Value profiling is supported by frontend and IR PGO instrumentation, |
| but it's not supported in all cases (e.g., [lightweight instrumentation]). |
| ``` |
| |
| **`NumCounters`** |
| The number of counters for the instrumented function. |
| |
| **`NumValueSites`** |
| This is an array of counters, and each counter represents the number of |
| instrumented sites for a kind of value in the function. |
| |
| **`NumBitmapBytes`** |
| The number of bitmap bytes for the function. |
| |
| (counter)= |
| |
| #### Profile Counters |
| |
| For PGO [^4], the counters within an instrumented function of a specific [FuncHash](#FuncHash) |
| are stored contiguously and in an order that is consistent with instrumentation points selection. |
| |
| (calculation-of-counter-offset)= |
| |
| As mentioned above, the recorded counter offset is relative to the profile metadata. |
| So how are function counters located in the raw profile data? |
| |
| Basically, the profile reader iterates profile metadata (from the [profile metadata](#profile-metadata) |
| section) and makes use of the recorded relative distances, as illustrated below. |
| |
| ``` |
| + --> start(__llvm_prf_data) --> +---------------------+ ------------+ |
| | | Data 1 | | |
| | +---------------------+ =====|| | |
| | | Data 2 | || | |
| | +---------------------+ || | |
| | | ... | || | |
| Counter| +---------------------+ || | |
| Delta | | Data N | || | |
| | +---------------------+ || | CounterPtr1 |
| | || | |
| | CounterPtr2 || | |
| | || | |
| | || | |
| + --> start(__llvm_prf_cnts) --> +---------------------+ || | |
| | ... | || | |
| +---------------------+ -----||----+ |
| | Counter for | || |
| | Data 1 | || |
| +---------------------+ || |
| | ... | || |
| +---------------------+ =====|| |
| | Counter for | |
| | Data 2 | |
| +---------------------+ |
| | ... | |
| +---------------------+ |
| | Counter for | |
| | Data N | |
| +---------------------+ |
| ``` |
| |
| In the graph, |
| |
| * The profile header records `CounterDelta` with the value as `start(__llvm_prf_cnts) - start(__llvm_prf_data)`. |
| We will call it `CounterDeltaInitVal` below for convenience. |
| * For each profile data record `ProfileDataN`, `CounterPtr` is recorded as |
| `start(CounterN) - start(ProfileDataN)`, where `ProfileDataN` is the N-th |
| entry in `__llvm_prf_data`, and `CounterN` represents the corresponding |
| profile counters. |
| |
| Each time the reader advances to the next data record, it [updates](https://github.com/llvm/llvm-project/blob/17ff25a58ee4f29816d932fdb75f0d305718069f/llvm/include/llvm/ProfileData/InstrProfReader.h#L439-L444) `CounterDelta` |
| to minus the size of one `ProfileData`. |
| |
| For the counter corresponding to the first data record, the byte offset |
| relative to the start of the counter section is calculated as `CounterPtr1 - CounterDeltaInitVal`. |
| When profile reader advances to the second data record, note `CounterDelta` |
| is updated to `CounterDeltaInitVal - sizeof(ProfileData)`. |
| Thus the byte offset relative to the start of the counter section is calculated |
| as `CounterPtr2 - (CounterDeltaInitVal - sizeof(ProfileData))`. |
| |
| (bitmap)= |
| |
| #### Bitmap |
| This section is used for source-based [Modified Condition/Decision Coverage](https://en.wikipedia.org/wiki/Modified_condition/decision_coverage) code coverage. Check out [Bitmap RFC](https://discourse.llvm.org/t/rfc-source-based-mc-dc-code-coverage/59244) |
| for the design. |
| |
| (function-names)= |
| |
| #### Names |
| |
| This section contains possibly compressed concatenated string of functions' PGO |
| names. If compressed, zlib library is used. |
| |
| Function names serve as keys in the PGO data hash table when raw profiles are |
| converted into indexed profiles. They are also crucial for `llvm-profdata` to |
| show the profiles in a human-readable way. |
| |
| #### Virtual Table Profile Data |
| |
| This section is used for [type profiling]. Each entry corresponds to one virtual |
| table and is defined by the following C++ struct |
| |
| ```c++ |
| struct VTableProfData { |
| // The start address of the vtable, collected at runtime. |
| uint64_t StartAddress; |
| // The byte size of the vtable. `StartAddress` and `ByteSize` specifies an address range to look up. |
| uint32_t ByteSize; |
| // The hash of vtable's (PGO) name |
| uint64_t MD5HashOfName; |
| }; |
| ``` |
| |
| At profile use time, the compiler looks up a profiled address in the sorted vtable |
| address ranges and maps the address to a specific vtable through hashed name. |
| |
| #### Virtual Table Names |
| |
| This section is similar to [function names](#function-names) section above, except it contains the PGO |
| names of profiled virtual tables. It's a standalone section such that raw profile |
| readers could directly find each name set by accessing the corresponding profile |
| data section. |
| |
| This section is stored in raw profiles such that `llvm-profdata` could show the |
| profiles in a human-readable way. |
| |
| #### Value Profile Data |
| |
| This section contains the profile data for value profiling. |
| |
| The value profiles corresponding to a profile metadata are serialized contiguously |
| as one record, and value profile records are stored in the same order as the |
| respective profile data, such that a raw profile reader [advances](https://github.com/llvm/llvm-project/blob/7e15fa9161eda7497a5d6abf0d951a1d12d86550/llvm/include/llvm/ProfileData/InstrProfReader.h#L456-L457) the pointer to |
| profile data and the pointer to value profile records simultaneously [^5] to find |
| value profiles for a per function, per [FuncHash](#FuncHash) profile data. |
| |
| ## Indexed Profile Format |
| |
| Indexed profiles are generated from `llvm-profdata`. In the indexed profiles, |
| function data are organized as on-disk hash table such that compilers can |
| look up profile data for functions in an IR module. |
| |
| Compilers and tools must retain backward compatibility with indexed profiles. |
| That is, a tool or a compiler built at newer versions of code must understand |
| profiles generated by older tools or compilers. |
| |
| ### General Storage Layout |
| |
| The ASCII art depicts the general storage layout of indexed profiles. |
| Specifically, the indexed profile header describes the byte offset of individual |
| payload sections. |
| |
| ``` |
| +-----------------------+---+ |
| | Magic | | |
| +-----------------------+ | |
| | Version | | |
| +-----------------------+ | |
| | HashType | H |
| +-----------------------+ E |
| | Byte Offset | A |
| +------ | of section A | D |
| | +-----------------------+ E |
| | | Byte Of fset | R |
| +-----------| of section B | | |
| | | +-----------------------+ | |
| | | | ... | | |
| | | +-----------------------+ | |
| | | | Byte Offset | | |
| +---------------| of section Z | | |
| | | | +-----------------------+---+ |
| | | | | Profile Summary | | |
| | | | +-----------------------+ P |
| | | +------>| Section A | A |
| | | +-----------------------+ Y |
| | +---------->| Section B | L |
| | +-----------------------+ O |
| | | ... | A |
| | +-----------------------+ D |
| +-------------->| Section Z | | |
| +-----------------------+---+ |
| ``` |
| |
| ```{note} |
| Profile summary section is at the beginning of payload. It's right after the |
| header so its position is implicitly known after reading the header. |
| ``` |
| |
| ### Header |
| |
| The [Header struct](https://github.com/llvm/llvm-project/blob/1a2960bab6381f2b288328e2371829b460ac020c/llvm/include/llvm/ProfileData/InstrProf.h#L1053-L1080) is the source of truth and struct fields should explain |
| what's in the header. At a high level, `*Offset` fields record section byte |
| offsets, which are used by readers to locate interesting sections and skip |
| uninteresting ones. |
| |
| ```{note} |
| To maintain backward compatibility of the indexed profiles, existing fields |
| shouldn't be deleted from struct definition; the field order shouldn't be |
| modified. New fields should be appended. |
| ``` |
| |
| ### Payload Sections |
| |
| #### (CS) Profile Summary |
| This section is right after profile header. It stores the serialized profile |
| summary. For context-sensitive IR-based instrumentation PGO, this section stores |
| an additional profile summary corresponding to the context-sensitive profiles. |
| |
| (function-data)= |
| |
| #### Function data |
| This section stores functions and their profiling data as an on-disk hash table. |
| Profile data for functions with the same name are grouped together and share one |
| hash table entry (the functions may come from different shared libraries for |
| instance). The profile data for them are organized as a sequence of key-value |
| pair where the key is [FuncHash](#FuncHash), and the value is profiled information (represented |
| by [InstrProfRecord](https://github.com/llvm/llvm-project/blob/7e405eb722e40c79b7726201d0f76b5dab34ba0f/llvm/include/llvm/ProfileData/InstrProf.h#L693)) for the function. |
| |
| #### MemProf Profile data |
| This section stores function's memory profiling data. See |
| [MemProf binary serialization format RFC](https://lists.llvm.org/pipermail/llvm-dev/2021-September/153007.html) for the design. |
| |
| #### Binary Ids |
| The section is used to carry on [binary id] information from raw profiles. |
| |
| #### Temporal Profile Traces |
| The section is used to carry on temporal profile information from raw profiles. |
| See [temporal profiling] for the design. |
| |
| #### Virtual Table Names |
| This section is used to store the names of vtables from raw profile in the indexed |
| profile. |
| |
| Unlike function names which are stored as keys of [function data](#function-data) hash table, |
| vtable names need to be stored in a standalone section in indexed profiles. |
| This way, `llvm-profdata` could show the profiled vtable information in a |
| human-readable way. |
| |
| ## Profile Data Usage |
| |
| `llvm-profdata` is the command line tool to display and process instrumentation- |
| based profile data. For supported usages, check out [llvm-profdata documentation](https://llvm.org/docs/CommandGuide/llvm-profdata.html). |
| |
| [^1]: For usage, see https://clang.llvm.org/docs/UsersManual.html#profiling-with-instrumentation |
| [^2]: For example, IR-based instrumentation supports [lightweight instrumentation] |
| and [temporal profiling]. Frontend instrumentation could support [single-byte counters]. |
| [^3]: A raw profile file could contain the concatenation of multiple raw |
| profiles, for example, from an executable and its shared libraries. Raw |
| profile reader could parse all raw profiles from the file correctly. |
| [^4]: The counter section is used by a few variant types (like temporal |
| profiling) and might have different semantics there. |
| [^5]: The step size of data pointer is the `sizeof(ProfileData)`, and the step |
| size of value profile pointer is calculated based on the number of collected |
| values. |
| |
| [lightweight instrumentation]: https://groups.google.com/g/llvm-dev/c/r03Z6JoN7d4 |
| [temporal profiling]: https://discourse.llvm.org/t/rfc-temporal-profiling-extension-for-irpgo/68068 |
| [single-byte counters]: https://discourse.llvm.org/t/rfc-single-byte-counters-for-source-based-code-coverage/75685 |
| [binary profile correlation]: https://discourse.llvm.org/t/rfc-add-binary-profile-correlation-to-not-load-profile-metadata-sections-into-memory-at-runtime/74565 |
| [binary id]: https://lists.llvm.org/pipermail/llvm-dev/2021-June/151154.html |
| [type profiling]: https://discourse.llvm.org/t/rfc-dynamic-type-profiling-and-optimizations-in-llvm/74600 |