| =================================== |
| Instrumentation Profile Format |
| =================================== |
| |
| .. contents:: |
| :local: |
| |
| |
| Overview |
| ========= |
| |
| Clang supports two types of profiling via instrumentation [1]_: frontend-based |
| and IR-based, and both could support a variety of use cases [2]_ . |
| This document describes two binary serialization formats (raw and indexed) to |
| store instrumented profiles with a specific emphasis on IRPGO use case, in the |
| sense that when specific header fields and payload sections have different ways |
| of interpretation across use cases, the documentation is based on IRPGO. |
| |
| .. note:: |
| Frontend-generated profiles are used together with coverage mapping for |
| `source-based code coverage`_. The `coverage mapping format`_ is different from |
| profile format. |
| |
| .. _`source-based code coverage`: https://clang.llvm.org/docs/SourceBasedCodeCoverage.html |
| .. _`coverage mapping format`: https://llvm.org/docs/CoverageMappingFormat.html |
| |
| Raw Profile Format |
| =================== |
| |
| The raw profile is generated by running the instrumented binary. The raw profile |
| data from an executable or a shared library [3]_ consists of a header and |
| multiple sections, with each section as a memory dump. The raw profile data needs |
| to be reasonably compact and fast to generate. |
| |
| There are no backward or forward version compatiblity guarantees for the raw profile |
| format. That is, compilers and tools `require`_ a specific raw profile version |
| to parse the profiles. |
| |
| .. _`require`: https://github.com/llvm/llvm-project/blob/bffdde8b8e5d9a76a47949cd0f574f3ce656e181/llvm/lib/ProfileData/InstrProfReader.cpp#L551-L558 |
| |
| To feed profiles back into compilers for an optimized build (e.g., via |
| ``-fprofile-use`` for IR instrumentation), a raw profile must to be converted into |
| indexed format. |
| |
| General Storage Layout |
| ----------------------- |
| |
| The storage layout of raw profile data format is illustrated below. Basically, |
| when the raw profile is read into an memory buffer, the actual byte offset of a |
| section is inferred from the section's order in the layout and size information |
| of all the sections ahead of it. |
| |
| :: |
| |
| +----+-----------------------+ |
| | | Magic | |
| | +-----------------------+ |
| | | Version | |
| | +-----------------------+ |
| H | Size Info for | |
| E | Section 1 | |
| A +-----------------------+ |
| D | Size Info for | |
| E | Section 2 | |
| R +-----------------------+ |
| | | ... | |
| | +-----------------------+ |
| | | Size Info for | |
| | | Section N | |
| +----+-----------------------+ |
| P | Section 1 | |
| A +-----------------------+ |
| Y | Section 2 | |
| L +-----------------------+ |
| O | ... | |
| A +-----------------------+ |
| D | Section N | |
| +----+-----------------------+ |
| |
| |
| .. note:: |
| Sections might be padded to meet specific alignment requirements. For |
| simplicity, header fields and data sections solely for padding purpose are |
| omitted in the data layout graph above and the rest of this document. |
| |
| Header |
| ------- |
| |
| ``Magic`` |
| Magic number encodes profile format (raw, indexed or text). For the raw format, |
| the magic number also encodes the endianness (big or little) and C pointer |
| size (4 or 8 bytes) of the platform on which the profile is generated. |
| |
| A factory method reads the magic number to construct reader properly and returns |
| error upon unrecognized format. Specifically, the factory method and raw profile |
| reader implementation make sure that a raw profile file could be read back on |
| a platform with the opposite endianness and/or the other C pointer size. |
| |
| ``Version`` |
| The lower 32 bits specify the actual version and the most significant 32 bits |
| specify the variant types of the profile. IR-based instrumentation PGO and |
| context-sensitive IR-based instrumentation PGO are two variant types. |
| |
| ``BinaryIdsSize`` |
| The byte size of `binary id`_ section. |
| |
| ``NumData`` |
| The number of profile metadata. The byte size of `profile metadata`_ section |
| could be computed with this field. |
| |
| ``NumCounter`` |
| The number of entries in the profile counter section. The byte size of `counter`_ |
| section could be computed with this field. |
| |
| ``NumBitmapBytes`` |
| The number of bytes in the profile `bitmap`_ section. |
| |
| ``NamesSize`` |
| The number of bytes in the name section. |
| |
| .. _`CountersDelta`: |
| |
| ``CountersDelta`` |
| This field records the in-memory address difference between the `profile metadata`_ |
| and counter section in the instrumented binary, i.e., ``start(__llvm_prf_cnts) - start(__llvm_prf_data)``. |
| |
| It's used jointly with the `CounterPtr`_ field to compute the counter offset |
| relative to ``start(__llvm_prf_cnts)``. Check out calculation-of-counter-offset_ |
| for a visualized explanation. |
| |
| .. note:: |
| The ``__llvm_prf_data`` object file section might not be loaded into memory |
| when instrumented binary runs or might not get generated in the instrumented |
| binary in the first place. In those cases, ``CountersDelta`` is not used and |
| other mechanisms are used to match counters with instrumented code. See |
| `lightweight instrumentation`_ and `binary profile correlation`_ for examples. |
| |
| ``BitmapDelta`` |
| This field records the in-memory address difference between the `profile metadata`_ |
| and bitmap section in the instrumented binary, i.e., ``start(__llvm_prf_bits) - start(__llvm_prf_data)``. |
| |
| It's used jointly with the `BitmapPtr`_ to find the bitmap of a profile data |
| record, in a similar way to how counters are referenced as explained by |
| calculation-of-counter-offset_ . |
| |
| Similar to `CountersDelta`_ field, this field may not be used in non-PGO variants |
| of profiles. |
| |
| ``NamesDelta`` |
| Records the in-memory address of name section. Not used except for raw profile |
| reader error checking. |
| |
| ``NumVTables`` |
| Records the number of instrumented vtable entries in the binary. Used for |
| `type profiling`_. |
| |
| ``VNamesSize`` |
| Records the byte size in the virtual table names section. Used for `type profiling`_. |
| |
| ``ValueKindLast`` |
| Records the number of value kinds. Macro `VALUE_PROF_KIND`_ defines the value |
| kinds with a description of the kind. |
| |
| .. _`VALUE_PROF_KIND`: https://github.com/llvm/llvm-project/blob/7e405eb722e40c79b7726201d0f76b5dab34ba0f/compiler-rt/include/profile/InstrProfData.inc#L184-L186 |
| |
| Payload Sections |
| ------------------ |
| |
| Binary Ids |
| ^^^^^^^^^^^ |
| Stores the binary ids of the instrumented binaries to associate binaries with |
| profiles for source code coverage. See `binary id`_ RFC for the design. |
| |
| .. _`profile metadata`: |
| |
| Profile Metadata |
| ^^^^^^^^^^^^^^^^^^ |
| |
| This section stores the metadata to map counters and value profiles back to |
| instrumented code regions (e.g., LLVM IR for IRPGO). |
| |
| The in-memory representation of the metadata is `__llvm_profile_data`_. |
| Some fields are used to reference data from other sections in the profile. |
| The fields are documented as follows: |
| |
| .. _`__llvm_profile_data`: https://github.com/llvm/llvm-project/blob/7c3b67d2038cfb48a80299089f6a1308eee1df7f/compiler-rt/include/profile/InstrProfData.inc#L65-L95 |
| |
| ``NameRef`` |
| The MD5 of the function's PGO name. PGO name has the format |
| ``[<filepath><delimiter>]<mangled-name>`` where ``<filepath>`` and |
| ``<delimiter>`` are provided for local-linkage functions to tell possibly |
| identical functions. |
| |
| .. _FuncHash: |
| |
| ``FuncHash`` |
| A checksum of the function's IR, taking control flow graph and instrumented |
| value sites into accounts. See `computeCFGHash`_ for details. |
| |
| .. _`computeCFGHash`: https://github.com/llvm/llvm-project/blob/7c3b67d2038cfb48a80299089f6a1308eee1df7f/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp#L616-L685 |
| |
| .. _`CounterPtr`: |
| |
| ``CounterPtr`` |
| The in-memory address difference between profile data and the start of corresponding |
| counters. Counter position is stored this way (as a link-time constant) to reduce |
| instrumented binary size compared with snapshotting the address of symbols directly. |
| See `commit a1532ed`_ for further information. |
| |
| .. _`commit a1532ed`: https://github.com/llvm/llvm-project/commit/a1532ed27582038e2d9588108ba0fe8237f01844 |
| |
| .. note:: |
| ``CounterPtr`` might represent a different value for non-IRPGO use case. For |
| example, for `binary profile correlation`_, it represents the absolute address of counter. |
| When in doubt, check source code. |
| |
| .. _`BitmapPtr`: |
| |
| ``BitmapPtr`` |
| The in-memory address difference between profile data and the start address of |
| corresponding bitmap. |
| |
| .. note:: |
| Similar to `CounterPtr`_, this field may represent a different value for non-IRPGO use case. |
| |
| ``FunctionPointer`` |
| Records the function address when instrumented binary runs. This is used to |
| map the profiled callee address of indirect calls to the ``NameRef`` during |
| conversion from raw to indexed profiles. |
| |
| ``Values`` |
| Represents value profiles in a two dimensional array. The number of elements |
| in the first dimension is the number of instrumented value sites across all |
| kinds. Each element in the first dimension is the head of a linked list, and |
| the each element in the second dimension is linked list element, carrying |
| ``<profiled-value, count>`` as payload. This is used by compiler runtime when |
| writing out value profiles. |
| |
| .. note:: |
| Value profiling is supported by frontend and IR PGO instrumentation, |
| but it's not supported in all cases (e.g., `lightweight instrumentation`_). |
| |
| ``NumCounters`` |
| The number of counters for the instrumented function. |
| |
| ``NumValueSites`` |
| This is an array of counters, and each counter represents the number of |
| instrumented sites for a kind of value in the function. |
| |
| ``NumBitmapBytes`` |
| The number of bitmap bytes for the function. |
| |
| .. _`counter`: |
| |
| Profile Counters |
| ^^^^^^^^^^^^^^^^^ |
| |
| For PGO [4]_, the counters within an instrumented function of a specific `FuncHash`_ |
| are stored contiguously and in an order that is consistent with instrumentation points selection. |
| |
| .. _calculation-of-counter-offset: |
| |
| As mentioned above, the recorded counter offset is relative to the profile metadata. |
| So how are function counters located in the raw profile data? |
| |
| Basically, the profile reader iterates profile metadata (from the `profile metadata`_ |
| section) and makes use of the recorded relative distances, as illustrated below. |
| |
| :: |
| |
| + --> start(__llvm_prf_data) --> +---------------------+ ------------+ |
| | | Data 1 | | |
| | +---------------------+ =====|| | |
| | | Data 2 | || | |
| | +---------------------+ || | |
| | | ... | || | |
| Counter| +---------------------+ || | |
| Delta | | Data N | || | |
| | +---------------------+ || | CounterPtr1 |
| | || | |
| | CounterPtr2 || | |
| | || | |
| | || | |
| + --> start(__llvm_prf_cnts) --> +---------------------+ || | |
| | ... | || | |
| +---------------------+ -----||----+ |
| | Counter for | || |
| | Data 1 | || |
| +---------------------+ || |
| | ... | || |
| +---------------------+ =====|| |
| | Counter for | |
| | Data 2 | |
| +---------------------+ |
| | ... | |
| +---------------------+ |
| | Counter for | |
| | Data N | |
| +---------------------+ |
| |
| |
| In the graph, |
| |
| * The profile header records ``CounterDelta`` with the value as ``start(__llvm_prf_cnts) - start(__llvm_prf_data)``. |
| We will call it ``CounterDeltaInitVal`` below for convenience. |
| * For each profile data record ``ProfileDataN``, ``CounterPtr`` is recorded as |
| ``start(CounterN) - start(ProfileDataN)``, where ``ProfileDataN`` is the N-th |
| entry in ``__llvm_prf_data``, and ``CounterN`` represents the corresponding |
| profile counters. |
| |
| Each time the reader advances to the next data record, it `updates`_ ``CounterDelta`` |
| to minus the size of one ``ProfileData``. |
| |
| .. _`updates`: https://github.com/llvm/llvm-project/blob/17ff25a58ee4f29816d932fdb75f0d305718069f/llvm/include/llvm/ProfileData/InstrProfReader.h#L439-L444 |
| |
| For the counter corresponding to the first data record, the byte offset |
| relative to the start of the counter section is calculated as ``CounterPtr1 - CounterDeltaInitVal``. |
| When profile reader advances to the second data record, note ``CounterDelta`` |
| is updated to ``CounterDeltaInitVal - sizeof(ProfileData)``. |
| Thus the byte offset relative to the start of the counter section is calculated |
| as ``CounterPtr2 - (CounterDeltaInitVal - sizeof(ProfileData))``. |
| |
| .. _`bitmap`: |
| |
| Bitmap |
| ^^^^^^^ |
| This section is used for source-based `Modified Condition/Decision Coverage`_ code coverage. Check out `Bitmap RFC`_ |
| for the design. |
| |
| .. _`Modified Condition/Decision Coverage`: https://en.wikipedia.org/wiki/Modified_condition/decision_coverage |
| .. _`Bitmap RFC`: https://discourse.llvm.org/t/rfc-source-based-mc-dc-code-coverage/59244 |
| |
| .. _`function names`: |
| |
| Names |
| ^^^^^^ |
| |
| This section contains possibly compressed concatenated string of functions' PGO |
| names. If compressed, zlib library is used. |
| |
| Function names serve as keys in the PGO data hash table when raw profiles are |
| converted into indexed profiles. They are also crucial for ``llvm-profdata`` to |
| show the profiles in a human-readable way. |
| |
| Virtual Table Profile Data |
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
| |
| This section is used for `type profiling`_. Each entry corresponds to one virtual |
| table and is defined by the following C++ struct |
| |
| .. code-block:: c++ |
| |
| struct VTableProfData { |
| // The start address of the vtable, collected at runtime. |
| uint64_t StartAddress; |
| // The byte size of the vtable. `StartAddress` and `ByteSize` specifies an address range to look up. |
| uint32_t ByteSize; |
| // The hash of vtable's (PGO) name |
| uint64_t MD5HashOfName; |
| }; |
| |
| At profile use time, the compiler looks up a profiled address in the sorted vtable |
| address ranges and maps the address to a specific vtable through hashed name. |
| |
| Virtual Table Names |
| ^^^^^^^^^^^^^^^^^^^^ |
| |
| This section is similar to `function names`_ section above, except it contains the PGO |
| names of profiled virtual tables. It's a standalone section such that raw profile |
| readers could directly find each name set by accessing the corresponding profile |
| data section. |
| |
| This section is stored in raw profiles such that `llvm-profdata` could show the |
| profiles in a human-readable way. |
| |
| Value Profile Data |
| ^^^^^^^^^^^^^^^^^^^^ |
| |
| This section contains the profile data for value profiling. |
| |
| The value profiles corresponding to a profile metadata are serialized contiguously |
| as one record, and value profile records are stored in the same order as the |
| respective profile data, such that a raw profile reader `advances`_ the pointer to |
| profile data and the pointer to value profile records simutaneously [5]_ to find |
| value profiles for a per function, per `FuncHash`_ profile data. |
| |
| .. _`advances`: https://github.com/llvm/llvm-project/blob/7e15fa9161eda7497a5d6abf0d951a1d12d86550/llvm/include/llvm/ProfileData/InstrProfReader.h#L456-L457 |
| |
| Indexed Profile Format |
| =========================== |
| |
| Indexed profiles are generated from ``llvm-profdata``. In the indexed profiles, |
| function data are organized as on-disk hash table such that compilers can |
| look up profile data for functions in an IR module. |
| |
| Compilers and tools must retain backward compatibility with indexed profiles. |
| That is, a tool or a compiler built at newer versions of code must understand |
| profiles generated by older tools or compilers. |
| |
| General Storage Layout |
| ----------------------- |
| |
| The ASCII art depicts the general storage layout of indexed profiles. |
| Specifically, the indexed profile header describes the byte offset of individual |
| payload sections. |
| |
| :: |
| |
| +-----------------------+---+ |
| | Magic | | |
| +-----------------------+ | |
| | Version | | |
| +-----------------------+ | |
| | HashType | H |
| +-----------------------+ E |
| | Byte Offset | A |
| +------ | of section A | D |
| | +-----------------------+ E |
| | | Byte Of fset | R |
| +-----------| of section B | | |
| | | +-----------------------+ | |
| | | | ... | | |
| | | +-----------------------+ | |
| | | | Byte Offset | | |
| +---------------| of section Z | | |
| | | | +-----------------------+---+ |
| | | | | Profile Summary | | |
| | | | +-----------------------+ P |
| | | +------>| Section A | A |
| | | +-----------------------+ Y |
| | +---------->| Section B | L |
| | +-----------------------+ O |
| | | ... | A |
| | +-----------------------+ D |
| +-------------->| Section Z | | |
| +-----------------------+---+ |
| |
| .. note:: |
| |
| Profile summary section is at the beginning of payload. It's right after the |
| header so its position is implicitly known after reading the header. |
| |
| Header |
| -------- |
| |
| The `Header struct`_ is the source of truth and struct fields should explain |
| what's in the header. At a high level, `*Offset` fields record section byte |
| offsets, which are used by readers to locate interesting sections and skip |
| uninteresting ones. |
| |
| .. note:: |
| |
| To maintain backward compatibility of the indexed profiles, existing fields |
| shouldn't be deleted from struct definition; the field order shouldn't be |
| modified. New fields should be appended. |
| |
| .. _`Header struct`: https://github.com/llvm/llvm-project/blob/1a2960bab6381f2b288328e2371829b460ac020c/llvm/include/llvm/ProfileData/InstrProf.h#L1053-L1080 |
| |
| |
| Payload Sections |
| ------------------ |
| |
| (CS) Profile Summary |
| ^^^^^^^^^^^^^^^^^^^^^ |
| This section is right after profile header. It stores the serialized profile |
| summary. For context-sensitive IR-based instrumentation PGO, this section stores |
| an additional profile summary corresponding to the context-sensitive profiles. |
| |
| .. _`function data`: |
| |
| Function data |
| ^^^^^^^^^^^^^^^^^^ |
| This section stores functions and their profiling data as an on-disk hash table. |
| Profile data for functions with the same name are grouped together and share one |
| hash table entry (the functions may come from different shared libraries for |
| instance). The profile data for them are organized as a sequence of key-value |
| pair where the key is `FuncHash`_, and the value is profiled information (represented |
| by `InstrProfRecord`_) for the function. |
| |
| .. _`InstrProfRecord`: https://github.com/llvm/llvm-project/blob/7e405eb722e40c79b7726201d0f76b5dab34ba0f/llvm/include/llvm/ProfileData/InstrProf.h#L693 |
| |
| MemProf Profile data |
| ^^^^^^^^^^^^^^^^^^^^^^ |
| This section stores function's memory profiling data. See |
| `MemProf binary serialization format RFC`_ for the design. |
| |
| .. _`MemProf binary serialization format RFC`: https://lists.llvm.org/pipermail/llvm-dev/2021-September/153007.html |
| |
| Binary Ids |
| ^^^^^^^^^^^^^^^^^^^^^^ |
| The section is used to carry on `binary id`_ information from raw profiles. |
| |
| Temporal Profile Traces |
| ^^^^^^^^^^^^^^^^^^^^^^^^ |
| The section is used to carry on temporal profile information from raw profiles. |
| See `temporal profiling`_ for the design. |
| |
| Virtual Table Names |
| ^^^^^^^^^^^^^^^^^^^^ |
| This section is used to store the names of vtables from raw profile in the indexed |
| profile. |
| |
| Unlike function names which are stored as keys of `function data`_ hash table, |
| vtable names need to be stored in a standalone section in indexed profiles. |
| This way, `llvm-profdata` could show the profiled vtable information in a |
| human-readable way. |
| |
| Profile Data Usage |
| ======================================= |
| |
| ``llvm-profdata`` is the command line tool to display and process instrumentation- |
| based profile data. For supported usages, check out `llvm-profdata documentation <https://llvm.org/docs/CommandGuide/llvm-profdata.html>`_. |
| |
| .. [1] For usage, see https://clang.llvm.org/docs/UsersManual.html#profiling-with-instrumentation |
| .. [2] For example, IR-based instrumentation supports `lightweight instrumentation`_ |
| and `temporal profiling`_. Frontend instrumentation could support `single-byte counters`_. |
| .. [3] A raw profile file could contain the concatenation of multiple raw |
| profiles, for example, from an executable and its shared libraries. Raw |
| profile reader could parse all raw profiles from the file correctly. |
| .. [4] The counter section is used by a few variant types (like temporal |
| profiling) and might have different semantics there. |
| .. [5] The step size of data pointer is the ``sizeof(ProfileData)``, and the step |
| size of value profile pointer is calcuated based on the number of collected |
| values. |
| |
| .. _`lightweight instrumentation`: https://groups.google.com/g/llvm-dev/c/r03Z6JoN7d4 |
| .. _`temporal profiling`: https://discourse.llvm.org/t/rfc-temporal-profiling-extension-for-irpgo/68068 |
| .. _`single-byte counters`: https://discourse.llvm.org/t/rfc-single-byte-counters-for-source-based-code-coverage/75685 |
| .. _`binary profile correlation`: https://discourse.llvm.org/t/rfc-add-binary-profile-correlation-to-not-load-profile-metadata-sections-into-memory-at-runtime/74565 |
| .. _`binary id`: https://lists.llvm.org/pipermail/llvm-dev/2021-June/151154.html |
| .. _`type profiling`: https://discourse.llvm.org/t/rfc-dynamic-type-profiling-and-optimizations-in-llvm/74600 |