| # BOLT Profile Formats |
| |
| BOLT accepts profile data in several formats. This document describes each |
| format, how to generate it, and how BOLT consumes it. |
| |
| The general recommended workflow is to convert unsymbolized profiles (perf.data |
| or pre-aggregated) into symbolized (fdata or YAML): |
| |
| ``` |
| $ perf2bolt executable \ |
| # perf.data is consumed directly: |
| -p perf.data |
| # OR pre-aggregated requires `--pa` switch: |
| -p preagg --pa |
| # fdata is the default output format, YAML is optionally emitted using `-w` flag: |
| -o perf.fdata [-w perf.yaml] |
| # the output format for `-o` can be switched with `--profile-format`: |
| -o perf.yaml --profile-format=yaml |
| ``` |
| |
| # Unsymbolized profiles |
| Sample or trace profiles without symbol information accepted by |
| perf2bolt, to be converted into symbolized profile formats, used by llvm-bolt. |
| |
| ## Linux perf data |
| |
| ### Collection |
| Example with brstack: |
| ```bash |
| perf record -j any,u -e cycles:u -o perf.data -- ./binary |
| ``` |
| |
| ### Consumption modes |
| |
| - **Branch samples (default)**: Branch stack samples from capable hardware |
| (Intel LBR, AMD LBRv2/BRS, ARM BRBE). |
| Used by default with `perf2bolt` and `llvm-bolt -p perf.data`. |
| - **Basic aggregation (`-ba`)**: Sample-based profile without branch stacks. |
| Lower quality but works on hardware/VMs without branch sampling support. |
| - **Tracing (`--itrace`)**: Synthesizing branch stacks from trace profile (Intel PT, ARM ETM). |
| Requires a value (e.g. `i10usl`), see |
| [perf documentation](https://github.com/torvalds/linux/blob/35f5aa9ccc83f4a4171cdb6ba023e514e2b2ecff/tools/perf/Documentation/itrace.txt) |
| for details. |
| - **ARM SPE (`--spe`)**: Statistical Profiling Extension on supported ARM |
| platforms providing short (1-deep) branch stacks. |
| |
| ### Build-id verification |
| |
| BOLT verifies that the build-id in `perf.data` matches the input binary. |
| Use `--ignore-build-id` to skip this check. |
| |
| ## Pre-aggregated format |
| |
| Pre-aggregated profile for direct consumption by `perf2bolt --pa` or |
| `llvm-bolt --pa`. Enables external tools to generate BOLT-compatible profiles |
| without going through `perf.data`. |
| |
| ### Entry types |
| |
| ``` |
| E <event> |
| S <start> <count> |
| [TR] <branch> <ft_start> <ft_end> <count> |
| B <start> <end> <count> <mispred_count> |
| [Ff] <start> <end> <count> |
| r <start> <end> <count> |
| ``` |
| |
| Where: |
| - `E` — Name of the sampling event used for subsequent entries. |
| - `S` — Aggregated basic sample at `<start>`. |
| - `T` — Aggregated trace: branch from `<branch>` to `<ft_start>` with a |
| fall-through to `<ft_end>`. |
| - `R` — Aggregated trace originating at a return. |
| - `B` — Aggregated branch from `<start>` to `<end>`. |
| - `F` — Aggregated fall-through from `<start>` to `<end>`. |
| - `f` — Aggregated fall-through with external origin (disambiguates returns |
| hitting a basic block head from regular internal jumps). |
| - `r` — Aggregated fall-through originating at an external return (no checks |
| performed for fall-through start). |
| |
| ### Location format |
| |
| Locations have the format `[<buildid>:]<offset>`: |
| - `<offset>` — Hex offset from the object base load address. |
| - `<buildid>:<offset>` — Offset within the object identified by `<buildid>`. |
| - `X:<addr>` — External address (outside the profiled binary). |
| |
| ### Examples |
| |
| Basic samples profile: |
| ``` |
| E cycles |
| S 41be50 3 |
| E br_inst_retired.near_taken |
| S 41be60 6 |
| ``` |
| |
| Trace profile combining branches and fall-throughs: |
| ``` |
| T 4b196f 4b19e0 4b19ef 2 |
| ``` |
| |
| Legacy branch profile with separate branches and fall-throughs: |
| ``` |
| F 41be50 41be50 3 |
| F 41be90 41be90 4 |
| B 4b1942 39b57f0 3 0 |
| B 4b196f 4b19e0 2 0 |
| ``` |
| |
| ### Generation |
| |
| Pre-aggregated profiles can be generated by external tools. See |
| [ebpf-bolt](https://github.com/aaupov/ebpf-bolt) for a reference |
| implementation using eBPF-based collection. |
| |
| # Symbolized profiles |
| The profiles accepted by llvm-bolt. fdata is the legacy format, YAML is the rich (metadata-enabled) format. |
| |
| ## fdata format |
| |
| Plaintext, space-separated branch profile format written by `perf2bolt` and |
| consumed by `llvm-bolt -data <file>`. Also produced by BOLT instrumentation. |
| |
| ### LBR mode format |
| |
| Each line records a branch: |
| |
| ``` |
| <is_sym_from> <sym_from> <off_from> <is_sym_to> <sym_to> <off_to> <mispreds> <branches> |
| ``` |
| |
| Where: |
| - `<is_sym_from>`, `<is_sym_to>`: `1` if the name is an ELF symbol, `0` if |
| it is a DSO name. Special values: `2` for local symbols (includes |
| filename), `3`/`4`/`5` for memory events. |
| - `<sym_from>`, `<sym_to>`: Symbol name or DSO name. |
| - `<off_from>`, `<off_to>`: Hex offset relative to the symbol/DSO. |
| - `<mispreds>`: Number of branch mispredictions. |
| - `<branches>`: Total number of branches. |
| |
| Example: |
| ``` |
| 1 main 3fb 0 /lib/ld-2.21.so 12 4 221 |
| ``` |
| |
| ### No-LBR mode format |
| |
| Requires `no_lbr` header followed by an optional event name: |
| |
| ``` |
| no_lbr <event_name> |
| <is_sym> <sym> <off> <count> |
| ``` |
| |
| ### Special headers |
| |
| - `boltedcollection`: Indicates profile collected on a BOLTed binary. |
| Requires BAT (BOLT Address Translation) tables for remapping. |
| |
| ### Memory events format |
| |
| Memory event types use `<is_sym>` values 3, 4, 5 to record load address |
| information alongside the instruction location. |
| |
| ## YAML format |
| |
| Structured profile format with block-level granularity. More resilient to |
| binary changes and supports stale profile matching. |
| |
| ### Schema |
| |
| Defined in `ProfileYAMLMapping.h`: |
| |
| ```yaml |
| header: |
| profile-version: <uint32> |
| binary-name: <string> |
| binary-build-id: <string> # optional |
| profile-flags: [lbr|sample|memevent] |
| profile-origin: <string> # optional, how profile was obtained |
| profile-events: <string> # optional, event names |
| dfs-order: <bool> # optional, default true |
| hash-func: <std-hash|xxh3> # optional, default std-hash |
| functions: |
| - name: <string> |
| fid: <uint32> |
| hash: <hex64> |
| exec: <uint64> |
| nblocks: <uint32> |
| blocks: |
| - bid: <uint32> |
| insns: <uint32> |
| hash: <hex64> # optional |
| exec: <uint64> # optional |
| succ: [{bid, cnt, mis}] # optional |
| calls: [{off, fid, cnt}] # optional |
| inline_tree: [...] # optional, pseudo probe info |
| ``` |
| |
| ### Hash functions |
| |
| - `std-hash`: Standard hash function (default for backward compatibility). |
| - `xxh3`: XXH3 hash function (recommended, better distribution). |
| |
| ### Stale profile matching |
| |
| BOLT supports matching profiles to modified binaries using block hashes and |
| call graph matching. When the binary changes between profile collection and |
| optimization, BOLT uses the hash values to find corresponding blocks in the |
| new binary. |