BOLT Profile Formats

BOLT accepts profile data in several formats. This document describes each format, how to generate it, and how BOLT consumes it.

The general recommended workflow is to convert unsymbolized profiles (perf.data or pre-aggregated) into symbolized (fdata or YAML):

$ perf2bolt executable \
# perf.data is consumed directly:
  -p perf.data
# OR pre-aggregated requires `--pa` switch:
  -p preagg --pa
# fdata is the default output format, YAML is optionally emitted using `-w` flag:
  -o perf.fdata [-w perf.yaml]
# the output format for `-o` can be switched with `--profile-format`:
  -o perf.yaml --profile-format=yaml

Unsymbolized profiles

Sample or trace profiles without symbol information accepted by perf2bolt, to be converted into symbolized profile formats, used by llvm-bolt.

Linux perf data

Collection

Example with brstack:

perf record -j any,u -e cycles:u -o perf.data -- ./binary

Consumption modes

  • Branch samples (default): Branch stack samples from capable hardware (Intel LBR, AMD LBRv2/BRS, ARM BRBE). Used by default with perf2bolt and llvm-bolt -p perf.data.
  • Basic aggregation (-ba): Sample-based profile without branch stacks. Lower quality but works on hardware/VMs without branch sampling support.
  • Tracing (--itrace): Synthesizing branch stacks from trace profile (Intel PT, ARM ETM). Requires a value (e.g. i10usl), see perf documentation for details.
  • ARM SPE (--spe): Statistical Profiling Extension on supported ARM platforms providing short (1-deep) branch stacks.

Build-id verification

BOLT verifies that the build-id in perf.data matches the input binary. Use --ignore-build-id to skip this check.

Pre-aggregated format

Pre-aggregated profile for direct consumption by perf2bolt --pa or llvm-bolt --pa. Enables external tools to generate BOLT-compatible profiles without going through perf.data.

Entry types

E <event>
S <start> <count>
[TR] <branch> <ft_start> <ft_end> <count>
B <start> <end> <count> <mispred_count>
[Ff] <start> <end> <count>
r <start> <end> <count>

Where:

  • E — Name of the sampling event used for subsequent entries.
  • S — Aggregated basic sample at <start>.
  • T — Aggregated trace: branch from <branch> to <ft_start> with a fall-through to <ft_end>.
  • R — Aggregated trace originating at a return.
  • B — Aggregated branch from <start> to <end>.
  • F — Aggregated fall-through from <start> to <end>.
  • f — Aggregated fall-through with external origin (disambiguates returns hitting a basic block head from regular internal jumps).
  • r — Aggregated fall-through originating at an external return (no checks performed for fall-through start).

Location format

Locations have the format [<buildid>:]<offset>:

  • <offset> — Hex offset from the object base load address.
  • <buildid>:<offset> — Offset within the object identified by <buildid>.
  • X:<addr> — External address (outside the profiled binary).

Examples

Basic samples profile:

E cycles
S 41be50 3
E br_inst_retired.near_taken
S 41be60 6

Trace profile combining branches and fall-throughs:

T 4b196f 4b19e0 4b19ef 2

Legacy branch profile with separate branches and fall-throughs:

F 41be50 41be50 3
F 41be90 41be90 4
B 4b1942 39b57f0 3 0
B 4b196f 4b19e0 2 0

Generation

Pre-aggregated profiles can be generated by external tools. See ebpf-bolt for a reference implementation using eBPF-based collection.

Symbolized profiles

The profiles accepted by llvm-bolt. fdata is the legacy format, YAML is the rich (metadata-enabled) format.

fdata format

Plaintext, space-separated branch profile format written by perf2bolt and consumed by llvm-bolt -data <file>. Also produced by BOLT instrumentation.

LBR mode format

Each line records a branch:

<is_sym_from> <sym_from> <off_from> <is_sym_to> <sym_to> <off_to> <mispreds> <branches>

Where:

  • <is_sym_from>, <is_sym_to>: 1 if the name is an ELF symbol, 0 if it is a DSO name. Special values: 2 for local symbols (includes filename), 3/4/5 for memory events.
  • <sym_from>, <sym_to>: Symbol name or DSO name.
  • <off_from>, <off_to>: Hex offset relative to the symbol/DSO.
  • <mispreds>: Number of branch mispredictions.
  • <branches>: Total number of branches.

Example:

1 main 3fb 0 /lib/ld-2.21.so 12 4 221

No-LBR mode format

Requires no_lbr header followed by an optional event name:

no_lbr <event_name>
<is_sym> <sym> <off> <count>

Special headers

  • boltedcollection: Indicates profile collected on a BOLTed binary. Requires BAT (BOLT Address Translation) tables for remapping.

Memory events format

Memory event types use <is_sym> values 3, 4, 5 to record load address information alongside the instruction location.

YAML format

Structured profile format with block-level granularity. More resilient to binary changes and supports stale profile matching.

Schema

Defined in ProfileYAMLMapping.h:

header:
  profile-version: <uint32>
  binary-name: <string>
  binary-build-id: <string>        # optional
  profile-flags: [lbr|sample|memevent]
  profile-origin: <string>         # optional, how profile was obtained
  profile-events: <string>         # optional, event names
  dfs-order: <bool>                # optional, default true
  hash-func: <std-hash|xxh3>      # optional, default std-hash
functions:
  - name: <string>
    fid: <uint32>
    hash: <hex64>
    exec: <uint64>
    nblocks: <uint32>
    blocks:
      - bid: <uint32>
        insns: <uint32>
        hash: <hex64>              # optional
        exec: <uint64>             # optional
        succ: [{bid, cnt, mis}]    # optional
        calls: [{off, fid, cnt}]   # optional
    inline_tree: [...]             # optional, pseudo probe info

Hash functions

  • std-hash: Standard hash function (default for backward compatibility).
  • xxh3: XXH3 hash function (recommended, better distribution).

Stale profile matching

BOLT supports matching profiles to modified binaries using block hashes and call graph matching. When the binary changes between profile collection and optimization, BOLT uses the hash values to find corresponding blocks in the new binary.