blob: b921f35830b0e26c688fc946b00b09ffa2686b51 [file] [log] [blame]
llvm-mca - LLVM Machine Code Analyzer
=====================================
SYNOPSIS
--------
:program:`llvm-mca` [*options*] [input]
DESCRIPTION
-----------
:program:`llvm-mca` is a performance analysis tool that uses information
available in LLVM (e.g. scheduling models) to statically measure the performance
of machine code in a specific CPU.
Performance is measured in terms of throughput as well as processor resource
consumption. The tool currently works for processors with an out-of-order
backend, for which there is a scheduling model available in LLVM.
The main goal of this tool is not just to predict the performance of the code
when run on the target, but also help with diagnosing potential performance
issues.
Given an assembly code sequence, llvm-mca estimates the IPC (Instructions Per
Cycle), as well as hardware resource pressure. The analysis and reporting style
were inspired by the IACA tool from Intel.
:program:`llvm-mca` allows the usage of special code comments to mark regions of
the assembly code to be analyzed. A comment starting with substring
``LLVM-MCA-BEGIN`` marks the beginning of a code region. A comment starting with
substring ``LLVM-MCA-END`` marks the end of a code region. For example:
.. code-block:: none
# LLVM-MCA-BEGIN My Code Region
...
# LLVM-MCA-END
Multiple regions can be specified provided that they do not overlap. A code
region can have an optional description. If no user-defined region is specified,
then :program:`llvm-mca` assumes a default region which contains every
instruction in the input file. Every region is analyzed in isolation, and the
final performance report is the union of all the reports generated for every
code region.
Inline assembly directives may be used from source code to annotate the
assembly text:
.. code-block:: c++
int foo(int a, int b) {
__asm volatile("# LLVM-MCA-BEGIN foo");
a += 42;
__asm volatile("# LLVM-MCA-END");
a *= b;
return a;
}
So for example, you can compile code with clang, output assembly, and pipe it
directly into llvm-mca for analysis:
.. code-block:: bash
$ clang foo.c -O2 -target x86_64-unknown-unknown -S -o - | llvm-mca -mcpu=btver2
Or for Intel syntax:
.. code-block:: bash
$ clang foo.c -O2 -target x86_64-unknown-unknown -mllvm -x86-asm-syntax=intel -S -o - | llvm-mca -mcpu=btver2
OPTIONS
-------
If ``input`` is "``-``" or omitted, :program:`llvm-mca` reads from standard
input. Otherwise, it will read from the specified filename.
If the :option:`-o` option is omitted, then :program:`llvm-mca` will send its output
to standard output if the input is from standard input. If the :option:`-o`
option specifies "``-``", then the output will also be sent to standard output.
.. option:: -help
Print a summary of command line options.
.. option:: -mtriple=<target triple>
Specify a target triple string.
.. option:: -march=<arch>
Specify the architecture for which to analyze the code. It defaults to the
host default target.
.. option:: -mcpu=<cpuname>
Specify the processor for which to analyze the code. By default, the cpu name
is autodetected from the host.
.. option:: -output-asm-variant=<variant id>
Specify the output assembly variant for the report generated by the tool.
On x86, possible values are [0, 1]. A value of 0 (vic. 1) for this flag enables
the AT&T (vic. Intel) assembly format for the code printed out by the tool in
the analysis report.
.. option:: -dispatch=<width>
Specify a different dispatch width for the processor. The dispatch width
defaults to field 'IssueWidth' in the processor scheduling model. If width is
zero, then the default dispatch width is used.
.. option:: -register-file-size=<size>
Specify the size of the register file. When specified, this flag limits how
many temporary registers are available for register renaming purposes. A value
of zero for this flag means "unlimited number of temporary registers".
.. option:: -iterations=<number of iterations>
Specify the number of iterations to run. If this flag is set to 0, then the
tool sets the number of iterations to a default value (i.e. 100).
.. option:: -noalias=<bool>
If set, the tool assumes that loads and stores don't alias. This is the
default behavior.
.. option:: -lqueue=<load queue size>
Specify the size of the load queue in the load/store unit emulated by the tool.
By default, the tool assumes an unbound number of entries in the load queue.
A value of zero for this flag is ignored, and the default load queue size is
used instead.
.. option:: -squeue=<store queue size>
Specify the size of the store queue in the load/store unit emulated by the
tool. By default, the tool assumes an unbound number of entries in the store
queue. A value of zero for this flag is ignored, and the default store queue
size is used instead.
.. option:: -timeline
Enable the timeline view.
.. option:: -timeline-max-iterations=<iterations>
Limit the number of iterations to print in the timeline view. By default, the
timeline view prints information for up to 10 iterations.
.. option:: -timeline-max-cycles=<cycles>
Limit the number of cycles in the timeline view. By default, the number of
cycles is set to 80.
.. option:: -resource-pressure
Enable the resource pressure view. This is enabled by default.
.. option:: -register-file-stats
Enable register file usage statistics.
.. option:: -dispatch-stats
Enable extra dispatch statistics. This view collects and analyzes instruction
dispatch events, as well as static/dynamic dispatch stall events. This view
is disabled by default.
.. option:: -scheduler-stats
Enable extra scheduler statistics. This view collects and analyzes instruction
issue events. This view is disabled by default.
.. option:: -retire-stats
Enable extra retire control unit statistics. This view is disabled by default.
.. option:: -instruction-info
Enable the instruction info view. This is enabled by default.
.. option:: -all-stats
Print all hardware statistics. This enables extra statistics related to the
dispatch logic, the hardware schedulers, the register file(s), and the retire
control unit. This option is disabled by default.
.. option:: -all-views
Enable all the view.
.. option:: -instruction-tables
Prints resource pressure information based on the static information
available from the processor model. This differs from the resource pressure
view because it doesn't require that the code is simulated. It instead prints
the theoretical uniform distribution of resource pressure for every
instruction in sequence.
EXIT STATUS
-----------
:program:`llvm-mca` returns 0 on success. Otherwise, an error message is printed
to standard error, and the tool returns 1.