llvm/docs/KernelInfo.rst - llvm-project - Git at Google

 ==========
 KernelInfo
 ==========

 .. contents::
    :local:

 Introduction
 ============

 This LLVM IR pass reports various statistics for codes compiled for GPUs.  The
 goal of these statistics is to help identify bad code patterns and ways to
 mitigate them.  The pass operates at the LLVM IR level so that it can, in
 theory, support any LLVM-based compiler for programming languages supporting
 GPUs.

 By default, the pass runs at the end of LTO, and options like
 ``-Rpass=kernel-info`` enable its remarks.  Example ``opt`` and ``clang``
 command lines appear in the next section.

 Remarks include summary statistics (e.g., total size of static allocas) and
 individual occurrences (e.g., source location of each alloca).  Examples of the
 output appear in tests in `llvm/test/Analysis/KernelInfo`.

 Example Command Lines
 =====================

 To analyze a C program as it appears to an LLVM GPU backend at the end of LTO:

 .. code-block:: shell

   $ clang -O2 -g -fopenmp --offload-arch=native test.c -foffload-lto \
       -Rpass=kernel-info

 To analyze specified LLVM IR, perhaps previously generated by something like
 ``clang -save-temps -g -fopenmp --offload-arch=native test.c``:

 .. code-block:: shell

   $ opt -disable-output test-openmp-nvptx64-nvidia-cuda-sm_70.bc \
       -pass-remarks=kernel-info -passes=kernel-info

 When specifying an LLVM pass pipeline on the command line, ``kernel-info`` still
 runs at the end of LTO by default.  ``-no-kernel-info-end-lto`` disables that
 behavior so you can position ``kernel-info`` explicitly:

 .. code-block:: shell

   $ clang -O2 -g -fopenmp --offload-arch=native test.c -foffload-lto \
       -Rpass=kernel-info \
       -Xoffload-linker --lto-newpm-passes='lto<O2>'

   $ clang -O2 -g -fopenmp --offload-arch=native test.c -foffload-lto \
       -Rpass=kernel-info -mllvm -no-kernel-info-end-lto \
       -Xoffload-linker --lto-newpm-passes='module(kernel-info),lto<O2>'

   $ opt -disable-output test-openmp-nvptx64-nvidia-cuda-sm_70.bc \
       -pass-remarks=kernel-info \
       -passes='lto<O2>'

   $ opt -disable-output test-openmp-nvptx64-nvidia-cuda-sm_70.bc \
       -pass-remarks=kernel-info -no-kernel-info-end-lto \
       -passes='module(kernel-info),lto<O2>'
	==========
	KernelInfo
	==========

	.. contents::
	:local:

	Introduction
	============

	This LLVM IR pass reports various statistics for codes compiled for GPUs. The
	goal of these statistics is to help identify bad code patterns and ways to
	mitigate them. The pass operates at the LLVM IR level so that it can, in
	theory, support any LLVM-based compiler for programming languages supporting
	GPUs.

	By default, the pass runs at the end of LTO, and options like
	``-Rpass=kernel-info`` enable its remarks. Example ``opt`` and ``clang``
	command lines appear in the next section.

	Remarks include summary statistics (e.g., total size of static allocas) and
	individual occurrences (e.g., source location of each alloca). Examples of the
	output appear in tests in `llvm/test/Analysis/KernelInfo`.

	Example Command Lines
	=====================

	To analyze a C program as it appears to an LLVM GPU backend at the end of LTO:

	.. code-block:: shell

	$ clang -O2 -g -fopenmp --offload-arch=native test.c -foffload-lto \
	-Rpass=kernel-info

	To analyze specified LLVM IR, perhaps previously generated by something like
	``clang -save-temps -g -fopenmp --offload-arch=native test.c``:

	.. code-block:: shell

	$ opt -disable-output test-openmp-nvptx64-nvidia-cuda-sm_70.bc \
	-pass-remarks=kernel-info -passes=kernel-info

	When specifying an LLVM pass pipeline on the command line, ``kernel-info`` still
	runs at the end of LTO by default. ``-no-kernel-info-end-lto`` disables that
	behavior so you can position ``kernel-info`` explicitly:

	.. code-block:: shell

	$ clang -O2 -g -fopenmp --offload-arch=native test.c -foffload-lto \
	-Rpass=kernel-info \
	-Xoffload-linker --lto-newpm-passes='lto<O2>'

	$ clang -O2 -g -fopenmp --offload-arch=native test.c -foffload-lto \
	-Rpass=kernel-info -mllvm -no-kernel-info-end-lto \
	-Xoffload-linker --lto-newpm-passes='module(kernel-info),lto<O2>'

	$ opt -disable-output test-openmp-nvptx64-nvidia-cuda-sm_70.bc \
	-pass-remarks=kernel-info \
	-passes='lto<O2>'

	$ opt -disable-output test-openmp-nvptx64-nvidia-cuda-sm_70.bc \
	-pass-remarks=kernel-info -no-kernel-info-end-lto \
	-passes='module(kernel-info),lto<O2>'