docs/CommandLineArgumentReference.rst - llvm-project/openmp - Git at Google

 OpenMP Command-Line Argument Reference
 ======================================
 Welcome to the OpenMP in LLVM command line argument reference. The content is
 not a complete list of arguments but includes the essential command-line
 arguments you may need when compiling and linking OpenMP.
 Section :ref:`general_command_line_arguments` lists OpenMP command line options
 for multicore programming while  :ref:`offload_command_line_arguments` lists
 options relevant to OpenMP target offloading.

 .. _general_command_line_arguments:

 OpenMP Command-Line Arguments
 -----------------------------

 ``-fopenmp``
 ^^^^^^^^^^^^
 Enable the OpenMP compilation toolchain. The compiler will parse OpenMP
 compiler directives and generate parallel code.

 ``-fopenmp-extensions``
 ^^^^^^^^^^^^^^^^^^^^^^^
 Enable all ``Clang`` extensions for OpenMP directives and clauses. A list of
 current extensions and their implementation status can be found on the
 `support <https://clang.llvm.org/docs/OpenMPSupport.html#openmp-extensions>`_
 page.

 ``-fopenmp-simd``
 ^^^^^^^^^^^^^^^^^
 This option enables OpenMP only for single instruction, multiple data
 (SIMD) constructs.

 ``-static-openmp``
 ^^^^^^^^^^^^^^^^^^
 Use the static OpenMP host runtime while linking.

 ``-fopenmp-version=<arg>``
 ^^^^^^^^^^^^^^^^^^^^^^^^^^
 Set the OpenMP version to a specific version ``<arg>`` of the OpenMP standard.
 For example, you may use ``-fopenmp-version=45`` to select version 4.5 of
 the OpenMP standard. The default value is ``-fopenmp-version=51`` for ``Clang``.

 .. _offload_command_line_arguments:

 Offloading Specific Command-Line Arguments
 ------------------------------------------

 .. _fopenmp-targets:

 ``-fopenmp-targets``
 ^^^^^^^^^^^^^^^^^^^^
 | Specify which OpenMP offloading targets should be supported. For example, you
   may specify ``-fopenmp-targets=amdgcn-amd-amdhsa,nvptx64``. This option is
   often optional when :ref:`offload_arch` is provided.
 | It is also possible to offload to CPU architectures, for instance with
   ``-fopenmp-targets=x86_64-pc-linux-gnu``.

 .. _offload_arch:

 ``--offload-arch``
 ^^^^^^^^^^^^^^^^^^
 | Specify the device architecture for OpenMP offloading. For instance
   ``--offload-arch=sm_80`` to target an Nvidia Tesla A100,
   ``--offload-arch=gfx90a`` to target an AMD Instinct MI250X, or
   ``--offload-arch=sm_80,gfx90a`` to target both.
 | It is also possible to specify :ref:`fopenmp-targets` without specifying
   ``--offload-arch``. In that case, the executables ``amdgpu-arch`` or
   ``nvptx-arch`` will be executed as part of the compiler driver to
   detect the device architecture automatically.
 | Finally, the device architecture will also be automatically inferred with
   ``--offload-arch=native``.

 ``--offload-device-only``
 ^^^^^^^^^^^^^^^^^^^^^^^^^
 Compile only the code that goes on the device. This option is mainly for
 debugging purposes. It is primarily used for inspecting the intermediate
 representation (IR) output when compiling for the device. It may also be used
 if device-only runtimes are created.

 ``--offload-host-only``
 ^^^^^^^^^^^^^^^^^^^^^^^
 Compile only the code that goes on the host. With this option enabled, the
 ``.llvm.offloading`` section with embedded device code will not be included in
 the intermediate representation.

 ``--offload-host-device``
 ^^^^^^^^^^^^^^^^^^^^^^^^^
 Compile the target regions for both the host and the device. That is the
 default option.

 ``-Xopenmp-target <arg>``
 ^^^^^^^^^^^^^^^^^^^^^^^^^
 Pass an argument ``<arg>`` to the offloading toolchain, for instance
 ``-Xopenmp-target -march=sm_80``.

 ``-Xopenmp-target=<triple> <arg>``
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 Pass an argument ``<arg>`` to the offloading toolchain for the target
 ``<triple>``. That is especially  useful when an argument must differ for each
 triple. For instance ``-Xopenmp-target=nvptx64 --offload-arch=sm_80
 -Xopenmp-target=amdgcn --offload-arch=gfx90a`` to specify the device
 architecture.  Alternatively, :ref:`Xarch_host` and :ref:`Xarch_device` can
 pass an argument to the host and device compilation toolchain.

 ``-Xoffload-linker<triple> <arg>``
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 Pass an argument ``<arg>`` to the offloading linker for the target specified in
 ``<triple>``.

 .. _Xarch_device:

 ``-Xarch_device <arg>``
 ^^^^^^^^^^^^^^^^^^^^^^^
 Pass an argument ``<arg>`` to the device compilation toolchain.

 .. _Xarch_host:

 ``-Xarch_host <arg>``
 ^^^^^^^^^^^^^^^^^^^^^
 Pass an argument ``<arg>`` to the host compilation toolchain.

 ``-foffload-lto[=<arg>]``
 ^^^^^^^^^^^^^^^^^^^^^^^^^
 Enable device link time optimization (LTO) and select the LTO mode ``<arg>``.
 Select either ``-foffload-lto=thin`` or ``-foffload-lto=full``. Thin LTO takes
 less time while still achieving some performance gains. If no argument is set,
 this option defaults to ``-foffload-lto=full``.

 ``-fopenmp-offload-mandatory``
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 | This option is set to avoid generating the host fallback code
   executed when offloading to the device fails. That is
   helpful when the target contains code that cannot be compiled for the host, for
   instance, if it contains unguarded device intrinsics.
 | This option can also be used to reduce compile time.
 | This option should not be used when one wants to verify that the code is being
   offloaded to the device. Instead, set the environment variable
   ``OMP_TARGET_OFFLOAD='MANDATORY'`` to confirm that the code is being offloaded to
   the device.

 ``-fopenmp-target-debug[=<arg>]``
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 Enable debugging in the device runtime library (RTL). Note that it is both
 necessary to configure the debugging in the device runtime at compile-time with
 ``-fopenmp-target-debug=<arg>`` and enable debugging at runtime with the
 environment  variable ``LIBOMPTARGET_DEVICE_RTL_DEBUG=<arg>``. Further, it is
 currently only supported for Nvidia targets as of July 2023. Alternatively, the
 environment variable ``LIBOMPTARGET_DEBUG`` can be set to debug both Nvidia and
 AMD GPU targets. For more information, see the
 `debugging instructions <https://openmp.llvm.org/design/Runtimes.html#debugging>`_.
 The debugging instructions list the supported debugging arguments.

 ``-fopenmp-target-jit``
 ^^^^^^^^^^^^^^^^^^^^^^^
 | Emit code that is Just-in-Time (JIT) compiled for OpenMP offloading. Embed
   LLVM-IR for the device code in the object files rather than binary code for the
   respective target. At runtime, the LLVM-IR is optimized again and compiled for
   the target device. The optimization level can be set at runtime with
   ``LIBOMPTARGET_JIT_OPT_LEVEL``, for instance,
   ``LIBOMPTARGET_JIT_OPT_LEVEL=3`` corresponding to optimizations level ``-O3``.
   See the
   `OpenMP JIT details <https://openmp.llvm.org/design/Runtimes.html#libomptarget-jit-pre-opt-ir-module>`_
   for instructions on extracting the embedded device code before or after the
   JIT and more.
 | We want to emphasize that JIT for OpenMP offloading is good for debugging  as
   the target IR can be extracted, modified, and injected at runtime.

 ``--offload-new-driver``
 ^^^^^^^^^^^^^^^^^^^^^^^^
 In upstream LLVM, OpenMP only uses the new driver. However, enabling this
 option for experimental linking with CUDA or HIP files is necessary.

 ``--offload-link``
 ^^^^^^^^^^^^^^^^^^
 Use the new offloading linker `clang-linker-wrapper` to perform the link job.
 `clang-linker-wrapper` is the default offloading linker for OpenMP. This option
 can be used to use the new offloading linker in toolchains that do not automatically
 use it. It is necessary to enable this option when linking with CUDA or HIP files.

 ``-nogpulib``
 ^^^^^^^^^^^^^
 Do not link the device library for CUDA or HIP device compilation.

 ``-nogpuinc``
 ^^^^^^^^^^^^^
 Do not include the default CUDA or HIP headers, and do not add CUDA or HIP
 include paths.
	OpenMP Command-Line Argument Reference
	======================================
	Welcome to the OpenMP in LLVM command line argument reference. The content is
	not a complete list of arguments but includes the essential command-line
	arguments you may need when compiling and linking OpenMP.
	Section :ref:`general_command_line_arguments` lists OpenMP command line options
	for multicore programming while :ref:`offload_command_line_arguments` lists
	options relevant to OpenMP target offloading.

	.. _general_command_line_arguments:

	OpenMP Command-Line Arguments
	-----------------------------

	``-fopenmp``
	^^^^^^^^^^^^
	Enable the OpenMP compilation toolchain. The compiler will parse OpenMP
	compiler directives and generate parallel code.

	``-fopenmp-extensions``
	^^^^^^^^^^^^^^^^^^^^^^^
	Enable all ``Clang`` extensions for OpenMP directives and clauses. A list of
	current extensions and their implementation status can be found on the
	`support <https://clang.llvm.org/docs/OpenMPSupport.html#openmp-extensions>`_
	page.

	``-fopenmp-simd``
	^^^^^^^^^^^^^^^^^
	This option enables OpenMP only for single instruction, multiple data
	(SIMD) constructs.

	``-static-openmp``
	^^^^^^^^^^^^^^^^^^
	Use the static OpenMP host runtime while linking.

	``-fopenmp-version=<arg>``
	^^^^^^^^^^^^^^^^^^^^^^^^^^
	Set the OpenMP version to a specific version ``<arg>`` of the OpenMP standard.
	For example, you may use ``-fopenmp-version=45`` to select version 4.5 of
	the OpenMP standard. The default value is ``-fopenmp-version=51`` for ``Clang``.

	.. _offload_command_line_arguments:

	Offloading Specific Command-Line Arguments
	------------------------------------------

	.. _fopenmp-targets:

	``-fopenmp-targets``
	^^^^^^^^^^^^^^^^^^^^
	\| Specify which OpenMP offloading targets should be supported. For example, you
	may specify ``-fopenmp-targets=amdgcn-amd-amdhsa,nvptx64``. This option is
	often optional when :ref:`offload_arch` is provided.
	\| It is also possible to offload to CPU architectures, for instance with
	``-fopenmp-targets=x86_64-pc-linux-gnu``.

	.. _offload_arch:

	``--offload-arch``
	^^^^^^^^^^^^^^^^^^
	\| Specify the device architecture for OpenMP offloading. For instance
	``--offload-arch=sm_80`` to target an Nvidia Tesla A100,
	``--offload-arch=gfx90a`` to target an AMD Instinct MI250X, or
	``--offload-arch=sm_80,gfx90a`` to target both.
	\| It is also possible to specify :ref:`fopenmp-targets` without specifying
	``--offload-arch``. In that case, the executables ``amdgpu-arch`` or
	``nvptx-arch`` will be executed as part of the compiler driver to
	detect the device architecture automatically.
	\| Finally, the device architecture will also be automatically inferred with
	``--offload-arch=native``.

	``--offload-device-only``
	^^^^^^^^^^^^^^^^^^^^^^^^^
	Compile only the code that goes on the device. This option is mainly for
	debugging purposes. It is primarily used for inspecting the intermediate
	representation (IR) output when compiling for the device. It may also be used
	if device-only runtimes are created.

	``--offload-host-only``
	^^^^^^^^^^^^^^^^^^^^^^^
	Compile only the code that goes on the host. With this option enabled, the
	``.llvm.offloading`` section with embedded device code will not be included in
	the intermediate representation.

	``--offload-host-device``
	^^^^^^^^^^^^^^^^^^^^^^^^^
	Compile the target regions for both the host and the device. That is the
	default option.

	``-Xopenmp-target <arg>``
	^^^^^^^^^^^^^^^^^^^^^^^^^
	Pass an argument ``<arg>`` to the offloading toolchain, for instance
	``-Xopenmp-target -march=sm_80``.

	``-Xopenmp-target=<triple> <arg>``
	^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
	Pass an argument ``<arg>`` to the offloading toolchain for the target
	``<triple>``. That is especially useful when an argument must differ for each
	triple. For instance ``-Xopenmp-target=nvptx64 --offload-arch=sm_80
	-Xopenmp-target=amdgcn --offload-arch=gfx90a`` to specify the device
	architecture. Alternatively, :ref:`Xarch_host` and :ref:`Xarch_device` can
	pass an argument to the host and device compilation toolchain.

	``-Xoffload-linker<triple> <arg>``
	^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
	Pass an argument ``<arg>`` to the offloading linker for the target specified in
	``<triple>``.

	.. _Xarch_device:

	``-Xarch_device <arg>``
	^^^^^^^^^^^^^^^^^^^^^^^
	Pass an argument ``<arg>`` to the device compilation toolchain.

	.. _Xarch_host:

	``-Xarch_host <arg>``
	^^^^^^^^^^^^^^^^^^^^^
	Pass an argument ``<arg>`` to the host compilation toolchain.

	``-foffload-lto[=<arg>]``
	^^^^^^^^^^^^^^^^^^^^^^^^^
	Enable device link time optimization (LTO) and select the LTO mode ``<arg>``.
	Select either ``-foffload-lto=thin`` or ``-foffload-lto=full``. Thin LTO takes
	less time while still achieving some performance gains. If no argument is set,
	this option defaults to ``-foffload-lto=full``.

	``-fopenmp-offload-mandatory``
	^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
	\| This option is set to avoid generating the host fallback code
	executed when offloading to the device fails. That is
	helpful when the target contains code that cannot be compiled for the host, for
	instance, if it contains unguarded device intrinsics.
	\| This option can also be used to reduce compile time.
	\| This option should not be used when one wants to verify that the code is being
	offloaded to the device. Instead, set the environment variable
	``OMP_TARGET_OFFLOAD='MANDATORY'`` to confirm that the code is being offloaded to
	the device.

	``-fopenmp-target-debug[=<arg>]``
	^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
	Enable debugging in the device runtime library (RTL). Note that it is both
	necessary to configure the debugging in the device runtime at compile-time with
	``-fopenmp-target-debug=<arg>`` and enable debugging at runtime with the
	environment variable ``LIBOMPTARGET_DEVICE_RTL_DEBUG=<arg>``. Further, it is
	currently only supported for Nvidia targets as of July 2023. Alternatively, the
	environment variable ``LIBOMPTARGET_DEBUG`` can be set to debug both Nvidia and
	AMD GPU targets. For more information, see the
	`debugging instructions <https://openmp.llvm.org/design/Runtimes.html#debugging>`_.
	The debugging instructions list the supported debugging arguments.

	``-fopenmp-target-jit``
	^^^^^^^^^^^^^^^^^^^^^^^
	\| Emit code that is Just-in-Time (JIT) compiled for OpenMP offloading. Embed
	LLVM-IR for the device code in the object files rather than binary code for the
	respective target. At runtime, the LLVM-IR is optimized again and compiled for
	the target device. The optimization level can be set at runtime with
	``LIBOMPTARGET_JIT_OPT_LEVEL``, for instance,
	``LIBOMPTARGET_JIT_OPT_LEVEL=3`` corresponding to optimizations level ``-O3``.
	See the
	`OpenMP JIT details <https://openmp.llvm.org/design/Runtimes.html#libomptarget-jit-pre-opt-ir-module>`_
	for instructions on extracting the embedded device code before or after the
	JIT and more.
	\| We want to emphasize that JIT for OpenMP offloading is good for debugging as
	the target IR can be extracted, modified, and injected at runtime.

	``--offload-new-driver``
	^^^^^^^^^^^^^^^^^^^^^^^^
	In upstream LLVM, OpenMP only uses the new driver. However, enabling this
	option for experimental linking with CUDA or HIP files is necessary.

	``--offload-link``
	^^^^^^^^^^^^^^^^^^
	Use the new offloading linker `clang-linker-wrapper` to perform the link job.
	`clang-linker-wrapper` is the default offloading linker for OpenMP. This option
	can be used to use the new offloading linker in toolchains that do not automatically
	use it. It is necessary to enable this option when linking with CUDA or HIP files.

	``-nogpulib``
	^^^^^^^^^^^^^
	Do not link the device library for CUDA or HIP device compilation.

	``-nogpuinc``
	^^^^^^^^^^^^^
	Do not include the default CUDA or HIP headers, and do not add CUDA or HIP
	include paths.