| OpenMP Command-Line Argument Reference |
| ====================================== |
| Welcome to the OpenMP in LLVM command line argument reference. The content is |
| not a complete list of arguments but includes the essential command-line |
| arguments you may need when compiling and linking OpenMP. |
| Section :ref:`general_command_line_arguments` lists OpenMP command line options |
| for multicore programming while :ref:`offload_command_line_arguments` lists |
| options relevant to OpenMP target offloading. |
| |
| .. _general_command_line_arguments: |
| |
| OpenMP Command-Line Arguments |
| ----------------------------- |
| |
| ``-fopenmp`` |
| ^^^^^^^^^^^^ |
| Enable the OpenMP compilation toolchain. The compiler will parse OpenMP |
| compiler directives and generate parallel code. |
| |
| ``-fopenmp-extensions`` |
| ^^^^^^^^^^^^^^^^^^^^^^^ |
| Enable all ``Clang`` extensions for OpenMP directives and clauses. A list of |
| current extensions and their implementation status can be found on the |
| `support <https://clang.llvm.org/docs/OpenMPSupport.html#openmp-extensions>`_ |
| page. |
| |
| ``-fopenmp-simd`` |
| ^^^^^^^^^^^^^^^^^ |
| This option enables OpenMP only for single instruction, multiple data |
| (SIMD) constructs. |
| |
| ``-static-openmp`` |
| ^^^^^^^^^^^^^^^^^^ |
| Use the static OpenMP host runtime while linking. |
| |
| ``-fopenmp-version=<arg>`` |
| ^^^^^^^^^^^^^^^^^^^^^^^^^^ |
| Set the OpenMP version to a specific version ``<arg>`` of the OpenMP standard. |
| For example, you may use ``-fopenmp-version=45`` to select version 4.5 of |
| the OpenMP standard. The default value is ``-fopenmp-version=51`` for ``Clang``. |
| |
| .. _offload_command_line_arguments: |
| |
| Offloading Specific Command-Line Arguments |
| ------------------------------------------ |
| |
| .. _fopenmp-targets: |
| |
| ``-fopenmp-targets`` |
| ^^^^^^^^^^^^^^^^^^^^ |
| | Specify which OpenMP offloading targets should be supported. For example, you |
| may specify ``-fopenmp-targets=amdgcn-amd-amdhsa,nvptx64``. This option is |
| often optional when :ref:`offload_arch` is provided. |
| | It is also possible to offload to CPU architectures, for instance with |
| ``-fopenmp-targets=x86_64-pc-linux-gnu``. |
| |
| .. _offload_arch: |
| |
| ``--offload-arch`` |
| ^^^^^^^^^^^^^^^^^^ |
| | Specify the device architecture for OpenMP offloading. For instance |
| ``--offload-arch=sm_80`` to target an Nvidia Tesla A100, |
| ``--offload-arch=gfx90a`` to target an AMD Instinct MI250X, or |
| ``--offload-arch=sm_80,gfx90a`` to target both. |
| | It is also possible to specify :ref:`fopenmp-targets` without specifying |
| ``--offload-arch``. In that case, the executables ``amdgpu-arch`` or |
| ``nvptx-arch`` will be executed as part of the compiler driver to |
| detect the device architecture automatically. |
| | Finally, the device architecture will also be automatically inferred with |
| ``--offload-arch=native``. |
| |
| ``--offload-device-only`` |
| ^^^^^^^^^^^^^^^^^^^^^^^^^ |
| Compile only the code that goes on the device. This option is mainly for |
| debugging purposes. It is primarily used for inspecting the intermediate |
| representation (IR) output when compiling for the device. It may also be used |
| if device-only runtimes are created. |
| |
| ``--offload-host-only`` |
| ^^^^^^^^^^^^^^^^^^^^^^^ |
| Compile only the code that goes on the host. With this option enabled, the |
| ``.llvm.offloading`` section with embedded device code will not be included in |
| the intermediate representation. |
| |
| ``--offload-host-device`` |
| ^^^^^^^^^^^^^^^^^^^^^^^^^ |
| Compile the target regions for both the host and the device. That is the |
| default option. |
| |
| ``-Xopenmp-target <arg>`` |
| ^^^^^^^^^^^^^^^^^^^^^^^^^ |
| Pass an argument ``<arg>`` to the offloading toolchain, for instance |
| ``-Xopenmp-target -march=sm_80``. |
| |
| ``-Xopenmp-target=<triple> <arg>`` |
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
| Pass an argument ``<arg>`` to the offloading toolchain for the target |
| ``<triple>``. That is especially useful when an argument must differ for each |
| triple. For instance ``-Xopenmp-target=nvptx64 --offload-arch=sm_80 |
| -Xopenmp-target=amdgcn --offload-arch=gfx90a`` to specify the device |
| architecture. Alternatively, :ref:`Xarch_host` and :ref:`Xarch_device` can |
| pass an argument to the host and device compilation toolchain. |
| |
| ``-Xoffload-linker<triple> <arg>`` |
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
| Pass an argument ``<arg>`` to the offloading linker for the target specified in |
| ``<triple>``. |
| |
| .. _Xarch_device: |
| |
| ``-Xarch_device <arg>`` |
| ^^^^^^^^^^^^^^^^^^^^^^^ |
| Pass an argument ``<arg>`` to the device compilation toolchain. |
| |
| .. _Xarch_host: |
| |
| ``-Xarch_host <arg>`` |
| ^^^^^^^^^^^^^^^^^^^^^ |
| Pass an argument ``<arg>`` to the host compilation toolchain. |
| |
| ``-foffload-lto[=<arg>]`` |
| ^^^^^^^^^^^^^^^^^^^^^^^^^ |
| Enable device link time optimization (LTO) and select the LTO mode ``<arg>``. |
| Select either ``-foffload-lto=thin`` or ``-foffload-lto=full``. Thin LTO takes |
| less time while still achieving some performance gains. If no argument is set, |
| this option defaults to ``-foffload-lto=full``. |
| |
| ``-fopenmp-offload-mandatory`` |
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
| | This option is set to avoid generating the host fallback code |
| executed when offloading to the device fails. That is |
| helpful when the target contains code that cannot be compiled for the host, for |
| instance, if it contains unguarded device intrinsics. |
| | This option can also be used to reduce compile time. |
| | This option should not be used when one wants to verify that the code is being |
| offloaded to the device. Instead, set the environment variable |
| ``OMP_TARGET_OFFLOAD='MANDATORY'`` to confirm that the code is being offloaded to |
| the device. |
| |
| ``-fopenmp-target-debug[=<arg>]`` |
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
| Enable debugging in the device runtime library (RTL). Note that it is both |
| necessary to configure the debugging in the device runtime at compile-time with |
| ``-fopenmp-target-debug=<arg>`` and enable debugging at runtime with the |
| environment variable ``LIBOMPTARGET_DEVICE_RTL_DEBUG=<arg>``. Further, it is |
| currently only supported for Nvidia targets as of July 2023. Alternatively, the |
| environment variable ``LIBOMPTARGET_DEBUG`` can be set to debug both Nvidia and |
| AMD GPU targets. For more information, see the |
| `debugging instructions <https://openmp.llvm.org/design/Runtimes.html#debugging>`_. |
| The debugging instructions list the supported debugging arguments. |
| |
| ``-fopenmp-target-jit`` |
| ^^^^^^^^^^^^^^^^^^^^^^^ |
| | Emit code that is Just-in-Time (JIT) compiled for OpenMP offloading. Embed |
| LLVM-IR for the device code in the object files rather than binary code for the |
| respective target. At runtime, the LLVM-IR is optimized again and compiled for |
| the target device. The optimization level can be set at runtime with |
| ``LIBOMPTARGET_JIT_OPT_LEVEL``, for instance, |
| ``LIBOMPTARGET_JIT_OPT_LEVEL=3`` corresponding to optimizations level ``-O3``. |
| See the |
| `OpenMP JIT details <https://openmp.llvm.org/design/Runtimes.html#libomptarget-jit-pre-opt-ir-module>`_ |
| for instructions on extracting the embedded device code before or after the |
| JIT and more. |
| | We want to emphasize that JIT for OpenMP offloading is good for debugging as |
| the target IR can be extracted, modified, and injected at runtime. |
| |
| ``--offload-new-driver`` |
| ^^^^^^^^^^^^^^^^^^^^^^^^ |
| In upstream LLVM, OpenMP only uses the new driver. However, enabling this |
| option for experimental linking with CUDA or HIP files is necessary. |
| |
| ``--offload-link`` |
| ^^^^^^^^^^^^^^^^^^ |
| Use the new offloading linker `clang-linker-wrapper` to perform the link job. |
| `clang-linker-wrapper` is the default offloading linker for OpenMP. This option |
| can be used to use the new offloading linker in toolchains that do not automatically |
| use it. It is necessary to enable this option when linking with CUDA or HIP files. |
| |
| ``-nogpulib`` |
| ^^^^^^^^^^^^^ |
| Do not link the device library for CUDA or HIP device compilation. |
| |
| ``-nogpuinc`` |
| ^^^^^^^^^^^^^ |
| Do not include the default CUDA or HIP headers, and do not add CUDA or HIP |
| include paths. |