| .. raw:: html |
| |
| <style type="text/css"> |
| .none { background-color: #FFCCCC } |
| .part { background-color: #FFFF99 } |
| .good { background-color: #CCFF99 } |
| </style> |
| |
| .. role:: none |
| .. role:: part |
| .. role:: good |
| |
| .. contents:: |
| :local: |
| |
| ================== |
| OpenCL Support |
| ================== |
| |
| Clang has complete support of OpenCL C versions from 1.0 to 2.0. |
| |
| Clang also supports :ref:`the C++ for OpenCL kernel language <cxx_for_opencl_impl>`. |
| |
| There is an ongoing work to support :ref:`OpenCL 3.0 <opencl_300>`. |
| |
| There are also other :ref:`new and experimental features <opencl_experimenal>` available. |
| |
| For general issues and bugs with OpenCL in clang refer to `Bugzilla |
| <https://bugs.llvm.org/buglist.cgi?component=OpenCL&list_id=172679&product=clang&resolution=--->`__. |
| |
| Internals Manual |
| ================ |
| |
| This section acts as internal documentation for OpenCL features design |
| as well as some important implementation aspects. It is primarily targeted |
| at the advanced users and the toolchain developers integrating frontend |
| functionality as a component. |
| |
| OpenCL Metadata |
| --------------- |
| |
| Clang uses metadata to provide additional OpenCL semantics in IR needed for |
| backends and OpenCL runtime. |
| |
| Each kernel will have function metadata attached to it, specifying the arguments. |
| Kernel argument metadata is used to provide source level information for querying |
| at runtime, for example using the `clGetKernelArgInfo |
| <https://www.khronos.org/registry/OpenCL/specs/opencl-1.2.pdf#167>`_ |
| call. |
| |
| Note that ``-cl-kernel-arg-info`` enables more information about the original |
| kernel code to be added e.g. kernel parameter names will appear in the OpenCL |
| metadata along with other information. |
| |
| The IDs used to encode the OpenCL's logical address spaces in the argument info |
| metadata follows the SPIR address space mapping as defined in the SPIR |
| specification `section 2.2 |
| <https://www.khronos.org/registry/spir/specs/spir_spec-2.0.pdf#18>`_ |
| |
| OpenCL Specific Options |
| ----------------------- |
| |
| In addition to the options described in :doc:`UsersManual` there are the |
| following options specific to the OpenCL frontend. |
| |
| All the options in this section are frontend-only and therefore if used |
| with regular clang driver they require frontend forwarding, e.g. ``-cc1`` |
| or ``-Xclang``. |
| |
| .. _opencl_cl_ext: |
| |
| .. option:: -cl-ext |
| |
| Disables support of OpenCL extensions. All OpenCL targets provide a list |
| of extensions that they support. Clang allows to amend this using the ``-cl-ext`` |
| flag with a comma-separated list of extensions prefixed with ``'+'`` or ``'-'``. |
| The syntax: ``-cl-ext=<(['-'|'+']<extension>[,])+>``, where extensions |
| can be either one of `the OpenCL published extensions |
| <https://www.khronos.org/registry/OpenCL>`_ |
| or any vendor extension. Alternatively, ``'all'`` can be used to enable |
| or disable all known extensions. |
| |
| Example disabling double support for the 64-bit SPIR target: |
| |
| .. code-block:: console |
| |
| $ clang -cc1 -triple spir64-unknown-unknown -cl-ext=-cl_khr_fp64 test.cl |
| |
| Enabling all extensions except double support in R600 AMD GPU can be done using: |
| |
| .. code-block:: console |
| |
| $ clang -cc1 -triple r600-unknown-unknown -cl-ext=-all,+cl_khr_fp16 test.cl |
| |
| .. _opencl_finclude_default_header: |
| |
| .. option:: -finclude-default-header |
| |
| Adds most of builtin types and function declarations during compilations. By |
| default the OpenCL headers are not loaded by the frontend and therefore certain |
| builtin types and most of builtin functions are not declared. To load them |
| automatically this flag can be passed to the frontend (see also :ref:`the |
| section on the OpenCL Header <opencl_header>`): |
| |
| .. code-block:: console |
| |
| $ clang -Xclang -finclude-default-header test.cl |
| |
| Alternatively the internal header `opencl-c.h` containing the declarations |
| can be included manually using ``-include`` or ``-I`` followed by the path |
| to the header location. The header can be found in the clang source tree or |
| installation directory. |
| |
| .. code-block:: console |
| |
| $ clang -I<path to clang sources>/lib/Headers/opencl-c.h test.cl |
| $ clang -I<path to clang installation>/lib/clang/<llvm version>/include/opencl-c.h/opencl-c.h test.cl |
| |
| In this example it is assumed that the kernel code contains |
| ``#include <opencl-c.h>`` just as a regular C include. |
| |
| Because the header is very large and long to parse, PCH (:doc:`PCHInternals`) |
| and modules (:doc:`Modules`) can be used internally to improve the compilation |
| speed. |
| |
| To enable modules for OpenCL: |
| |
| .. code-block:: console |
| |
| $ clang -target spir-unknown-unknown -c -emit-llvm -Xclang -finclude-default-header -fmodules -fimplicit-module-maps -fm odules-cache-path=<path to the generated module> test.cl |
| |
| Another way to circumvent long parsing latency for the OpenCL builtin |
| declarations is to use mechanism enabled by :ref:`-fdeclare-opencl-builtins |
| <opencl_fdeclare_opencl_builtins>` flag that is available as an alternative |
| feature. |
| |
| .. _opencl_fdeclare_opencl_builtins: |
| |
| .. option:: -fdeclare-opencl-builtins |
| |
| In addition to regular header includes with builtin types and functions using |
| :ref:`-finclude-default-header <opencl_finclude_default_header>`, clang |
| supports a fast mechanism to declare builtin functions with |
| ``-fdeclare-opencl-builtins``. This does not declare the builtin types and |
| therefore it has to be used in combination with ``-finclude-default-header`` |
| if full functionality is required. |
| |
| **Example of Use**: |
| |
| .. code-block:: console |
| |
| $ clang -Xclang -fdeclare-opencl-builtins test.cl |
| |
| .. _opencl_fake_address_space_map: |
| |
| .. option:: -ffake-address-space-map |
| |
| Overrides the target address space map with a fake map. |
| This allows adding explicit address space IDs to the bitcode for non-segmented |
| memory architectures that do not have separate IDs for each of the OpenCL |
| logical address spaces by default. Passing ``-ffake-address-space-map`` will |
| add/override address spaces of the target compiled for with the following values: |
| ``1-global``, ``2-constant``, ``3-local``, ``4-generic``. The private address |
| space is represented by the absence of an address space attribute in the IR (see |
| also :ref:`the section on the address space attribute <opencl_addrsp>`). |
| |
| .. code-block:: console |
| |
| $ clang -cc1 -ffake-address-space-map test.cl |
| |
| .. _opencl_builtins: |
| |
| OpenCL builtins |
| --------------- |
| |
| **Clang builtins** |
| |
| There are some standard OpenCL functions that are implemented as Clang builtins: |
| |
| - All pipe functions from `section 6.13.16.2/6.13.16.3 |
| <https://www.khronos.org/registry/cl/specs/opencl-2.0-openclc.pdf#160>`_ of |
| the OpenCL v2.0 kernel language specification. |
| |
| - Address space qualifier conversion functions ``to_global``/``to_local``/``to_private`` |
| from `section 6.13.9 |
| <https://www.khronos.org/registry/cl/specs/opencl-2.0-openclc.pdf#101>`_. |
| |
| - All the ``enqueue_kernel`` functions from `section 6.13.17.1 |
| <https://www.khronos.org/registry/cl/specs/opencl-2.0-openclc.pdf#164>`_ and |
| enqueue query functions from `section 6.13.17.5 |
| <https://www.khronos.org/registry/cl/specs/opencl-2.0-openclc.pdf#171>`_. |
| |
| **Fast builtin function declarations** |
| |
| The implementation of the fast builtin function declarations (available via the |
| :ref:`-fdeclare-opencl-builtins option <opencl_fdeclare_opencl_builtins>`) consists |
| of the following main components: |
| |
| - A TableGen definitions file ``OpenCLBuiltins.td``. This contains a compact |
| representation of the supported builtin functions. When adding new builtin |
| function declarations, this is normally the only file that needs modifying. |
| |
| - A Clang TableGen emitter defined in ``ClangOpenCLBuiltinEmitter.cpp``. During |
| Clang build time, the emitter reads the TableGen definition file and |
| generates ``OpenCLBuiltins.inc``. This generated file contains various tables |
| and functions that capture the builtin function data from the TableGen |
| definitions in a compact manner. |
| |
| - OpenCL specific code in ``SemaLookup.cpp``. When ``Sema::LookupBuiltin`` |
| encounters a potential builtin function, it will check if the name corresponds |
| to a valid OpenCL builtin function. If so, all overloads of the function are |
| inserted using ``InsertOCLBuiltinDeclarationsFromTable`` and overload |
| resolution takes place. |
| |
| OpenCL Extensions and Features |
| ------------------------------ |
| |
| Clang implements various extensions to OpenCL kernel languages. |
| |
| New functionality is accepted as soon as the documentation is detailed to the |
| level sufficient to be implemented. There should be an evidence that the |
| extension is designed with implementation feasibility in consideration and |
| assessment of complexity for C/C++ based compilers. Alternatively, the |
| documentation can be accepted in a format of a draft that can be further |
| refined during the implementation. |
| |
| Implementation guidelines |
| ^^^^^^^^^^^^^^^^^^^^^^^^^ |
| |
| This section explains how to extend clang with the new functionality. |
| |
| **Parsing functionality** |
| |
| If an extension modifies the standard parsing it needs to be added to |
| the clang frontend source code. This also means that the associated macro |
| indicating the presence of the extension should be added to clang. |
| |
| The default flow for adding a new extension into the frontend is to |
| modify `OpenCLExtensions.def |
| <https://github.com/llvm/llvm-project/blob/main/clang/include/clang/Basic/OpenCLExtensions.def>`_ |
| |
| This will add the macro automatically and also add a field in the target |
| options ``clang::TargetOptions::OpenCLFeaturesMap`` to control the exposure |
| of the new extension during the compilation. |
| |
| Note that by default targets like `SPIR` or `X86` expose all the OpenCL |
| extensions. For all other targets the configuration has to be made explicitly. |
| |
| Note that the target extension support performed by clang can be overridden |
| with :ref:`-cl-ext <opencl_cl_ext>` command-line flags. |
| |
| **Library functionality** |
| |
| If an extension adds functionality that does not modify standard language |
| parsing it should not require modifying anything other than header files and |
| ``OpenCLBuiltins.td`` detailed in :ref:`OpenCL builtins <opencl_builtins>`. |
| Most commonly such extensions add functionality via libraries (by adding |
| non-native types or functions) parsed regularly. Similar to other languages this |
| is the most common way to add new functionality. |
| |
| Clang has standard headers where new types and functions are being added, |
| for more details refer to |
| :ref:`the section on the OpenCL Header <opencl_header>`. The macros indicating |
| the presence of such extensions can be added in the standard header files |
| conditioned on target specific predefined macros or/and language version |
| predefined macros. |
| |
| **Pragmas** |
| |
| Some extensions alter standard parsing dynamically via pragmas. |
| |
| Clang provides a mechanism to add the standard extension pragma |
| ``OPENCL EXTENSION`` by setting a dedicated flag in the extension list entry of |
| ``OpenCLExtensions.def``. Note that there is no default behavior for the |
| standard extension pragmas as it is not specified (for the standards up to and |
| including version 3.0) in a sufficient level of detail and, therefore, |
| there is no default functionality provided by clang. |
| |
| Pragmas without detailed information of their behavior (e.g. an explanation of |
| changes it triggers in the parsing) should not be added to clang. Moreover, the |
| pragmas should provide useful functionality to the user. For example, such |
| functionality should address a practical use case and not be redundant i.e. |
| cannot be achieved using existing features. |
| |
| Note that some legacy extensions (published prior to OpenCL 3.0) still |
| provide some non-conformant functionality for pragmas e.g. add diagnostics on |
| the use of types or functions. This functionality is not guaranteed to remain in |
| future releases. However, any future changes should not affect backward |
| compatibility. |
| |
| .. _opencl_addrsp: |
| |
| Address spaces attribute |
| ------------------------ |
| |
| Clang has arbitrary address space support using the ``address_space(N)`` |
| attribute, where ``N`` is an integer number in the range specified in the |
| Clang source code. This addresses spaces can be used along with the OpenCL |
| address spaces however when such addresses spaces converted to/from OpenCL |
| address spaces the behavior is not governed by OpenCL specification. |
| |
| An OpenCL implementation provides a list of standard address spaces using |
| keywords: ``private``, ``local``, ``global``, and ``generic``. In the AST and |
| in the IR each of the address spaces will be represented by unique number |
| provided in the Clang source code. The specific IDs for an address space do not |
| have to match between the AST and the IR. Typically in the AST address space |
| numbers represent logical segments while in the IR they represent physical |
| segments. |
| Therefore, machines with flat memory segments can map all AST address space |
| numbers to the same physical segment ID or skip address space attribute |
| completely while generating the IR. However, if the address space information |
| is needed by the IR passes e.g. to improve alias analysis, it is recommended |
| to keep it and only lower to reflect physical memory segments in the late |
| machine passes. The mapping between logical and target address spaces is |
| specified in the Clang's source code. |
| |
| .. _cxx_for_opencl_impl: |
| |
| C++ for OpenCL Implementation Status |
| ==================================== |
| |
| Clang implements language version 1.0 published in `the official |
| release of C++ for OpenCL Documentation |
| <https://github.com/KhronosGroup/OpenCL-Docs/releases/tag/cxxforopencl-v1.0-r2>`_. |
| |
| Limited support of experimental C++ libraries is described in the :ref:`experimental features <opencl_experimenal>`. |
| |
| Bugzilla bugs for this functionality are typically prefixed |
| with '[C++4OpenCL]' - click `here |
| <https://bugs.llvm.org/buglist.cgi?component=OpenCL&list_id=204139&product=clang&query_format=advanced&resolution=---&short_desc=%5BC%2B%2B4OpenCL%5D&short_desc_type=allwordssubstr>`__ |
| to view the full bug list. |
| |
| |
| Missing features or with limited support |
| ---------------------------------------- |
| |
| - IR generation for global destructors is incomplete (See: |
| `PR48047 <https://llvm.org/PR48047>`_). |
| |
| .. _opencl_300: |
| |
| OpenCL C 3.0 Usage |
| ================== |
| |
| OpenCL C 3.0 language standard makes most OpenCL C 2.0 features optional. Optional |
| functionality in OpenCL C 3.0 is indicated with the presence of feature-test macros |
| (list of feature-test macros is `here <https://www.khronos.org/registry/OpenCL/specs/3.0-unified/html/OpenCL_C.html#features>`__). |
| Command-line flag :ref:`-cl-ext <opencl_cl_ext>` can be used to override features supported by a target. |
| |
| For cases when there is an associated extension for a specific feature (fp64 and 3d image writes) |
| user should specify both (extension and feature) in command-line flag: |
| |
| .. code-block:: console |
| |
| $ clang -cc1 -cl-std=CL3.0 -cl-ext=+cl_khr_fp64,+__opencl_c_fp64 ... |
| $ clang -cc1 -cl-std=CL3.0 -cl-ext=-cl_khr_fp64,-__opencl_c_fp64 ... |
| |
| |
| OpenCL C 3.0 Implementation Status |
| ---------------------------------- |
| |
| The following table provides an overview of features in OpenCL C 3.0 and their |
| implementation status. |
| |
| +------------------------------+-------------------------+-----------------------------------------+----------------------+----------------------------------------------------------------------------------------------+ |
| | Category | Feature | Status | Reviews | |
| +==============================+=========================+=========================================+======================+==============================================================================================+ |
| | Command line interface | New value for ``-cl-std`` flag | :good:`done` | https://reviews.llvm.org/D88300 | |
| +------------------------------+-------------------------+-----------------------------------------+----------------------+----------------------------------------------------------------------------------------------+ |
| | Predefined macros | New version macro | :good:`done` | https://reviews.llvm.org/D88300 | |
| +------------------------------+-------------------------+-----------------------------------------+----------------------+----------------------------------------------------------------------------------------------+ |
| | Predefined macros | Feature macros | :good:`done` | https://reviews.llvm.org/D95776 | |
| +------------------------------+-------------------------+-----------------------------------------+----------------------+----------------------------------------------------------------------------------------------+ |
| | Feature optionality | Generic address space | :good:`done` | https://reviews.llvm.org/D95778 and https://reviews.llvm.org/D103401 | |
| +------------------------------+-------------------------+-----------------------------------------+----------------------+----------------------------------------------------------------------------------------------+ |
| | Feature optionality | Builtin function overloads with generic address space | :good:`done` | https://reviews.llvm.org/D105526 | |
| +------------------------------+-------------------------+-----------------------------------------+----------------------+----------------------------------------------------------------------------------------------+ |
| | Feature optionality | Program scope variables in global memory | :good:`done` | https://reviews.llvm.org/D103191 | |
| +------------------------------+-------------------------+-----------------------------------------+----------------------+----------------------------------------------------------------------------------------------+ |
| | Feature optionality | 3D image writes including builtin functions | :part:`worked on` | https://reviews.llvm.org/D106260 (frontend) | |
| +------------------------------+-------------------------+-----------------------------------------+----------------------+----------------------------------------------------------------------------------------------+ |
| | Feature optionality | read_write images including builtin functions | :part:`worked on` | https://reviews.llvm.org/D104915 (frontend) and https://reviews.llvm.org/D107539 (functions) | |
| +------------------------------+-------------------------+-----------------------------------------+----------------------+----------------------------------------------------------------------------------------------+ |
| | Feature optionality | C11 atomics memory scopes, ordering and builtin function | :good:`done` | https://reviews.llvm.org/D106111 | |
| +------------------------------+-------------------------+-----------------------------------------+----------------------+----------------------------------------------------------------------------------------------+ |
| | Feature optionality | Blocks and Device-side kernel enqueue including builtin functions | :none:`unclaimed` | | |
| +------------------------------+-------------------------+-----------------------------------------+----------------------+----------------------------------------------------------------------------------------------+ |
| | Feature optionality | Pipes including builtin functions | :good:`done` | https://reviews.llvm.org/D107154 (frontend) and https://reviews.llvm.org/D105858 (functions) | |
| +------------------------------+-------------------------+-----------------------------------------+----------------------+----------------------------------------------------------------------------------------------+ |
| | Feature optionality | Work group collective builtin functions | :good:`done` | https://reviews.llvm.org/D105858 | |
| +------------------------------+-------------------------+-----------------------------------------+----------------------+----------------------------------------------------------------------------------------------+ |
| | Feature optionality | Image types and builtin functions | :good:`done` | https://reviews.llvm.org/D103911 (frontend) and https://reviews.llvm.org/D107539 (functions) | |
| +------------------------------+-------------------------+-----------------------------------------+----------------------+----------------------------------------------------------------------------------------------+ |
| | Feature optionality | Double precision floating point type | :good:`done` | https://reviews.llvm.org/D96524 | |
| +------------------------------+-------------------------+-----------------------------------------+----------------------+----------------------------------------------------------------------------------------------+ |
| | New functionality | RGBA vector components | :good:`done` | https://reviews.llvm.org/D99969 | |
| +------------------------------+-------------------------+-----------------------------------------+----------------------+----------------------------------------------------------------------------------------------+ |
| | New functionality | Subgroup functions | :part:`worked on` | https://reviews.llvm.org/D105858 | |
| +------------------------------+-------------------------+-----------------------------------------+----------------------+----------------------------------------------------------------------------------------------+ |
| | New functionality | Atomic mem scopes: subgroup, all devices including functions | :part:`worked on` | https://reviews.llvm.org/D103241 | |
| +------------------------------+-------------------------+-----------------------------------------+----------------------+----------------------------------------------------------------------------------------------+ |
| |
| .. _opencl_experimenal: |
| |
| Experimental features |
| ===================== |
| |
| Clang provides the following new WIP features for the developers to experiment |
| and provide early feedback or contribute with further improvements. |
| Feel free to contact us on `cfe-dev |
| <https://lists.llvm.org/mailman/listinfo/cfe-dev>`_ or via `Bugzilla |
| <https://bugs.llvm.org/>`__. |
| |
| .. _opencl_experimental_cxxlibs: |
| |
| C++ libraries for OpenCL |
| ------------------------ |
| |
| There is ongoing work to support C++ standard libraries from `LLVM's libcxx |
| <https://libcxx.llvm.org/>`_ in OpenCL kernel code using C++ for OpenCL mode. |
| |
| It is currently possible to include `type_traits` from C++17 in the kernel |
| sources when the following clang extensions are enabled |
| ``__cl_clang_function_pointers`` and ``__cl_clang_variadic_functions``, |
| see :doc:`LanguageExtensions` for more details. The use of non-conformant |
| features enabled by the extensions does not expose non-conformant behavior |
| beyond the compilation i.e. does not get generated in IR or binary. |
| The extension only appear in metaprogramming |
| mechanism to identify or verify the properties of types. This allows to provide |
| the full C++ functionality without a loss of portability. To avoid unsafe use |
| of the extensions it is recommended that the extensions are disabled directly |
| after the header include. |
| |
| **Example of Use**: |
| |
| The example of kernel code with `type_traits` is illustrated here. |
| |
| .. code-block:: c++ |
| |
| #pragma OPENCL EXTENSION __cl_clang_function_pointers : enable |
| #pragma OPENCL EXTENSION __cl_clang_variadic_functions : enable |
| #include <type_traits> |
| #pragma OPENCL EXTENSION __cl_clang_function_pointers : disable |
| #pragma OPENCL EXTENSION __cl_clang_variadic_functions : disable |
| |
| using sint_type = std::make_signed<unsigned int>::type; |
| |
| __kernel void foo() { |
| static_assert(!std::is_same<sint_type, unsigned int>::value); |
| } |
| |
| The possible clang invocation to compile the example is as follows: |
| |
| .. code-block:: console |
| |
| $ clang -I<path to libcxx checkout or installation>/include test.clcpp |
| |
| Note that `type_traits` is a header only library and therefore no extra |
| linking step against the standard libraries is required. See full example |
| in `Compiler Explorer <https://godbolt.org/z/5WbnTfb65>`_. |
| |
| More OpenCL specific C++ library implementations built on top of libcxx |
| are available in `libclcxx <https://github.com/KhronosGroup/libclcxx>`_ |
| project. |