| ===================== |
| Clang Offload Bundler |
| ===================== |
| |
| .. contents:: |
| :local: |
| |
| .. _clang-offload-bundler: |
| |
| Introduction |
| ============ |
| |
| For heterogeneous single source programming languages, use one or more |
| ``--offload-arch=<target-id>`` Clang options to specify the target IDs of the |
| code to generate for the offload code regions. |
| |
| The tool chain may perform multiple compilations of a translation unit to |
| produce separate code objects for the host and potentially multiple offloaded |
| devices. The ``clang-offload-bundler`` tool may be used as part of the tool |
| chain to combine these multiple code objects into a single bundled code object. |
| |
| The tool chain may use a bundled code object as an intermediate step so that |
| each tool chain step consumes and produces a single file as in traditional |
| non-heterogeneous tool chains. The bundled code object contains the code objects |
| for the host and all the offload devices. |
| |
| A bundled code object may also be used to bundle just the offloaded code |
| objects, and embedded as data into the host code object. The host compilation |
| includes an ``init`` function that will use the runtime corresponding to the |
| offload kind (see :ref:`clang-offload-kind-table`) to load the offload code |
| objects appropriate to the devices present when the host program is executed. |
| |
| Supported File Formats |
| ====================== |
| Several text and binary file formats are supported for bundling/unbundling. See |
| :ref:`supported-file-formats-table` for a list of currently supported formats. |
| |
| .. table:: Supported File Formats |
| :name: supported-file-formats-table |
| |
| +--------------------+----------------+-------------+ |
| | File Format | File Extension | Text/Binary | |
| +====================+================+=============+ |
| | CPP output | i | Text | |
| +--------------------+----------------+-------------+ |
| | C++ CPP output | ii | Text | |
| +--------------------+----------------+-------------+ |
| | CUDA/HIP output | cui | Text | |
| +--------------------+----------------+-------------+ |
| | Dependency | d | Text | |
| +--------------------+----------------+-------------+ |
| | LLVM | ll | Text | |
| +--------------------+----------------+-------------+ |
| | LLVM Bitcode | bc | Binary | |
| +--------------------+----------------+-------------+ |
| | Assembler | s | Text | |
| +--------------------+----------------+-------------+ |
| | Object | o | Binary | |
| +--------------------+----------------+-------------+ |
| | Archive of objects | a | Binary | |
| +--------------------+----------------+-------------+ |
| | Precompiled header | gch | Binary | |
| +--------------------+----------------+-------------+ |
| | Clang AST file | ast | Binary | |
| +--------------------+----------------+-------------+ |
| |
| .. _clang-bundled-code-object-layout-text: |
| |
| Bundled Text File Layout |
| ======================== |
| |
| The format of the bundled files is currently very simple: text formats are |
| concatenated with comments that have a magic string and bundle entry ID in |
| between. |
| |
| :: |
| |
| "Comment OFFLOAD_BUNDLER_MAGIC_STR__START__ 1st Bundle Entry ID" |
| Bundle 1 |
| "Comment OFFLOAD_BUNDLER_MAGIC_STR__END__ 1st Bundle Entry ID" |
| ... |
| "Comment OFFLOAD_BUNDLER_MAGIC_STR__START__ Nth Bundle Entry ID" |
| Bundle N |
| "Comment OFFLOAD_BUNDLER_MAGIC_STR__END__ 1st Bundle Entry ID" |
| |
| .. _clang-bundled-code-object-layout: |
| |
| Bundled Binary File Layout |
| ========================== |
| |
| The layout of a bundled code object is defined by the following table: |
| |
| .. table:: Bundled Code Object Layout |
| :name: bundled-code-object-layout-table |
| |
| =================================== ======= ================ =============================== |
| Field Type Size in Bytes Description |
| =================================== ======= ================ =============================== |
| Magic String string 24 ``__CLANG_OFFLOAD_BUNDLE__`` |
| Number Of Bundle Entries integer 8 Number of bundle entries. |
| 1st Bundle Entry Code Object Offset integer 8 Byte offset from beginning of |
| bundled code object to 1st code |
| object. |
| 1st Bundle Entry Code Object Size integer 8 Byte size of 1st code object. |
| 1st Bundle Entry ID Length integer 8 Character length of bundle |
| entry ID of 1st code object. |
| 1st Bundle Entry ID string 1st Bundle Entry Bundle entry ID of 1st code |
| ID Length object. This is not NUL |
| terminated. See |
| :ref:`clang-bundle-entry-id`. |
| \... |
| Nth Bundle Entry Code Object Offset integer 8 |
| Nth Bundle Entry Code Object Size integer 8 |
| Nth Bundle Entry ID Length integer 8 |
| Nth Bundle Entry ID string 1st Bundle Entry |
| ID Length |
| 1st Bundle Entry Code Object bytes 1st Bundle Entry |
| Code Object Size |
| \... |
| Nth Bundle Entry Code Object bytes Nth Bundle Entry |
| Code Object Size |
| =================================== ======= ================ =============================== |
| |
| .. _clang-bundle-entry-id: |
| |
| Bundle Entry ID |
| =============== |
| |
| Each entry in a bundled code object (see |
| :ref:`clang-bundled-code-object-layout`) has a bundle entry ID that indicates |
| the kind of the entry's code object and the runtime that manages it. |
| |
| Bundle entry ID syntax is defined by the following BNF syntax: |
| |
| .. code:: |
| |
| <bundle-entry-id> ::== <offload-kind> "-" <target-triple> [ "-" <target-id> ] |
| |
| Where: |
| |
| **offload-kind** |
| The runtime responsible for managing the bundled entry code object. See |
| :ref:`clang-offload-kind-table`. |
| |
| .. table:: Bundled Code Object Offload Kind |
| :name: clang-offload-kind-table |
| |
| ============= ============================================================== |
| Offload Kind Description |
| ============= ============================================================== |
| host Host code object. ``clang-offload-bundler`` always includes |
| this entry as the first bundled code object entry. For an |
| embedded bundled code object this entry is not used by the |
| runtime and so is generally an empty code object. |
| |
| hip Offload code object for the HIP language. Used for all |
| HIP language offload code objects when the |
| ``clang-offload-bundler`` is used to bundle code objects as |
| intermediate steps of the tool chain. Also used for AMD GPU |
| code objects before ABI version V4 when the |
| ``clang-offload-bundler`` is used to create a *fat binary* |
| to be loaded by the HIP runtime. The fat binary can be |
| loaded directly from a file, or be embedded in the host code |
| object as a data section with the name ``.hip_fatbin``. |
| |
| hipv4 Offload code object for the HIP language. Used for AMD GPU |
| code objects with at least ABI version V4 when the |
| ``clang-offload-bundler`` is used to create a *fat binary* |
| to be loaded by the HIP runtime. The fat binary can be |
| loaded directly from a file, or be embedded in the host code |
| object as a data section with the name ``.hip_fatbin``. |
| |
| openmp Offload code object for the OpenMP language extension. |
| ============= ============================================================== |
| |
| **target-triple** |
| The target triple of the code object. |
| |
| **target-id** |
| The canonical target ID of the code object. Present only if the target |
| supports a target ID. See :ref:`clang-target-id`. |
| |
| Each entry of a bundled code object must have a different bundle entry ID. There |
| can be multiple entries for the same processor provided they differ in target |
| feature settings. If there is an entry with a target feature specified as *Any*, |
| then all entries must specify that target feature as *Any* for the same |
| processor. There may be additional target specific restrictions. |
| |
| .. _clang-target-id: |
| |
| Target ID |
| ========= |
| |
| A target ID is used to indicate the processor and optionally its configuration, |
| expressed by a set of target features, that affect ISA generation. It is target |
| specific if a target ID is supported, or if the target triple alone is |
| sufficient to specify the ISA generation. |
| |
| It is used with the ``-mcpu=<target-id>`` and ``--offload-arch=<target-id>`` |
| Clang compilation options to specify the kind of code to generate. |
| |
| It is also used as part of the bundle entry ID to identify the code object. See |
| :ref:`clang-bundle-entry-id`. |
| |
| Target ID syntax is defined by the following BNF syntax: |
| |
| .. code:: |
| |
| <target-id> ::== <processor> ( ":" <target-feature> ( "+" | "-" ) )* |
| |
| Where: |
| |
| **processor** |
| Is a the target specific processor or any alternative processor name. |
| |
| **target-feature** |
| Is a target feature name that is supported by the processor. Each target |
| feature must appear at most once in a target ID and can have one of three |
| values: |
| |
| *Any* |
| Specified by omitting the target feature from the target ID. |
| A code object compiled with a target ID specifying the default |
| value of a target feature can be loaded and executed on a processor |
| configured with the target feature on or off. |
| |
| *On* |
| Specified by ``+``, indicating the target feature is enabled. A code |
| object compiled with a target ID specifying a target feature on |
| can only be loaded on a processor configured with the target feature on. |
| |
| *Off* |
| specified by ``-``, indicating the target feature is disabled. A code |
| object compiled with a target ID specifying a target feature off |
| can only be loaded on a processor configured with the target feature off. |
| |
| There are two forms of target ID: |
| |
| *Non-Canonical Form* |
| The non-canonical form is used as the input to user commands to allow the user |
| greater convenience. It allows both the primary and alternative processor name |
| to be used and the target features may be specified in any order. |
| |
| *Canonical Form* |
| The canonical form is used for all generated output to allow greater |
| convenience for tools that consume the information. It is also used for |
| internal passing of information between tools. Only the primary and not |
| alternative processor name is used and the target features are specified in |
| alphabetic order. Command line tools convert non-canonical form to canonical |
| form. |
| |
| Target Specific information |
| =========================== |
| |
| Target specific information is available for the following: |
| |
| *AMD GPU* |
| AMD GPU supports target ID and target features. See `User Guide for AMDGPU Backend |
| <https://llvm.org/docs/AMDGPUUsage.html>`_ which defines the `processors |
| <https://llvm.org/docs/AMDGPUUsage.html#amdgpu-processors>`_ and `target |
| features <https://llvm.org/docs/AMDGPUUsage.html#amdgpu-target-features>`_ |
| supported. |
| |
| Most other targets do not support target IDs. |
| |
| Archive Unbundling |
| ================== |
| |
| Unbundling of heterogeneous device archive is done to create device specific |
| archives. Heterogeneous Device Archive is in a format compatible with GNU ar |
| utility and contains a collection of bundled device binaries where each bundle |
| file will contain device binaries for a host and one or more targets. The |
| output device specific archive is in a format compatible with GNU ar utility |
| and contains a collection of device binaries for a specific target. |
| |
| .. code:: |
| |
| Heterogeneous Device Archive, HDA = {F1.X, F2.X, ..., FN.Y} |
| where, Fi = Bundle{Host-DeviceBinary, T1-DeviceBinary, T2-DeviceBinary, ..., |
| Tm-DeviceBinary}, |
| Ti = {Target i, qualified using Bundle Entry ID}, |
| X/Y = \*.bc for AMDGPU and \*.cubin for NVPTX |
| |
| Device Specific Archive, DSA(Tk) = {F1-Tk-DeviceBinary.X, F2-Tk-DeviceBinary.X, ... |
| FN-Tk-DeviceBinary.Y} |
| where, Fi-Tj-DeviceBinary.X represents device binary of i-th bundled device |
| binary file for target Tj. |
| |
| clang-offload-bundler extracts compatible device binaries for a given target |
| from the bundled device binaries in a heterogeneous device archive and creates |
| a target specific device archive without bundling. |
| |
| clang-offlocad-bundler determines whether a device binary is compatible with a |
| target by comparing bundle ID's. Two bundle ID's are considered compatible if: |
| |
| * Their offload kind are the same |
| * Their target triple are the same |
| * Their GPUArch are the same |