|  | ====================== | 
|  | Clang Offload Packager | 
|  | ====================== | 
|  |  | 
|  | .. contents:: | 
|  | :local: | 
|  |  | 
|  | .. _clang-offload-packager: | 
|  |  | 
|  | Introduction | 
|  | ============ | 
|  |  | 
|  | This tool bundles device files into a single image containing necessary | 
|  | metadata. We use a custom binary format for bundling all the device images | 
|  | together. The image format is a small header wrapping around a string map. This | 
|  | tool creates bundled binaries so that they can be embedded into the host to | 
|  | create a fat-binary. | 
|  |  | 
|  | Binary Format | 
|  | ============= | 
|  |  | 
|  | The binary format is marked by the ``0x10FF10AD`` magic bytes, followed by a | 
|  | version. Each created binary contains its own magic bytes. This allows us to | 
|  | locate all the embedded offloading sections even after they may have been merged | 
|  | by the linker, such as when using relocatable linking. Conceptually, this binary | 
|  | format is a serialization of a string map and an image buffer. The binary header | 
|  | is described in the following :ref:`table<table-binary_header>`. | 
|  |  | 
|  | .. table:: Offloading Binary Header | 
|  | :name: table-binary_header | 
|  |  | 
|  | +----------+--------------+----------------------------------------------------+ | 
|  | |   Type   |  Identifier  | Description                                        | | 
|  | +==========+==============+====================================================+ | 
|  | | uint8_t  |    magic     | The magic bytes for the binary format (0x10FF10AD) | | 
|  | +----------+--------------+----------------------------------------------------+ | 
|  | | uint32_t |   version    | Version of this format (currently version 1)       | | 
|  | +----------+--------------+----------------------------------------------------+ | 
|  | | uint64_t |    size      | Size of this binary in bytes                       | | 
|  | +----------+--------------+----------------------------------------------------+ | 
|  | | uint64_t | entry offset | Absolute offset of the offload entries in bytes    | | 
|  | +----------+--------------+----------------------------------------------------+ | 
|  | | uint64_t |  entry size  | Size of the offload entries in bytes               | | 
|  | +----------+--------------+----------------------------------------------------+ | 
|  |  | 
|  | Once identified through the magic bytes, we use the size field to take a slice | 
|  | of the binary blob containing the information for a single offloading image. We | 
|  | can then use the offset field to find the actual offloading entries containing | 
|  | the image and metadata. The offload entry contains information about the device | 
|  | image. It contains the fields shown in the following | 
|  | :ref:`table<table-binary_entry>`. | 
|  |  | 
|  | .. table:: Offloading Entry Table | 
|  | :name: table-binary_entry | 
|  |  | 
|  | +----------+---------------+----------------------------------------------------+ | 
|  | |   Type   |   Identifier  | Description                                        | | 
|  | +==========+===============+====================================================+ | 
|  | | uint16_t |  image kind   | The kind of the device image (e.g. bc, cubin)      | | 
|  | +----------+---------------+----------------------------------------------------+ | 
|  | | uint16_t | offload kind  | The producer of the image (e.g. openmp, cuda)      | | 
|  | +----------+---------------+----------------------------------------------------+ | 
|  | | uint32_t |     flags     | Generic flags for the image                        | | 
|  | +----------+---------------+----------------------------------------------------+ | 
|  | | uint64_t | string offset | Absolute offset of the string metadata table       | | 
|  | +----------+---------------+----------------------------------------------------+ | 
|  | | uint64_t |  num strings  | Number of string entries in the table              | | 
|  | +----------+---------------+----------------------------------------------------+ | 
|  | | uint64_t |  image offset | Absolute offset of the device image in bytes       | | 
|  | +----------+---------------+----------------------------------------------------+ | 
|  | | uint64_t |   image size  | Size of the device image in bytes                  | | 
|  | +----------+---------------+----------------------------------------------------+ | 
|  |  | 
|  | This table contains the offsets of the string table and the device image itself | 
|  | along with some other integer information. The image kind lets us easily | 
|  | identify the type of image stored here without needing to inspect the binary. | 
|  | The offloading kind is used to determine which registration code or linking | 
|  | semantics are necessary for this image. These are stored as enumerations with | 
|  | the following values for the :ref:`offload kind<table-offload_kind>` and the | 
|  | :ref:`image kind<table-image_kind>`. | 
|  |  | 
|  | .. table:: Image Kind | 
|  | :name: table-image_kind | 
|  |  | 
|  | +---------------+-------+---------------------------------------+ | 
|  | |      Name     | Value | Description                           | | 
|  | +===============+=======+=======================================+ | 
|  | | IMG_None      | 0x00  | No image information provided         | | 
|  | +---------------+-------+---------------------------------------+ | 
|  | | IMG_Object    | 0x01  | The image is a generic object file    | | 
|  | +---------------+-------+---------------------------------------+ | 
|  | | IMG_Bitcode   | 0x02  | The image is an LLVM-IR bitcode file  | | 
|  | +---------------+-------+---------------------------------------+ | 
|  | | IMG_Cubin     | 0x03  | The image is a CUDA object file       | | 
|  | +---------------+-------+---------------------------------------+ | 
|  | | IMG_Fatbinary | 0x04  | The image is a CUDA fatbinary file    | | 
|  | +---------------+-------+---------------------------------------+ | 
|  | | IMG_PTX       | 0x05  | The image is a CUDA PTX file          | | 
|  | +---------------+-------+---------------------------------------+ | 
|  |  | 
|  | .. table:: Offload Kind | 
|  | :name: table-offload_kind | 
|  |  | 
|  | +------------+-------+---------------------------------------+ | 
|  | |      Name  | Value | Description                           | | 
|  | +============+=======+=======================================+ | 
|  | | OFK_None   | 0x00  | No offloading information provided    | | 
|  | +------------+-------+---------------------------------------+ | 
|  | | OFK_OpenMP | 0x01  | The producer was OpenMP offloading    | | 
|  | +------------+-------+---------------------------------------+ | 
|  | | OFK_CUDA   | 0x02  | The producer was CUDA                 | | 
|  | +------------+-------+---------------------------------------+ | 
|  | | OFK_HIP    | 0x03  | The producer was HIP                  | | 
|  | +------------+-------+---------------------------------------+ | 
|  | | OFK_SYCL   | 0x04  | The producer was SYCL                 | | 
|  | +------------+-------+---------------------------------------+ | 
|  |  | 
|  | The flags are used to signify certain conditions, such as the presence of | 
|  | debugging information or whether or not LTO was used. The string entry table is | 
|  | used to generically contain any arbitrary key-value pair. This is stored as an | 
|  | array of the :ref:`string entry<table-binary_string>` format. | 
|  |  | 
|  | .. table:: Offloading String Entry | 
|  | :name: table-binary_string | 
|  |  | 
|  | +----------+--------------+-------------------------------------------------------+ | 
|  | |   Type   |   Identifier | Description                                           | | 
|  | +==========+==============+=======================================================+ | 
|  | | uint64_t |  key offset  | Absolute byte offset of the key in the string table   | | 
|  | +----------+--------------+-------------------------------------------------------+ | 
|  | | uint64_t | value offset | Absolute byte offset of the value in the string table | | 
|  | +----------+--------------+-------------------------------------------------------+ | 
|  |  | 
|  | The string entries simply provide offsets to a key and value pair in the | 
|  | binary images string table. The string table is simply a collection of null | 
|  | terminated strings with defined offsets in the image. The string entry allows us | 
|  | to create a key-value pair from this string table. This is used for passing | 
|  | arbitrary arguments to the image, such as the triple and architecture. | 
|  |  | 
|  | All of these structures are combined to form a single binary blob, the order | 
|  | does not matter because of the use of absolute offsets. This makes it easier to | 
|  | extend in the future. As mentioned previously, multiple offloading images are | 
|  | bundled together by simply concatenating them in this format. Because we have | 
|  | the magic bytes and size of each image, we can extract them as-needed. | 
|  |  | 
|  | Usage | 
|  | ===== | 
|  |  | 
|  | This tool can be used with the following arguments. Generally information is | 
|  | passed as a key-value pair to the ``image=`` argument. The ``file`` and | 
|  | ``triple``, arguments are considered mandatory to make a valid image. | 
|  | The ``arch`` argument is suggested. | 
|  |  | 
|  | .. code-block:: console | 
|  |  | 
|  | OVERVIEW: A utility for bundling several object files into a single binary. | 
|  | The output binary can then be embedded into the host section table | 
|  | to create a fatbinary containing offloading code. | 
|  |  | 
|  | USAGE: clang-offload-packager [options] | 
|  |  | 
|  | OPTIONS: | 
|  |  | 
|  | Generic Options: | 
|  |  | 
|  | --help                      - Display available options (--help-hidden for more) | 
|  | --help-list                 - Display list of available options (--help-list-hidden for more) | 
|  | --version                   - Display the version of this program | 
|  |  | 
|  | clang-offload-packager options: | 
|  |  | 
|  | --image=<<key>=<value>,...> - List of key and value arguments. Required | 
|  | keywords are 'file' and 'triple'. | 
|  | -o <file>                   - Write output to <file>. | 
|  |  | 
|  | Example | 
|  | ======= | 
|  |  | 
|  | This tool simply takes many input files from the ``image`` option and creates a | 
|  | single output file with all the images combined. | 
|  |  | 
|  | .. code-block:: console | 
|  |  | 
|  | clang-offload-packager -o out.bin --image=file=input.o,triple=nvptx64,arch=sm_70 | 
|  |  | 
|  | The inverse operation can be performed instead by passing the packaged binary as | 
|  | input. In this mode the matching images will either be placed in the output | 
|  | specified by the ``file`` option. If no ``file`` argument is provided a name | 
|  | will be generated for each matching image. | 
|  |  | 
|  | .. code-block:: console | 
|  |  | 
|  | clang-offload-packager in.bin --image=file=output.o,triple=nvptx64,arch=sm_70 |