This document describes the mechanisms of producing LLVM IR from MLIR. The overall flow is two-stage:
This flow allows the non-trivial transformation to be performed within MLIR using MLIR APIs and makes the translation between MLIR and LLVM IR simple and potentially bidirectional. As a corollary, dialect ops translatable to LLVM IR are expected to closely match the corresponding LLVM IR instructions and intrinsics. This minimizes the dependency on LLVM IR libraries in MLIR as well as reduces the churn in case of changes.
SPIR-V to LLVM dialect conversion has a dedicated document.
Conversion to the LLVM dialect from other dialects is the first step to produce LLVM IR. All non-trivial IR modifications are expected to happen at this stage or before. The conversion is progressive: most passes convert one dialect to the LLVM dialect and keep operations from other dialects intact. For example, the -convert-memref-to-llvm
pass will only convert operations from the memref
dialect but will not convert operations from other dialects even if they use or produce memref
-typed values.
The process relies on the Dialect Conversion infrastructure and, in particular, on the materialization hooks of TypeConverter
to support progressive lowering by injecting unrealized_conversion_cast
operations between converted and unconverted operations. After multiple partial conversions to the LLVM dialect are performed, the cast operations that became noop can be removed by the -reconcile-unrealized-casts
pass. The latter pass is not specific to the LLVM dialect and can remove any noop casts.
Built-in types have a default conversion to LLVM dialect types provided by the LLVMTypeConverter
class. Users targeting the LLVM dialect can reuse and extend this type converter to support other types. Extra care must be taken if the conversion rules for built-in types are overridden: all conversion must use the same type converter.
The types compatible with the LLVM dialect are kept as is.
Complex type is converted into an LLVM dialect literal structure type with two elements:
The elemental type is converted recursively using these rules.
Example:
complex<f32> // -> !llvm.struct<(f32, f32)>
Index type is converted into an LLVM dialect integer type with the bitwidth specified by the data layout of the closest module. For example, on x86-64 CPUs it converts to i64. This behavior can be overridden by the type converter configuration, which is often exposed as a pass option by conversion passes.
Example:
index // -> on x86_64 i64
Ranked memref types are converted into an LLVM dialect literal structure type that contains the dynamic information associated with the memref object, referred to as descriptor. Only memrefs in the strided form can be converted to the LLVM dialect with the default descriptor format. Memrefs with other, less trivial layouts should be converted into the strided form first, e.g., by materializing the non-trivial address remapping due to layout as affine.apply
operations.
The default memref descriptor is a struct with the following fields:
index
-type integer containing the distance in number of elements between the beginning of the (aligned) buffer and the first element to be accessed through the memref, referred to as “offset”.index
-type integers as the rank of the memref: the array represents the size, in number of elements, of the memref along the given dimension.index
-type integers as the rank of memref: the second array represents the “stride” (in tensor abstraction sense), i.e. the number of consecutive elements of the underlying buffer one needs to jump over to get to the next logically indexed element.For constant memref dimensions, the corresponding size entry is a constant whose runtime value matches the static value. This normalization serves as an ABI for the memref type to interoperate with externally linked functions. In the particular case of rank 0
memrefs, the size and stride arrays are omitted, resulting in a struct containing two pointers + offset.
Examples:
// Assuming index is converted to i64. memref<f32> -> !llvm.struct<(ptr<f32> , ptr<f32>, i64)> memref<1 x f32> -> !llvm.struct<(ptr<f32>, ptr<f32>, i64, array<1 x 64>, array<1 x i64>)> memref<? x f32> -> !llvm.struct<(ptr<f32>, ptr<f32>, i64 array<1 x 64>, array<1 x i64>)> memref<10x42x42x43x123 x f32> -> !llvm.struct<(ptr<f32>, ptr<f32>, i64 array<5 x 64>, array<5 x i64>)> memref<10x?x42x?x123 x f32> -> !llvm.struct<(ptr<f32>, ptr<f32>, i64 array<5 x 64>, array<5 x i64>)> // Memref types can have vectors as element types memref<1x? x vector<4xf32>> -> !llvm.struct<(ptr<vector<4 x f32>>, ptr<vector<4 x f32>>, i64, array<2 x i64>, array<2 x i64>)>
Unranked memref types are converted to LLVM dialect literal structure type that contains the dynamic information associated with the memref object, referred to as unranked descriptor. It contains:
index
-typed integer representing the dynamic rank of the memref;!llvm.ptr<i8>
) to a ranked memref descriptor with the contents listed above.This descriptor is primarily intended for interfacing with rank-polymorphic library functions. The pointer to the ranked memref descriptor points to some allocated memory, which may reside on stack of the current function or in heap. Conversion patterns for operations producing unranked memrefs are expected to manage the allocation. Note that this may lead to stack allocations (llvm.alloca
) being performed in a loop and not reclaimed until the end of the current function.
Function types are converted to LLVM dialect function types as follows:
!llvm.void
result since LLVM function types must have a result;memref
types, both ranked and unranked, appearing as function arguments are unbundled into individual function arguments to allow for specifying metadata such as aliasing information on individual pointers;memref
-typed arguments is subject to calling conventions.Examples:
// Zero-ary function type with no results: () -> () // is converted to a zero-ary function with `void` result. !llvm.func<void ()> // Unary function with one result: (i32) -> (i64) // has its argument and result type converted, before creating the LLVM dialect // function type. !llvm.func<i64 (i32)> // Binary function with one result: (i32, f32) -> (i64) // has its arguments handled separately !llvm.func<i64 (i32, f32)> // Binary function with two results: (i32, f32) -> (i64, f64) // has its result aggregated into a structure type. !llvm.func<struct<(i64, f64)> (i32, f32)> // Function-typed arguments or results in higher-order functions: (() -> ()) -> (() -> ()) // are converted into pointers to functions. !llvm.func<ptr<func<void ()>> (ptr<func<void ()>>)> // These rules apply recursively: a function type taking a function that takes // another function ( ( (i32) -> (i64) ) -> () ) -> () // is converted into a function type taking a pointer-to-function that takes // another point-to-function. !llvm.func<void (ptr<func<void (ptr<func<i64 (i32)>>)>>)> // A memref descriptor appearing as function argument: (memref<f32>) -> () // gets converted into a list of individual scalar components of a descriptor. !llvm.func<void (ptr<f32>, ptr<f32>, i64)> // The list of arguments is linearized and one can freely mix memref and other // types in this list: (memref<f32>, f32) -> () // which gets converted into a flat list. !llvm.func<void (ptr<f32>, ptr<f32>, i64, f32)> // For nD ranked memref descriptors: (memref<?x?xf32>) -> () // the converted signature will contain 2n+1 `index`-typed integer arguments, // offset, n sizes and n strides, per memref argument type. !llvm.func<void (ptr<f32>, ptr<f32>, i64, i64, i64, i64, i64)> // Same rules apply to unranked descriptors: (memref<*xf32>) -> () // which get converted into their components. !llvm.func<void (i64, ptr<i8>)> // However, returning a memref from a function is not affected: () -> (memref<?xf32>) // gets converted to a function returning a descriptor structure. !llvm.func<struct<(ptr<f32>, ptr<f32>, i64, array<1xi64>, array<1xi64>)> ()> // If multiple memref-typed results are returned: () -> (memref<f32>, memref<f64>) // their descriptor structures are additionally packed into another structure, // potentially with other non-memref typed results. !llvm.func<struct<(struct<(ptr<f32>, ptr<f32>, i64)>, struct<(ptr<double>, ptr<double>, i64)>)> ()>
Conversion patterns are available to convert built-in function operations and standard call operations targeting those functions using these conversion rules.
LLVM IR only supports one-dimensional vectors, unlike MLIR where vectors can be multi-dimensional. Vector types cannot be nested in either IR. In the one-dimensional case, MLIR vectors are converted to LLVM IR vectors of the same size with element type converted using these conversion rules. In the n-dimensional case, MLIR vectors are converted to (n-1)-dimensional array types of one-dimensional vectors.
Examples:
vector<4x8 x f32> // -> !llvm.array<4 x vector<8 x f32>> memref<2 x vector<4x8 x f32> // -> !llvm.struct<(ptr<array<4 x vector<8xf32>>>, ptr<array<4 x vector<8xf32>>> i64, array<1 x i64>, array<1 x i64>)>
Tensor types cannot be converted to the LLVM dialect. Operations on tensors must be bufferized before being converted.
Calling conventions provides a mechanism to customize the conversion of function and function call operations without changing how individual types are handled elsewhere. They are implemented simultaneously by the default type converter and by the conversion patterns for the relevant operations.
In case of multi-result functions, the returned values are inserted into a structure-typed value before being returned and extracted from it at the call site. This transformation is a part of the conversion and is transparent to the defines and uses of the values being returned.
Example:
func @foo(%arg0: i32, %arg1: i64) -> (i32, i64) { return %arg0, %arg1 : i32, i64 } func @bar() { %0 = arith.constant 42 : i32 %1 = arith.constant 17 : i64 %2:2 = call @foo(%0, %1) : (i32, i64) -> (i32, i64) "use_i32"(%2#0) : (i32) -> () "use_i64"(%2#1) : (i64) -> () } // is transformed into llvm.func @foo(%arg0: i32, %arg1: i64) -> !llvm.struct<(i32, i64)> { // insert the vales into a structure %0 = llvm.mlir.undef : !llvm.struct<(i32, i64)> %1 = llvm.insertvalue %arg0, %0[0] : !llvm.struct<(i32, i64)> %2 = llvm.insertvalue %arg1, %1[1] : !llvm.struct<(i32, i64)> // return the structure value llvm.return %2 : !llvm.struct<(i32, i64)> } llvm.func @bar() { %0 = llvm.mlir.constant(42 : i32) : i32 %1 = llvm.mlir.constant(17) : i64 // call and extract the values from the structure %2 = llvm.call @bar(%0, %1) : (i32, i32) -> !llvm.struct<(i32, i64)> %3 = llvm.extractvalue %2[0] : !llvm.struct<(i32, i64)> %4 = llvm.extractvalue %2[1] : !llvm.struct<(i32, i64)> // use as before "use_i32"(%3) : (i32) -> () "use_i64"(%4) : (i64) -> () }
The default calling convention converts memref
-typed function arguments to LLVM dialect literal structs defined above before unbundling them into individual scalar arguments.
Examples:
This convention is implemented in the conversion of std.func
and std.call
to the LLVM dialect, with the former unpacking the descriptor into a set of individual values and the latter packing those values back into a descriptor so as to make it transparently usable by other operations. Conversions from other dialects should take this convention into account.
This specific convention is motivated by the necessity to specify alignment and aliasing attributes on the raw pointers underpinning the memref.
Examples:
func @foo(%arg0: memref<?xf32>) -> () { "use"(%arg0) : (memref<?xf32>) -> () return } // Gets converted to the following // (using type alias for brevity): !llvm.memref_1d = type !llvm.struct<(ptr<f32>, ptr<f32>, i64, array<1xi64>, array<1xi64>)> llvm.func @foo(%arg0: !llvm.ptr<f32>, // Allocated pointer. %arg1: !llvm.ptr<f32>, // Aligned pointer. %arg2: i64, // Offset. %arg3: i64, // Size in dim 0. %arg4: i64) { // Stride in dim 0. // Populate memref descriptor structure. %0 = llvm.mlir.undef : %1 = llvm.insertvalue %arg0, %0[0] : !llvm.memref_1d %2 = llvm.insertvalue %arg1, %1[1] : !llvm.memref_1d %3 = llvm.insertvalue %arg2, %2[2] : !llvm.memref_1d %4 = llvm.insertvalue %arg3, %3[3, 0] : !llvm.memref_1d %5 = llvm.insertvalue %arg4, %4[4, 0] : !llvm.memref_1d // Descriptor is now usable as a single value. "use"(%5) : (!llvm.memref_1d) -> () llvm.return }
func @bar() { %0 = "get"() : () -> (memref<?xf32>) call @foo(%0) : (memref<?xf32>) -> () return } // Gets converted to the following // (using type alias for brevity): !llvm.memref_1d = type !llvm.struct<(ptr<f32>, ptr<f32>, i64, array<1xi64>, array<1xi64>)> llvm.func @bar() { %0 = "get"() : () -> !llvm.memref_1d // Unpack the memref descriptor. %1 = llvm.extractvalue %0[0] : !llvm.memref_1d %2 = llvm.extractvalue %0[1] : !llvm.memref_1d %3 = llvm.extractvalue %0[2] : !llvm.memref_1d %4 = llvm.extractvalue %0[3, 0] : !llvm.memref_1d %5 = llvm.extractvalue %0[4, 0] : !llvm.memref_1d // Pass individual values to the callee. llvm.call @foo(%1, %2, %3, %4, %5) : (!llvm.memref_1d) -> () llvm.return }
For unranked memrefs, the list of function arguments always contains two elements, same as the unranked memref descriptor: an integer rank, and a type-erased (!llvm<"i8*">
) pointer to the ranked memref descriptor. Note that while the calling convention does not require allocation, casting to unranked memref does since one cannot take an address of an SSA value containing the ranked memref, which must be stored in some memory instead. The caller is in charge of ensuring the thread safety and management of the allocated memory, in particular the deallocation.
Example
llvm.func @foo(%arg0: memref<*xf32>) -> () { "use"(%arg0) : (memref<*xf32>) -> () return } // Gets converted to the following. llvm.func @foo(%arg0: i64 // Rank. %arg1: !llvm.ptr<i8>) { // Type-erased pointer to descriptor. // Pack the unranked memref descriptor. %0 = llvm.mlir.undef : !llvm.struct<(i64, ptr<i8>)> %1 = llvm.insertvalue %arg0, %0[0] : !llvm.struct<(i64, ptr<i8>)> %2 = llvm.insertvalue %arg1, %1[1] : !llvm.struct<(i64, ptr<i8>)> "use"(%2) : (!llvm.struct<(i64, ptr<i8>)>) -> () llvm.return }
llvm.func @bar() { %0 = "get"() : () -> (memref<*xf32>) call @foo(%0): (memref<*xf32>) -> () return } // Gets converted to the following. llvm.func @bar() { %0 = "get"() : () -> (!llvm.struct<(i64, ptr<i8>)>) // Unpack the memref descriptor. %1 = llvm.extractvalue %0[0] : !llvm.struct<(i64, ptr<i8>)> %2 = llvm.extractvalue %0[1] : !llvm.struct<(i64, ptr<i8>)> // Pass individual values to the callee. llvm.call @foo(%1, %2) : (i64, !llvm.ptr<i8>) llvm.return }
Lifetime. The second element of the unranked memref descriptor points to some memory in which the ranked memref descriptor is stored. By convention, this memory is allocated on stack and has the lifetime of the function. (Note: due to function-length lifetime, creation of multiple unranked memref descriptors, e.g., in a loop, may lead to stack overflows.) If an unranked descriptor has to be returned from a function, the ranked descriptor it points to is copied into dynamically allocated memory, and the pointer in the unranked descriptor is updated accordingly. The allocation happens immediately before returning. It is the responsibility of the caller to free the dynamically allocated memory. The default conversion of std.call
and std.call_indirect
copies the ranked descriptor to newly allocated memory on the caller's stack. Thus, the convention of the ranked memref descriptor pointed to by an unranked memref descriptor being stored on stack is respected.
The “bare pointer” calling convention converts memref
-typed function arguments to a single pointer to the aligned data. Note that this does not apply to uses of memref
outside of function signatures, the default descriptor structures are still used. This convention further restricts the supported cases to the following.
memref
types with default layout.memref
types with all dimensions statically known.memref
values allocated in such a way that the allocated and aligned pointer match. Alternatively, the same function must handle allocation and deallocation since only one pointer is passed to any callee.Examples:
func @callee(memref<2x4xf32>) { func @caller(%0 : memref<2x4xf32>) { call @callee(%0) : (memref<2x4xf32>) -> () } // -> !descriptor = !llvm.struct<(ptr<f32>, ptr<f32>, i64, array<2xi64>, array<2xi64>)> llvm.func @callee(!llvm.ptr<f32>) llvm.func @caller(%arg0: !llvm.ptr<f32>) { // A descriptor value is defined at the function entry point. %0 = llvm.mlir.undef : !descriptor // Both the allocated and aligned pointer are set up to the same value. %1 = llvm.insertelement %arg0, %0[0] : !descriptor %2 = llvm.insertelement %arg0, %1[1] : !descriptor // The offset is set up to zero. %3 = llvm.mlir.constant(0 : index) : i64 %4 = llvm.insertelement %3, %2[2] : !descriptor // The sizes and strides are derived from the statically known values. %5 = llvm.mlir.constant(2 : index) : i64 %6 = llvm.mlir.constant(4 : index) : i64 %7 = llvm.insertelement %5, %4[3, 0] : !descriptor %8 = llvm.insertelement %6, %7[3, 1] : !descriptor %9 = llvm.mlir.constant(1 : index) : i64 %10 = llvm.insertelement %9, %8[4, 0] : !descriptor %11 = llvm.insertelement %10, %9[4, 1] : !descriptor // The function call corresponds to extracting the aligned data pointer. %12 = llvm.extractelement %11[1] : !descriptor llvm.call @callee(%12) : (!llvm.ptr<f32>) -> () }
The “bare pointer” calling convention does not support unranked memrefs as their shape cannot be known at compile time.
In practical cases, it may be desirable to have externally-facing functions with a single attribute corresponding to a MemRef argument. When interfacing with LLVM IR produced from C, the code needs to respect the corresponding calling convention. The conversion to the LLVM dialect provides an option to generate wrapper functions that take memref descriptors as pointers-to-struct compatible with data types produced by Clang when compiling C sources. The generation of such wrapper functions can additionally be controlled at a function granularity by setting the llvm.emit_c_interface
unit attribute.
More specifically, a memref argument is converted into a pointer-to-struct argument of type {T*, T*, i64, i64[N], i64[N]}*
in the wrapper function, where T
is the converted element type and N
is the memref rank. This type is compatible with that produced by Clang for the following C++ structure template instantiations or their equivalents in C.
template<typename T, size_t N> struct MemRefDescriptor { T *allocated; T *aligned; intptr_t offset; intptr_t sizes[N]; intptr_t strides[N]; };
Furthermore, we also rewrite function results to pointer parameters if the rewritten function result has a struct type. The special result parameter is added as the first parameter and is of pointer-to-struct type.
If enabled, the option will do the following. For external functions declared in the MLIR module.
_mlir_ciface_<original name>
where memref arguments are converted to pointer-to-struct and the remaining arguments are converted as usual. Results are converted to a special argument if they are of struct type.For (non-external) functions defined in the MLIR module.
_mlir_ciface_<original name>
where memref arguments are converted to pointer-to-struct and the remaining arguments are converted as usual. Results are converted to a special argument if they are of struct type.Examples:
func @qux(%arg0: memref<?x?xf32>) // Gets converted into the following // (using type alias for brevity): !llvm.memref_2d = type !llvm.struct<(ptr<f32>, ptr<f32>, i64, array<2xi64>, array<2xi64>)> // Function with unpacked arguments. llvm.func @qux(%arg0: !llvm.ptr<f32>, %arg1: !llvm.ptr<f32>, %arg2: i64, %arg3: i64, %arg4: i64, %arg5: i64, %arg6: i64) { // Populate memref descriptor (as per calling convention). %0 = llvm.mlir.undef : !llvm.memref_2d %1 = llvm.insertvalue %arg0, %0[0] : !llvm.memref_2d %2 = llvm.insertvalue %arg1, %1[1] : !llvm.memref_2d %3 = llvm.insertvalue %arg2, %2[2] : !llvm.memref_2d %4 = llvm.insertvalue %arg3, %3[3, 0] : !llvm.memref_2d %5 = llvm.insertvalue %arg5, %4[4, 0] : !llvm.memref_2d %6 = llvm.insertvalue %arg4, %5[3, 1] : !llvm.memref_2d %7 = llvm.insertvalue %arg6, %6[4, 1] : !llvm.memref_2d // Store the descriptor in a stack-allocated space. %8 = llvm.mlir.constant(1 : index) : i64 %9 = llvm.alloca %8 x !llvm.memref_2d : (i64) -> !llvm.ptr<struct<(ptr<f32>, ptr<f32>, i64, array<2xi64>, array<2xi64>)>> llvm.store %7, %9 : !llvm.ptr<struct<(ptr<f32>, ptr<f32>, i64, array<2xi64>, array<2xi64>)>> // Call the interface function. llvm.call @_mlir_ciface_qux(%9) : (!llvm.ptr<struct<(ptr<f32>, ptr<f32>, i64, array<2xi64>, array<2xi64>)>>) -> () // The stored descriptor will be freed on return. llvm.return } // Interface function. llvm.func @_mlir_ciface_qux(!llvm.ptr<struct<(ptr<f32>, ptr<f32>, i64, array<2xi64>, array<2xi64>)>>)
func @foo(%arg0: memref<?x?xf32>) { return } // Gets converted into the following // (using type alias for brevity): !llvm.memref_2d = type !llvm.struct<(ptr<f32>, ptr<f32>, i64, array<2xi64>, array<2xi64>)> !llvm.memref_2d_ptr = type !llvm.ptr<struct<(ptr<f32>, ptr<f32>, i64, array<2xi64>, array<2xi64>)>> // Function with unpacked arguments. llvm.func @foo(%arg0: !llvm.ptr<f32>, %arg1: !llvm.ptr<f32>, %arg2: i64, %arg3: i64, %arg4: i64, %arg5: i64, %arg6: i64) { llvm.return } // Interface function callable from C. llvm.func @_mlir_ciface_foo(%arg0: !llvm.memref_2d_ptr) { // Load the descriptor. %0 = llvm.load %arg0 : !llvm.memref_2d_ptr // Unpack the descriptor as per calling convention. %1 = llvm.extractvalue %0[0] : !llvm.memref_2d %2 = llvm.extractvalue %0[1] : !llvm.memref_2d %3 = llvm.extractvalue %0[2] : !llvm.memref_2d %4 = llvm.extractvalue %0[3, 0] : !llvm.memref_2d %5 = llvm.extractvalue %0[3, 1] : !llvm.memref_2d %6 = llvm.extractvalue %0[4, 0] : !llvm.memref_2d %7 = llvm.extractvalue %0[4, 1] : !llvm.memref_2d llvm.call @foo(%1, %2, %3, %4, %5, %6, %7) : (!llvm.ptr<f32>, !llvm.ptr<f32>, i64, i64, i64, i64, i64) -> () llvm.return }
func @foo(%arg0: memref<?x?xf32>) -> memref<?x?xf32> { return %arg0 : memref<?x?xf32> } // Gets converted into the following // (using type alias for brevity): !llvm.memref_2d = type !llvm.struct<(ptr<f32>, ptr<f32>, i64, array<2xi64>, array<2xi64>)> !llvm.memref_2d_ptr = type !llvm.ptr<struct<(ptr<f32>, ptr<f32>, i64, array<2xi64>, array<2xi64>)>> // Function with unpacked arguments. llvm.func @foo(%arg0: !llvm.ptr<f32>, %arg1: !llvm.ptr<f32>, %arg2: i64, %arg3: i64, %arg4: i64, %arg5: i64, %arg6: i64) -> !llvm.memref_2d { %0 = llvm.mlir.undef : !llvm.memref_2d %1 = llvm.insertvalue %arg0, %0[0] : !llvm.memref_2d %2 = llvm.insertvalue %arg1, %1[1] : !llvm.memref_2d %3 = llvm.insertvalue %arg2, %2[2] : !llvm.memref_2d %4 = llvm.insertvalue %arg3, %3[3, 0] : !llvm.memref_2d %5 = llvm.insertvalue %arg5, %4[4, 0] : !llvm.memref_2d %6 = llvm.insertvalue %arg4, %5[3, 1] : !llvm.memref_2d %7 = llvm.insertvalue %arg6, %6[4, 1] : !llvm.memref_2d llvm.return %7 : !llvm.memref_2d } // Interface function callable from C. llvm.func @_mlir_ciface_foo(%arg0: !llvm.memref_2d_ptr, %arg1: !llvm.memref_2d_ptr) { %0 = llvm.load %arg1 : !llvm.memref_2d_ptr %1 = llvm.extractvalue %0[0] : !llvm.memref_2d %2 = llvm.extractvalue %0[1] : !llvm.memref_2d %3 = llvm.extractvalue %0[2] : !llvm.memref_2d %4 = llvm.extractvalue %0[3, 0] : !llvm.memref_2d %5 = llvm.extractvalue %0[3, 1] : !llvm.memref_2d %6 = llvm.extractvalue %0[4, 0] : !llvm.memref_2d %7 = llvm.extractvalue %0[4, 1] : !llvm.memref_2d %8 = llvm.call @foo(%1, %2, %3, %4, %5, %6, %7) : (!llvm.ptr<f32>, !llvm.ptr<f32>, i64, i64, i64, i64, i64) -> !llvm.memref_2d llvm.store %8, %arg0 : !llvm.memref_2d_ptr llvm.return }
Rationale: Introducing auxiliary functions for C-compatible interfaces is preferred to modifying the calling convention since it will minimize the effect of C compatibility on intra-module calls or calls between MLIR-generated functions. In particular, when calling external functions from an MLIR module in a (parallel) loop, the fact of storing a memref descriptor on stack can lead to stack exhaustion and/or concurrent access to the same address. Auxiliary interface function serves as an allocation scope in this case. Furthermore, when targeting accelerators with separate memory spaces such as GPUs, stack-allocated descriptors passed by pointer would have to be transferred to the device memory, which introduces significant overhead. In such situations, auxiliary interface functions are executed on host and only pass the values through device function invocation mechanism.
Accesses to a memref element are transformed into an access to an element of the buffer pointed to by the descriptor. The position of the element in the buffer is calculated by linearizing memref indices in row-major order (lexically first index is the slowest varying, similar to C, but accounting for strides). The computation of the linear address is emitted as arithmetic operation in the LLVM IR dialect. Strides are extracted from the memref descriptor.
Examples:
An access to a memref with indices:
%0 = memref.load %m[%1,%2,%3,%4] : memref<?x?x4x8xf32, offset: ?>
is transformed into the equivalent of the following code:
// Compute the linearized index from strides. // When strides or, in absence of explicit strides, the corresponding sizes are // dynamic, extract the stride value from the descriptor. %stride1 = llvm.extractvalue[4, 0] : !llvm.struct<(ptr<f32>, ptr<f32>, i64, array<4xi64>, array<4xi64>)> %addr1 = arith.muli %stride1, %1 : i64 // When the stride or, in absence of explicit strides, the trailing sizes are // known statically, this value is used as a constant. The natural value of // strides is the product of all sizes following the current dimension. %stride2 = llvm.mlir.constant(32 : index) : i64 %addr2 = arith.muli %stride2, %2 : i64 %addr3 = arith.addi %addr1, %addr2 : i64 %stride3 = llvm.mlir.constant(8 : index) : i64 %addr4 = arith.muli %stride3, %3 : i64 %addr5 = arith.addi %addr3, %addr4 : i64 // Multiplication with the known unit stride can be omitted. %addr6 = arith.addi %addr5, %4 : i64 // If the linear offset is known to be zero, it can also be omitted. If it is // dynamic, it is extracted from the descriptor. %offset = llvm.extractvalue[2] : !llvm.struct<(ptr<f32>, ptr<f32>, i64, array<4xi64>, array<4xi64>)> %addr7 = arith.addi %addr6, %offset : i64 // All accesses are based on the aligned pointer. %aligned = llvm.extractvalue[1] : !llvm.struct<(ptr<f32>, ptr<f32>, i64, array<4xi64>, array<4xi64>)> // Get the address of the data pointer. %ptr = llvm.getelementptr %aligned[%addr8] : !llvm.struct<(ptr<f32>, ptr<f32>, i64, array<4xi64>, array<4xi64>)> -> !llvm.ptr<f32> // Perform the actual load. %0 = llvm.load %ptr : !llvm.ptr<f32>
For stores, the address computation code is identical and only the actual store operation is different.
Note: the conversion does not perform any sort of common subexpression elimination when emitting memref accesses.
Utility classes common to many conversions to the LLVM dialect can be found under lib/Conversion/LLVMCommon
. They include the following.
LLVMConversionTarget
specifies all LLVM dialect operations as legal.LLVMTypeConverter
implements the default type conversion as described above.ConvertOpToLLVMPattern
extends the conversion pattern class with LLVM dialect-specific functionality.VectorConvertOpToLLVMPattern
extends the previous class to automatically unroll operations on higher-dimensional vectors into lists of operations on one-dimensional vectors before.StructBuilder
provides a convenient API for building IR that creates or accesses values of LLVM dialect structure types; it is derived by MemRefDescriptor
, UrankedMemrefDescriptor
and ComplexBuilder
for the built-in types convertible to LLVM dialect structure types.MLIR modules containing llvm.func
, llvm.mlir.global
and llvm.metadata
operations can be translated to LLVM IR modules using the following scheme.
The translation mechanism provides extension hooks for translating custom operations to LLVM IR via a dialect interface LLVMTranslationDialectInterface
:
convertOperation
translates an operation that belongs to the current dialect to LLVM IR given an IRBuilderBase
and various mappings;amendOperation
performs additional actions on an operation if it contains a dialect attribute that belongs to the current dialect, for example sets up instruction-level metadata.Dialects containing operations or attributes that want to be translated to LLVM IR must provide an implementation of this interface and register it with the system. Note that registration may happen without creating the dialect, for example, in a separate library to avoid the need for the “main” dialect library to depend on LLVM IR libraries. The implementations of these methods may used the ModuleTranslation
object provided to them which holds the state of the translation and contains numerous utilities.
Note that this extension mechanism is intentionally restrictive. LLVM IR has a small, relatively stable set of instructions and types that MLIR intends to model fully. Therefore, the extension mechanism is provided only for LLVM IR constructs that are more often extended -- intrinsics and metadata. The primary goal of the extension mechanism is to support sets of intrinsics, for example those representing a particular instruction set. The extension mechanism does not allow for customizing type or block translation, nor does it support custom module-level operations. Such transformations should be performed within MLIR and target the corresponding MLIR constructs.
An experimental flow allows one to import a substantially limited subset of LLVM IR into MLIR, producing LLVM dialect operations.
mlir-translate -import-llvm filename.ll