Introduction to Declare Target

In OpenMP declare target is a directive that can be applied to a function or variable (primarily global) to notate to the compiler that it should be generated in a particular device's environment. In essence whether something should be emitted for host or device, or both. An example of its usage for both data and functions can be seen below.

module test_0
    integer :: sp = 0
!$omp declare target link(sp)
end module test_0

program main
    use test_0
!$omp target map(tofrom:sp)
    sp = 1
!$omp end target
end program

In the above example, we create a variable in a separate module, mark it as declare target and then map it, embedding it into the device IR and assigning to it.

function func_t_device() result(i)
    !$omp declare target to(func_t_device) device_type(nohost)
        INTEGER :: I
        I = 1
end function func_t_device

program main
!$omp target
    call func_t_device()
!$omp end target
end program

In the above example, we are stating that a function is required on device utilising declare target, and that we will not be utilising it on host, so we are in theory free to remove or ignore it there. A user could also in this case, leave off the declare target from the function and it would be implicitly marked declare target any (for both host and device), as it's been utilised within a target region.

Declare Target as represented in the OpenMP Dialect

In the OpenMP Dialect declare target is not represented by a specific operation. Instead, it's an OpenMP dialect specific attribute that can be applied to any operation in any dialect, which helps to simplify the utilisation of it. Rather than replacing or modifying existing global or function operations in a dialect, it applies to it as extra metadata that the lowering can use in different ways as is necessary.

The attribute is composed of multiple fields representing the clauses you would find on the declare target directive i.e. device type (nohost, any, host) or the capture clause (link or to). A small example of declare target applied to a Fortran real can be found below:

fir.global internal @_QFEi {omp.declare_target = 
#omp.declaretarget<device_type = (any), capture_clause = (to)>} : f32 {
    %0 = fir.undefined f32
    fir.has_value %0 : f32
}

This would look similar for function style operations.

The application and access of this attribute is aided by an OpenMP Dialect MLIR Interface named DeclareTargetInterface, which can be utilised on operations to access the appropriate interface functions, e.g.:

auto declareTargetGlobal = 
llvm::dyn_cast<mlir::omp::DeclareTargetInterface>(Op.getOperation());
declareTargetGlobal.isDeclareTarget();

Declare Target Fortran OpenMP Lowering

The initial lowering of declare target to MLIR for both use-cases is done inside of the usual OpenMP lowering in flang/lib/Lower/OpenMP.cpp. However, some direct calls to declare target related functions from Flang's lowering bridge in flang/lib/Lower/Bridge.cpp are made.

The marking of operations with the declare target attribute happens in two phases, the second one optional and contingent on the first failing. The initial phase happens when the declare target directive and its clauses are initially processed, with the primary data gathering for the directive and clause happening in a function called getDeclareTargetInfo. This is then used to feed the markDeclareTarget function, which does the actual marking utilising the DeclareTargetInterface. If it encounters a variable or function that has been marked twice over multiple directives with two differing device types (e.g. host, nohost), then it will swap the device type to any.

Whenever we invoke genFIR on an OpenMPDeclarativeConstruct from the lowering bridge, we are also invoking another function called gatherOpenMPDeferredDeclareTargets, which gathers information relevant to the application of the declare target attribute. This information includes the symbol that it should be applied to, device type clause, and capture clause, and it is stored in a vector that is part of the lowering bridge's instantiation of the AbstractConverter. It is only stored if we encounter a function or variable symbol that does not have an operation instantiated for it yet. This cannot happen as part of the initial marking as we must store this data in the lowering bridge and we only have access to the abstract version of the converter via the OpenMP lowering.

The information produced by the first phase is used in the second phase, which is a form of deferred processing of the declare target marked operations that have delayed generation and cannot be proccessed in the first phase. The main notable case this occurs currently is when a Fortran function interface has been marked. This is done via the function markOpenMPDeferredDeclareTargetFunctions, which is called from the lowering bridge at the end of the lowering process allowing us to mark those where possible. It iterates over the data previously gathered by gatherOpenMPDeferredDeclareTargets checking if any of the recorded symbols have now had their corresponding operations instantiated and applying the declare target attribute where possible utilising markDeclareTarget. However, it must be noted that it is still possible for operations not to be generated for certain symbols, in particular the case of function interfaces that are not directly used or defined within the current module. This means we cannot emit errors in the case of left-over unmarked symbols. These must (and should) be caught by the initial semantic analysis.

NOTE: declare target can be applied to implicit SAVE attributed variables. However, by default Flang does not represent these as GlobalOp's, which means we cannot tag and lower them as declare target normally. Instead, similarly to the way threadprivate handles these cases, we raise and initialize the variable as an internal GlobalOp and apply the attribute. This occurs in the flang/lib/Lower/OpenMP.cpp function genDeclareTargetIntGlobal.

Declare Target Transformation Passes for Flang

There are currently two passes within Flang that are related to the processing of declare target:

  • OMPMarkDeclareTarget - This pass is in charge of marking functions captured (called from) in target regions or other declare target marked functions as declare target. It does so recursively, i.e. nested calls will also be implicitly marked. It currently will try to mark things as conservatively as possible, e.g. if captured in a target region it will apply nohost, unless it encounters a host declare target in which case it will apply the any device type. Functions are handled similarly, except we utilise the parent's device type where possible.
  • OMPFunctionFiltering - This is executed after the OMPMarkDeclareTarget pass, and its job is to conservatively remove host functions from the module where possible when compiling for the device. This helps make sure that most incompatible code for the host is not lowered for the device. Host functions with target regions in them need to be preserved (e.g. for lowering the target region(s) inside). Otherwise, it removes any function marked as a declare target host function and any uses will be replaced with undef‘s so that the remaining host code doesn’t become broken. Host functions with target regions are marked with a declare target host attribute so they will be removed after outlining the target regions contained inside.

While this infrastructure could be generally applicable to more than just Flang, it is only utilised in the Flang frontend, so it resides there rather than in the OpenMP dialect codebase.

Declare Target OpenMP Dialect To LLVM-IR Lowering

The OpenMP dialect lowering of declare target is done through the amendOperation flow, as it's not an operation but rather an attribute. This is triggered immediately after the corresponding operation has been lowered to LLVM-IR. As it is applicable to different types of operations, we must specialise this function for each operation type that we may encounter. Currently, this is GlobalOp's and FuncOp's.

FuncOp processing is fairly simple. When compiling for the device, host marked functions are removed, including those that could not be removed earlier due to having target directives within. This leaves any, device or indeterminable functions left in the module to lower further. When compiling for the host, no filtering is done because nohost functions must be available as a fallback implementation.

For GlobalOp's, the processing is a little more complex. We currently leverage the registerTargetGlobalVariable and getAddrOfDeclareTargetVar OMPIRBuilder functions shared with Clang. These two functions invoke each other depending on the clauses and options provided to the OMPIRBuilder (in particular, unified shared memory). Their main purposes are the generation of a new global device pointer with a “ref_” prefix on the device and enqueuing metadata generation by the OMPIRBuilder to be produced at module finalization time. This is done for both host and device and it links the newly generated device global pointer and the host pointer together across the two modules.

Similarly to other metadata (e.g. for TargetOp) that is shared across both host and device modules, processing of GlobalOp‘s in the device needs access to the previously generated host IR file, which is done through another attribute applied to the ModuleOp by the compiler frontend. The file is loaded in and consumed by the OMPIRBuilder to populate it’s OffloadInfoManager data structures, keeping host and device appropriately synchronised.

The second (and more important to remember) is that as we effectively replace the original LLVM-IR generated for the declare target marked GlobalOp we have some corrections we need to do for TargetOp‘s (or other region operations that use them directly) which still refer to the original lowered global operation. This is done via handleDeclareTargetMapVar which is invoked as the final function and alteration to the lowered target region, it’s only invoked for device as it‘s only required in the case where we have emitted the “ref” pointer , and it effectively replaces all uses of the originally lowered global symbol, with our new global ref pointer’s symbol. Currently we do not remove or delete the old symbol, this is due to the fact that the same symbol can be utilised across multiple target regions, if we remove it, we risk breaking lowerings of target regions that will be processed at a later time. To appropriately delete these no longer necessary symbols we would need a deferred removal process at the end of the module, which is currently not in place. It may be possible to store this information in the OMPIRBuilder and then perform this cleanup process on finalization, but this is open for discussion and implementation still.

Current Support

For the moment, declare target should work for:

  • Marking functions/subroutines and function/subroutine interfaces for generation on host, device or both.
  • Implicit function/subroutine capture for calls emitted in a target region or explicitly marked declare target function/subroutine. Note: Calls made via arguments passed to other functions must still be themselves marked declare target, e.g. passing a C function pointer and invoking it, then the interface and the C function in the other module must be marked declare target, with the same type of marking as indicated by the specification.
  • Marking global variables with declare target's link clause and mapping the data to the device data environment utilising declare target. This may not work for all types yet, but for scalars and arrays of scalars, it should.

Doesn't work for, or needs further testing for:

  • Marking the following types with declare target link (needs further testing):
    • Descriptor based types, e.g. pointers/allocatables.
    • Derived types.
    • Members of derived types (use-case needs legality checking with OpenMP specification).
  • Marking global variables with declare target's to clause. A lot of the lowering should exist, but it needs further testing and likely some further changes to fully function.