| <!--===- docs/Directives.md |
| |
| Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. |
| See https://llvm.org/LICENSE.txt for license information. |
| SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception |
| |
| --> |
| |
| # Compiler directives supported by Flang |
| |
| A list of non-standard directives supported by Flang |
| |
| * `!dir$ fixed` and `!dir$ free` select Fortran source forms. Their effect |
| persists to the end of the current source file. |
| * `!dir$ ignore_tkr [[(TKRDMACP)] dummy-arg-name]...` in an interface definition |
| disables some semantic checks at call sites for the actual arguments that |
| correspond to some named dummy arguments (or all of them, by default). The |
| directive allow actual arguments that would otherwise be diagnosed as |
| incompatible in type (T), kind (K), rank (R), CUDA device (D), or managed/ |
| unified (M) status. The letter (A) is a shorthand for (TKRDM), and is the |
| default when no letters appear. The letter (C) checks for contiguity, for |
| example allowing |
| an element of an assumed-shape array to be passed as a dummy argument. When |
| the dummy argument is passed by descriptor, (C) specifies that the descriptor |
| should not be copied or reboxed, allowing the original descriptor to be passed |
| directly even if attributes like ALLOCATABLE or POINTER don't match exactly. |
| When the dummy argument is not passed by descriptor (e.g., an assumed-size |
| array in a BIND(C) interface), the base address is extracted from the actual |
| argument's descriptor and passed as a raw pointer. |
| The letter (P) ignores pointer and allocatable matching, so that one can pass |
| an allocatable array to routine with pointer array argument and vice versa. |
| The letter (M) disables matching of the actual argument's CUDA storage |
| (managed/unified) against the dummy's. Its main use is in host modules that |
| overload the same routine with both a host-typed and a `device`-typed |
| specific: placing (M) on the device-typed dummy turns that specific into an |
| overload discriminator. Under `-gpu=mem:unified` or `-gpu=mem:managed`, an |
| unattributed host actual is normally allowed to bind to a `device` dummy |
| (the host-to-device attribute check is relaxed). (M) on that dummy opts it |
| out of the relaxation: an unattributed host actual then binds to the |
| host-typed specific in the same overload set, while actuals with an |
| explicit `device`, `managed`, or `unified` attribute continue to bind to |
| the device-typed specific. For example: |
| ``` |
| interface compute |
| module procedure compute_host |
| module procedure compute_device |
| end interface |
| contains |
| subroutine compute_host(alpha) |
| real :: alpha |
| end |
| subroutine compute_device(alpha) |
| real, device :: alpha |
| !dir$ ignore_tkr(m) alpha |
| end |
| ! ... |
| real :: a ! plain host scalar |
| real, device :: d ! device scalar |
| call compute(a) ! always binds to compute_host |
| call compute(d) ! always binds to compute_device |
| ``` |
| For contrast: without `ignore_tkr(m)` on `compute_device`, |
| `call compute(a)` compiled with `-gpu=mem:unified` would instead resolve |
| to `compute_device`, because the matching rules let `a` bind to the |
| device dummy and rank it as a closer match than the host one (see the |
| "Attributed Argument Matching Distance Values" table in section 3.2.3 |
| of the CUDA Fortran Programming Guide). |
| For example, if one wanted to call a "set all bytes to zero" utility that |
| could be applied to arrays of any type or rank: |
| ``` |
| interface |
| subroutine clear(arr,bytes) |
| !dir$ ignore_tkr arr |
| integer(1), intent(out) :: arr(bytes) |
| end |
| end interface |
| ``` |
| Note that it's not allowed to pass array actual argument to `ignore_trk(R)` |
| dummy argument that is a scalar with `VALUE` attribute, for example: |
| ``` |
| interface |
| subroutine s(b) |
| !dir$ ignore_tkr(r) b |
| integer, value :: b |
| end |
| end interface |
| integer :: a(5) |
| call s(a) |
| ``` |
| The reason for this limitation is that scalars with `VALUE` attribute can |
| be passed in registers, so it's not clear how lowering should handle this |
| case. (Passing scalar actual argument to `ignore_tkr(R)` dummy argument |
| that is a scalar with `VALUE` attribute is allowed.) |
| * `!dir$ ivdep` asserts that there are no vector dependencies in the following loop, |
| allowing the compiler to vectorize or parallelize the loop if it chooses to do so |
| based on its cost model. It does not force vectorization. |
| * `!dir$ assume_aligned desginator:alignment`, where designator is a variable, |
| maybe with array indices, and alignment is what the compiler should assume the |
| alignment to be. E.g A:64 or B(1,1,1):128. The alignment should be a power of 2, |
| and is limited to 256. |
| [This directive is currently recognised by the parser, but not |
| handled by the other parts of the compiler]. |
| * `!dir$ vector always` forces vectorization on the following loop regardless |
| of cost model decisions. The loop must still be vectorizable. |
| [This directive currently only works on plain do loops without labels]. |
| * `!dir$ simd` works the same as `vector always` above, but provides an alternative |
| spelling and support for projects which would have used the classic-flang frontend |
| previously. |
| * `!dir$ vector vectorlength({fixed|scalable|<num>|<num>,fixed|<num>,scalable})` |
| specifies a hint to the compiler about the desired vectorization factor. If |
| `fixed` is used, the compiler should prefer fixed-width vectorization. |
| Scalable vectorization instructions may still be used with a fixed-width |
| predicate. If `scalable` is used the compiler should prefer scalable |
| vectorization, though it can choose to use fixed length vectorization or not |
| at all. `<num>` means that the compiler should consider using this specific |
| vectorization factor, which should be an integer literal. This directive |
| currently has the same limitations as `!dir$ vector always`. |
| * `!dir$ unroll [n]` specifies that the compiler ought to unroll the immediately |
| following loop `n` times. When `n` is `0` or `1`, the loop should not be unrolled |
| at all. When `n` is `2` or greater, the loop should be unrolled exactly `n` |
| times if possible. When `n` is omitted, the compiler should attempt to fully |
| unroll the loop. Some compilers accept an optional `=` before the `n` when `n` |
| is present in the directive. Flang does not. |
| * `!dir$ unroll_and_jam [N]` control how many times a loop should be unrolled and |
| jammed. It must be placed immediately before a loop that follows. `N` is an optional |
| integer that specifying the unrolling factor. When `N` is `0` or `1`, the loop |
| should not be unrolled at all. If `N` is omitted the optimizer will |
| selects the number of times to unroll the loop. |
| * `!dir$ prefetch designator[, designator]...`, where the designator list can be |
| a variable or an array reference. This directive is used to insert a hint to |
| the code generator to prefetch instructions for memory references. |
| * `!dir$ novector` disabling vectorization on the following loop. |
| * `!dir$ nounroll` disabling unrolling on the following loop. |
| * `!dir$ nounroll_and_jam` disabling unrolling and jamming on the following loop. |
| * `!dir$ inline` instructs the compiler to attempt to inline the called routines if the |
| directive is specified before a call statement or all call statements within the loop |
| body if specified before a DO LOOP or all function references if specified before an |
| assignment statement. |
| * `!dir$ forceinline` works in the same way as the `inline` directive, but it forces |
| inlining by the compiler on a function call statement. |
| * `!dir$ inlinealways <name>`. An alternative spelling to `forceinline`, providing compatibility |
| with older Fortran compilers, such as classic-flang. It can be specified at the callsite, or |
| in the function or subroutine you want to inline. `name` is optional and should only be used |
| when specifying the directive within a function, example: |
| ``` |
| function test |
| !DIR$ INLINEALWAYS test |
| ... |
| end function |
| ``` |
| * `!dir$ noinline` works in the same way as the `inline` directive, but prevents |
| any attempt of inlining by the compiler on a function call statement. |
| |
| # Directive Details |
| |
| ## Introduction |
| Directives are commonly used in Fortran programs to specify additional actions |
| to be performed by the compiler. The directives are always specified with the |
| `!dir$` or `cdir$` prefix. |
| |
| ## Loop Directives |
| |
| Some directives are associated with the following construct, for example loop |
| directives. Directives on loops are used to specify additional transformation to |
| be performed by the compiler like enabling vectorisation, unrolling, interchange |
| etc. |
| |
| Currently loop directives are not accepted in the presence of OpenMP or OpenACC |
| constructs on the loop. This should be implemented as it is used in some |
| applications. |
| |
| ### Array Expressions |
| It is to be decided whether loop directives should also be able to be associated |
| with array expressions. |
| |
| ## Semantics |
| Directives that are associated with constructs must appear in the same section |
| as the construct they are associated with, for example loop directives must |
| appear in the executable section as the loops appear there. To facilitate this |
| the parse tree is corrected to move such directives that appear in the |
| specification part into the execution part. |
| |
| When a directive that must be associated with a construct appears, a search |
| forward from that directive to the next non-directive construct is performed to |
| check that that construct matches the expected construct for the directive. |
| Skipping other intermediate directives allows multiple directives to appear on |
| the same construct. |
| |
| ## Lowering |
| Evaluation is extended with a new field called dirs for representing directives |
| associated with that Evaluation. When lowering loop directives, the associated |
| Do Loop's evaluation is found and the directive is added to it. This information |
| is used only during the lowering of the loop. |
| |
| ### Representation in LLVM |
| The `llvm.loop` metadata is used in LLVM to provide information to the optimizer |
| about the loop. For example, the `llvm.loop.vectorize.enable` metadata informs |
| the optimizer that a loop can be vectorized without considering its cost-model. |
| This attribute is added to the loop condition branch. |
| |
| ### Representation in MLIR |
| The MLIR LLVM dialect models this by an attribute called LoopAnnotation |
| Attribute. The attribute can be added to the latch of the loop in the cf |
| dialect and is then carried through lowering to the LLVM dialect. |
| |
| ## Testing |
| Since directives must maintain a flow from source to LLVM IR, an integration |
| test is provided that tests the `vector always` directive, as well as individual |
| lit tests for each of the parsing, semantics and lowering stages. |