| <!--===- docs/ArrayComposition.md |
| |
| Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. |
| See https://llvm.org/LICENSE.txt for license information. |
| SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception |
| |
| --> |
| |
| # Array Composition |
| |
| ```{contents} |
| --- |
| local: |
| --- |
| ``` |
| |
| This note attempts to describe the motivation for and design of an |
| implementation of Fortran 90 (and later) array expression evaluation that |
| minimizes the use of dynamically allocated temporary storage for |
| the results of calls to transformational intrinsic functions, and |
| making them more amenable to acceleration. |
| |
| The transformational intrinsic functions of Fortran of interest to |
| us here include: |
| |
| * Reductions to scalars (`SUM(X)`, also `ALL`, `ANY`, `COUNT`, |
| `DOT_PRODUCT`, |
| `IALL`, `IANY`, `IPARITY`, `MAXVAL`, `MINVAL`, `PARITY`, `PRODUCT`) |
| * Axial reductions (`SUM(X,DIM=)`, &c.) |
| * Location reductions to indices (`MAXLOC`, `MINLOC`, `FINDLOC`) |
| * Axial location reductions (`MAXLOC(DIM=`, &c.) |
| * `TRANSPOSE(M)` matrix transposition |
| * `RESHAPE` without `ORDER=` |
| * `RESHAPE` with `ORDER=` |
| * `CSHIFT` and `EOSHIFT` with scalar `SHIFT=` |
| * `CSHIFT` and `EOSHIFT` with array-valued `SHIFT=` |
| * `PACK` and `UNPACK` |
| * `MATMUL` |
| * `SPREAD` |
| |
| Other Fortran intrinsic functions are technically transformational (e.g., |
| `COMMAND_ARGUMENT_COUNT`) but not of interest for this note. |
| The generic `REDUCE` is also not considered here. |
| |
| ## Arrays as functions |
| |
| A whole array can be viewed as a function that maps its indices to the values |
| of its elements. |
| Specifically, it is a map from a tuple of integers to its element type. |
| The rank of the array is the number of elements in that tuple, |
| and the shape of the array delimits the domain of the map. |
| |
| `REAL :: A(N,M)` can be seen as a function mapping ordered pairs of integers |
| `(J,K)` with `1<=J<=N` and `1<=J<=M` to real values. |
| |
| ## Array expressions as functions |
| |
| The same perspective can be taken of an array expression comprising |
| intrinsic operators and elemental functions. |
| Fortran doesn't allow one to apply subscripts directly to an expression, |
| but expressions have rank and shape, and one can view array expressions |
| as functions over index tuples by applying those indices to the arrays |
| and subexpressions in the expression. |
| |
| Consider `B = A + 1.0` (assuming `REAL :: A(N,M), B(N,M)`). |
| The right-hand side of that assignment could be evaluated into a |
| temporary array `T` and then subscripted as it is copied into `B`. |
| ``` |
| REAL, ALLOCATABLE :: T(:,:) |
| ALLOCATE(T(N,M)) |
| DO CONCURRENT(J=1:N,K=1:M) |
| T(J,K)=A(J,K) + 1.0 |
| END DO |
| DO CONCURRENT(J=1:N,K=1:M) |
| B(J,K)=T(J,K) |
| END DO |
| DEALLOCATE(T) |
| ``` |
| But we can avoid the allocation, population, and deallocation of |
| the temporary by treating the right-hand side expression as if it |
| were a statement function `F(J,K)=A(J,K)+1.0` and evaluating |
| ``` |
| DO CONCURRENT(J=1:N,K=1:M) |
| A(J,K)=F(J,K) |
| END DO |
| ``` |
| |
| In general, when a Fortran array assignment to a non-allocatable array |
| does not include the left-hand |
| side variable as an operand of the right-hand side expression, and any |
| function calls on the right-hand side are elemental or scalar-valued, |
| we can avoid the use of a temporary. |
| |
| ## Transformational intrinsic functions as function composition |
| |
| Many of the transformational intrinsic functions listed above |
| can, when their array arguments are viewed as functions over their |
| index tuples, be seen as compositions of those functions with |
| functions of the "incoming" indices -- yielding a function for |
| an entire right-hand side of an array assignment statement. |
| |
| For example, the application of `TRANSPOSE(A + 1.0)` to the index |
| tuple `(J,K)` becomes `A(K,J) + 1.0`. |
| |
| Partial (axial) reductions can be similarly composed. |
| The application of `SUM(A,DIM=2)` to the index `J` is the |
| complete reduction `SUM(A(J,:))`. |
| |
| More completely: |
| * Reductions to scalars (`SUM(X)` without `DIM=`) become |
| runtime calls; the result needs no dynamic allocation, |
| being a scalar. |
| * Axial reductions (`SUM(X,DIM=d)`) applied to indices `(J,K)` |
| become scalar values like `SUM(X(J,K,:))` if `d=3`. |
| * Location reductions to indices (`MAXLOC(X)` without `DIM=`) |
| do not require dynamic allocation, since their results are |
| either scalar or small vectors of length `RANK(X)`. |
| * Axial location reductions (`MAXLOC(X,DIM=)`, &c.) |
| are handled like other axial reductions like `SUM(DIM=)`. |
| * `TRANSPOSE(M)` exchanges the two components of the index tuple. |
| * `RESHAPE(A,SHAPE=s)` without `ORDER=` must precompute the shape |
| vector `S`, and then use it to linearize indices into offsets |
| in the storage order of `A` (whose shape must also be captured). |
| These conversions can involve division and/or modulus, which |
| can be optimized into a fixed-point multiplication using the |
| usual technique. |
| * `RESHAPE` with `ORDER=` is similar, but must permute the |
| components of the index tuple; it generalizes `TRANSPOSE`. |
| * `CSHIFT` applies addition and modulus. |
| * `EOSHIFT` applies addition and a conditional move (`MERGE`). |
| * `PACK` and `UNPACK` are likely to require a runtime call. |
| * `MATMUL(A,B)` can become `DOT_PRODUCT(A(J,:),B(:,K))`, but |
| might benefit from calling a highly optimized runtime |
| routine. |
| * `SPREAD(A,DIM=d,NCOPIES=n)` for compile-time `d` simply |
| applies `A` to a reduced index tuple. |
| |
| ## Determination of rank and shape |
| |
| An important part of evaluating array expressions without the use of |
| temporary storage is determining the shape of the result prior to, |
| or without, evaluating the elements of the result. |
| |
| The shapes of array objects, results of elemental intrinsic functions, |
| and results of intrinsic operations are obvious. |
| But it is possible to determine the shapes of the results of many |
| transformational intrinsic function calls as well. |
| |
| * `SHAPE(SUM(X,DIM=d))` is `SHAPE(X)` with one element removed: |
| `PACK(SHAPE(X),[(j,j=1,RANK(X))]/=d)` in general. |
| (The `DIM=` argument is commonly a compile-time constant.) |
| * `SHAPE(MAXLOC(X))` is `[RANK(X)]`. |
| * `SHAPE(MAXLOC(X,DIM=d))` is `SHAPE(X)` with one element removed. |
| * `SHAPE(TRANSPOSE(M))` is a reversal of `SHAPE(M)`. |
| * `SHAPE(RESHAPE(..., SHAPE=S))` is `S`. |
| * `SHAPE(CSHIFT(X))` is `SHAPE(X)`; same with `EOSHIFT`. |
| * `SHAPE(PACK(A,VECTOR=V))` is `SHAPE(V)` |
| * `SHAPE(PACK(A,MASK=m))` with non-scalar `m` and without `VECTOR=` is `[COUNT(m)]`. |
| * `RANK(PACK(...))` is always 1. |
| * `SHAPE(UNPACK(MASK=M))` is `SHAPE(M)`. |
| * `SHAPE(MATMUL(A,B))` drops one value from `SHAPE(A)` and another from `SHAPE(B)`. |
| * `SHAPE(SHAPE(X))` is `[RANK(X)]`. |
| * `SHAPE(SPREAD(A,DIM=d,NCOPIES=n))` is `SHAPE(A)` with `n` inserted at |
| dimension `d`. |
| |
| This is useful because expression evaluations that *do* require temporaries |
| to hold their results (due to the context in which the evaluation occurs) |
| can be implemented with a separation of the allocation |
| of the temporary array and the population of the array. |
| The code that evaluates the expression, or that implements a transformational |
| intrinsic in the runtime library, can be designed with an API that includes |
| a pointer to the destination array as an argument. |
| |
| Statements like `ALLOCATE(A,SOURCE=expression)` should thus be capable |
| of evaluating their array expressions directly into the newly-allocated |
| storage for the allocatable array. |
| The implementation would generate code to calculate the shape, use it |
| to allocate the memory and populate the descriptor, and then drive a |
| loop nest around the expression to populate the array. |
| In cases where the analyzed shape is known at compile time, we should |
| be able to have the opportunity to avoid heap allocation in favor of |
| stack storage, if the scope of the variable is local. |
| |
| ## Automatic reallocation of allocatables |
| |
| Fortran 2003 introduced the ability to assign non-conforming array expressions |
| to ALLOCATABLE arrays with the implied semantics of reallocation to the |
| new shape. |
| The implementation of this feature also becomes more straightforward if |
| our implementation of array expressions has decoupled calculation of shapes |
| from the evaluation of the elements of the result. |
| |
| ## Rewriting rules |
| |
| Let `{...}` denote an ordered tuple of 1-based indices, e.g. `{j,k}`, into |
| the result of an array expression or subexpression. |
| |
| * Array constructors always yield vectors; higher-rank arrays that appear as |
| constituents are flattened; so `[X] => RESHAPE(X,SHAPE=[SIZE(X)})`. |
| * Array constructors with multiple constituents are concatenations of |
| their constituents; so `[X,Y]{j} => MERGE(Y{j-SIZE(X)},X{j},J>SIZE(X))`. |
| * Array constructors with implied DO loops are difficult when nested |
| triangularly. |
| * Whole array references can have lower bounds other than 1, so |
| `A => A(LBOUND(A,1):UBOUND(A,1),...)`. |
| * Array sections simply apply indices: `A(i:...:n){j} => A(i1+n*(j-1))`. |
| * Vector-valued subscripts apply indices to the subscript: `A(N(:)){j} => A(N(:){j})`. |
| * Scalar operands ignore indices: `X{j,k} => X`. |
| Further, they are evaluated at most once. |
| * Elemental operators and functions apply indices to their arguments: |
| `(A(:,:) + B(:,:)){j,k}` => A(:,:){j,k} + B(:,:){j,k}`. |
| * `TRANSPOSE(X){j,k} => X{k,j}`. |
| * `SPREAD(X,DIM=2,...){j,k} => X{j}`; i.e., the contents are replicated. |
| If X is sufficiently expensive to compute elementally, it might be evaluated |
| into a temporary. |
| |
| (more...) |