| <!--===- docs/F202X.md |
| |
| Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. |
| See https://llvm.org/LICENSE.txt for license information. |
| SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception |
| |
| --> |
| |
| # A first take on Fortran 202X features for LLVM Flang |
| |
| I (Peter Klausler) have been studying the draft PDF of the |
| [Fortran 202X standard](https://j3-fortran.org/doc/year/23/23-007r1.pdf), |
| which will soon be published as ISO Fortran 2023. |
| I have compiled this summary of its changes relative to |
| the current Fortran 2018 standard from the perspective |
| of a [Fortran compiler](https://github.com/llvm/llvm-project/tree/main/flang) |
| implementor. |
| |
| ## TL;DR |
| |
| Fortran 202X doesn't make very many changes to the language |
| relative to Fortran 2018, which was itself a small increment |
| over Fortran 2008. |
| Apart from `REDUCE` clauses that were added to the |
| [still broken](https://github.com/llvm/llvm-project/blob/main/flang/docs/DoConcurrent.md) |
| `DO CONCURRENT` construct, there's little here for Fortran users |
| to get excited about. |
| |
| ## Priority of implementation in LLVM Flang |
| |
| We are working hard to ensure that existing working applications will |
| port successfully to LLVM Flang with minimal effort. |
| I am not particularly concerned with conforming to a new |
| standard as an end in itself. |
| |
| The only features below that appear to have already been implemented |
| in other compilers are the `REDUCE` clauses and the degree trigonometric |
| intrinsic functions, so those should have priority as an aid to |
| portability. |
| We would want to support them earlier even if they were not in a standard. |
| |
| The `REDUCE` clause also merits early implementation due to |
| its potential for performance improvements in real codes. |
| I don't see any other feature here that would be relevant to |
| performance (maybe a weak argument could be made for `SIMPLE`). |
| The bulk of this revision unfortunately comprises changes to Fortran that |
| are neither performance-related, already available in |
| some compilers, nor (obviously) in use in existing codes. |
| I will not prioritize implementing them myself over |
| other work until they become portability concerns or are |
| requested by actual users. |
| |
| Given Fortran's history of the latency between new |
| standards and the support for their features in real compilers, |
| and then the extra lag before the features are then actually used |
| in codes meant to be portable, I doubt that many of the items |
| below will have to be worked on any time soon due to user demand. |
| |
| If J3 had chosen to add more features that were material improvements |
| to Fortran -- and there's quite a long list of worthy candidates that |
| were passed over, like read-only pointers -- it would have made sense |
| for me to prioritize their implementation in LLVM Flang more |
| urgently. |
| |
| ## Specific change descriptions |
| |
| The individual features added to the language are summarized |
| in what I see as their order of significance to Fortran users. |
| |
| ### Alert: There's a breaking change! |
| |
| The Fortran committee used to abhor making breaking changes, |
| apart from fixes, so that conforming codes could be portable across |
| time as well as across compilers. |
| Fortran 202X, however, uncharacteristically perpetrates one such |
| change to existing semantics that will silently cause existing |
| codes to work differently, if that change were to be implemented |
| and enabled by default. |
| |
| Specifically, automatic reallocation of whole deferred-length character |
| allocatable scalars is now mandated when they appear for internal output |
| (e.g., `WRITE(A,*) ...`) |
| or as output arguments for some statements and intrinsic procedures |
| (e.g., `IOMSG=`, `ERRMSG=`). |
| So existing codes that allocate output buffers |
| for such things will, or would, now observe that their buffers are |
| silently changing their lengths during execution, rather than being |
| padded with blanks or being truncated. For example: |
| |
| ``` |
| character(:), allocatable :: buffer |
| allocate(character(20)::buffer) |
| write(buffer,'F5.3') 3.14159 |
| print *, len(buffer) |
| ``` |
| |
| prints 20 with Fortran 2018 but would print 5 with Fortran 202X. |
| |
| There would have no problem with the new standard changing the |
| behavior in the current error case of an unallocated variable; |
| defining new semantics for old errors is a generally safe means |
| for extending a programming language. |
| However, in this case, we'll need to protect existing conforming |
| codes from the surprising new reallocation semantics, which |
| affect cases that are not errors. |
| |
| When/if there are requests from real users to implement this breaking |
| change, and if it is implemented, I'll have to ensure that users |
| have the ability to control this change in behavior via an option &/or the |
| runtime environment, and when it's enabled, emit a warning at code |
| sites that are at risk. |
| This warning should mention a source change they can make to protect |
| themselves from this change by passing the complete substring (`A(:)`) |
| instead of a whole character allocatable. |
| |
| This feature reminds me of Fortran 2003's change to whole |
| allocatable array assignment, although in that case users were |
| put at risk only of extra runtime overhead that was needless in |
| existing codes, not a change in behavior, and users learned to |
| assign to whole array sections (`A(:)=...`) rather than to whole |
| allocatable arrays where the performance hit mattered. |
| |
| ### Major Items |
| |
| The features in this section are expensive to implement in |
| terms of engineering time to design, code, refactor, and test |
| (i.e., weeks or months, not days). |
| |
| #### `DO CONCURRENT REDUCE` |
| |
| J3 continues to ignore the |
| [serious semantic problems](https://github.com/llvm/llvm-project/blob/main/flang/docs/DoConcurrent.md) |
| with `DO CONCURRENT`, despite the simplicity of the necessary fix and their |
| admirable willingness to repair the standard to fix problems with |
| other features (e.g., plugging holes in `PURE` procedure requirements) |
| and their less admirable willingness to make breaking changes (see above). |
| They did add `REDUCE` clauses to `DO CONCURRENT`, and those seem to be |
| immediately useful to HPC codes and worth implementing soon. |
| |
| #### `SIMPLE` procedures |
| |
| The new `SIMPLE` procedures constitute a subset of F'95/HPF's `PURE` |
| procedures. |
| There are things that one can do in a `PURE` procedure |
| but cannot in a `SIMPLE` one. But the virtue of being `SIMPLE` seems |
| to be its own reward, not a requirement to access any other |
| feature. |
| |
| `SIMPLE` procedures might have been more useful had `DO CONCURRENT` been |
| changed to require callees to be `SIMPLE`, not just `PURE`. |
| |
| The implementation of `SIMPLE` will be nontrivial: it involves |
| some parsing and symbol table work, and some generalization of the |
| predicate function `IsPureProcedure()`, extending the semantic checking on |
| calls in `PURE` procedures to ensure that `SIMPLE` procedures |
| only call other `SIMPLE` procedures, and modifying the intrinsic |
| procedure table to note that most intrinsics are now `SIMPLE` |
| rather than just `PURE`. |
| |
| I don't expect any codes to rush to change their `PURE` procedures |
| to be `SIMPLE`, since it buys little and reduces portability. |
| This makes `SIMPLE` a lower-priority feature. |
| |
| #### Conditional expressions and actual arguments |
| |
| Next on the list of "big ticket" items are C-style conditional |
| expressions. These come in two forms, each of which is a distinct |
| feature that would be nontrivial to implement, and I would not be |
| surprised to see some compilers implement one before the other. |
| |
| The first form is a new parenthesized expression primary that any C programmer |
| would recognize. It has straightforward parsing and semantics, |
| but will require support in folding and all other code that |
| processes expressions. Lowering will be nontrivial due to |
| control flow. |
| |
| The second form is a conditional actual argument syntax |
| that allows runtime selection of argument associations, as well |
| as a `.NIL.` syntax for optional arguments to signify an absent actual |
| argument. This would have been more useful if it had also been |
| allowed as a pointer assignment statement right-hand side, and |
| that might be a worthwhile extension. As this form is essentially |
| a conditional variable reference it may be cleaner to have a |
| distinct representation from the conditional expression primary |
| in the parse tree and strongly-typed `Expr<T>` representations. |
| |
| #### `ENUMERATION TYPE` |
| |
| Fortran 202X has a new category of type. The new non-interoperable |
| `ENUMERATION TYPE` feature is like C++'s `enum class` -- not, unfortunately, |
| a powerful sum data type as in Haskell or Rust. Unlike the |
| current `ENUM, BIND(C)` feature, `ENUMERATION TYPE` defines a new |
| type name and its distinct values. |
| |
| This feature may well be the item requiring the largest patch to |
| the compiler for its implementation, as it affects parsing, |
| type checking on assignment and argument association, generic |
| resolution, formatted I/O, NAMELIST, debugging symbols, &c. |
| It will indirectly affect every switch statement in the compiler |
| that switches over the six (now seven) type categories. |
| This will be a big project for little useful return to users. |
| |
| #### `TYPEOF` and `CLASSOF` |
| |
| Last on the list of "big ticket" items are the new TYPEOF and CLASSOF |
| type specifiers, which allow declarations to indirectly use the |
| types of previously-defined entities. These would have obvious utility |
| in a language with type polymorphism but aren't going to be very |
| useful yet in Fortran 202X (esp. `TYPEOF`), although they would be worth |
| supporting as a utility feature for a parametric module extension. |
| |
| `CLASSOF` has implications for semantics and lowering that need to |
| be thought through as it seems to provide a means of |
| declaring polymorphic local variables and function results that are |
| neither allocatables nor pointers. |
| |
| #### Coarray extensions: |
| |
| * `NOTIFY_TYPE`, `NOTIFY WAIT` statement, `NOTIFY=` specifier on image selector |
| * Arrays with coarray components |
| |
| #### "Rank Independent" Features |
| |
| The `RANK(n)` attribute declaration syntax is equivalent to |
| `DIMENSION(:,:,...,:)` or an equivalent entity-decl containing `n` colons. |
| As `n` must be a constant expression, that's straightforward to implement, |
| though not terribly useful until the language acquires additional features. |
| (I can see some utility in being able to declare PDT components with a |
| `RANK` that depends on a `KIND` type parameter.) |
| |
| It is now possible to declare the lower and upper bounds of an explicit |
| shape entity using a constant-length vector specification expression |
| in a declaration, `ALLOCATE` statement, or pointer assignment with |
| bounds remapping. |
| For example, `real A([2,3])` is equivalent to `real A(2,3)`. |
| |
| The new `A(@V)` "multiple subscript" indexing syntax uses an integer |
| vector to supply a list of subscripts or of triplet bounds/strides. This one |
| has tough edge cases for lowering that need to be thought through; |
| for example, when the lengths of two or more of the vectors in |
| `A(@U,@V,@W)` are not known at compilation time, implementing the indexing |
| would be tricky in generated code and might just end up filling a |
| temporary with `[U,V,W]` first. |
| |
| The obvious use case for "multiple subscripts" would be as a means to |
| index into an assumed-rank dummy argument without the bother of a `SELECT RANK` |
| construct, but that usage is not supported in Fortran 202X. |
| |
| This feature may well turn out to be Fortran 202X's analog to Fortran 2003's |
| `LEN` derived type parameters. |
| |
| ### Minor Items |
| |
| So much for the major features of Fortran 202X. The longer list |
| of minor features can be more briefly summarized. |
| |
| #### New Edit Descriptors |
| |
| Fortran 202X has some noncontroversial small tweaks to formatted output. |
| The `AT` edit descriptor automatically trims character output. The `LZP`, |
| `LZS`, and `LZ` control edit descriptors and `LEADING_ZERO=` specifier provide a |
| means for controlling the output of leading zero digits. |
| |
| #### Intrinsic Module Extensions |
| |
| Addressing some issues and omissions in intrinsic modules: |
| |
| * LOGICAL8/16/32/64 and REAL16 |
| * IEEE module facilities upgraded to match latest IEEE FP standard |
| * C_F_STRPOINTER, F_C_STRING for NUL-terminated strings |
| * C_F_POINTER(LOWER=) |
| |
| #### Intrinsic Procedure Extensions |
| |
| The `SYSTEM_CLOCK` intrinsic function got some semantic tweaks. |
| |
| There are new intrinsic functions for trigonometric functions in |
| units of degrees and half-circles. |
| GNU Fortran already supports the forms that use degree units. |
| These should call into math library implementations that are |
| specialized for those units rather than simply multiplying |
| arguments or results with conversion factors. |
| * `ACOSD`, `ASIND`, `ATAND`, `ATAN2D`, `COSD`, `SIND`, `TAND` |
| * `ACOSPI`, `ASINPI`, `ATANPI`, `ATAN2PI`, `COSPI`, `SINPI`, `TANPI` |
| |
| `SELECTED_LOGICAL_KIND` maps a bit size to a kind of `LOGICAL` |
| |
| There are two new character utility intrinsic |
| functions whose implementations have very low priority: `SPLIT` and `TOKENIZE`. |
| `TOKENIZE` requires memory allocation to return its results, |
| and could and should have been implemented once in some Fortran utility |
| library for those who need a slow tokenization facility rather than |
| requiring implementations in each vendor's runtime support library with |
| all the extra cost and compatibilty risk that entails. |
| |
| `SPLIT` is worse -- not only could it, like `TOKENIZE`, |
| have been supplied by a Fortran utility library rather than being |
| added to the standard, it's redundant; |
| it provides nothing that cannot be already accomplished by |
| composing today's `SCAN` intrinsic function with substring indexing: |
| |
| ``` |
| module m |
| interface split |
| module procedure :: split |
| end interface |
| !instantiate for all possible ck/ik/lk combinations |
| integer, parameter :: ck = kind(''), ik = kind(0), lk = kind(.true.) |
| contains |
| simple elemental subroutine split(string, set, pos, back) |
| character(*, kind=ck), intent(in) :: string, set |
| integer(kind=ik), intent(in out) :: pos |
| logical(kind=lk), intent(in), optional :: back |
| if (present(back)) then |
| if (back) then |
| pos = scan(string(:pos-1), set, .true.) |
| return |
| end if |
| end if |
| npos = scan(string(pos+1:), set) |
| pos = merge(pos + npos, len(string) + 1, npos /= 0) |
| end |
| end |
| ``` |
| |
| (The code above isn't a proposed implementation for `SPLIT`, just a |
| demonstration of how programs could use `SCAN` to accomplish the same |
| results today.) |
| |
| ## Source limitations |
| |
| Fortran 202X raises the maximum number of characters per free form |
| source line and the maximum total number of characters per statement. |
| Both of these have always been unlimited in this compiler (or |
| limited only by available memory, to be more accurate.) |
| |
| ## More BOZ usage opportunities |
| |
| BOZ literal constants (binary, octal, and hexadecimal constants, |
| also known as "typeless" values) have more conforming usage in the |
| new standard in contexts where the type is unambiguously known. |
| They may now appear as initializers, as right-hand sides of intrinsic |
| assignments to integer and real variables, in explicitly typed |
| array constructors, and in the definitions of enumerations. |
| |
| ## Citation updates |
| |
| The source base contains hundreds of references to the subclauses, |
| requirements, and constraints of the Fortran 2018 standard, mostly in code comments. |
| These will need to be mapped to their Fortran 202X counterparts once the |
| new standard is published, as the Fortran committee does not provide a |
| means for citing these items by names that are fixed over time like the |
| C++ committee does. |
| If we had access to the LaTeX sources of the standard, we could generate |
| a mapping table and automate this update. |