| <!--===- docs/C++17.md |
| |
| Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. |
| See https://llvm.org/LICENSE.txt for license information. |
| SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception |
| |
| --> |
| |
| # C++14/17 features used in f18 |
| |
| ```{contents} |
| --- |
| local: |
| --- |
| ``` |
| |
| The C++ dialect used in this project constitutes a subset of the |
| standard C++ programming language and library features. |
| We want our dialect to be compatible with the LLVM C++ language |
| subset that will be in use at the time that we integrate with that |
| project. |
| We also want to maximize portability, future-proofing, |
| compile-time error checking, and use of best practices. |
| |
| To that end, we have a C++ style guide (q.v.) that lays |
| out the details of how our C++ code should look and gives |
| guidance about feature usage. |
| |
| We have chosen to use some features of the recent C++17 |
| language standard in f18. |
| The most important of these are: |
| * sum types (discriminated unions) in the form of `std::variant` |
| * `using` template parameter packs |
| * generic lambdas with `auto` argument types |
| * product types in the form of `std::tuple` |
| * `std::optional` |
| |
| (`std::tuple` is actually a C++11 feature, but I include it |
| in this list because it's not particularly well known.) |
| |
| ## Sum types |
| |
| First, some background information to explain the need for sum types |
| in f18. |
| |
| Fortran is notoriously problematic to lex and parse, as tokenization |
| depends on the state of the partial parse; |
| the language has no reserved words in the sense that C++ does. |
| Fortran parsers implemented with distinct lexing and parsing phases |
| (generated by hand or with tools) need to implement them as |
| coroutines with complicated state, and experience has shown that |
| it's hard to get them right and harder to extend them as the language |
| evolves. |
| |
| Alternatively, with the use of backtracking, one can parse Fortran with |
| a unified lexer/parser. |
| We have chosen to do so because it is simpler and should reduce |
| both initial bugs and long-term maintenance. |
| |
| Specifically, f18's parser uses the technique of recursive descent with |
| backtracking. |
| It is constructed as the incremental composition of pure parsing functions |
| that each, when given a context (location in the input stream plus some state), |
| either _succeeds_ or _fails_ to recognize some piece of Fortran. |
| On success, they return a new state and some semantic value, and this is |
| usually an instance of a C++ `struct` type that encodes the semantic |
| content of a production in the Fortran grammar. |
| |
| This technique allows us to specify both the Fortran grammar and the |
| representation of successfully parsed programs with C++ code |
| whose functions and data structures correspond closely to the productions |
| of Fortran. |
| |
| The specification of Fortran uses a form of BNF with alternatives, |
| optional elements, sequences, and lists. Each of these constructs |
| in the Fortran grammar maps directly in the f18 parser to both |
| the means of combining other parsers as alternatives, &c., and to |
| the declarations of the parse tree data structures that represent |
| the results of successful parses. |
| Move semantics are used in the parsing functions to acquire and |
| combine the results of sub-parses into the result of a larger |
| parse. |
| |
| To represent nodes in the Fortran parse tree, we need a means of |
| handling sum types for productions that have multiple alternatives. |
| The bounded polymorphism supplied by the C++17 `std::variant` fits |
| those needs exactly. |
| For example, production R502 in Fortran defines the top-level |
| program unit of Fortran as being a function, subroutine, module, &c. |
| The `struct ProgramUnit` in the f18 parse tree header file |
| represents each program unit with a member that is a `std::variant` |
| over the six possibilities. |
| Similarly, the parser for that type in the f18 grammar has six alternatives, |
| each of which constructs an instance of `ProgramUnit` upon the result of |
| parsing a `Module`, `FunctionSubprogram`, and so on. |
| |
| Code that performs semantic analysis on the result of a successful |
| parse is typically implemented with overloaded functions. |
| A function instantiated on `ProgramUnit` will use `std::visit` to |
| identify the right alternative and perform the right actions. |
| The call to `std::visit` must pass a visitor that can handle all |
| of the possibilities, and f18 will fail to build if one is missing. |
| |
| Were we unable to use `std::variant` directly, we would likely |
| have chosen to implement a local `SumType` replacement; in the |
| absence of C++17's abilities of `using` a template parameter pack |
| and allowing `auto` arguments in anonymous lambda functions, |
| it would be less convenient to use. |
| |
| The other options for polymorphism in C++ at the level of C++11 |
| would be to: |
| * loosen up compile-time type safety and use a unified parse tree node |
| representation with an enumeration type for an operator and generic |
| subtree pointers, or |
| * define the sum types for the parse tree as abstract base classes from |
| which each particular alternative would derive, and then use virtual |
| functions (or the forbidden `dynamic_cast`) to identify alternatives |
| during analysis |
| |
| ## Product types |
| |
| Many productions in the Fortran grammar describe a sequence of various |
| sub-parses. |
| For example, R504 defines the things that may appear in the "specification |
| part" of a subprogram in the order in which they are allowed: `USE` |
| statements, then `IMPORT` statements, and so on. |
| |
| The parse tree node that represents such a thing needs to incorporate |
| the representations of those parses, of course. |
| It turns out to be convenient to allow these data members to be anonymous |
| components of a `std::tuple` product type. |
| This type facilitates the automation of code that walks over all of the |
| members in a type-safe fashion and avoids the need to invent and remember |
| needless member names -- the components of a `std::tuple` instance can |
| be identified and accessed in terms of their types, and those tend to be |
| distinct. |
| |
| So we use `std::tuple` for such things. |
| It has also been handy for template metaprogramming that needs to work |
| with lists of types. |
| |
| ## `std::optional` |
| |
| This simple little type is used wherever a value might or might not be |
| present. |
| It is especially useful for function results and |
| rvalue reference arguments. |
| It corresponds directly to the optional elements in the productions |
| of the Fortran grammar. |
| It is also used as a wrapper around a parse tree node type to define the |
| results of the various parsing functions, where presence of a value |
| signifies a successful recognition and absence denotes a failed parse. |
| It is used in data structures in place of nullable pointers to |
| avoid indirection as well as the possible confusion over whether a pointer |
| is allowed to be null. |