Sourabh Singh Tomar | 932aae7 | 2020-09-10 23:04:37 +0530 | [diff] [blame] | 1 | <!--===- docs/C++17.md |
| 2 | |
| 3 | Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. |
| 4 | See https://llvm.org/LICENSE.txt for license information. |
| 5 | SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception |
| 6 | |
| 7 | --> |
| 8 | |
Richard Barton | 271a7bb | 2020-09-11 14:17:19 +0100 | [diff] [blame] | 9 | # C++14/17 features used in f18 |
| 10 | |
cor3ntin | b7ff032 | 2023-09-25 14:02:39 +0200 | [diff] [blame] | 11 | ```{contents} |
| 12 | --- |
| 13 | local: |
| 14 | --- |
Richard Barton | 271a7bb | 2020-09-11 14:17:19 +0100 | [diff] [blame] | 15 | ``` |
peter klausler | d1cc618 | 2018-11-26 12:42:11 -0800 | [diff] [blame] | 16 | |
peter klausler | 9c45b0d | 2019-02-27 16:00:37 -0800 | [diff] [blame] | 17 | The C++ dialect used in this project constitutes a subset of the |
peter klausler | d1cc618 | 2018-11-26 12:42:11 -0800 | [diff] [blame] | 18 | standard C++ programming language and library features. |
peter klausler | 9c45b0d | 2019-02-27 16:00:37 -0800 | [diff] [blame] | 19 | We want our dialect to be compatible with the LLVM C++ language |
| 20 | subset that will be in use at the time that we integrate with that |
peter klausler | d1cc618 | 2018-11-26 12:42:11 -0800 | [diff] [blame] | 21 | project. |
| 22 | We also want to maximize portability, future-proofing, |
| 23 | compile-time error checking, and use of best practices. |
| 24 | |
| 25 | To that end, we have a C++ style guide (q.v.) that lays |
| 26 | out the details of how our C++ code should look and gives |
| 27 | guidance about feature usage. |
| 28 | |
| 29 | We have chosen to use some features of the recent C++17 |
| 30 | language standard in f18. |
| 31 | The most important of these are: |
| 32 | * sum types (discriminated unions) in the form of `std::variant` |
peter klausler | 9c45b0d | 2019-02-27 16:00:37 -0800 | [diff] [blame] | 33 | * `using` template parameter packs |
| 34 | * generic lambdas with `auto` argument types |
peter klausler | d1cc618 | 2018-11-26 12:42:11 -0800 | [diff] [blame] | 35 | * product types in the form of `std::tuple` |
| 36 | * `std::optional` |
| 37 | |
peter klausler | 9c45b0d | 2019-02-27 16:00:37 -0800 | [diff] [blame] | 38 | (`std::tuple` is actually a C++11 feature, but I include it |
| 39 | in this list because it's not particularly well known.) |
| 40 | |
Richard Barton | 271a7bb | 2020-09-11 14:17:19 +0100 | [diff] [blame] | 41 | ## Sum types |
peter klausler | d1cc618 | 2018-11-26 12:42:11 -0800 | [diff] [blame] | 42 | |
peter klausler | 9c45b0d | 2019-02-27 16:00:37 -0800 | [diff] [blame] | 43 | First, some background information to explain the need for sum types |
| 44 | in f18. |
| 45 | |
peter klausler | d1cc618 | 2018-11-26 12:42:11 -0800 | [diff] [blame] | 46 | Fortran is notoriously problematic to lex and parse, as tokenization |
| 47 | depends on the state of the partial parse; |
| 48 | the language has no reserved words in the sense that C++ does. |
| 49 | Fortran parsers implemented with distinct lexing and parsing phases |
| 50 | (generated by hand or with tools) need to implement them as |
| 51 | coroutines with complicated state, and experience has shown that |
peter klausler | 6dd3b8b | 2018-11-26 12:46:11 -0800 | [diff] [blame] | 52 | it's hard to get them right and harder to extend them as the language |
| 53 | evolves. |
peter klausler | d1cc618 | 2018-11-26 12:42:11 -0800 | [diff] [blame] | 54 | |
| 55 | Alternatively, with the use of backtracking, one can parse Fortran with |
| 56 | a unified lexer/parser. |
| 57 | We have chosen to do so because it is simpler and should reduce |
| 58 | both initial bugs and long-term maintenance. |
| 59 | |
| 60 | Specifically, f18's parser uses the technique of recursive descent with |
| 61 | backtracking. |
| 62 | It is constructed as the incremental composition of pure parsing functions |
| 63 | that each, when given a context (location in the input stream plus some state), |
peter klausler | 6dd3b8b | 2018-11-26 12:46:11 -0800 | [diff] [blame] | 64 | either _succeeds_ or _fails_ to recognize some piece of Fortran. |
peter klausler | d1cc618 | 2018-11-26 12:42:11 -0800 | [diff] [blame] | 65 | On success, they return a new state and some semantic value, and this is |
| 66 | usually an instance of a C++ `struct` type that encodes the semantic |
| 67 | content of a production in the Fortran grammar. |
| 68 | |
| 69 | This technique allows us to specify both the Fortran grammar and the |
| 70 | representation of successfully parsed programs with C++ code |
| 71 | whose functions and data structures correspond closely to the productions |
| 72 | of Fortran. |
| 73 | |
| 74 | The specification of Fortran uses a form of BNF with alternatives, |
| 75 | optional elements, sequences, and lists. Each of these constructs |
| 76 | in the Fortran grammar maps directly in the f18 parser to both |
| 77 | the means of combining other parsers as alternatives, &c., and to |
| 78 | the declarations of the parse tree data structures that represent |
| 79 | the results of successful parses. |
| 80 | Move semantics are used in the parsing functions to acquire and |
| 81 | combine the results of sub-parses into the result of a larger |
| 82 | parse. |
| 83 | |
| 84 | To represent nodes in the Fortran parse tree, we need a means of |
| 85 | handling sum types for productions that have multiple alternatives. |
| 86 | The bounded polymorphism supplied by the C++17 `std::variant` fits |
| 87 | those needs exactly. |
| 88 | For example, production R502 in Fortran defines the top-level |
| 89 | program unit of Fortran as being a function, subroutine, module, &c. |
| 90 | The `struct ProgramUnit` in the f18 parse tree header file |
| 91 | represents each program unit with a member that is a `std::variant` |
| 92 | over the six possibilities. |
| 93 | Similarly, the parser for that type in the f18 grammar has six alternatives, |
| 94 | each of which constructs an instance of `ProgramUnit` upon the result of |
| 95 | parsing a `Module`, `FunctionSubprogram`, and so on. |
| 96 | |
| 97 | Code that performs semantic analysis on the result of a successful |
| 98 | parse is typically implemented with overloaded functions. |
| 99 | A function instantiated on `ProgramUnit` will use `std::visit` to |
| 100 | identify the right alternative and perform the right actions. |
| 101 | The call to `std::visit` must pass a visitor that can handle all |
| 102 | of the possibilities, and f18 will fail to build if one is missing. |
| 103 | |
| 104 | Were we unable to use `std::variant` directly, we would likely |
| 105 | have chosen to implement a local `SumType` replacement; in the |
peter klausler | 9c45b0d | 2019-02-27 16:00:37 -0800 | [diff] [blame] | 106 | absence of C++17's abilities of `using` a template parameter pack |
| 107 | and allowing `auto` arguments in anonymous lambda functions, |
peter klausler | d1cc618 | 2018-11-26 12:42:11 -0800 | [diff] [blame] | 108 | it would be less convenient to use. |
| 109 | |
| 110 | The other options for polymorphism in C++ at the level of C++11 |
| 111 | would be to: |
| 112 | * loosen up compile-time type safety and use a unified parse tree node |
| 113 | representation with an enumeration type for an operator and generic |
| 114 | subtree pointers, or |
| 115 | * define the sum types for the parse tree as abstract base classes from |
| 116 | which each particular alternative would derive, and then use virtual |
| 117 | functions (or the forbidden `dynamic_cast`) to identify alternatives |
| 118 | during analysis |
| 119 | |
Richard Barton | 271a7bb | 2020-09-11 14:17:19 +0100 | [diff] [blame] | 120 | ## Product types |
peter klausler | d1cc618 | 2018-11-26 12:42:11 -0800 | [diff] [blame] | 121 | |
| 122 | Many productions in the Fortran grammar describe a sequence of various |
| 123 | sub-parses. |
| 124 | For example, R504 defines the things that may appear in the "specification |
| 125 | part" of a subprogram in the order in which they are allowed: `USE` |
| 126 | statements, then `IMPORT` statements, and so on. |
| 127 | |
| 128 | The parse tree node that represents such a thing needs to incorporate |
| 129 | the representations of those parses, of course. |
| 130 | It turns out to be convenient to allow these data members to be anonymous |
| 131 | components of a `std::tuple` product type. |
| 132 | This type facilitates the automation of code that walks over all of the |
| 133 | members in a type-safe fashion and avoids the need to invent and remember |
| 134 | needless member names -- the components of a `std::tuple` instance can |
| 135 | be identified and accessed in terms of their types, and those tend to be |
| 136 | distinct. |
| 137 | |
| 138 | So we use `std::tuple` for such things. |
| 139 | It has also been handy for template metaprogramming that needs to work |
| 140 | with lists of types. |
| 141 | |
Richard Barton | 271a7bb | 2020-09-11 14:17:19 +0100 | [diff] [blame] | 142 | ## `std::optional` |
peter klausler | d1cc618 | 2018-11-26 12:42:11 -0800 | [diff] [blame] | 143 | |
| 144 | This simple little type is used wherever a value might or might not be |
| 145 | present. |
peter klausler | 1b1f60f | 2018-12-05 13:03:39 -0800 | [diff] [blame] | 146 | It is especially useful for function results and |
| 147 | rvalue reference arguments. |
peter klausler | d1cc618 | 2018-11-26 12:42:11 -0800 | [diff] [blame] | 148 | It corresponds directly to the optional elements in the productions |
| 149 | of the Fortran grammar. |
| 150 | It is also used as a wrapper around a parse tree node type to define the |
| 151 | results of the various parsing functions, where presence of a value |
| 152 | signifies a successful recognition and absence denotes a failed parse. |
| 153 | It is used in data structures in place of nullable pointers to |
| 154 | avoid indirection as well as the possible confusion over whether a pointer |
| 155 | is allowed to be null. |