blob: b20364b03e060a4816f10d3c8dbcbf6415c35107 [file] [log] [blame] [view]
Sourabh Singh Tomar932aae72020-09-10 23:04:37 +05301<!--===- docs/C++17.md
2
3 Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
4 See https://llvm.org/LICENSE.txt for license information.
5 SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
6
7-->
8
Richard Barton271a7bb2020-09-11 14:17:19 +01009# C++14/17 features used in f18
10
cor3ntinb7ff0322023-09-25 14:02:39 +020011```{contents}
12---
13local:
14---
Richard Barton271a7bb2020-09-11 14:17:19 +010015```
peter klauslerd1cc6182018-11-26 12:42:11 -080016
peter klausler9c45b0d2019-02-27 16:00:37 -080017The C++ dialect used in this project constitutes a subset of the
peter klauslerd1cc6182018-11-26 12:42:11 -080018standard C++ programming language and library features.
peter klausler9c45b0d2019-02-27 16:00:37 -080019We want our dialect to be compatible with the LLVM C++ language
20subset that will be in use at the time that we integrate with that
peter klauslerd1cc6182018-11-26 12:42:11 -080021project.
22We also want to maximize portability, future-proofing,
23compile-time error checking, and use of best practices.
24
25To that end, we have a C++ style guide (q.v.) that lays
26out the details of how our C++ code should look and gives
27guidance about feature usage.
28
29We have chosen to use some features of the recent C++17
30language standard in f18.
31The most important of these are:
32* sum types (discriminated unions) in the form of `std::variant`
peter klausler9c45b0d2019-02-27 16:00:37 -080033* `using` template parameter packs
34* generic lambdas with `auto` argument types
peter klauslerd1cc6182018-11-26 12:42:11 -080035* product types in the form of `std::tuple`
36* `std::optional`
37
peter klausler9c45b0d2019-02-27 16:00:37 -080038(`std::tuple` is actually a C++11 feature, but I include it
39in this list because it's not particularly well known.)
40
Richard Barton271a7bb2020-09-11 14:17:19 +010041## Sum types
peter klauslerd1cc6182018-11-26 12:42:11 -080042
peter klausler9c45b0d2019-02-27 16:00:37 -080043First, some background information to explain the need for sum types
44in f18.
45
peter klauslerd1cc6182018-11-26 12:42:11 -080046Fortran is notoriously problematic to lex and parse, as tokenization
47depends on the state of the partial parse;
48the language has no reserved words in the sense that C++ does.
49Fortran parsers implemented with distinct lexing and parsing phases
50(generated by hand or with tools) need to implement them as
51coroutines with complicated state, and experience has shown that
peter klausler6dd3b8b2018-11-26 12:46:11 -080052it's hard to get them right and harder to extend them as the language
53evolves.
peter klauslerd1cc6182018-11-26 12:42:11 -080054
55Alternatively, with the use of backtracking, one can parse Fortran with
56a unified lexer/parser.
57We have chosen to do so because it is simpler and should reduce
58both initial bugs and long-term maintenance.
59
60Specifically, f18's parser uses the technique of recursive descent with
61backtracking.
62It is constructed as the incremental composition of pure parsing functions
63that each, when given a context (location in the input stream plus some state),
peter klausler6dd3b8b2018-11-26 12:46:11 -080064either _succeeds_ or _fails_ to recognize some piece of Fortran.
peter klauslerd1cc6182018-11-26 12:42:11 -080065On success, they return a new state and some semantic value, and this is
66usually an instance of a C++ `struct` type that encodes the semantic
67content of a production in the Fortran grammar.
68
69This technique allows us to specify both the Fortran grammar and the
70representation of successfully parsed programs with C++ code
71whose functions and data structures correspond closely to the productions
72of Fortran.
73
74The specification of Fortran uses a form of BNF with alternatives,
75optional elements, sequences, and lists. Each of these constructs
76in the Fortran grammar maps directly in the f18 parser to both
77the means of combining other parsers as alternatives, &c., and to
78the declarations of the parse tree data structures that represent
79the results of successful parses.
80Move semantics are used in the parsing functions to acquire and
81combine the results of sub-parses into the result of a larger
82parse.
83
84To represent nodes in the Fortran parse tree, we need a means of
85handling sum types for productions that have multiple alternatives.
86The bounded polymorphism supplied by the C++17 `std::variant` fits
87those needs exactly.
88For example, production R502 in Fortran defines the top-level
89program unit of Fortran as being a function, subroutine, module, &c.
90The `struct ProgramUnit` in the f18 parse tree header file
91represents each program unit with a member that is a `std::variant`
92over the six possibilities.
93Similarly, the parser for that type in the f18 grammar has six alternatives,
94each of which constructs an instance of `ProgramUnit` upon the result of
95parsing a `Module`, `FunctionSubprogram`, and so on.
96
97Code that performs semantic analysis on the result of a successful
98parse is typically implemented with overloaded functions.
99A function instantiated on `ProgramUnit` will use `std::visit` to
100identify the right alternative and perform the right actions.
101The call to `std::visit` must pass a visitor that can handle all
102of the possibilities, and f18 will fail to build if one is missing.
103
104Were we unable to use `std::variant` directly, we would likely
105have chosen to implement a local `SumType` replacement; in the
peter klausler9c45b0d2019-02-27 16:00:37 -0800106absence of C++17's abilities of `using` a template parameter pack
107and allowing `auto` arguments in anonymous lambda functions,
peter klauslerd1cc6182018-11-26 12:42:11 -0800108it would be less convenient to use.
109
110The other options for polymorphism in C++ at the level of C++11
111would be to:
112* loosen up compile-time type safety and use a unified parse tree node
113 representation with an enumeration type for an operator and generic
114 subtree pointers, or
115* define the sum types for the parse tree as abstract base classes from
116 which each particular alternative would derive, and then use virtual
117 functions (or the forbidden `dynamic_cast`) to identify alternatives
118 during analysis
119
Richard Barton271a7bb2020-09-11 14:17:19 +0100120## Product types
peter klauslerd1cc6182018-11-26 12:42:11 -0800121
122Many productions in the Fortran grammar describe a sequence of various
123sub-parses.
124For example, R504 defines the things that may appear in the "specification
125part" of a subprogram in the order in which they are allowed: `USE`
126statements, then `IMPORT` statements, and so on.
127
128The parse tree node that represents such a thing needs to incorporate
129the representations of those parses, of course.
130It turns out to be convenient to allow these data members to be anonymous
131components of a `std::tuple` product type.
132This type facilitates the automation of code that walks over all of the
133members in a type-safe fashion and avoids the need to invent and remember
134needless member names -- the components of a `std::tuple` instance can
135be identified and accessed in terms of their types, and those tend to be
136distinct.
137
138So we use `std::tuple` for such things.
139It has also been handy for template metaprogramming that needs to work
140with lists of types.
141
Richard Barton271a7bb2020-09-11 14:17:19 +0100142## `std::optional`
peter klauslerd1cc6182018-11-26 12:42:11 -0800143
144This simple little type is used wherever a value might or might not be
145present.
peter klausler1b1f60f2018-12-05 13:03:39 -0800146It is especially useful for function results and
147rvalue reference arguments.
peter klauslerd1cc6182018-11-26 12:42:11 -0800148It corresponds directly to the optional elements in the productions
149of the Fortran grammar.
150It is also used as a wrapper around a parse tree node type to define the
151results of the various parsing functions, where presence of a value
152signifies a successful recognition and absence denotes a failed parse.
153It is used in data structures in place of nullable pointers to
154avoid indirection as well as the possible confusion over whether a pointer
155is allowed to be null.