docs/C++17.md - llvm-project/flang - Git at Google

 <!--===- docs/C++17.md

    Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
    See https://llvm.org/LICENSE.txt for license information.
    SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception

 -->

 # C++14/17 features used in f18

 ```{contents}
 ---
 local:
 ---
 ```

 The C++ dialect used in this project constitutes a subset of the
 standard C++ programming language and library features.
 We want our dialect to be compatible with the LLVM C++ language
 subset that will be in use at the time that we integrate with that
 project.
 We also want to maximize portability, future-proofing,
 compile-time error checking, and use of best practices.

 To that end, we have a C++ style guide (q.v.) that lays
 out the details of how our C++ code should look and gives
 guidance about feature usage.

 We have chosen to use some features of the recent C++17
 language standard in f18.
 The most important of these are:
 * sum types (discriminated unions) in the form of `std::variant`
 * `using` template parameter packs
 * generic lambdas with `auto` argument types
 * product types in the form of `std::tuple`
 * `std::optional`

 (`std::tuple` is actually a C++11 feature, but I include it
 in this list because it's not particularly well known.)

 ## Sum types

 First, some background information to explain the need for sum types
 in f18.

 Fortran is notoriously problematic to lex and parse, as tokenization
 depends on the state of the partial parse;
 the language has no reserved words in the sense that C++ does.
 Fortran parsers implemented with distinct lexing and parsing phases
 (generated by hand or with tools) need to implement them as
 coroutines with complicated state, and experience has shown that
 it's hard to get them right and harder to extend them as the language
 evolves.

 Alternatively, with the use of backtracking, one can parse Fortran with
 a unified lexer/parser.
 We have chosen to do so because it is simpler and should reduce
 both initial bugs and long-term maintenance.

 Specifically, f18's parser uses the technique of recursive descent with
 backtracking.
 It is constructed as the incremental composition of pure parsing functions
 that each, when given a context (location in the input stream plus some state),
 either _succeeds_ or _fails_ to recognize some piece of Fortran.
 On success, they return a new state and some semantic value, and this is
 usually an instance of a C++ `struct` type that encodes the semantic
 content of a production in the Fortran grammar.

 This technique allows us to specify both the Fortran grammar and the
 representation of successfully parsed programs with C++ code
 whose functions and data structures correspond closely to the productions
 of Fortran.

 The specification of Fortran uses a form of BNF with alternatives,
 optional elements, sequences, and lists.  Each of these constructs
 in the Fortran grammar maps directly in the f18 parser to both
 the means of combining other parsers as alternatives, &c., and to
 the declarations of the parse tree data structures that represent
 the results of successful parses.
 Move semantics are used in the parsing functions to acquire and
 combine the results of sub-parses into the result of a larger
 parse.

 To represent nodes in the Fortran parse tree, we need a means of
 handling sum types for productions that have multiple alternatives.
 The bounded polymorphism supplied by the C++17 `std::variant` fits
 those needs exactly.
 For example, production R502 in Fortran defines the top-level
 program unit of Fortran as being a function, subroutine, module, &c.
 The `struct ProgramUnit` in the f18 parse tree header file
 represents each program unit with a member that is a `std::variant`
 over the six possibilities.
 Similarly, the parser for that type in the f18 grammar has six alternatives,
 each of which constructs an instance of `ProgramUnit` upon the result of
 parsing a `Module`, `FunctionSubprogram`, and so on.

 Code that performs semantic analysis on the result of a successful
 parse is typically implemented with overloaded functions.
 A function instantiated on `ProgramUnit` will use `std::visit` to
 identify the right alternative and perform the right actions.
 The call to `std::visit` must pass a visitor that can handle all
 of the possibilities, and f18 will fail to build if one is missing.

 Were we unable to use `std::variant` directly, we would likely
 have chosen to implement a local `SumType` replacement; in the
 absence of C++17's abilities of `using` a template parameter pack
 and allowing `auto` arguments in anonymous lambda functions,
 it would be less convenient to use.

 The other options for polymorphism in C++ at the level of C++11
 would be to:
 * loosen up compile-time type safety and use a unified parse tree node
   representation with an enumeration type for an operator and generic
   subtree pointers, or
 * define the sum types for the parse tree as abstract base classes from
   which each particular alternative would derive, and then use virtual
   functions (or the forbidden `dynamic_cast`) to identify alternatives
   during analysis

 ## Product types

 Many productions in the Fortran grammar describe a sequence of various
 sub-parses.
 For example, R504 defines the things that may appear in the "specification
 part" of a subprogram in the order in which they are allowed: `USE`
 statements, then `IMPORT` statements, and so on.

 The parse tree node that represents such a thing needs to incorporate
 the representations of those parses, of course.
 It turns out to be convenient to allow these data members to be anonymous
 components of a `std::tuple` product type.
 This type facilitates the automation of code that walks over all of the
 members in a type-safe fashion and avoids the need to invent and remember
 needless member names -- the components of a `std::tuple` instance can
 be identified and accessed in terms of their types, and those tend to be
 distinct.

 So we use `std::tuple` for such things.
 It has also been handy for template metaprogramming that needs to work
 with lists of types.

 ## `std::optional`

 This simple little type is used wherever a value might or might not be
 present.
 It is especially useful for function results and
 rvalue reference arguments.
 It corresponds directly to the optional elements in the productions
 of the Fortran grammar.
 It is also used as a wrapper around a parse tree node type to define the
 results of the various parsing functions, where presence of a value
 signifies a successful recognition and absence denotes a failed parse.
 It is used in data structures in place of nullable pointers to
 avoid indirection as well as the possible confusion over whether a pointer
 is allowed to be null.
	<!--===- docs/C++17.md

	Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
	See https://llvm.org/LICENSE.txt for license information.
	SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception

	-->

	# C++14/17 features used in f18

	```{contents}
	---
	local:
	---
	```

	The C++ dialect used in this project constitutes a subset of the
	standard C++ programming language and library features.
	We want our dialect to be compatible with the LLVM C++ language
	subset that will be in use at the time that we integrate with that
	project.
	We also want to maximize portability, future-proofing,
	compile-time error checking, and use of best practices.

	To that end, we have a C++ style guide (q.v.) that lays
	out the details of how our C++ code should look and gives
	guidance about feature usage.

	We have chosen to use some features of the recent C++17
	language standard in f18.
	The most important of these are:
	* sum types (discriminated unions) in the form of `std::variant`
	* `using` template parameter packs
	* generic lambdas with `auto` argument types
	* product types in the form of `std::tuple`
	* `std::optional`

	(`std::tuple` is actually a C++11 feature, but I include it
	in this list because it's not particularly well known.)

	## Sum types

	First, some background information to explain the need for sum types
	in f18.

	Fortran is notoriously problematic to lex and parse, as tokenization
	depends on the state of the partial parse;
	the language has no reserved words in the sense that C++ does.
	Fortran parsers implemented with distinct lexing and parsing phases
	(generated by hand or with tools) need to implement them as
	coroutines with complicated state, and experience has shown that
	it's hard to get them right and harder to extend them as the language
	evolves.

	Alternatively, with the use of backtracking, one can parse Fortran with
	a unified lexer/parser.
	We have chosen to do so because it is simpler and should reduce
	both initial bugs and long-term maintenance.

	Specifically, f18's parser uses the technique of recursive descent with
	backtracking.
	It is constructed as the incremental composition of pure parsing functions
	that each, when given a context (location in the input stream plus some state),
	either _succeeds_ or _fails_ to recognize some piece of Fortran.
	On success, they return a new state and some semantic value, and this is
	usually an instance of a C++ `struct` type that encodes the semantic
	content of a production in the Fortran grammar.

	This technique allows us to specify both the Fortran grammar and the
	representation of successfully parsed programs with C++ code
	whose functions and data structures correspond closely to the productions
	of Fortran.

	The specification of Fortran uses a form of BNF with alternatives,
	optional elements, sequences, and lists. Each of these constructs
	in the Fortran grammar maps directly in the f18 parser to both
	the means of combining other parsers as alternatives, &c., and to
	the declarations of the parse tree data structures that represent
	the results of successful parses.
	Move semantics are used in the parsing functions to acquire and
	combine the results of sub-parses into the result of a larger
	parse.

	To represent nodes in the Fortran parse tree, we need a means of
	handling sum types for productions that have multiple alternatives.
	The bounded polymorphism supplied by the C++17 `std::variant` fits
	those needs exactly.
	For example, production R502 in Fortran defines the top-level
	program unit of Fortran as being a function, subroutine, module, &c.
	The `struct ProgramUnit` in the f18 parse tree header file
	represents each program unit with a member that is a `std::variant`
	over the six possibilities.
	Similarly, the parser for that type in the f18 grammar has six alternatives,
	each of which constructs an instance of `ProgramUnit` upon the result of
	parsing a `Module`, `FunctionSubprogram`, and so on.

	Code that performs semantic analysis on the result of a successful
	parse is typically implemented with overloaded functions.
	A function instantiated on `ProgramUnit` will use `std::visit` to
	identify the right alternative and perform the right actions.
	The call to `std::visit` must pass a visitor that can handle all
	of the possibilities, and f18 will fail to build if one is missing.

	Were we unable to use `std::variant` directly, we would likely
	have chosen to implement a local `SumType` replacement; in the
	absence of C++17's abilities of `using` a template parameter pack
	and allowing `auto` arguments in anonymous lambda functions,
	it would be less convenient to use.

	The other options for polymorphism in C++ at the level of C++11
	would be to:
	* loosen up compile-time type safety and use a unified parse tree node
	representation with an enumeration type for an operator and generic
	subtree pointers, or
	* define the sum types for the parse tree as abstract base classes from
	which each particular alternative would derive, and then use virtual
	functions (or the forbidden `dynamic_cast`) to identify alternatives
	during analysis

	## Product types

	Many productions in the Fortran grammar describe a sequence of various
	sub-parses.
	For example, R504 defines the things that may appear in the "specification
	part" of a subprogram in the order in which they are allowed: `USE`
	statements, then `IMPORT` statements, and so on.

	The parse tree node that represents such a thing needs to incorporate
	the representations of those parses, of course.
	It turns out to be convenient to allow these data members to be anonymous
	components of a `std::tuple` product type.
	This type facilitates the automation of code that walks over all of the
	members in a type-safe fashion and avoids the need to invent and remember
	needless member names -- the components of a `std::tuple` instance can
	be identified and accessed in terms of their types, and those tend to be
	distinct.

	So we use `std::tuple` for such things.
	It has also been handy for template metaprogramming that needs to work
	with lists of types.

	## `std::optional`

	This simple little type is used wherever a value might or might not be
	present.
	It is especially useful for function results and
	rvalue reference arguments.
	It corresponds directly to the optional elements in the productions
	of the Fortran grammar.
	It is also used as a wrapper around a parse tree node type to define the
	results of the various parsing functions, where presence of a value
	signifies a successful recognition and absence denotes a failed parse.
	It is used in data structures in place of nullable pointers to
	avoid indirection as well as the possible confusion over whether a pointer
	is allowed to be null.