docs/Overview.md - llvm-project/flang - Git at Google

 <!--===- docs/Overview.md

    Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
    See https://llvm.org/LICENSE.txt for license information.
    SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception

 -->

 # Overview of Compiler Phases

 ```{contents}
 ---
 local:
 ---
 ```
 The Flang compiler transforms Fortran source code into an executable file.
 This transformation proceeds in three high level phases -- analysis, lowering,
 and code generation/linking.

 The first high level phase (analysis) transforms Fortran source code into a
 decorated parse tree and a symbol table.  During this phase, all user
 related errors are detected and reported.

 The second high level phase (lowering), changes the decorated parse tree and
 symbol table into the Fortran Intermediate Representation (FIR), which is a
 dialect of LLVM's Multi-Level Intermediate Representation or MLIR.  It then
 runs a series of passes on the FIR code which verify its validity, perform a
 series of optimizations, and finally transform it into LLVM's Intermediate
 Representation, or LLVM IR

 The third high level phase generates machine code and invokes a linker to
 produce an executable file.

 This document describes the first two high level phases.  Each of these is
 described in more detailed phases.

 Each detailed phase is described -- its inputs and outputs along with how to
 produce a readable version of the outputs.

 Each detailed phase produces either correct output or fatal errors.

 ## Analysis

 This high level phase validates that the program is correct and creates all of
 the information needed for lowering.

 ### Prescan and Preprocess

 See [Preprocessing.md](Preprocessing.md).

 **Input:** Fortran source and header files, command line macro definitions,
   set of enabled compiler directives (to be treated as directives rather than
   comments).

 **Output:**
 - A "cooked" character stream: the entire program as a contiguous stream of
   normalized Fortran source.
   Extraneous whitespace and comments are removed (except comments that are
   compiler directives that are not disabled) and case is normalized.  Also,
   directives are processed and macros expanded.
 - Provenance information mapping each character back to the source it came from.
   This is used in subsequent phases that need source locations.  This includes
   error messages, optimization reports, and debugging information.

 **Entry point:** `parser::Parsing::Prescan`

 **Commands:**
  - `flang -fc1 -E src.f90` dumps the cooked character stream
  - `flang -fc1 -fdebug-dump-provenance src.f90` dumps provenance
    information

 ### Parsing

 **Input:** Cooked character stream

 **Output:** A parse tree for each Fortran program unit in the source code
 representing a syntactically correct program, rooted at the program unit.  See:
 [Parsing.md](Parsing.md) and [ParserCombinators.md](ParserCombinators.md).

 **Entry point:** `parser::Parsing::Parse`

 **Commands:**
   - `flang -fc1 -fdebug-dump-parse-tree-no-sema src.f90` dumps the parse tree
   - `flang -fc1 -fdebug-unparse src.f90` converts the parse tree to normalized Fortran
   - `flang -fc1 -fdebug-dump-parsing-log src.f90` runs an instrumented parse and dumps the log
   - `flang -fc1 -fdebug-measure-parse-tree src.f90` measures the parse tree

 ### Semantic processing

 **Input:** the parse tree, the cooked character stream, and provenance
 information

 **Output:**
 * a symbol table
 * modified parse tree
 * module files, (see: [ModFiles.md](ModFiles.md))
 * the intrinsic procedure table
 * the target characteristics
 * the runtime derived type derived type tables (see: [RuntimeTypeInfo.md](RuntimeTypeInfo.md))

 **Entry point:** `semantics::Semantics::Perform`

 For more detail on semantic analysis, see: [Semantics.md](Semantics.md).
 Semantic processing performs several tasks:
 * validates labels, see: [LabelResolution.md](LabelResolution.md).
 * canonicalizes DO statements,
 * canonicalizes OpenACC and OpenMP code
 * resolves names, building a tree of scopes and symbols
 * rewrites the parse tree to correct parsing mistakes (when needed) once semantic information is available to clarify the program's meaning
 * checks the validity of declarations
 * analyzes expressions and statements, emitting error messages where appropriate
 * creates module files if the source code contains modules,
   see [ModFiles.md](ModFiles.md).

 In the course of semantic analysis, the compiler:
 * creates the symbol table
 * decorates the parse tree with semantic information (such as pointers into the symbol table)
 * creates the intrinsic procedure table
 * folds constant expressions

 At the end of semantic processing, all validation of the user's program is complete.  This is the last detailed phase of analysis processing.

 **Commands:**
   - `flang -fc1 -fdebug-dump-parse-tree src.f90` dumps the parse tree after semantic analysis
   - `flang -fc1 -fdebug-dump-symbols src.f90` dumps the symbol table
   - `flang -fc1 -fdebug-dump-all src.f90` dumps both the parse tree and the symbol table

 ## Lowering

 Lowering takes the parse tree and symbol table produced by analysis and
 produces LLVM IR.

 ### Create the lowering bridge

 **Inputs:**
   - the parse tree
   - the symbol table
   - The default KINDs for intrinsic types (specified by default or command line option)
   - The intrinsic procedure table (created in semantics processing)
   - The target characteristics (created during semantics processing)
   - The cooked character stream
   - The target triple -- CPU type, vendor, operating system
   - The mapping between Fortran KIND values to FIR KIND values

 The lowering bridge is a container that holds all of the information needed for lowering.

 **Output:** A container with all of the information needed for lowering

 **Entry point:** lower::LoweringBridge::create

 ### Initial lowering

 **Input:** the lowering bridge

 **Output:** A Fortran IR (FIR) representation of the program.

 **Entry point:** `lower::LoweringBridge::lower`

 The compiler then takes the information in the lowering bridge and creates a
 pre-FIR tree or PFT.  The PFT is a list of programs and modules.  The programs
 and modules contain lists of function-like units.  The function-like units
 contain a list of evaluations.  All of these contain pointers back into the
 parse tree.  The compiler walks the PFT generating FIR.

 **Commands:**
   - `flang -fc1 -fdebug-dump-pft src.f90` dumps the pre-FIR tree
   - `flang -fc1 -emit-mlir src.f90` dumps the FIR to the files src.mlir

 ### Transformation passes

 **Input:** initial version of the FIR code

 **Output:** An LLVM IR representation of the program

 **Entry point:** `mlir::PassManager::run`

 The compiler then runs a series of passes over the FIR code.  The first is a
 verification pass.  It's followed by a series of transformation passes that
 perform various optimizations and transformations.  The final pass creates an
 LLVM IR representation of the program.

 **Commands:**
   - `flang -mmlir --mlir-print-ir-after-all -S src.f90` dumps the FIR code after each pass to standard error
   - `flang -fc1 -emit-llvm src.f90` dumps the LLVM IR to src.ll

 ## Object code generation and linking

 After the LLVM IR is created, the flang driver invokes LLVM's existing
 infrastructure to generate object code and invoke a linker to create the
 executable file.
	<!--===- docs/Overview.md

	Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
	See https://llvm.org/LICENSE.txt for license information.
	SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception

	-->

	# Overview of Compiler Phases

	```{contents}
	---
	local:
	---
	```
	The Flang compiler transforms Fortran source code into an executable file.
	This transformation proceeds in three high level phases -- analysis, lowering,
	and code generation/linking.

	The first high level phase (analysis) transforms Fortran source code into a
	decorated parse tree and a symbol table. During this phase, all user
	related errors are detected and reported.

	The second high level phase (lowering), changes the decorated parse tree and
	symbol table into the Fortran Intermediate Representation (FIR), which is a
	dialect of LLVM's Multi-Level Intermediate Representation or MLIR. It then
	runs a series of passes on the FIR code which verify its validity, perform a
	series of optimizations, and finally transform it into LLVM's Intermediate
	Representation, or LLVM IR

	The third high level phase generates machine code and invokes a linker to
	produce an executable file.

	This document describes the first two high level phases. Each of these is
	described in more detailed phases.

	Each detailed phase is described -- its inputs and outputs along with how to
	produce a readable version of the outputs.

	Each detailed phase produces either correct output or fatal errors.

	## Analysis

	This high level phase validates that the program is correct and creates all of
	the information needed for lowering.

	### Prescan and Preprocess

	See [Preprocessing.md](Preprocessing.md).

	Input: Fortran source and header files, command line macro definitions,
	set of enabled compiler directives (to be treated as directives rather than
	comments).

	Output:
	- A "cooked" character stream: the entire program as a contiguous stream of
	normalized Fortran source.
	Extraneous whitespace and comments are removed (except comments that are
	compiler directives that are not disabled) and case is normalized. Also,
	directives are processed and macros expanded.
	- Provenance information mapping each character back to the source it came from.
	This is used in subsequent phases that need source locations. This includes
	error messages, optimization reports, and debugging information.

	Entry point: `parser::Parsing::Prescan`

	Commands:
	- `flang -fc1 -E src.f90` dumps the cooked character stream
	- `flang -fc1 -fdebug-dump-provenance src.f90` dumps provenance
	information

	### Parsing

	Input: Cooked character stream

	Output: A parse tree for each Fortran program unit in the source code
	representing a syntactically correct program, rooted at the program unit. See:
	[Parsing.md](Parsing.md) and [ParserCombinators.md](ParserCombinators.md).

	Entry point: `parser::Parsing::Parse`

	Commands:
	- `flang -fc1 -fdebug-dump-parse-tree-no-sema src.f90` dumps the parse tree
	- `flang -fc1 -fdebug-unparse src.f90` converts the parse tree to normalized Fortran
	- `flang -fc1 -fdebug-dump-parsing-log src.f90` runs an instrumented parse and dumps the log
	- `flang -fc1 -fdebug-measure-parse-tree src.f90` measures the parse tree

	### Semantic processing

	Input: the parse tree, the cooked character stream, and provenance
	information

	Output:
	* a symbol table
	* modified parse tree
	* module files, (see: [ModFiles.md](ModFiles.md))
	* the intrinsic procedure table
	* the target characteristics
	* the runtime derived type derived type tables (see: [RuntimeTypeInfo.md](RuntimeTypeInfo.md))

	Entry point: `semantics::Semantics::Perform`

	For more detail on semantic analysis, see: [Semantics.md](Semantics.md).
	Semantic processing performs several tasks:
	* validates labels, see: [LabelResolution.md](LabelResolution.md).
	* canonicalizes DO statements,
	* canonicalizes OpenACC and OpenMP code
	* resolves names, building a tree of scopes and symbols
	* rewrites the parse tree to correct parsing mistakes (when needed) once semantic information is available to clarify the program's meaning
	* checks the validity of declarations
	* analyzes expressions and statements, emitting error messages where appropriate
	* creates module files if the source code contains modules,
	see [ModFiles.md](ModFiles.md).

	In the course of semantic analysis, the compiler:
	* creates the symbol table
	* decorates the parse tree with semantic information (such as pointers into the symbol table)
	* creates the intrinsic procedure table
	* folds constant expressions

	At the end of semantic processing, all validation of the user's program is complete. This is the last detailed phase of analysis processing.

	Commands:
	- `flang -fc1 -fdebug-dump-parse-tree src.f90` dumps the parse tree after semantic analysis
	- `flang -fc1 -fdebug-dump-symbols src.f90` dumps the symbol table
	- `flang -fc1 -fdebug-dump-all src.f90` dumps both the parse tree and the symbol table

	## Lowering

	Lowering takes the parse tree and symbol table produced by analysis and
	produces LLVM IR.

	### Create the lowering bridge

	Inputs:
	- the parse tree
	- the symbol table
	- The default KINDs for intrinsic types (specified by default or command line option)
	- The intrinsic procedure table (created in semantics processing)
	- The target characteristics (created during semantics processing)
	- The cooked character stream
	- The target triple -- CPU type, vendor, operating system
	- The mapping between Fortran KIND values to FIR KIND values

	The lowering bridge is a container that holds all of the information needed for lowering.

	Output: A container with all of the information needed for lowering

	Entry point: lower::LoweringBridge::create

	### Initial lowering

	Input: the lowering bridge

	Output: A Fortran IR (FIR) representation of the program.

	Entry point: `lower::LoweringBridge::lower`

	The compiler then takes the information in the lowering bridge and creates a
	pre-FIR tree or PFT. The PFT is a list of programs and modules. The programs
	and modules contain lists of function-like units. The function-like units
	contain a list of evaluations. All of these contain pointers back into the
	parse tree. The compiler walks the PFT generating FIR.

	Commands:
	- `flang -fc1 -fdebug-dump-pft src.f90` dumps the pre-FIR tree
	- `flang -fc1 -emit-mlir src.f90` dumps the FIR to the files src.mlir

	### Transformation passes

	Input: initial version of the FIR code

	Output: An LLVM IR representation of the program

	Entry point: `mlir::PassManager::run`

	The compiler then runs a series of passes over the FIR code. The first is a
	verification pass. It's followed by a series of transformation passes that
	perform various optimizations and transformations. The final pass creates an
	LLVM IR representation of the program.

	Commands:
	- `flang -mmlir --mlir-print-ir-after-all -S src.f90` dumps the FIR code after each pass to standard error
	- `flang -fc1 -emit-llvm src.f90` dumps the LLVM IR to src.ll

	## Object code generation and linking

	After the LLVM IR is created, the flang driver invokes LLVM's existing
	infrastructure to generate object code and invoke a linker to create the
	executable file.