| <!--===- docs/ControlFlowGraph.md |
| |
| Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. |
| See https://llvm.org/LICENSE.txt for license information. |
| SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception |
| |
| --> |
| |
| # Control Flow Graph |
| |
| ```{contents} |
| --- |
| local: |
| --- |
| ``` |
| |
| ## Concept |
| After a Fortran subprogram has been parsed, its names resolved, and all its |
| semantic constraints successfully checked, the parse tree of its |
| executable part is translated into another abstract representation, |
| namely the _control flow graph_ described in this note. |
| |
| This second representation of the subprogram's executable part is |
| suitable for analysis and incremental modification as the subprogram |
| is readied for code generation. |
| Many high-level Fortran features are implemented by rewriting portions |
| of a subprogram's control flow graph in place. |
| |
| ### Control Flow Graph |
| A _control flow graph_ is a collection of simple (_i.e.,_ "non-extended") |
| basic _blocks_ that comprise straight-line sequences of _actions_ with a |
| single entry point and a single exit point, and a collection of |
| directed flow _edges_ (or _arcs_) denoting all possible transitions of |
| control flow that may take place during execution from the end of |
| one basic block to the beginning of another (or itself). |
| |
| A block that has multiple distinct successors in the flow of control |
| must end with an action that selects its successor. |
| |
| The sequence of actions that constitutes a basic block may |
| include references to user and library procedures. |
| Subprogram calls with implicit control flow afterwards, namely |
| alternate returns and `END=`/`ERR=` labels on input/output, |
| will be lowered in translation to a representation that materializes |
| that control flow into something similar to a computed `GO TO` or |
| C language `switch` statement. |
| |
| For convenience in optimization and to simplify the implementation of |
| data flow confluence functions, we may choose to maintain the |
| property that each flow arc is the sole outbound arc emanating from |
| its originating block, the sole inbound arc arriving at its destination, |
| or both. |
| Empty blocks would inserted to "split" arcs when necessary to maintain this |
| invariant property. |
| |
| Fortran subprograms (other than internal subprograms) can have multiple |
| entry points by using the obsolescent `ENTRY` statement. |
| We will implement such subprograms by constructing a union |
| of their dummy argument lists and using it as part of the definition |
| of a new subroutine or function that can be called by each of |
| the entry points, which are then all converted into wrapper routines that |
| pass a selector value as an additional argument to drive a `switch` on entry |
| to the new subprogram. |
| |
| This transformation ensures that every subprogram's control |
| flow graph has a well-defined `START` node. |
| |
| Statement labels can be used in Fortran on any statement, but only |
| the labels that decorate legal destinations of `GO TO` statements |
| need to be implemented in the control flow graph. |
| Specifically, non-executable statements like `DATA`, `NAMELIST`, and |
| `FORMAT` statements will be extracted into data initialization |
| records before or during the construction of the control flow |
| graph, and will survive only as synonyms for `CONTINUE`. |
| |
| Nests of multiple labeled `DO` loops that terminate on the same |
| label will be have that label rewritten so that `GO TO` within |
| the loop nest will arrive at the copy that most closely nests |
| the context. |
| The Fortran standard does not require us to do this, but XLF |
| (at least) works this way. |
| |
| ### Expressions and Statements (Operations and Actions) |
| Expressions are trees, not DAGs, of intrinsic operations, |
| resolved function references, constant literals, and |
| data designators. |
| |
| Expression nodes are represented in the compiler in a type-safe manner. |
| There is a distinct class or class template for every category of |
| intrinsic type, templatized over its supported kind type parameter values. |
| |
| Operands are storage-owning indirections to other instances |
| of `Expression`, instances of constant values, and to representations |
| of data and function references. |
| These indirections are not nullable apart from the situation in which |
| the operands of an expression are being removed for use elsewhere before |
| the expression is destructed. |
| |
| The ranks and the extents of the shapes of the results of expressions |
| are explicit for constant arrays and recoverable by analysis otherwise. |
| |
| Parenthesized subexpressions are scrupulously preserved in accordance with |
| the Fortran standard. |
| |
| The expression tree is meant to be a representation that is |
| as equally well suited for use in the symbol table (e.g., for |
| a bound of an explicit shape array) as it is for an action |
| in a basic block of the control flow graph (e.g., the right |
| hand side of an assignment statement). |
| |
| Each basic block comprises a linear sequence of _actions_. |
| These are represented as a doubly-linked list so that insertion |
| and deletion can be done in constant time. |
| |
| Only the last action in a basic block can represent a change |
| to the flow of control. |
| |
| ### Scope Transitions |
| Some of the various scopes of the symbol table are visible in the control flow |
| graph as `SCOPE ENTRY` and `SCOPE EXIT` actions. |
| `SCOPE ENTRY` actions are unique for their corresponding scopes, |
| while `SCOPE EXIT` actions need not be so. |
| It must be the case that |
| any flow of control within the subprogram will enter only scopes that are |
| not yet active, and exit only the most recently entered scope that has not |
| yet been deactivated; i.e., when modeled by a push-down stack that is |
| pushed by each traversal of a `SCOPE ENTRY` action, |
| the entries of the stack are always distinct, only the scope at |
| the top of the stack is ever popped by `SCOPE EXIT`, and the stack is empty |
| when the subprogram terminates. |
| Further, any references to resolved symbols must be to symbols whose scopes |
| are active. |
| |
| The `DEALLOCATE` actions and calls to `FINAL` procedures implied by scoped |
| lifetimes will be explicit in the sequence of actions in the control flow |
| graph. |
| |
| Parallel regions might be partially represented by scopes, or by explicit |
| operations similar to the scope entry and exit operations. |
| |
| ### Data Flow Representation |
| The subprogram text will be in static single assignment form by the time the |
| subprogram arrives at the bridge to the LLVM IR builder. |
| Merge points are actions at the heads of basic blocks whose operands |
| are definition points; definition points are actions at the ends of |
| basic blocks whose operands are expression trees (which may refer to |
| merge points). |
| |
| ### Rewriting Transformations |
| |
| #### I/O |
| #### Dynamic allocation |
| #### Array constructors |
| |
| #### Derived type initialization, deallocation, and finalization |
| The machinery behind the complicated semantics of Fortran's derived types |
| and `ALLOCATABLE` objects will be implemented in large part by the run time |
| support library. |
| |
| #### Actual argument temporaries |
| #### Array assignments, `WHERE`, and `FORALL` |
| |
| Array operations have shape. |
| |
| `WHERE` masks have shape. |
| Their effects on array operations are by means of explicit `MASK` operands that |
| are part of array assignment operations. |
| |
| #### Intrinsic function and subroutine calls |