| ========================= |
| Dependence Graphs in LLVM |
| ========================= |
| |
| .. contents:: |
| :local: |
| |
| Introduction |
| ============ |
| Dependence graphs are useful tools in compilers for analyzing relationships |
| between various program elements to help guide optimizations. The ideas |
| behind these graphs are described in papers [1]_ and [2]_. |
| |
| The implementation of these ideas in LLVM may be slightly different than |
| what is mentioned in the papers. These differences are documented in |
| the `implementation details <implementation-details_>`_. |
| |
| .. _DataDependenceGraph: |
| |
| Data Dependence Graph |
| ===================== |
| In its simplest form the Data Dependence Graph (or DDG) represents data |
| dependencies between individual instructions. Each node in such a graph |
| represents a single instruction and is referred to as an "atomic" node. |
| It is also possible to combine some atomic nodes that have a simple |
| def-use dependency between them into larger nodes that contain multiple- |
| instructions. |
| |
| As described in [1]_ the DDG uses graph abstraction to group nodes |
| that are part of a strongly connected component of the graph |
| into special nodes called pi-blocks. pi-blocks represent cycles of data |
| dependency that prevent reordering transformations. Since any strongly |
| connected component of the graph is a maximal subgraph of all the nodes |
| that form a cycle, pi-blocks are at most one level deep. In other words, |
| no pi-blocks are nested inside another pi-block, resulting in a |
| hierarchical representation that is at most one level deep. |
| |
| |
| For example, consider the following: |
| |
| .. code-block:: c++ |
| |
| for (int i = 1; i < n; i++) { |
| b[i] = c[i] + b[i-1]; |
| } |
| |
| This code contains a statement that has a loop carried dependence on |
| itself creating a cycle in the DDG. The figure below illustrates |
| how the cycle of dependency is carried through multiple def-use relations |
| and a memory access dependency. |
| |
| .. image:: cycle.png |
| |
| The DDG corresponding to this example would have a pi-block that contains |
| all the nodes participating in the cycle, as shown below: |
| |
| .. image:: cycle_pi.png |
| |
| Program Dependence Graph |
| ======================== |
| |
| The Program Dependence Graph (or PDG) has a similar structure as the |
| DDG, but it is capable of representing both data dependencies and |
| control-flow dependencies between program elements such as |
| instructions, groups of instructions, basic blocks or groups of |
| basic blocks. |
| |
| High-Level Design |
| ================= |
| |
| The DDG and the PDG are both directed graphs and they extend the |
| ``DirectedGraph`` class. Each implementation extends its corresponding |
| node and edge types resulting in the inheritance relationship depicted |
| in the UML diagram below: |
| |
| .. image:: uml_nodes_and_edges.png |
| |
| Graph Construction |
| ------------------ |
| |
| The graph build algorithm considers dependencies between elements of |
| a given set of instructions or basic blocks. Any dependencies coming |
| into or going out of instructions that do not belong to that range |
| are ignored. The steps in the build algorithm for the DDG are very |
| similar to the steps in the build algorithm for the PDG. As such, |
| one of the design goals is to reuse the build algorithm code to |
| allow creation of both DDG and PDG representations while allowing |
| the two implementations to define their own distinct and independent |
| node and edge types. This is achieved by using the well-known builder |
| design pattern to isolate the construction of the dependence graph |
| from its concrete representation. |
| |
| The following UML diagram depicts the overall structure of the design |
| pattern as it applies to the dependence graph implementation. |
| |
| .. image:: uml_builder_pattern.png |
| |
| Notice that the common code for building the two types of graphs are |
| provided in the ``DependenceGraphBuilder`` class, while the ``DDGBuilder`` |
| and ``PDGBuilder`` control some aspects of how the graph is constructed |
| by the way of overriding virtual methods defined in ``DependenceGraphBuilder``. |
| |
| Note also that the steps and the names used in this diagram are for |
| illustrative purposes and may be different from those in the actual |
| implementation. |
| |
| Design Trade-offs |
| ----------------- |
| |
| Advantages: |
| ^^^^^^^^^^^ |
| - Builder allows graph construction code to be reused for DDG and PDG. |
| - Builder allows us to create DDG and PDG as separate graphs. |
| - DDG nodes and edges are completely disjoint from PDG nodes and edges allowing them to change easily and independently. |
| |
| Disadvantages: |
| ^^^^^^^^^^^^^^ |
| - Builder may be perceived as over-engineering at first. |
| - There are some similarities between DDG nodes and edges compared to PDG nodes and edges, but there is little reuse of the class definitions. |
| |
| - This is tolerable given that the node and edge types are fairly simple and there is little code reuse opportunity anyway. |
| |
| |
| .. _implementation-details: |
| |
| Implementation Details |
| ====================== |
| |
| The current implementation of DDG differs slightly from the dependence |
| graph described in [1]_ in the following ways: |
| |
| 1. The graph nodes in the paper represent three main program components, namely *assignment statements*, *for loop headers* and *while loop headers*. In this implementation, DDG nodes naturally represent LLVM IR instructions. An assignment statement in this implementation typically involves a node representing the ``store`` instruction along with a number of individual nodes computing the right-hand-side of the assignment that connect to the ``store`` node via a def-use edge. The loop header instructions are not represented as special nodes in this implementation because they have limited uses and can be easily identified, for example, through ``LoopAnalysis``. |
| 2. The paper describes five types of dependency edges between nodes namely *loop dependency*, *flow-*, *anti-*, *output-*, and *input-* dependencies. In this implementation *memory* edges represent the *flow-*, *anti-*, *output-*, and *input-* dependencies. However, *loop dependencies* are not made explicit, because they mainly represent association between a loop structure and the program elements inside the loop and this association is fairly obvious in LLVM IR itself. |
| 3. The paper describes two types of pi-blocks; *recurrences* whose bodies are SCCs and *IN* nodes whose bodies are not part of any SCC. In this implementation, pi-blocks are only created for *recurrences*. *IN* nodes remain as simple DDG nodes in the graph. |
| |
| |
| References |
| ---------- |
| .. [1] "D. J. Kuck, R. H. Kuhn, D. A. Padua, B. Leasure, and M. Wolfe (1981). DEPENDENCE GRAPHS AND COMPILER OPTIMIZATIONS." |
| .. [2] "J. FERRANTE (IBM), K. J. OTTENSTEIN (Michigan Technological University) and JOE D. WARREN (Rice University), 1987. The Program Dependence Graph and Its Use in Optimization." |