| \documentclass{article} |
| \begin{document} |
| |
| \title{Data Structure Analysis User's Manual} |
| \author{Written by Patrick Simmons} |
| |
| \maketitle |
| |
| \section{Introduction} |
| |
| Data Structure Analysis is a form of field-sensitive, |
| context-sensitive, unification-based alias analysis, described fully |
| in Chris Lattner's |
| thesis.\footnote{http://llvm.org/pubs/2005-05-04-LattnerPHDThesis.html} |
| DSA is also the LLVM analysis pass implementing this algorithm. The |
| goal of this user's manual is to document the external API of the LLVM |
| DSA pass for the benefit of those who wish to write clients for this |
| pass. |
| |
| \section{Brief Overview of DSA Algorithm} |
| |
| DSA roughly works in three distinct phases: Local, Bottom-Up, and |
| Top-Down. In order to motivate the artifacts created by the DSA |
| algorithm, these phases are briefly discussed below. Many |
| simplifications have been made, and many details have been omitted. |
| Those interested in the workings of DSA in any detail should consult |
| Chris Lattner's thesis. |
| |
| \subsection{Local} |
| |
| The local phase of DSA creates a node (``DSNode'') representing each |
| pointer and pointer target within a function. It then creates a graph |
| by creating edges between the nodes representing the ``points-to'' |
| relationship. However, because DSA is unification-based, there can be |
| only one edge from any single field. Thus, if the local analysis is |
| not able to determine which of multiple objects is the target of a |
| pointer, all possible targets must be merged into a single DSNode. |
| The local analysis is run on each function in the program, creating |
| separate graphs for each. |
| |
| \subsection{Bottom-Up} |
| |
| After the local phase has run, DSA iterates over the callgraph, |
| callees before callers, and clones the callee's graph into the caller |
| by matching the nodes corresponding to callee formal parameters to the |
| actual parameters given in the caller. This cloning is key to the |
| context-sensitivity of DSA. A number of tricks are used which in |
| practice prevent this pass from exhibiting its exponential worst-case |
| behavior. |
| |
| \subsection{Top-Down} |
| |
| The top-down phase of DSA iterates over the callgraph again, this time |
| callers before callees, and merges nodes in callees when necessary. |
| This has the unfortunate effect of destroying some of DSA's |
| context-sensitivity but is necessary for clients that which to use DSA |
| as simply an alias analysis (such DSA-AA). |
| |
| \section{Artifacts of DSA} |
| |
| The artifacts of DSA are a collection of directed graphs, called |
| ``DSGraphs'', with nodes (``DSNodes'') representing sets of objects |
| which may alias and edges representing the points-to relation between |
| these nodes. |
| |
| \subsection{The Function Graphs} |
| \label{fungraph} |
| |
| A DSGraph is created for every function in the program. This graph |
| contains nodes and edges representing the information from the local |
| function, the callees if bottom-up DSA has been run, and the callees |
| and callers if top-down DSA has been run. |
| |
| When interpreting the results of a function graph, it is important to |
| keep in mind the ``perspective'' of the DSA passes that have been run. |
| For instance, under bottom-up DSA, a DSNode representing the target of |
| a pointer argument to a function may not have the \texttt{global} flag |
| but still may represent a global variable in some contexts if a |
| particular caller passes the address of a global variable to the |
| function. Roughly, in bottom-up DSA, a function graph contains the |
| information on a function which is true no matter what calling the |
| calling context is. In contrast, under top-down DSA, a function graph |
| summarizes the information about a function taking into account all |
| possible contexts. Under bottom-up DSA, a node which may be either a |
| heap node or a stack node depending on context will have neither the |
| heap nor stack flags set. Under top-down DSA, a node which may be |
| either a heap node or a stack node depending on context will have both |
| flags set. Under bottom-up DSA, two arguments which may or may not |
| alias depending on context will be represented as two separate |
| DSNodes; under top-down DSA, these arguments will be unified into one |
| node. |
| |
| Thus, bottom-up DSA gives more context-sensitivity to clients, but at |
| the price of giving its results in a different and, for many clients, |
| less useful form. It is up to the client to evaluate the trade-offs |
| between the two passes. |
| |
| \subsection{The Globals Graph} |
| |
| The globals graph represents the interactions between global variables |
| and nodes reachable from global variables. The globals graph contains |
| exactly those DSNodes which are reachable from some DSNode |
| representing a global variable. The existence and design of this |
| graph are partly motivated by internal DSA optimizations unlikely to be |
| of concern to clients. |
| |
| It is important not to confuse the globals graph with the |
| \texttt{global} flag described later in this manual. While all |
| DSNodes marked with the \texttt{global} flag will be represented in |
| the globals graph since nodes with the \texttt{global} flag are |
| exactly those which may represent global variables, the globals graph |
| will also contain nodes which do not have the \texttt{global} flag, |
| such as the targets of global pointers. The set of nodes in the |
| globals graph is the transitive closure of the set of nodes with the |
| \texttt{global} flag under the points-to relation. |
| |
| \section{Interfaces to DSA Results} |
| |
| %Will mostly use DSGraph.h and DSNode.h for this |
| %Make sure to remember to mention the flags. |
| %Mention John's rules in this section. |
| |
| This section lists selected, generally useful functions from the |
| client API of DSA. |
| |
| \subsection{Special Types} |
| |
| DSA uses a number of special types in its arguments and return |
| values. These are summarized in this section. |
| |
| \subsubsection{DSNodeHandle} |
| |
| DSNodeHandle is a smart pointer designed to contain one thing and one |
| thing only: a pointer to a DSNode. The only interesting function it |
| contains is \texttt{getNode()}, which returns the pointer associated |
| with a DSNodeHandle. If a client pass wishes to build a data |
| structure containing DSNode references, this class should be used |
| instead of storing DSNode references directly. The reason for this is |
| that calling getNode() on a DSNodeHandle has the potential to render |
| invalid any raw DSNode pointer held by a client. |
| |
| While DSNodeHandle implements the <, >, and == comparison functions by |
| simply comparing the underlying raw pointers, it is not safe to use |
| DSNodeHandles in STL or other data structures that expect to be used |
| with immutable objects. Any accesses to DSNodes may result in the |
| pointer associated with a DSNodeHandle changing its value and |
| therefore change the result of these comparison operations. In |
| particular, it is not safe to store DSNodeHandles in an STL set or |
| map. |
| |
| \subsubsection{NodeMapTy} |
| |
| This is a typedef for \texttt{std::map<const DSNode*, DSNodeHandle>}. |
| |
| \subsubsection{InvNodeMapTy} |
| |
| This is a typedef for \texttt{std::multimap<DSNodeHandle, const DSNode*>}. |
| |
| \subsection{DataStructures : public ModulePass} |
| |
| \begin{itemize} |
| \item \texttt{DSGraph* getDSGraph(const Function\& F) const}: return |
| the DSGraph for the specified function, or null if none |
| \item \texttt{DSGraph* getGlobalsGraph() const}: return the globals |
| graph |
| \end{itemize} |
| |
| \subsection{DSGraph} |
| |
| \begin{itemize} |
| \item \texttt{DSGraph* getGlobalsGraph() const}: return the globals |
| graph |
| \item \texttt{node\_iterator node\_begin()/node\_end()}: iterate over all the nodes |
| in the DSGraph. Be extremely careful when using these methods. |
| \item \texttt{string getFunctionNames() const}: return a |
| space-separated list of the names of functions in this graph |
| \item \texttt{const std::list<DSCallSite>\& getFunctionCalls() const}: |
| return a list of call sites in the original local graph |
| \item \texttt{fc\_iterator fc\_begin()/fc\_end() const}: iterate over |
| call sites |
| \item \texttt{DSNodeHandle\& getNodeForValue(const Value* V) const}: |
| return the DSNode in this graph associated with the specified value |
| \item \texttt{bool hasNodeForValue(const Value* V) const}: return true |
| if there is a DSNode in this graph associated with the specified |
| value, false otherwise |
| \item \texttt{DSNodeHandle\& getReturnNodeFor(const Function\& F)}: get |
| the return node for the specified function (which must be associated |
| with the current DSGraph) |
| \item \texttt{static void computeNodeMapping(const DSNodeHandle\& NH1, |
| const DSNodeHandle\& NH2, NodeMapTy\& NodeMap, bool StrictChecking = |
| true)}: given roots in two different DSGraphs, traverse the nodes |
| reachable from the two graphs and construct the maping of nodes from |
| the first to the second graph. This function is useful in allowing |
| a client to analyze which nodes in two different DSGraphs represent |
| the ``same'' value in a certain context; however, the most common |
| potential uses of this function are handled by the more specialized |
| functions below. |
| \item \texttt{void computeGtoGGMapping(NodeMapTy\& NodeMap)}: compute |
| the mapping of nodes in this DSGraph to nodes in the globals graph |
| \item \texttt{void computeGGtoGMapping(InvNodeMapTy\& InvNodeMap)}: |
| compute the mapping of nodes in the globals graph to nodes in this |
| DSGraph |
| \item \texttt{void computeCalleeCallerMapping(DSCallSite CS, const |
| Function\& Callee, DSGraph\& CalleeGraph, NodeMapTy\& NodeMap)}: given |
| a call from a function in the current graph to the ``Callee'' |
| function, which must be associated with ``CalleeGraph'', construct a |
| mapping of nodes from the callee to nodes in this graph |
| \end{itemize} |
| |
| \subsection{DSNode} |
| |
| The DSNode class represents a node in a DSA function or globals graph |
| and therefore represents a set of objects which may alias. |
| Interesting analysis results about the behavior of this alias set are |
| stored as flags: |
| \begin{verbatim} |
| enum NodeTy { |
| ShadowNode = 0, // Nothing is known about this node... |
| AllocaNode = 1 << 0, // This node was allocated with alloca |
| HeapNode = 1 << 1, // This node was allocated with malloc |
| GlobalNode = 1 << 2, // This node was allocated by a global var decl |
| ExternFuncNode = 1 << 3, // This node contains external functions |
| ExternGlobalNode = 1 << 4, // This node contains external globals |
| UnknownNode = 1 << 5, // This node points to unknown allocated memory |
| IncompleteNode = 1 << 6, // This node may not be complete |
| |
| ModifiedNode = 1 << 7, // This node is modified in this context |
| ReadNode = 1 << 8, // This node is read in this context |
| |
| ArrayNode = 1 << 9, // This node is treated like an array |
| CollapsedNode = 1 << 10, // This node is collapsed |
| ExternalNode = 1 << 11, // This node comes from an external source |
| IntToPtrNode = 1 << 12, // This node comes from an int cast |
| PtrToIntNode = 1 << 13, // This node escapes to an int cast |
| VAStartNode = 1 << 14, // This node is from a vastart call |
| \end{verbatim} |
| |
| These flags may be accessed with the following accessors: |
| \begin{verbatim} |
| unsigned getNodeFlags() const; |
| bool isAllocaNode() const { return NodeType & AllocaNode; } |
| bool isHeapNode() const { return NodeType & HeapNode; } |
| bool isGlobalNode() const { return NodeType & GlobalNode; } |
| bool isExternFuncNode() const { return NodeType & ExternFuncNode; } |
| bool isUnknownNode() const { return NodeType & UnknownNode; } |
| bool isModifiedNode() const { return NodeType & ModifiedNode; } |
| bool isReadNode() const { return NodeType & ReadNode; } |
| bool isArrayNode() const { return NodeType & ArrayNode; } |
| bool isCollapsedNode() const { return NodeType & CollapsedNode; } |
| bool isIncompleteNode() const { return NodeType & IncompleteNode;} |
| bool isCompleteNode() const { return !isIncompleteNode(); } |
| bool isExternalNode() const { return NodeType & ExternalNode; } |
| bool isIntToPtrNode() const { return NodeType & IntToPtrNode; } |
| bool isPtrToIntNode() const { return NodeType & PtrToIntNode; } |
| bool isVAStartNode() const { return NodeType & VAStartNode; } |
| \end{verbatim} |
| |
| Other useful functions of this class are summarized below: |
| \begin{itemize} |
| \item \texttt{DSGraph* getParentGraph() const}: get the DSGraph of |
| which this node is a part |
| \item \texttt{unsigned getNumReferrers() const}: return the number of |
| edges pointing to this node |
| \item \texttt{unsigned getSize() const}: return the size of the |
| largest value represented by this node |
| \item \texttt{bool hasLink(unsigned offset) const}: return true if any |
| value represented by this node may have a pointer to another node at |
| the specified offset from the start of the value. |
| \item \texttt{DSNodeHandle\& getLilnk(unsigned offset)}: returns the |
| DSNodeHandle associated with the node pointed to by this node at the |
| specified offset |
| \item \texttt{void markReachableNodes(llvm::DenseSet<const DSNode*>\& |
| ReachableNodes) const}: populates ReachableNodes with pointers to |
| all DSNodes reachable from this node's edges. |
| \end{itemize} |
| |
| \subsection{DSCallSite} |
| |
| This class provides an interface to the call graph of the program as |
| seen by DSA. The interesting functions are printed below. |
| \begin{verbatim} |
| /// isDirectCall - Return true if this call site is a direct call of the |
| /// function specified by getCalleeFunc. If not, it is an indirect call to |
| /// the node specified by getCalleeNode. |
| /// |
| bool isDirectCall() const { return CalleeF != 0; } |
| bool isIndirectCall() const { return !isDirectCall(); } |
| |
| |
| // Accessor functions... |
| const Function &getCaller() const; |
| CallSite getCallSite() const { return Site; } |
| DSNodeHandle &getRetVal() { return RetVal; } |
| const DSNodeHandle &getRetVal() const { return RetVal; } |
| DSNodeHandle &getVAVal() { return VarArgVal; } |
| const DSNodeHandle &getVAVal() const { return VarArgVal; } |
| |
| DSNode *getCalleeNode() const { |
| assert(!CalleeF && CalleeN.getNode()); return CalleeN.getNode(); |
| } |
| const Function *getCalleeFunc() const { |
| assert(!CalleeN.getNode() && CalleeF); return CalleeF; |
| } |
| |
| unsigned getNumPtrArgs() const { return CallArgs.size(); } |
| \end{verbatim} |
| |
| %Final subsection |
| \subsection{Potential Pitfalls} |
| |
| The most important pitfall to using DSA successfully is to |
| misunderstand the information returned by it. This is more of a |
| problem when using a bottom-up-only analysis than when using a |
| top-down analysis. When designing a client pass, one must take care |
| not to infer things from DSA that are not implied by the information |
| given. It may be advisable for a pass author to reread |
| \S\ref{fungraph} of this manual, especially if bottom-up |
| DSA is to be used. |
| |
| \section{DSA Passes} |
| |
| Depending on their needs, clients may wish to use the results of any |
| one of a number of DSA passes. It is important to understand the |
| differences between them. The two most common passes are listed |
| below. A client uses a pass by indicating that it requires its |
| results in the normal LLVM manner: |
| \begin{verbatim} |
| EXAMPLE |
| \end{verbatim} |
| |
| \subsection{EQTDDataStructures} |
| |
| This is the ``final'' form of DSA and is probably the appropriate pass |
| for most external clients to use. By including this pass, a client |
| will get the results of the DSA analysis after all phases of DSA have |
| been run. Merging has been performed for aliasing in both callers and |
| callees, so fewer nodes will be incomplete than in bottom-up. |
| Aliasing information will therefore be most precise. |
| |
| The primary disadvantage of using EQTDDataStructures is the loss of |
| precision caused by the additional merging of nodes relative to |
| bottom-up. Depending on the needs of a client, therefore, |
| EquivBuDataStructures may be more appropriate in specific cases. |
| |
| \subsection{EquivBUDataStructures} |
| |
| This pass runs all forms of DSA except top-down propagation. The |
| DSGraphs returned by this pass take into account aliasing caused by |
| calleees of a function, but not by callers. Thus, this pass is not |
| appropriate for use as a traditional not-shared/may-shared/must-shared |
| alias analysis. However, as this pass is more context-sensitive than |
| EQTDDataStructures, it may be a better choice for certain clients. |
| |
| \end{document} |