| <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" |
| "http://www.w3.org/TR/html4/strict.dtd"> |
| <html> |
| <head> |
| <title>Checker Developer Manual</title> |
| <link type="text/css" rel="stylesheet" href="menu.css"> |
| <link type="text/css" rel="stylesheet" href="content.css"> |
| <script type="text/javascript" src="scripts/menu.js"></script> |
| </head> |
| <body> |
| |
| <div id="page"> |
| <!--#include virtual="menu.html.incl"--> |
| |
| <div id="content"> |
| |
| <h3 style="color:red">This Page Is Under Construction</h3> |
| |
| <h1>Checker Developer Manual</h1> |
| |
| <p>The static analyzer engine performs path-sensitive exploration of the program and |
| relies on a set of checkers to implement the logic for detecting and |
| constructing specific bug reports. Anyone who is interested in implementing their own |
| checker, should check out the Building a Checker in 24 Hours talk |
| (<a href="http://llvm.org/devmtg/2012-11/Zaks-Rose-Checker24Hours.pdf">slides</a> |
| <a href="https://youtu.be/kdxlsP5QVPw">video</a>) |
| and refer to this page for additional information on writing a checker. The static analyzer is a |
| part of the Clang project, so consult <a href="http://clang.llvm.org/hacking.html">Hacking on Clang</a> |
| and <a href="http://llvm.org/docs/ProgrammersManual.html">LLVM Programmer's Manual</a> |
| for developer guidelines and send your questions and proposals to |
| <a href=http://lists.llvm.org/mailman/listinfo/cfe-dev>cfe-dev mailing list</a>. |
| </p> |
| |
| <ul> |
| <li><a href="#start">Getting Started</a></li> |
| <li><a href="#analyzer">Static Analyzer Overview</a> |
| <ul> |
| <li><a href="#interaction">Interaction with Checkers</a></li> |
| <li><a href="#values">Representing Values</a></li> |
| </ul></li> |
| <li><a href="#idea">Idea for a Checker</a></li> |
| <li><a href="#registration">Checker Registration</a></li> |
| <li><a href="#events_callbacks">Events, Callbacks, and Checker Class Structure</a></li> |
| <li><a href="#extendingstates">Custom Program States</a></li> |
| <li><a href="#bugs">Bug Reports</a></li> |
| <li><a href="#ast">AST Visitors</a></li> |
| <li><a href="#testing">Testing</a></li> |
| <li><a href="#commands">Useful Commands/Debugging Hints</a> |
| <ul> |
| <li><a href="#attaching">Attaching the Debugger</a></li> |
| <li><a href="#narrowing">Narrowing Down the Problem</a></li> |
| <li><a href="#visualizing">Visualizing the Analysis</a></li> |
| <li><a href="#debugprints">Debug Prints and Tricks</a></li> |
| </ul></li> |
| <li><a href="#additioninformation">Additional Sources of Information</a></li> |
| <li><a href="#links">Useful Links</a></li> |
| </ul> |
| |
| <h2 id=start>Getting Started</h2> |
| <ul> |
| <li>To check out the source code and build the project, follow steps 1-4 of |
| the <a href="http://clang.llvm.org/get_started.html">Clang Getting Started</a> |
| page.</li> |
| |
| <li>The analyzer source code is located under the Clang source tree: |
| <br><tt> |
| $ <b>cd llvm/tools/clang</b> |
| </tt> |
| <br>See: <tt>include/clang/StaticAnalyzer</tt>, <tt>lib/StaticAnalyzer</tt>, |
| <tt>test/Analysis</tt>.</li> |
| |
| <li>The analyzer regression tests can be executed from the Clang's build |
| directory: |
| <br><tt> |
| $ <b>cd ../../../; cd build/tools/clang; TESTDIRS=Analysis make test</b> |
| </tt></li> |
| |
| <li>Analyze a file with the specified checker: |
| <br><tt> |
| $ <b>clang -cc1 -analyze -analyzer-checker=core.DivideZero test.c</b> |
| </tt></li> |
| |
| <li>List the available checkers: |
| <br><tt> |
| $ <b>clang -cc1 -analyzer-checker-help</b> |
| </tt></li> |
| |
| <li>See the analyzer help for different output formats, fine tuning, and |
| debug options: |
| <br><tt> |
| $ <b>clang -cc1 -help | grep "analyzer"</b> |
| </tt></li> |
| |
| </ul> |
| |
| <h2 id=analyzer>Static Analyzer Overview</h2> |
| The analyzer core performs symbolic execution of the given program. All the |
| input values are represented with symbolic values; further, the engine deduces |
| the values of all the expressions in the program based on the input symbols |
| and the path. The execution is path sensitive and every possible path through |
| the program is explored. The explored execution traces are represented with |
| <a href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1ExplodedGraph.html">ExplodedGraph</a> object. |
| Each node of the graph is |
| <a href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1ExplodedNode.html">ExplodedNode</a>, |
| which consists of a <tt>ProgramPoint</tt> and a <tt>ProgramState</tt>. |
| <p> |
| <a href="http://clang.llvm.org/doxygen/classclang_1_1ProgramPoint.html">ProgramPoint</a> |
| represents the corresponding location in the program (or the CFG). |
| <tt>ProgramPoint</tt> is also used to record additional information on |
| when/how the state was added. For example, <tt>PostPurgeDeadSymbolsKind</tt> |
| kind means that the state is the result of purging dead symbols - the |
| analyzer's equivalent of garbage collection. |
| <p> |
| <a href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1ProgramState.html">ProgramState</a> |
| represents abstract state of the program. It consists of: |
| <ul> |
| <li><tt>Environment</tt> - a mapping from source code expressions to symbolic |
| values |
| <li><tt>Store</tt> - a mapping from memory locations to symbolic values |
| <li><tt>GenericDataMap</tt> - constraints on symbolic values |
| </ul> |
| |
| <h3 id=interaction>Interaction with Checkers</h3> |
| |
| <p> |
| Checkers are not merely passive receivers of the analyzer core changes - they |
| actively participate in the <tt>ProgramState</tt> construction through the |
| <tt>GenericDataMap</tt> which can be used to store the checker-defined part |
| of the state. Each time the analyzer engine explores a new statement, it |
| notifies each checker registered to listen for that statement, giving it an |
| opportunity to either report a bug or modify the state. (As a rule of thumb, |
| the checker itself should be stateless.) The checkers are called one after another |
| in the predefined order; thus, calling all the checkers adds a chain to the |
| <tt>ExplodedGraph</tt>. |
| </p> |
| |
| <h3 id=values>Representing Values</h3> |
| |
| <p> |
| During symbolic execution, <a href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1SVal.html">SVal</a> |
| objects are used to represent the semantic evaluation of expressions. |
| They can represent things like concrete |
| integers, symbolic values, or memory locations (which are memory regions). |
| They are a discriminated union of "values", symbolic and otherwise. |
| If a value isn't symbolic, usually that means there is no symbolic |
| information to track. For example, if the value was an integer, such as |
| <tt>42</tt>, it would be a <a href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1nonloc_1_1ConcreteInt.html">ConcreteInt</a>, |
| and the checker doesn't usually need to track any state with the concrete |
| number. In some cases, <tt>SVal</tt> is not a symbol, but it really should be |
| a symbolic value. This happens when the analyzer cannot reason about something |
| (yet). An example is floating point numbers. In such cases, the |
| <tt>SVal</tt> will evaluate to <a href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1UnknownVal.html">UnknownVal</a>. |
| This represents a case that is outside the realm of the analyzer's reasoning |
| capabilities. <tt>SVals</tt> are value objects and their values can be viewed |
| using the <tt>.dump()</tt> method. Often they wrap persistent objects such as |
| symbols or regions. |
| </p> |
| |
| <p> |
| <a href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1SymExpr.html">SymExpr</a> (symbol) |
| is meant to represent abstract, but named, symbolic value. Symbols represent |
| an actual (immutable) value. We might not know what its specific value is, but |
| we can associate constraints with that value as we analyze a path. For |
| example, we might record that the value of a symbol is greater than |
| <tt>0</tt>, etc. |
| </p> |
| |
| <p> |
| <a href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1MemRegion.html">MemRegion</a> is similar to a symbol. |
| It is used to provide a lexicon of how to describe abstract memory. Regions can |
| layer on top of other regions, providing a layered approach to representing memory. |
| For example, a struct object on the stack might be represented by a <tt>VarRegion</tt>, |
| but a <tt>FieldRegion</tt> which is a subregion of the <tt>VarRegion</tt> could |
| be used to represent the memory associated with a specific field of that object. |
| So how do we represent symbolic memory regions? That's what |
| <a href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1SymbolicRegion.html">SymbolicRegion</a> |
| is for. It is a <tt>MemRegion</tt> that has an associated symbol. Since the |
| symbol is unique and has a unique name; that symbol names the region. |
| </p> |
| |
| <p> |
| Let's see how the analyzer processes the expressions in the following example: |
| </p> |
| |
| <p> |
| <pre class="code_example"> |
| int foo(int x) { |
| int y = x * 2; |
| int z = x; |
| ... |
| } |
| </pre> |
| </p> |
| |
| <p> |
| Let's look at how <tt>x*2</tt> gets evaluated. When <tt>x</tt> is evaluated, |
| we first construct an <tt>SVal</tt> that represents the lvalue of <tt>x</tt>, in |
| this case it is an <tt>SVal</tt> that references the <tt>MemRegion</tt> for <tt>x</tt>. |
| Afterwards, when we do the lvalue-to-rvalue conversion, we get a new <tt>SVal</tt>, |
| which references the value <b>currently bound</b> to <tt>x</tt>. That value is |
| symbolic; it's whatever <tt>x</tt> was bound to at the start of the function. |
| Let's call that symbol <tt>$0</tt>. Similarly, we evaluate the expression for <tt>2</tt>, |
| and get an <tt>SVal</tt> that references the concrete number <tt>2</tt>. When |
| we evaluate <tt>x*2</tt>, we take the two <tt>SVals</tt> of the subexpressions, |
| and create a new <tt>SVal</tt> that represents their multiplication (which in |
| this case is a new symbolic expression, which we might call <tt>$1</tt>). When we |
| evaluate the assignment to <tt>y</tt>, we again compute its lvalue (a <tt>MemRegion</tt>), |
| and then bind the <tt>SVal</tt> for the RHS (which references the symbolic value <tt>$1</tt>) |
| to the <tt>MemRegion</tt> in the symbolic store. |
| <br> |
| The second line is similar. When we evaluate <tt>x</tt> again, we do the same |
| dance, and create an <tt>SVal</tt> that references the symbol <tt>$0</tt>. Note, two <tt>SVals</tt> |
| might reference the same underlying values. |
| </p> |
| |
| <p> |
| To summarize, MemRegions are unique names for blocks of memory. Symbols are |
| unique names for abstract symbolic values. Some MemRegions represents abstract |
| symbolic chunks of memory, and thus are also based on symbols. SVals are just |
| references to values, and can reference either MemRegions, Symbols, or concrete |
| values (e.g., the number 1). |
| </p> |
| |
| <!-- |
| TODO: Add a picture. |
| <br> |
| Symbols<br> |
| FunctionalObjects are used throughout. |
| --> |
| |
| <h2 id=idea>Idea for a Checker</h2> |
| Here are several questions which you should consider when evaluating your |
| checker idea: |
| <ul> |
| <li>Can the check be effectively implemented without path-sensitive |
| analysis? See <a href="#ast">AST Visitors</a>.</li> |
| |
| <li>How high the false positive rate is going to be? Looking at the occurrences |
| of the issue you want to write a checker for in the existing code bases might |
| give you some ideas. </li> |
| |
| <li>How the current limitations of the analysis will effect the false alarm |
| rate? Currently, the analyzer only reasons about one procedure at a time (no |
| inter-procedural analysis). Also, it uses a simple range tracking based |
| solver to model symbolic execution.</li> |
| |
| <li>Consult the <a |
| href="http://llvm.org/bugs/buglist.cgi?query_format=advanced&bug_status=NEW&bug_status=REOPENED&version=trunk&component=Static%20Analyzer&product=clang">Bugzilla database</a> |
| to get some ideas for new checkers and consider starting with improving/fixing |
| bugs in the existing checkers.</li> |
| </ul> |
| |
| <p>Once an idea for a checker has been chosen, there are two key decisions that |
| need to be made: |
| <ul> |
| <li> Which events the checker should be tracking. This is discussed in more |
| detail in the section <a href="#events_callbacks">Events, Callbacks, and |
| Checker Class Structure</a>. |
| <li> What checker-specific data needs to be stored as part of the program |
| state (if any). This should be minimized as much as possible. More detail about |
| implementing custom program state is given in section <a |
| href="#extendingstates">Custom Program States</a>. |
| </ul> |
| |
| |
| <h2 id=registration>Checker Registration</h2> |
| All checker implementation files are located in |
| <tt>clang/lib/StaticAnalyzer/Checkers</tt> folder. The steps below describe |
| how the checker <tt>SimpleStreamChecker</tt>, which checks for misuses of |
| stream APIs, was registered with the analyzer. |
| Similar steps should be followed for a new checker. |
| <ol> |
| <li>A new checker implementation file, <tt>SimpleStreamChecker.cpp</tt>, was |
| created in the directory <tt>lib/StaticAnalyzer/Checkers</tt>. |
| <li>The following registration code was added to the implementation file: |
| <pre class="code_example"> |
| void ento::registerSimpleStreamChecker(CheckerManager &mgr) { |
| mgr.registerChecker<SimpleStreamChecker>(); |
| } |
| </pre> |
| <li>A package was selected for the checker and the checker was defined in the |
| table of checkers at <tt>include/clang/StaticAnalyzer/Checkers/Checkers.td</tt>. |
| Since all checkers should first be developed as "alpha", and the SimpleStreamChecker |
| performs UNIX API checks, the correct package is "alpha.unix", and the following |
| was added to the corresponding <tt>UnixAlpha</tt> section of <tt>Checkers.td</tt>: |
| <pre class="code_example"> |
| let ParentPackage = UnixAlpha in { |
| ... |
| def SimpleStreamChecker : Checker<"SimpleStream">, |
| HelpText<"Check for misuses of stream APIs">, |
| DescFile<"SimpleStreamChecker.cpp">; |
| ... |
| } // end "alpha.unix" |
| </pre> |
| |
| <li>The source code file was made visible to CMake by adding it to |
| <tt>lib/StaticAnalyzer/Checkers/CMakeLists.txt</tt>. |
| |
| </ol> |
| |
| After adding a new checker to the analyzer, one can verify that the new checker |
| was successfully added by seeing if it appears in the list of available checkers: |
| <br> <tt><b>$clang -cc1 -analyzer-checker-help</b></tt> |
| |
| <h2 id=events_callbacks>Events, Callbacks, and Checker Class Structure</h2> |
| |
| <p> All checkers inherit from the <tt><a |
| href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1Checker.html"> |
| Checker</a></tt> template class; the template parameter(s) describe the type of |
| events that the checker is interested in processing. The various types of events |
| that are available are described in the file <a |
| href="http://clang.llvm.org/doxygen/CheckerDocumentation_8cpp_source.html"> |
| CheckerDocumentation.cpp</a> |
| |
| <p> For each event type requested, a corresponding callback function must be |
| defined in the checker class (<a |
| href="http://clang.llvm.org/doxygen/CheckerDocumentation_8cpp_source.html"> |
| CheckerDocumentation.cpp</a> shows the |
| correct function name and signature for each event type). |
| |
| <p> As an example, consider <tt>SimpleStreamChecker</tt>. This checker needs to |
| take action at the following times: |
| |
| <ul> |
| <li>Before making a call to a function, check if the function is <tt>fclose</tt>. |
| If so, check the parameter being passed. |
| <li>After making a function call, check if the function is <tt>fopen</tt>. If |
| so, process the return value. |
| <li>When values go out of scope, check whether they are still-open file |
| descriptors, and report a bug if so. In addition, remove any information about |
| them from the program state in order to keep the state as small as possible. |
| <li>When file pointers "escape" (are used in a way that the analyzer can no longer |
| track them), mark them as such. This prevents false positives in the cases where |
| the analyzer cannot be sure whether the file was closed or not. |
| </ul> |
| |
| <p>These events that will be used for each of these actions are, respectively, <a |
| href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1check_1_1PreCall.html">PreCall</a>, |
| <a |
| href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1check_1_1PostCall.html">PostCall</a>, |
| <a |
| href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1check_1_1DeadSymbols.html">DeadSymbols</a>, |
| and <a |
| href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1check_1_1PointerEscape.html">PointerEscape</a>. |
| The high-level structure of the checker's class is thus: |
| |
| <pre class="code_example"> |
| class SimpleStreamChecker : public Checker<check::PreCall, |
| check::PostCall, |
| check::DeadSymbols, |
| check::PointerEscape> { |
| public: |
| |
| void checkPreCall(const CallEvent &Call, CheckerContext &C) const; |
| |
| void checkPostCall(const CallEvent &Call, CheckerContext &C) const; |
| |
| void checkDeadSymbols(SymbolReaper &SR, CheckerContext &C) const; |
| |
| ProgramStateRef checkPointerEscape(ProgramStateRef State, |
| const InvalidatedSymbols &Escaped, |
| const CallEvent *Call, |
| PointerEscapeKind Kind) const; |
| }; |
| </pre> |
| |
| <h2 id=extendingstates>Custom Program States</h2> |
| |
| <p> Checkers often need to keep track of information specific to the checks they |
| perform. However, since checkers have no guarantee about the order in which the |
| program will be explored, or even that all possible paths will be explored, this |
| state information cannot be kept within individual checkers. Therefore, if |
| checkers need to store custom information, they need to add new categories of |
| data to the <tt>ProgramState</tt>. The preferred way to do so is to use one of |
| several macros designed for this purpose. They are: |
| |
| <ul> |
| <li><a |
| href="http://clang.llvm.org/doxygen/ProgramStateTrait_8h.html#ae4cddb54383cd702a045d7c61b009147">REGISTER_TRAIT_WITH_PROGRAMSTATE</a>: |
| Used when the state information is a single value. The methods available for |
| state types declared with this macro are <tt>get</tt>, <tt>set</tt>, and |
| <tt>remove</tt>. |
| <li><a |
| href="http://clang.llvm.org/doxygen/CheckerContext_8h.html#aa27656fa0ce65b0d9ba12eb3c02e8be9">REGISTER_LIST_WITH_PROGRAMSTATE</a>: |
| Used when the state information is a list of values. The methods available for |
| state types declared with this macro are <tt>add</tt>, <tt>get</tt>, |
| <tt>remove</tt>, and <tt>contains</tt>. |
| <li><a |
| href="http://clang.llvm.org/doxygen/CheckerContext_8h.html#ad90f9387b94b344eaaf499afec05f4d1">REGISTER_SET_WITH_PROGRAMSTATE</a>: |
| Used when the state information is a set of values. The methods available for |
| state types declared with this macro are <tt>add</tt>, <tt>get</tt>, |
| <tt>remove</tt>, and <tt>contains</tt>. |
| <li><a |
| href="http://clang.llvm.org/doxygen/CheckerContext_8h.html#a6d1893bb8c18543337b6c363c1319fcf">REGISTER_MAP_WITH_PROGRAMSTATE</a>: |
| Used when the state information is a map from a key to a value. The methods |
| available for state types declared with this macro are <tt>add</tt>, |
| <tt>set</tt>, <tt>get</tt>, <tt>remove</tt>, and <tt>contains</tt>. |
| </ul> |
| |
| <p>All of these macros take as parameters the name to be used for the custom |
| category of state information and the data type(s) to be used for storage. The |
| data type(s) specified will become the parameter type and/or return type of the |
| methods that manipulate the new category of state information. Each of these |
| methods are templated with the name of the custom data type. |
| |
| <p>For example, a common case is the need to track data associated with a |
| symbolic expression; a map type is the most logical way to implement this. The |
| key for this map will be a pointer to a symbolic expression |
| (<tt>SymbolRef</tt>). If the data type to be associated with the symbolic |
| expression is an integer, then the custom category of state information would be |
| declared as |
| |
| <pre class="code_example"> |
| REGISTER_MAP_WITH_PROGRAMSTATE(ExampleDataType, SymbolRef, int) |
| </pre> |
| |
| The data would be accessed with the function |
| |
| <pre class="code_example"> |
| ProgramStateRef state; |
| SymbolRef Sym; |
| ... |
| int currentlValue = state->get<ExampleDataType>(Sym); |
| </pre> |
| |
| and set with the function |
| |
| <pre class="code_example"> |
| ProgramStateRef state; |
| SymbolRef Sym; |
| int newValue; |
| ... |
| ProgramStateRef newState = state->set<ExampleDataType>(Sym, newValue); |
| </pre> |
| |
| <p>In addition, the macros define a data type used for storing the data of the |
| new data category; the name of this type is the name of the data category with |
| "Ty" appended. For <tt>REGISTER_TRAIT_WITH_PROGRAMSTATE</tt>, this will simply |
| be passed data type; for the other three macros, this will be a specialized |
| version of the <a |
| href="http://llvm.org/doxygen/classllvm_1_1ImmutableList.html">llvm::ImmutableList</a>, |
| <a |
| href="http://llvm.org/doxygen/classllvm_1_1ImmutableSet.html">llvm::ImmutableSet</a>, |
| or <a |
| href="http://llvm.org/doxygen/classllvm_1_1ImmutableMap.html">llvm::ImmutableMap</a> |
| templated class. For the <tt>ExampleDataType</tt> example above, the type |
| created would be equivalent to writing the declaration: |
| |
| <pre class="code_example"> |
| typedef llvm::ImmutableMap<SymbolRef, int> ExampleDataTypeTy; |
| </pre> |
| |
| <p>These macros will cover a majority of use cases; however, they still have a |
| few limitations. They cannot be used inside namespaces (since they expand to |
| contain top-level namespace references), and the data types that they define |
| cannot be referenced from more than one file. |
| |
| <p>Note that <tt>ProgramStates</tt> are immutable; instead of modifying an existing |
| one, functions that modify the state will return a copy of the previous state |
| with the change applied. This updated state must be then provided to the |
| analyzer core by calling the <tt>CheckerContext::addTransition</tt> function. |
| <h2 id=bugs>Bug Reports</h2> |
| |
| |
| <p> When a checker detects a mistake in the analyzed code, it needs a way to |
| report it to the analyzer core so that it can be displayed. The two classes used |
| to construct this report are <tt><a |
| href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1BugType.html">BugType</a></tt> |
| and <tt><a |
| href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1BugReport.html"> |
| BugReport</a></tt>. |
| |
| <p> |
| <tt>BugType</tt>, as the name would suggest, represents a type of bug. The |
| constructor for <tt>BugType</tt> takes two parameters: The name of the bug |
| type, and the name of the category of the bug. These are used (e.g.) in the |
| summary page generated by the scan-build tool. |
| |
| <P> |
| The <tt>BugReport</tt> class represents a specific occurrence of a bug. In |
| the most common case, three parameters are used to form a <tt>BugReport</tt>: |
| <ol> |
| <li>The type of bug, specified as an instance of the <tt>BugType</tt> class. |
| <li>A short descriptive string. This is placed at the location of the bug in |
| the detailed line-by-line output generated by scan-build. |
| <li>The context in which the bug occurred. This includes both the location of |
| the bug in the program and the program's state when the location is reached. These are |
| both encapsulated in an <tt>ExplodedNode</tt>. |
| </ol> |
| |
| <p>In order to obtain the correct <tt>ExplodedNode</tt>, a decision must be made |
| as to whether or not analysis can continue along the current path. This decision |
| is based on whether the detected bug is one that would prevent the program under |
| analysis from continuing. For example, leaking of a resource should not stop |
| analysis, as the program can continue to run after the leak. Dereferencing a |
| null pointer, on the other hand, should stop analysis, as there is no way for |
| the program to meaningfully continue after such an error. |
| |
| <p>If analysis can continue, then the most recent <tt>ExplodedNode</tt> |
| generated by the checker can be passed to the <tt>BugReport</tt> constructor |
| without additional modification. This <tt>ExplodedNode</tt> will be the one |
| returned by the most recent call to <a |
| href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1CheckerContext.html#a264f48d97809707049689c37aa35af78">CheckerContext::addTransition</a>. |
| If no transition has been performed during the current callback, the checker should call <a |
| href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1CheckerContext.html#a264f48d97809707049689c37aa35af78">CheckerContext::addTransition()</a> |
| and use the returned node for bug reporting. |
| |
| <p>If analysis can not continue, then the current state should be transitioned |
| into a so-called <i>sink node</i>, a node from which no further analysis will be |
| performed. This is done by calling the <a |
| href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1CheckerContext.html#adeea33a5a2bed190210c4a2bb807a6f0"> |
| CheckerContext::generateSink</a> function; this function is the same as the |
| <tt>addTransition</tt> function, but marks the state as a sink node. Like |
| <tt>addTransition</tt>, this returns an <tt>ExplodedNode</tt> with the updated |
| state, which can then be passed to the <tt>BugReport</tt> constructor. |
| |
| <p> |
| After a <tt>BugReport</tt> is created, it should be passed to the analyzer core |
| by calling <a href = "http://clang.llvm.org/doxygen/classclang_1_1ento_1_1CheckerContext.html#ae7738af2cbfd1d713edec33d3203dff5">CheckerContext::emitReport</a>. |
| |
| <h2 id=ast>AST Visitors</h2> |
| Some checks might not require path-sensitivity to be effective. Simple AST walk |
| might be sufficient. If that is the case, consider implementing a Clang |
| compiler warning. On the other hand, a check might not be acceptable as a compiler |
| warning; for example, because of a relatively high false positive rate. In this |
| situation, AST callbacks <tt><b>checkASTDecl</b></tt> and |
| <tt><b>checkASTCodeBody</b></tt> are your best friends. |
| |
| <h2 id=testing>Testing</h2> |
| Every patch should be well tested with Clang regression tests. The checker tests |
| live in <tt>clang/test/Analysis</tt> folder. To run all of the analyzer tests, |
| execute the following from the <tt>clang</tt> build directory: |
| <pre class="code"> |
| $ <b>bin/llvm-lit -sv ../llvm/tools/clang/test/Analysis</b> |
| </pre> |
| |
| <h2 id=commands>Useful Commands/Debugging Hints</h2> |
| |
| <h3 id=attaching>Attaching the Debugger</h3> |
| |
| <p>When your command contains the <tt><b>-cc1</b></tt> flag, you can attach the |
| debugger to it directly:</p> |
| |
| <pre class="code"> |
| $ <b>gdb --args clang -cc1 -analyze -analyzer-checker=core test.c</b> |
| $ <b>lldb -- clang -cc1 -analyze -analyzer-checker=core test.c</b> |
| </pre> |
| |
| <p> |
| Otherwise, if your command line contains <tt><b>--analyze</b></tt>, |
| the actual clang instance would be run in a separate process. In |
| order to debug it, use the <tt><b>-###</b></tt> flag for obtaining |
| the command line of the child process: |
| </p> |
| |
| <pre class="code"> |
| $ <b>clang --analyze test.c -\#\#\#</b> |
| </pre> |
| |
| <p> |
| Below we describe a few useful command line arguments, all of which assume that |
| you are running <tt><b>clang -cc1</b></tt>. |
| </p> |
| |
| <h3 id=narrowing>Narrowing Down the Problem</h3> |
| |
| <p>While investigating a checker-related issue, instruct the analyzer to only |
| execute a single checker: |
| </p> |
| <pre class="code"> |
| $ <b>clang -cc1 -analyze -analyzer-checker=osx.KeychainAPI test.c</b> |
| </pre> |
| |
| <p>If you are experiencing a crash, to see which function is failing while |
| processing a large file use the <tt><b>-analyzer-display-progress</b></tt> |
| option.</p> |
| |
| <p>To selectively analyze only the given function, use the |
| <tt><b>-analyze-function</b></tt> option:</p> |
| <pre class="code"> |
| $ <b>clang -cc1 -analyze -analyzer-checker=core test.c -analyzer-display-progress</b> |
| ANALYZE (Syntax): test.c foo |
| ANALYZE (Syntax): test.c bar |
| ANALYZE (Path, Inline_Regular): test.c bar |
| ANALYZE (Path, Inline_Regular): test.c foo |
| $ <b>clang -cc1 -analyze -analyzer-checker=core test.c -analyzer-display-progress -analyze-function=foo</b> |
| ANALYZE (Syntax): test.c foo |
| ANALYZE (Path, Inline_Regular): test.c foo |
| </pre> |
| |
| <b>Note: </b> a fully qualified function name has to be used when selecting |
| C++ functions and methods, Objective-C methods and blocks, e.g.: |
| |
| <pre class="code"> |
| $ <b>clang -cc1 -analyze -analyzer-checker=core test.cc -analyze-function=foo(int)</b> |
| </pre> |
| |
| The fully qualified name can be found from the |
| <tt><b>-analyzer-display-progress</b></tt> output. |
| |
| <p>The bug reporter mechanism removes path diagnostics inside intermediate |
| function calls that have returned by the time the bug was found and contain |
| no interesting pieces. Usually it is up to the checkers to produce more |
| interesting pieces by adding custom <tt>BugReporterVisitor</tt> objects. |
| However, you can disable path pruning while debugging with the |
| <tt><b>-analyzer-config prune-paths=false</b></tt> option. |
| |
| <h3 id=visualizing>Visualizing the Analysis</h3> |
| |
| <p>To dump the AST, which often helps understanding how the program should |
| behave:</p> |
| <pre class="code"> |
| $ <b>clang -cc1 -ast-dump test.c</b> |
| </pre> |
| |
| <p>To view/dump CFG use <tt>debug.ViewCFG</tt> or <tt>debug.DumpCFG</tt> |
| checkers:</p> |
| <pre class="code"> |
| $ <b>clang -cc1 -analyze -analyzer-checker=debug.ViewCFG test.c</b> |
| </pre> |
| |
| <p><tt>ExplodedGraph</tt> (the state graph explored by the analyzer) can be |
| visualized with another debug checker:</p> |
| <pre class="code"> |
| $ <b>clang -cc1 -analyze -analyzer-checker=debug.ViewExplodedGraph test.c</b> |
| </pre> |
| <p>Or, equivalently, with <tt><b>-analyzer-viz-egraph-graphviz</b></tt> |
| option, which does the same thing - dumps the exploded graph in graphviz |
| <tt><b>.dot</b></tt> format.</p> |
| |
| <p>You can convert <tt><b>.dot</b></tt> files into other formats - in |
| particular, converting to <tt><b>.svg</b></tt> and viewing in your web |
| browser might be more comfortable than using a <tt><b>.dot</b></tt> viewer:</p> |
| <pre class="code"> |
| $ <b>dot -Tsvg ExprEngine-501e2e.dot -o ExprEngine-501e2e.svg</b> |
| </pre> |
| |
| <p>The <tt><b>-trim-egraph</b></tt> option removes all paths except those |
| leading to bug reports from the exploded graph dump. This is useful |
| because exploded graphs are often huge and hard to navigate.</p> |
| |
| <p>Viewing <tt>ExplodedGraph</tt> is your most powerful tool for understanding |
| the analyzer's false positives, because it gives comprehensive information |
| on every decision made by the analyzer across all analysis paths.</p> |
| |
| <p>There are more debug checkers available. To see all available debug checkers: |
| </p> |
| <pre class="code"> |
| $ <b>clang -cc1 -analyzer-checker-help | grep "debug"</b> |
| </pre> |
| |
| <h3 id=debugprints>Debug Prints and Tricks</h3> |
| |
| <p>To view "half-baked" <tt>ExplodedGraph</tt> while debugging, jump to a frame |
| that has <tt>clang::ento::ExprEngine</tt> object and execute:</p> |
| <pre class="code"> |
| (gdb) <b>p ViewGraph(0)</b> |
| </pre> |
| |
| <p>To see the <tt>ProgramState</tt> while debugging use the following command. |
| <pre class="code"> |
| (gdb) <b>p State->dump()</b> |
| </pre> |
| |
| <p>To see <tt>clang::Expr</tt> while debugging use the following command. If you |
| pass in a <tt>SourceManager</tt> object, it will also dump the corresponding line in the |
| source code.</p> |
| <pre class="code"> |
| (gdb) <b>p E->dump()</b> |
| </pre> |
| |
| <p>To dump AST of a method that the current <tt>ExplodedNode</tt> belongs |
| to:</p> |
| <pre class="code"> |
| (gdb) <b>p C.getPredecessor()->getCodeDecl().getBody()->dump()</b> |
| </pre> |
| |
| <h2 id=links>Making Your Checker Better</h2> |
| <ul> |
| <li>User facing documentation is important for adoption! Make sure the <a href="/available_checks.html">checker list </a>is updated |
| at the homepage of the analyzer. Also ensure the description is clear to |
| non-analyzer-developers in <tt>Checkers.td</tt>.</li> |
| <li>Warning and note messages should be clear and easy to understand, even if a bit long.</li> |
| <ul> |
| <li>Messages should start with a capital letter (unlike Clang warnings!) and should not |
| end with <tt>.</tt>.</li> |
| <li>Articles are usually omitted, eg. <tt>Dereference of a null pointer</tt> -> |
| <tt>Dereference of null pointer</tt>.</li> |
| <li>Introduce <tt>BugReporterVisitor</tt>s to emit additional notes that explain the warning |
| to the user better. There are some existing visitors that might be useful for your check, |
| e.g. <tt>trackNullOrUndefValue</tt>. For example, SimpleStreamChecker should highlight |
| the event of opening the file when reporting a file descriptor leak.</li> |
| </ul> |
| <li>If the check tracks anything in the program state, it needs to implement the |
| <tt>checkDeadSymbols</tt>callback to clean the state up.</li> |
| <li>The check should conservatively assume that the program is correct when a tracked symbol |
| is passed to a function that is unknown to the analyzer. |
| <tt>checkPointerEscape</tt> callback could help you handle that case.</li> |
| <li>Use safe and convenient APIs!</li> |
| <ul> |
| <li>Always use <tt>CheckerContext::generateErrorNode</tt> and |
| <tt>CheckerContext::generateNonFatalErrorNode</tt> for emitting bug reports. |
| Most importantly, never emit report against <tt>CheckerContext::getPredecessor</tt>.</li> |
| <li>Prefer <tt>checkPreCall</tt> and <tt>checkPostCall</tt> to |
| <tt>checkPreStmt<CallExpr></tt> and <tt>checkPostStmt<CallExpr></tt>.</li> |
| <li>Use <tt>CallDescription</tt> to detect hardcoded API calls in the program.</li> |
| <li>Simplify <tt>C.getState()->getSVal(E, C.getLocationContext())</tt> to <tt>C.getSVal(E)</tt>.</li> |
| </ul> |
| <li>Common sources of crashes:</li> |
| <ul> |
| <li><tt>CallEvent::getOriginExpr</tt> is nullable - for example, it returns null for an |
| automatic destructor of a variable. The same applies to some values generated while the |
| call was modeled, eg. <tt>SymbolConjured::getStmt</tt> is nullable.</li> |
| <li><tt>CallEvent::getDecl</tt> is nullable - for example, it returns null for a |
| call of symbolic function pointer.</li> |
| <li><tt>addTransition</tt>, <tt>generateSink</tt>, <tt>generateNonFatalErrorNode</tt>, |
| <tt>generateErrorNode</tt> are nullable because you can transition to a node that you have already visited.</li> |
| <li>Methods of <tt>CallExpr</tt>/<tt>FunctionDecl</tt>/<tt>CallEvent</tt> that |
| return arguments crash when the argument is out-of-bounds. If you checked the function name, |
| it doesn't mean that the function has the expected number of arguments! |
| Which is why you should use <tt>CallDescription</tt>.</li> |
| <li>Nullability of different entities within different kinds of symbols and regions is usually |
| documented via assertions in their constructors.</li> |
| <li><tt>NamedDecl::getName</tt> will fail if the name of the declaration is not a single token, |
| e.g. for destructors. You could use <tt>NamedDecl::getNameAsString</tt> for those cases. |
| Note that this method is much slower and should be used sparringly, e.g. only when generating reports |
| but not during analysis.</li> |
| <li>Is <tt>-analyzer-checker=core</tt> included in all test <tt>RUN:</tt> lines? It was never supported |
| to run the analyzer with the core checks disabled. It might cause unexpected behavior and |
| crashes. You should do all your testing with the core checks enabled.</li> |
| </ul> |
| </ul> |
| <li>Patterns that you should most likely avoid even if they're not technically wrong:</li> |
| <ul> |
| <li><tt>BugReporterVisitor</tt> should most likely not match the AST of the current program point |
| to decide when to emit a note. It is much easier to determine that by observing changes in |
| the program state.</li> |
| <li>In <tt>State->getSVal(Region)</tt>, if <tt>Region</tt> is not known to be a <tt>TypedValueRegion</tt> |
| and the optional type argument is not specified, the checker may accidentally try to dereference a |
| void pointer.</li> |
| <li>Checker logic should not depend on whether a certain value is a <tt>Loc</tt> or <tt>NonLoc</tt>. |
| It should be immediately obvious whether the <tt>SVal</tt> is a <tt>Loc</tt> or a |
| <tt>NonLoc</tt> depending on the AST that is being checked. Checking whether a value |
| is <tt>Loc</tt> or <tt>Unknown</tt>/<tt>Undefined</tt> or whether the value is |
| <tt>NonLoc</tt> or <tt>Unknown</tt>/<tt>Undefined</tt> is totally fine.</li> |
| <li>New symbols should not be constructed in the checker via direct calls to <tt>SymbolManager</tt>, |
| unless they are of <tt>SymbolMetadata</tt> class tagged by the checker, |
| or they represent newly created values such as the return value in <tt>evalCall</tt>. |
| For modeling arithmetic/bitwise/comparison operations, <tt>SValBuilder</tt> should be used.</li> |
| <li>Custom <tt>ProgramPointTag</tt>s should not be created within the checker. There is usually |
| no good reason for a checker to chain multiple nodes together, because checkers aren't worklists.</li> |
| </ul> |
| <li>Checkers are encouraged to actively participate in the analysis by sharing |
| their knowledge about the program state with the rest of the analyzer, |
| but they should not be disrupting the analysis unnecessarily:</li> |
| <ul> |
| <li>If a checker splits program state, this must be based on knowledge that |
| the newly appearing branches are definitely possible and worth exploring |
| from the user's perspective. Otherwise the state split should be delayed |
| until there's an indication that one of the paths is taken, or one of the |
| paths needs to be dropped entirely. For example, it is fine to eagerly split |
| paths while modeling <tt>isalpha(x)</tt> as long as <tt>x</tt> is constrained accordingly on |
| each path. At the same time, it is not a good idea to split paths over the |
| return value of <tt>printf()</tt> while modeling the call because nobody ever checks |
| for errors in <tt>printf</tt>; at best, it'd just double the remaining analysis time. |
| </li> |
| <li>Caution is advised when using <tt>CheckerContext::generateNonFatalErrorNode</tt> |
| because it generates an independent transition, much like <tt>addTransition</tt>. |
| It is easy to accidentally split paths while using it. Ideally, try to |
| structure the code so that it was obvious that every <tt>addTransition</tt> or |
| <tt>generateNonFatalErrorNode</tt> (or sequence of such if the split is intended) is |
| immediately followed by return from the checker callback.</li> |
| <li>Multiple implementations of <tt>evalCall</tt> in different checkers should not conflict.</li> |
| <li>When implementing <tt>evalAssume</tt>, the checker should always return a non-null state |
| for either the true assumption or the false assumption (or both).</li> |
| <li>Checkers shall not mutate values of expressions, i.e. use the <tt>ProgramState::BindExpr</tt> API, |
| unless they are fully responsible for computing the value. |
| Under no circumstances should they change non-<tt>Unknown</tt> values of expressions. |
| Currently the only valid use case for this API in checkers is to model the return value in the <tt>evalCall</tt> callback. |
| If expression values are incorrect, <tt>ExprEngine</tt> needs to be fixed instead.</li> |
| </ul> |
| |
| <h2 id=additioninformation>Additional Sources of Information</h2> |
| |
| Here are some additional resources that are useful when working on the Clang |
| Static Analyzer: |
| |
| <ul> |
| <li><a href="http://lcs.ios.ac.cn/~xuzb/canalyze/memmodel.pdf">Xu, Zhongxing & |
| Kremenek, Ted & Zhang, Jian. (2010). A Memory Model for Static Analysis of C |
| Programs.</a></li> |
| <li><a href="https://github.com/llvm/llvm-project/blob/master/clang/lib/StaticAnalyzer/README.txt"> |
| The Clang Static Analyzer README</a></li> |
| <li><a href="https://github.com/llvm/llvm-project/blob/master/clang/docs/analyzer/RegionStore.txt"> |
| Documentation for how the Store works</a></li> |
| <li><a href="https://github.com/llvm/llvm-project/blob/master/clang/docs/analyzer/IPA.txt"> |
| Documentation about inlining</a></li> |
| <li> The "Building a Checker in 24 hours" presentation given at the <a |
| href="http://llvm.org/devmtg/2012-11">November 2012 LLVM Developer's |
| meeting</a>. Describes the construction of SimpleStreamChecker. <a |
| href="http://llvm.org/devmtg/2012-11/Zaks-Rose-Checker24Hours.pdf">Slides</a> |
| and <a |
| href="https://youtu.be/kdxlsP5QVPw">video</a> |
| are available.</li> |
| <li> |
| <a href="https://github.com/haoNoQ/clang-analyzer-guide/releases/download/v0.1/clang-analyzer-guide-v0.1.pdf"> |
| Artem Degrachev: Clang Static Analyzer: A Checker Developer's Guide |
| </a> (reading the previous items first might be a good idea)</li> |
| <li>The list of <a href="implicit_checks.html">Implicit Checkers</a></li> |
| <li> <a href="http://clang.llvm.org/doxygen">Clang doxygen</a>. Contains |
| up-to-date documentation about the APIs available in Clang. Relevant entries |
| have been linked throughout this page. Also of use is the |
| <a href="http://llvm.org/doxygen">LLVM doxygen</a>, when dealing with classes |
| from LLVM.</li> |
| <li> The <a href="http://lists.llvm.org/mailman/listinfo/cfe-dev"> |
| cfe-dev mailing list</a>. This is the primary mailing list used for |
| discussion of Clang development (including static code analysis). The |
| <a href="http://lists.llvm.org/pipermail/cfe-dev">archive</a> also contains |
| a lot of information.</li> |
| </ul> |
| |
| </div> |
| </div> |
| </body> |
| </html> |