blob: bab5c85bce384ffd8391ab0ec2439fcbd1ecb359 [file] [log] [blame]
%%=============================================================================
\section{Structure}
\label{section:struct}
%%=============================================================================
%%-----------------------------------------------------------------------------
\subsection{Guiding Principles}
\label{section:struct:principles}
%%-----------------------------------------------------------------------------
SAFECode is implemented as a set of LLVM analysis and transform
passes. One important question in designing SAFECode was how to split
up the work of transforms between different LLVM passes.
The following principles guide the design of SAFECode's software
architecture:
\begin{enumerate}
\item{\textbf{Separation of Concerns:}}
SAFECode passes should be as simple as possible. Previous versions of
SAFECode had passes that perform both run-time check instrumentation
and optimization of the run-time checks. This created a situation in
which code was complex and passes had a dozen different options to
turn features on and off.
By separating concerns, each pass is smaller, easier to read, and
easier to understand. It also makes the software more flexible;
features can be enabled and disabled by simply choosing to run or not
run a particular transform pass.
\item{\textbf{Enable Integration into LLVM and Clang:}}
At some point in the future, we (or others) may want to integrate
parts of SAFECode into LLVM and/or Clang. Doing so would have many
benefits, including wide-scale adoption, better integration with the
compiler tool-chain, and additional developers.
To make integration into other LLVM projects easier, SAFECode attempts
to adhere to the next principle.
\item{\textbf{Make Whole-Program Analysis an Optimization:}}
A simple approach to implementing SAFECode is to first run
whole-program analysis passes to infer properties about the program
and then to have transforms use this information to instrument the
code with run-time checks when necessary. The problem is that LLVM
performs whole-program analysis in the linker; the linker, in turn,
runs LLVM transform passes more or less unconditionally.
Therefore, we want SAFECode instrumentation passes to require no
whole-program analysis at all and write the more sophisticated
features into optimizations on run-time checks. The front-end (e.g.,
Clang) can then decide whether to instrument a program and run a
simple transform pass, and the linker can run the whole-program
analysis passes and transformations.
\end{enumerate}
%%-----------------------------------------------------------------------------
\subsection{Compilation Phases}
\label{section:struct:phases}
%%-----------------------------------------------------------------------------
SAFECode's various transform passes can be, roughly speaking, grouped
into several phases as follows:
\begin{enumerate}
\item{\textbf{Check Insertion Phase:}}
In this phase, SAFECode examines the code for operations which may
cause a memory safety error and inserts run-time checks as needed.
These run-time checks are simple and do not assume that everything
about the program is known. They are designed so that they can be
used by a front-end (like Clang) to instrument programs.
\item{\textbf{Check Optimization Phase:}}
During this phase, SAFECode attempts to optimize the run-time checks
it inserted in the Check Insertion Phase. Some of these optimizations
do not require whole program analysis and could be integrated into a
front-end compiler; others do require whole-program analysis and would
normally be implemented in an optimizing linker.
A key feature of these optimization passes is that they work on both
instrumented and uninstrumented code. If there are no run-time checks
to optimize, they should do nothing.
An important optimization that is executed during this phase is
Automatic Pool Allocation. Automatic Pool Allocation will change all
heap allocations to allocate memory out of distinct pools, and it will
also modify run-time checks to include pool handles; the run-time
checks can use these pool handles to speed up their checks or to make
their checks more strict.
\item{\textbf{Check Completion Phase:}}
The Check Completion Phase uses whole-program analysis to modify the run-time
checks in a program with completeness information. Completeness means
that everything that can be known about a memory object is known to
the compiler, and therefore the run-time check can be more strict
about what it considers to be correct behavior.
\item{\textbf{Debug Instrumentation Phase:}}
Finally, there's a phase for instrumenting the run-time checks with
debug information if the user wants to use SAFECode more as a debugger
than as a production-use memory safety system.
\end{enumerate}