docs/DataFlowSanitizer.rst - clang - Git at Google

 =================
 DataFlowSanitizer
 =================

 .. toctree::
    :hidden:

    DataFlowSanitizerDesign

 .. contents::
    :local:

 Introduction
 ============

 DataFlowSanitizer is a generalised dynamic data flow analysis.

 Unlike other Sanitizer tools, this tool is not designed to detect a
 specific class of bugs on its own.  Instead, it provides a generic
 dynamic data flow analysis framework to be used by clients to help
 detect application-specific issues within their own code.

 Usage
 =====

 With no program changes, applying DataFlowSanitizer to a program
 will not alter its behavior.  To use DataFlowSanitizer, the program
 uses API functions to apply tags to data to cause it to be tracked, and to
 check the tag of a specific data item.  DataFlowSanitizer manages
 the propagation of tags through the program according to its data flow.

 The APIs are defined in the header file ``sanitizer/dfsan_interface.h``.
 For further information about each function, please refer to the header
 file.

 ABI List
 --------

 DataFlowSanitizer uses a list of functions known as an ABI list to decide
 whether a call to a specific function should use the operating system's native
 ABI or whether it should use a variant of this ABI that also propagates labels
 through function parameters and return values.  The ABI list file also controls
 how labels are propagated in the former case.  DataFlowSanitizer comes with a
 default ABI list which is intended to eventually cover the glibc library on
 Linux but it may become necessary for users to extend the ABI list in cases
 where a particular library or function cannot be instrumented (e.g. because
 it is implemented in assembly or another language which DataFlowSanitizer does
 not support) or a function is called from a library or function which cannot
 be instrumented.

 DataFlowSanitizer's ABI list file is a :doc:`SanitizerSpecialCaseList`.
 The pass treats every function in the ``uninstrumented`` category in the
 ABI list file as conforming to the native ABI.  Unless the ABI list contains
 additional categories for those functions, a call to one of those functions
 will produce a warning message, as the labelling behavior of the function
 is unknown.  The other supported categories are ``discard``, ``functional``
 and ``custom``.

 * ``discard`` -- To the extent that this function writes to (user-accessible)
   memory, it also updates labels in shadow memory (this condition is trivially
   satisfied for functions which do not write to user-accessible memory).  Its
   return value is unlabelled.
 * ``functional`` -- Like ``discard``, except that the label of its return value
   is the union of the label of its arguments.
 * ``custom`` -- Instead of calling the function, a custom wrapper ``__dfsw_F``
   is called, where ``F`` is the name of the function.  This function may wrap
   the original function or provide its own implementation.  This category is
   generally used for uninstrumentable functions which write to user-accessible
   memory or which have more complex label propagation behavior.  The signature
   of ``__dfsw_F`` is based on that of ``F`` with each argument having a
   label of type ``dfsan_label`` appended to the argument list.  If ``F``
   is of non-void return type a final argument of type ``dfsan_label *``
   is appended to which the custom function can store the label for the
   return value.  For example:

 .. code-block:: c++

   void f(int x);
   void __dfsw_f(int x, dfsan_label x_label);

   void *memcpy(void *dest, const void *src, size_t n);
   void *__dfsw_memcpy(void *dest, const void *src, size_t n,
                       dfsan_label dest_label, dfsan_label src_label,
                       dfsan_label n_label, dfsan_label *ret_label);

 If a function defined in the translation unit being compiled belongs to the
 ``uninstrumented`` category, it will be compiled so as to conform to the
 native ABI.  Its arguments will be assumed to be unlabelled, but it will
 propagate labels in shadow memory.

 For example:

 .. code-block:: none

   # main is called by the C runtime using the native ABI.
   fun:main=uninstrumented
   fun:main=discard

   # malloc only writes to its internal data structures, not user-accessible memory.
   fun:malloc=uninstrumented
   fun:malloc=discard

   # tolower is a pure function.
   fun:tolower=uninstrumented
   fun:tolower=functional

   # memcpy needs to copy the shadow from the source to the destination region.
   # This is done in a custom function.
   fun:memcpy=uninstrumented
   fun:memcpy=custom

 Example
 =======

 The following program demonstrates label propagation by checking that
 the correct labels are propagated.

 .. code-block:: c++

   #include <sanitizer/dfsan_interface.h>
   #include <assert.h>

   int main(void) {
     int i = 1;
     dfsan_label i_label = dfsan_create_label("i", 0);
     dfsan_set_label(i_label, &i, sizeof(i));

     int j = 2;
     dfsan_label j_label = dfsan_create_label("j", 0);
     dfsan_set_label(j_label, &j, sizeof(j));

     int k = 3;
     dfsan_label k_label = dfsan_create_label("k", 0);
     dfsan_set_label(k_label, &k, sizeof(k));

     dfsan_label ij_label = dfsan_get_label(i + j);
     assert(dfsan_has_label(ij_label, i_label));
     assert(dfsan_has_label(ij_label, j_label));
     assert(!dfsan_has_label(ij_label, k_label));

     dfsan_label ijk_label = dfsan_get_label(i + j + k);
     assert(dfsan_has_label(ijk_label, i_label));
     assert(dfsan_has_label(ijk_label, j_label));
     assert(dfsan_has_label(ijk_label, k_label));

     return 0;
   }

 Current status
 ==============

 DataFlowSanitizer is a work in progress, currently under development for
 x86\_64 Linux.

 Design
 ======

 Please refer to the :doc:`design document<DataFlowSanitizerDesign>`.
	=================
	DataFlowSanitizer
	=================

	.. toctree::
	:hidden:

	DataFlowSanitizerDesign

	.. contents::
	:local:

	Introduction
	============

	DataFlowSanitizer is a generalised dynamic data flow analysis.

	Unlike other Sanitizer tools, this tool is not designed to detect a
	specific class of bugs on its own. Instead, it provides a generic
	dynamic data flow analysis framework to be used by clients to help
	detect application-specific issues within their own code.

	Usage
	=====

	With no program changes, applying DataFlowSanitizer to a program
	will not alter its behavior. To use DataFlowSanitizer, the program
	uses API functions to apply tags to data to cause it to be tracked, and to
	check the tag of a specific data item. DataFlowSanitizer manages
	the propagation of tags through the program according to its data flow.

	The APIs are defined in the header file ``sanitizer/dfsan_interface.h``.
	For further information about each function, please refer to the header
	file.

	ABI List
	--------

	DataFlowSanitizer uses a list of functions known as an ABI list to decide
	whether a call to a specific function should use the operating system's native
	ABI or whether it should use a variant of this ABI that also propagates labels
	through function parameters and return values. The ABI list file also controls
	how labels are propagated in the former case. DataFlowSanitizer comes with a
	default ABI list which is intended to eventually cover the glibc library on
	Linux but it may become necessary for users to extend the ABI list in cases
	where a particular library or function cannot be instrumented (e.g. because
	it is implemented in assembly or another language which DataFlowSanitizer does
	not support) or a function is called from a library or function which cannot
	be instrumented.

	DataFlowSanitizer's ABI list file is a :doc:`SanitizerSpecialCaseList`.
	The pass treats every function in the ``uninstrumented`` category in the
	ABI list file as conforming to the native ABI. Unless the ABI list contains
	additional categories for those functions, a call to one of those functions
	will produce a warning message, as the labelling behavior of the function
	is unknown. The other supported categories are ``discard``, ``functional``
	and ``custom``.

	* ``discard`` -- To the extent that this function writes to (user-accessible)
	memory, it also updates labels in shadow memory (this condition is trivially
	satisfied for functions which do not write to user-accessible memory). Its
	return value is unlabelled.
	* ``functional`` -- Like ``discard``, except that the label of its return value
	is the union of the label of its arguments.
	* ``custom`` -- Instead of calling the function, a custom wrapper ``__dfsw_F``
	is called, where ``F`` is the name of the function. This function may wrap
	the original function or provide its own implementation. This category is
	generally used for uninstrumentable functions which write to user-accessible
	memory or which have more complex label propagation behavior. The signature
	of ``__dfsw_F`` is based on that of ``F`` with each argument having a
	label of type ``dfsan_label`` appended to the argument list. If ``F``
	is of non-void return type a final argument of type ``dfsan_label *``
	is appended to which the custom function can store the label for the
	return value. For example:

	.. code-block:: c++

	void f(int x);
	void __dfsw_f(int x, dfsan_label x_label);

	void memcpy(void dest, const void *src, size_t n);
	void __dfsw_memcpy(void dest, const void *src, size_t n,
	dfsan_label dest_label, dfsan_label src_label,
	dfsan_label n_label, dfsan_label *ret_label);

	If a function defined in the translation unit being compiled belongs to the
	``uninstrumented`` category, it will be compiled so as to conform to the
	native ABI. Its arguments will be assumed to be unlabelled, but it will
	propagate labels in shadow memory.

	For example:

	.. code-block:: none

	# main is called by the C runtime using the native ABI.
	fun:main=uninstrumented
	fun:main=discard

	# malloc only writes to its internal data structures, not user-accessible memory.
	fun:malloc=uninstrumented
	fun:malloc=discard

	# tolower is a pure function.
	fun:tolower=uninstrumented
	fun:tolower=functional

	# memcpy needs to copy the shadow from the source to the destination region.
	# This is done in a custom function.
	fun:memcpy=uninstrumented
	fun:memcpy=custom

	Example
	=======

	The following program demonstrates label propagation by checking that
	the correct labels are propagated.

	.. code-block:: c++

	#include <sanitizer/dfsan_interface.h>
	#include <assert.h>

	int main(void) {
	int i = 1;
	dfsan_label i_label = dfsan_create_label("i", 0);
	dfsan_set_label(i_label, &i, sizeof(i));

	int j = 2;
	dfsan_label j_label = dfsan_create_label("j", 0);
	dfsan_set_label(j_label, &j, sizeof(j));

	int k = 3;
	dfsan_label k_label = dfsan_create_label("k", 0);
	dfsan_set_label(k_label, &k, sizeof(k));

	dfsan_label ij_label = dfsan_get_label(i + j);
	assert(dfsan_has_label(ij_label, i_label));
	assert(dfsan_has_label(ij_label, j_label));
	assert(!dfsan_has_label(ij_label, k_label));

	dfsan_label ijk_label = dfsan_get_label(i + j + k);
	assert(dfsan_has_label(ijk_label, i_label));
	assert(dfsan_has_label(ijk_label, j_label));
	assert(dfsan_has_label(ijk_label, k_label));

	return 0;
	}

	Current status
	==============

	DataFlowSanitizer is a work in progress, currently under development for
	x86\_64 Linux.

	Design
	======

	Please refer to the :doc:`design document<DataFlowSanitizerDesign>`.