clang/docs/TypeSanitizer.rst - llvm-project - Git at Google

 =============
 TypeSanitizer
 =============

 .. contents::
    :local:

 Introduction
 ============

 The TypeSanitizer is a detector for strict type aliasing violations. It consists of a compiler
 instrumentation module and a run-time library. C/C++ has type-based aliasing rules, and LLVM
 can exploit these for optimizations given the TBAA metadata Clang emits. In general, a pointer
 of a given type cannot access an object of a different type, with only a few exceptions.

 These rules aren't always apparent to users, which leads to code that violates these rules
 (e.g. for type punning). This can lead to optimization passes introducing bugs unless the
 code is build with ``-fno-strict-aliasing``, sacrificing performance.

 TypeSanitizer is built to catch when these strict aliasing rules have been violated, helping
 users find where such bugs originate in their code despite the code looking valid at first glance.

 As TypeSanitizer is still experimental, it can currently have a large impact on runtime speed,
 memory use, and code size. It also has a large compile-time overhead. Work is being done to
 reduce these impacts.

 The TypeSanitizer Algorithm
 ===========================
 For each TBAA type-access descriptor, encoded in LLVM IR using TBAA Metadata, the instrumentation
 pass generates descriptor tables. Thus there is a unique pointer to each type (and access descriptor).
 These tables are comdat (except for anonymous-namespace types), so the pointer values are unique
 across the program.

 The descriptors refer to other descriptors to form a type aliasing tree, like how LLVM's TBAA data
 does.

 The runtime uses 8 bytes of shadow memory, the size of the pointer to the type descriptor, for
 every byte of accessed data in the program. The first byte of a type will have its shadow memory
 be set to the pointer to its type descriptor. Aside from that, there are some other values it may be.

 * 0 is used to represent an unknown type
 * Negative numbers represent an interior byte: A byte inside a type that is not the first one. As an
   example, a value of -2 means you are in the third byte of a type.

 The Instrumentation first checks for an exact match between the type of the current access and the
 type for that address in the shadow memory. This can quickly be done by checking pointer values. If
 it matches, it checks the remaining shadow memory of the type to ensure they are the correct negative
 numbers. If this fails, it calls the "slow path" check. If the exact match fails, we check to see if
 the value, and the remainder of the shadow bytes, is 0. If they are, we can set the shadow memory to
 the correct type descriptor pointer for the first byte, and the correct negative numbers for the rest
 of the type's shadow.

 If the type in shadow memory is neither an exact match nor 0, we call the slower runtime check. It
 uses the full TBAA algorithm, just as the compiler does, to determine when two types are permitted to
 alias.

 The instrumentation pass inserts calls to the memset intrinsic to set the memory updated by memset,
 memcpy, and memmove, as well as allocas/byval (and for lifetime.start/end) to reset the shadow memory
 to reflect that the type is now unknown. The runtime intercepts memset, memcpy, etc. to perform the
 same function for the library calls.

 How to build
 ============

 Build LLVM/Clang with `CMake <https://llvm.org/docs/CMake.html>`_ and enable
 the ``compiler-rt`` runtime. An example CMake configuration that will allow
 for the use/testing of TypeSanitizer:

 .. code-block:: console

    $ cmake -DCMAKE_BUILD_TYPE=Release -DLLVM_ENABLE_PROJECTS="clang" -DLLVM_ENABLE_RUNTIMES="compiler-rt" <path to source>/llvm

 Usage
 =====

 Compile and link your program with ``-fsanitize=type`` flag. The
 TypeSanitizer run-time library should be linked to the final executable, so
 make sure to use ``clang`` (not ``ld``) for the final link step. To
 get a reasonable performance add ``-O1`` or higher.
 TypeSanitizer by default doesn't print the full stack trace in error messages. Use ``TYSAN_OPTIONS=print_stacktrace=1``
 to print the full trace. To get nicer stack traces in error messages add ``-fno-omit-frame-pointer`` and
 ``-g``.  To get perfect stack traces you may need to disable inlining (just use ``-O1``) and tail call elimination
 (``-fno-optimize-sibling-calls``).

 .. code-block:: console

     % cat example_AliasViolation.c
     int main(int argc, char **argv) {
       int x = 100;
       float *y = (float*)&x;
       *y += 2.0f;          // Strict aliasing violation
       return 0;
     }

     # Compile and link
     % clang++ -g -fsanitize=type example_AliasViolation.cc

 The program will print an error message to ``stderr`` each time a strict aliasing violation is detected.
 The program won't terminate, which will allow you to detect many strict aliasing violations in one
 run.

 .. code-block:: console

     % ./a.out
     ==1375532==ERROR: TypeSanitizer: type-aliasing-violation on address 0x7ffeebf1a72c (pc 0x5b3b1145ff41 bp 0x7ffeebf1a660 sp 0x7ffeebf19e08 tid 1375532)
     READ of size 4 at 0x7ffeebf1a72c with type float accesses an existing object of type int
         #0 0x5b3b1145ff40 in main example_AliasViolation.c:4:10

     ==1375532==ERROR: TypeSanitizer: type-aliasing-violation on address 0x7ffeebf1a72c (pc 0x5b3b1146008a bp 0x7ffeebf1a660 sp 0x7ffeebf19e08 tid 1375532)
     WRITE of size 4 at 0x7ffeebf1a72c with type float accesses an existing object of type int
         #0 0x5b3b11460089 in main example_AliasViolation.c:4:10

 Error terminology
 ------------------

 There are some terms that may appear in TypeSanitizer errors that are derived from
 `TBAA Metadata <https://llvm.org/docs/LangRef.html#tbaa-metadata>`. This section hopes to provide a
 brief dictionary of these terms.

 * ``omnipotent char``: This is a special type which can alias with anything. Its name comes from the C/C++
   type ``char``.
 * ``type p[x]``: This signifies pointers to the type. ``x`` is the number of indirections to reach the final value.
   As an example, a pointer to a pointer to an integer would be ``type p2 int``.

 TypeSanitizer is still experimental. User-facing error messages should be improved in the future to remove
 references to LLVM IR specific terms.

 Sanitizer features
 ==================

 ``__has_feature(type_sanitizer)``
 ------------------------------------

 In some cases one may need to execute different code depending on whether
 TypeSanitizer is enabled.
 :ref:`\_\_has\_feature <langext-__has_feature-__has_extension>` can be used for
 this purpose.

 .. code-block:: c

     #if defined(__has_feature)
     #  if __has_feature(type_sanitizer)
     // code that builds only under TypeSanitizer
     #  endif
     #endif

 ``__attribute__((no_sanitize("type")))``
 -----------------------------------------------

 Some code you may not want to be instrumented by TypeSanitizer.  One may use the
 function attribute ``no_sanitize("type")`` to disable instrumenting type aliasing.
 It is possible, depending on what happens in non-instrumented code, that instrumented code
 emits false-positives/ false-negatives. This attribute may not be supported by other
 compilers, so we suggest to use it together with ``__has_feature(type_sanitizer)``.

 ``__attribute__((disable_sanitizer_instrumentation))``
 --------------------------------------------------------

 The ``disable_sanitizer_instrumentation`` attribute can be applied to functions
 to prevent all kinds of instrumentation. As a result, it may introduce false
 positives and incorrect stack traces. Therefore, it should be used with care,
 and only if absolutely required; for example for certain code that cannot
 tolerate any instrumentation and resulting side-effects. This attribute
 overrides ``no_sanitize("type")``.

 Ignorelist
 ----------

 TypeSanitizer supports ``src`` and ``fun`` entity types in
 :doc:`SanitizerSpecialCaseList`, that can be used to suppress aliasing
 violation reports in the specified source files or functions. Like
 with other methods of ignoring instrumentation, this can result in false
 positives/ false-negatives.

 Limitations
 -----------

 * TypeSanitizer uses more real memory than a native run. It uses 8 bytes of
   shadow memory for each byte of user memory.
 * There are transformation passes which run before TypeSanitizer. If these
   passes optimize out an aliasing violation, TypeSanitizer cannot catch it.
 * Currently, all instrumentation is inlined. This can result in a **15x**
   (on average) increase in generated file size, and **3x** to **7x** increase
   in compile time. In some documented cases this can cause the compiler to hang.
   There are plans to improve this in the future.
 * Codebases that use unions and struct-initialized variables can see incorrect
   results, as TypeSanitizer doesn't yet instrument these reliably.
 * Since Clang & LLVM's TBAA system is used to generate the checks used by the
   instrumentation, TypeSanitizer follows Clang & LLVM's rules for type aliasing.
   There may be situations where that disagrees with the standard. However this
   does at least mean that TypeSanitizer will catch any aliasing violations that
   would cause bugs when compiling with Clang & LLVM.
 * TypeSanitizer cannot currently be run alongside other sanitizers such as
   AddressSanitizer, ThreadSanitizer or UndefinedBehaviourSanitizer.

 Current Status
 --------------

 TypeSanitizer is brand new, and still in development. There are some known
 issues, especially in areas where Clang's emitted TBAA data isn't extensive
 enough for TypeSanitizer's runtime.

 We are actively working on enhancing the tool --- stay tuned.  Any help,
 issues, pull requests, ideas, is more than welcome. You can find the
 `issue tracker here. <https://github.com/llvm/llvm-project/issues?q=is%3Aissue%20state%3Aopen%20TySan%20label%3Acompiler-rt%3Atysan>`_
	=============
	TypeSanitizer
	=============

	.. contents::
	:local:

	Introduction
	============

	The TypeSanitizer is a detector for strict type aliasing violations. It consists of a compiler
	instrumentation module and a run-time library. C/C++ has type-based aliasing rules, and LLVM
	can exploit these for optimizations given the TBAA metadata Clang emits. In general, a pointer
	of a given type cannot access an object of a different type, with only a few exceptions.

	These rules aren't always apparent to users, which leads to code that violates these rules
	(e.g. for type punning). This can lead to optimization passes introducing bugs unless the
	code is build with ``-fno-strict-aliasing``, sacrificing performance.

	TypeSanitizer is built to catch when these strict aliasing rules have been violated, helping
	users find where such bugs originate in their code despite the code looking valid at first glance.

	As TypeSanitizer is still experimental, it can currently have a large impact on runtime speed,
	memory use, and code size. It also has a large compile-time overhead. Work is being done to
	reduce these impacts.

	The TypeSanitizer Algorithm
	===========================
	For each TBAA type-access descriptor, encoded in LLVM IR using TBAA Metadata, the instrumentation
	pass generates descriptor tables. Thus there is a unique pointer to each type (and access descriptor).
	These tables are comdat (except for anonymous-namespace types), so the pointer values are unique
	across the program.

	The descriptors refer to other descriptors to form a type aliasing tree, like how LLVM's TBAA data
	does.

	The runtime uses 8 bytes of shadow memory, the size of the pointer to the type descriptor, for
	every byte of accessed data in the program. The first byte of a type will have its shadow memory
	be set to the pointer to its type descriptor. Aside from that, there are some other values it may be.

	* 0 is used to represent an unknown type
	* Negative numbers represent an interior byte: A byte inside a type that is not the first one. As an
	example, a value of -2 means you are in the third byte of a type.

	The Instrumentation first checks for an exact match between the type of the current access and the
	type for that address in the shadow memory. This can quickly be done by checking pointer values. If
	it matches, it checks the remaining shadow memory of the type to ensure they are the correct negative
	numbers. If this fails, it calls the "slow path" check. If the exact match fails, we check to see if
	the value, and the remainder of the shadow bytes, is 0. If they are, we can set the shadow memory to
	the correct type descriptor pointer for the first byte, and the correct negative numbers for the rest
	of the type's shadow.

	If the type in shadow memory is neither an exact match nor 0, we call the slower runtime check. It
	uses the full TBAA algorithm, just as the compiler does, to determine when two types are permitted to
	alias.

	The instrumentation pass inserts calls to the memset intrinsic to set the memory updated by memset,
	memcpy, and memmove, as well as allocas/byval (and for lifetime.start/end) to reset the shadow memory
	to reflect that the type is now unknown. The runtime intercepts memset, memcpy, etc. to perform the
	same function for the library calls.

	How to build
	============

	Build LLVM/Clang with `CMake <https://llvm.org/docs/CMake.html>`_ and enable
	the ``compiler-rt`` runtime. An example CMake configuration that will allow
	for the use/testing of TypeSanitizer:

	.. code-block:: console

	$ cmake -DCMAKE_BUILD_TYPE=Release -DLLVM_ENABLE_PROJECTS="clang" -DLLVM_ENABLE_RUNTIMES="compiler-rt" <path to source>/llvm

	Usage
	=====

	Compile and link your program with ``-fsanitize=type`` flag. The
	TypeSanitizer run-time library should be linked to the final executable, so
	make sure to use ``clang`` (not ``ld``) for the final link step. To
	get a reasonable performance add ``-O1`` or higher.
	TypeSanitizer by default doesn't print the full stack trace in error messages. Use ``TYSAN_OPTIONS=print_stacktrace=1``
	to print the full trace. To get nicer stack traces in error messages add ``-fno-omit-frame-pointer`` and
	``-g``. To get perfect stack traces you may need to disable inlining (just use ``-O1``) and tail call elimination
	(``-fno-optimize-sibling-calls``).

	.. code-block:: console

	% cat example_AliasViolation.c
	int main(int argc, char **argv) {
	int x = 100;
	float y = (float)&x;
	*y += 2.0f; // Strict aliasing violation
	return 0;
	}

	# Compile and link
	% clang++ -g -fsanitize=type example_AliasViolation.cc

	The program will print an error message to ``stderr`` each time a strict aliasing violation is detected.
	The program won't terminate, which will allow you to detect many strict aliasing violations in one
	run.

	.. code-block:: console

	% ./a.out
	==1375532==ERROR: TypeSanitizer: type-aliasing-violation on address 0x7ffeebf1a72c (pc 0x5b3b1145ff41 bp 0x7ffeebf1a660 sp 0x7ffeebf19e08 tid 1375532)
	READ of size 4 at 0x7ffeebf1a72c with type float accesses an existing object of type int
	#0 0x5b3b1145ff40 in main example_AliasViolation.c:4:10

	==1375532==ERROR: TypeSanitizer: type-aliasing-violation on address 0x7ffeebf1a72c (pc 0x5b3b1146008a bp 0x7ffeebf1a660 sp 0x7ffeebf19e08 tid 1375532)
	WRITE of size 4 at 0x7ffeebf1a72c with type float accesses an existing object of type int
	#0 0x5b3b11460089 in main example_AliasViolation.c:4:10

	Error terminology
	------------------

	There are some terms that may appear in TypeSanitizer errors that are derived from
	`TBAA Metadata <https://llvm.org/docs/LangRef.html#tbaa-metadata>`. This section hopes to provide a
	brief dictionary of these terms.

	* ``omnipotent char``: This is a special type which can alias with anything. Its name comes from the C/C++
	type ``char``.
	* ``type p[x]``: This signifies pointers to the type. ``x`` is the number of indirections to reach the final value.
	As an example, a pointer to a pointer to an integer would be ``type p2 int``.

	TypeSanitizer is still experimental. User-facing error messages should be improved in the future to remove
	references to LLVM IR specific terms.

	Sanitizer features
	==================

	``__has_feature(type_sanitizer)``
	------------------------------------

	In some cases one may need to execute different code depending on whether
	TypeSanitizer is enabled.
	:ref:`\_\_has\_feature <langext-__has_feature-__has_extension>` can be used for
	this purpose.

	.. code-block:: c

	#if defined(__has_feature)
	# if __has_feature(type_sanitizer)
	// code that builds only under TypeSanitizer
	# endif
	#endif

	``__attribute__((no_sanitize("type")))``
	-----------------------------------------------

	Some code you may not want to be instrumented by TypeSanitizer. One may use the
	function attribute ``no_sanitize("type")`` to disable instrumenting type aliasing.
	It is possible, depending on what happens in non-instrumented code, that instrumented code
	emits false-positives/ false-negatives. This attribute may not be supported by other
	compilers, so we suggest to use it together with ``__has_feature(type_sanitizer)``.

	``__attribute__((disable_sanitizer_instrumentation))``
	--------------------------------------------------------

	The ``disable_sanitizer_instrumentation`` attribute can be applied to functions
	to prevent all kinds of instrumentation. As a result, it may introduce false
	positives and incorrect stack traces. Therefore, it should be used with care,
	and only if absolutely required; for example for certain code that cannot
	tolerate any instrumentation and resulting side-effects. This attribute
	overrides ``no_sanitize("type")``.

	Ignorelist
	----------

	TypeSanitizer supports ``src`` and ``fun`` entity types in
	:doc:`SanitizerSpecialCaseList`, that can be used to suppress aliasing
	violation reports in the specified source files or functions. Like
	with other methods of ignoring instrumentation, this can result in false
	positives/ false-negatives.

	Limitations
	-----------

	* TypeSanitizer uses more real memory than a native run. It uses 8 bytes of
	shadow memory for each byte of user memory.
	* There are transformation passes which run before TypeSanitizer. If these
	passes optimize out an aliasing violation, TypeSanitizer cannot catch it.
	* Currently, all instrumentation is inlined. This can result in a 15x
	(on average) increase in generated file size, and 3x to 7x increase
	in compile time. In some documented cases this can cause the compiler to hang.
	There are plans to improve this in the future.
	* Codebases that use unions and struct-initialized variables can see incorrect
	results, as TypeSanitizer doesn't yet instrument these reliably.
	* Since Clang & LLVM's TBAA system is used to generate the checks used by the
	instrumentation, TypeSanitizer follows Clang & LLVM's rules for type aliasing.
	There may be situations where that disagrees with the standard. However this
	does at least mean that TypeSanitizer will catch any aliasing violations that
	would cause bugs when compiling with Clang & LLVM.
	* TypeSanitizer cannot currently be run alongside other sanitizers such as
	AddressSanitizer, ThreadSanitizer or UndefinedBehaviourSanitizer.

	Current Status
	--------------

	TypeSanitizer is brand new, and still in development. There are some known
	issues, especially in areas where Clang's emitted TBAA data isn't extensive
	enough for TypeSanitizer's runtime.

	We are actively working on enhancing the tool --- stay tuned. Any help,
	issues, pull requests, ideas, is more than welcome. You can find the
	`issue tracker here. <https://github.com/llvm/llvm-project/issues?q=is%3Aissue%20state%3Aopen%20TySan%20label%3Acompiler-rt%3Atysan>`_