clang/docs/ShadowCallStack.rst - llvm-project - Git at Google

 ===============
 ShadowCallStack
 ===============

 .. contents::
    :local:

 Introduction
 ============

 ShadowCallStack is an instrumentation pass, currently only implemented for
 aarch64, that protects programs against return address overwrites
 (e.g. stack buffer overflows.) It works by saving a function's return address
 to a separately allocated 'shadow call stack' in the function prolog in
 non-leaf functions and loading the return address from the shadow call stack
 in the function epilog. The return address is also stored on the regular stack
 for compatibility with unwinders, but is otherwise unused.

 The aarch64 implementation is considered production ready, and
 an `implementation of the runtime`_ has been added to Android's libc
 (bionic). An x86_64 implementation was evaluated using Chromium and was found
 to have critical performance and security deficiencies--it was removed in
 LLVM 9.0. Details on the x86_64 implementation can be found in the
 `Clang 7.0.1 documentation`_.

 .. _`implementation of the runtime`: https://android.googlesource.com/platform/bionic/+/808d176e7e0dd727c7f929622ec017f6e065c582/libc/bionic/pthread_create.cpp#128
 .. _`Clang 7.0.1 documentation`: https://releases.llvm.org/7.0.1/tools/clang/docs/ShadowCallStack.html

 Comparison
 ----------

 To optimize for memory consumption and cache locality, the shadow call
 stack stores only an array of return addresses. This is in contrast to other
 schemes, like :doc:`SafeStack`, that mirror the entire stack and trade-off
 consuming more memory for shorter function prologs and epilogs with fewer
 memory accesses.

 `Return Flow Guard`_ is a pure software implementation of shadow call stacks
 on x86_64. Like the previous implementation of ShadowCallStack on x86_64, it is
 inherently racy due to the architecture's use of the stack for calls and
 returns.

 Intel `Control-flow Enforcement Technology`_ (CET) is a proposed hardware
 extension that would add native support to use a shadow stack to store/check
 return addresses at call/return time. Being a hardware implementation, it
 would not suffer from race conditions and would not incur the overhead of
 function instrumentation, but it does require operating system support.

 .. _`Return Flow Guard`: https://xlab.tencent.com/en/2016/11/02/return-flow-guard/
 .. _`Control-flow Enforcement Technology`: https://software.intel.com/sites/default/files/managed/4d/2a/control-flow-enforcement-technology-preview.pdf

 Compatibility
 -------------

 A runtime is not provided in compiler-rt so one must be provided by the
 compiled application or the operating system. Integrating the runtime into
 the operating system should be preferred since otherwise all thread creation
 and destruction would need to be intercepted by the application.

 The instrumentation makes use of the platform register ``x18``.  On some
 platforms, ``x18`` is reserved, and on others, it is designated as a scratch
 register.  This generally means that any code that may run on the same thread
 as code compiled with ShadowCallStack must either target one of the platforms
 whose ABI reserves ``x18`` (currently Android, Darwin, Fuchsia and Windows)
 or be compiled with the flag ``-ffixed-x18``. If absolutely necessary, code
 compiled without ``-ffixed-x18`` may be run on the same thread as code that
 uses ShadowCallStack by saving the register value temporarily on the stack
 (`example in Android`_) but this should be done with care since it risks
 leaking the shadow call stack address.

 .. _`example in Android`: https://android-review.googlesource.com/c/platform/frameworks/base/+/803717

 Because of the use of register ``x18``, the ShadowCallStack feature is
 incompatible with any other feature that may use ``x18``. However, there
 is no inherent reason why ShadowCallStack needs to use register ``x18``
 specifically; in principle, a platform could choose to reserve and use another
 register for ShadowCallStack, but this would be incompatible with the AAPCS64.

 Special unwind information is required on functions that are compiled
 with ShadowCallStack and that may be unwound, i.e. functions compiled with
 ``-fexceptions`` (which is the default in C++). Some unwinders (such as the
 libgcc 4.9 unwinder) do not understand this unwind info and will segfault
 when encountering it. LLVM libunwind processes this unwind info correctly,
 however. This means that if exceptions are used together with ShadowCallStack,
 the program must use a compatible unwinder.

 Security
 ========

 ShadowCallStack is intended to be a stronger alternative to
 ``-fstack-protector``. It protects from non-linear overflows and arbitrary
 memory writes to the return address slot.

 The instrumentation makes use of the ``x18`` register to reference the shadow
 call stack, meaning that references to the shadow call stack do not have
 to be stored in memory. This makes it possible to implement a runtime that
 avoids exposing the address of the shadow call stack to attackers that can
 read arbitrary memory. However, attackers could still try to exploit side
 channels exposed by the operating system `[1]`_ `[2]`_ or processor `[3]`_
 to discover the address of the shadow call stack.

 .. _`[1]`: https://eyalitkin.wordpress.com/2017/09/01/cartography-lighting-up-the-shadows/
 .. _`[2]`: https://www.blackhat.com/docs/eu-16/materials/eu-16-Goktas-Bypassing-Clangs-SafeStack.pdf
 .. _`[3]`: https://www.vusec.net/projects/anc/

 Unless care is taken when allocating the shadow call stack, it may be
 possible for an attacker to guess its address using the addresses of
 other allocations. Therefore, the address should be chosen to make this
 difficult. One way to do this is to allocate a large guard region without
 read/write permissions, randomly select a small region within it to be
 used as the address of the shadow call stack and mark only that region as
 read/write. This also mitigates somewhat against processor side channels.
 The intent is that the Android runtime `will do this`_, but the platform will
 first need to be `changed`_ to avoid using ``setrlimit(RLIMIT_AS)`` to limit
 memory allocations in certain processes, as this also limits the number of
 guard regions that can be allocated.

 .. _`will do this`: https://android-review.googlesource.com/c/platform/bionic/+/891622
 .. _`changed`: https://android-review.googlesource.com/c/platform/frameworks/av/+/837745

 The runtime will need the address of the shadow call stack in order to
 deallocate it when destroying the thread. If the entire program is compiled
 with ``-ffixed-x18``, this is trivial: the address can be derived from the
 value stored in ``x18`` (e.g. by masking out the lower bits). If a guard
 region is used, the address of the start of the guard region could then be
 stored at the start of the shadow call stack itself. But if it is possible
 for code compiled without ``-ffixed-x18`` to run on a thread managed by the
 runtime, which is the case on Android for example, the address must be stored
 somewhere else instead. On Android we store the address of the start of the
 guard region in TLS and deallocate the entire guard region including the
 shadow call stack at thread exit. This is considered acceptable given that
 the address of the start of the guard region is already somewhat guessable.

 One way in which the address of the shadow call stack could leak is in the
 ``jmp_buf`` data structure used by ``setjmp`` and ``longjmp``. The Android
 runtime `avoids this`_ by only storing the low bits of ``x18`` in the
 ``jmp_buf``, which requires the address of the shadow call stack to be
 aligned to its size.

 .. _`avoids this`: https://android.googlesource.com/platform/bionic/+/808d176e7e0dd727c7f929622ec017f6e065c582/libc/arch-arm64/bionic/setjmp.S#49

 The architecture's call and return instructions (``bl`` and ``ret``) operate on
 a register rather than the stack, which means that leaf functions are generally
 protected from return address overwrites even without ShadowCallStack.

 Usage
 =====

 To enable ShadowCallStack, just pass the ``-fsanitize=shadow-call-stack``
 flag to both compile and link command lines. On aarch64, you also need to pass
 ``-ffixed-x18`` unless your target already reserves ``x18``.

 Low-level API
 -------------

 ``__has_feature(shadow_call_stack)``
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

 In some cases one may need to execute different code depending on whether
 ShadowCallStack is enabled. The macro ``__has_feature(shadow_call_stack)`` can
 be used for this purpose.

 .. code-block:: c

     #if defined(__has_feature)
     #  if __has_feature(shadow_call_stack)
     // code that builds only under ShadowCallStack
     #  endif
     #endif

 ``__attribute__((no_sanitize("shadow-call-stack")))``
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

 Use ``__attribute__((no_sanitize("shadow-call-stack")))`` on a function
 declaration to specify that the shadow call stack instrumentation should not be
 applied to that function, even if enabled globally.

 Example
 =======

 The following example code:

 .. code-block:: c++

     int foo() {
       return bar() + 1;
     }

 Generates the following aarch64 assembly when compiled with ``-O2``:

 .. code-block:: none

     stp     x29, x30, [sp, #-16]!
     mov     x29, sp
     bl      bar
     add     w0, w0, #1
     ldp     x29, x30, [sp], #16
     ret

 Adding ``-fsanitize=shadow-call-stack`` would output the following assembly:

 .. code-block:: none

     str     x30, [x18], #8
     stp     x29, x30, [sp, #-16]!
     mov     x29, sp
     bl      bar
     add     w0, w0, #1
     ldp     x29, x30, [sp], #16
     ldr     x30, [x18, #-8]!
     ret
	===============
	ShadowCallStack
	===============

	.. contents::
	:local:

	Introduction
	============

	ShadowCallStack is an instrumentation pass, currently only implemented for
	aarch64, that protects programs against return address overwrites
	(e.g. stack buffer overflows.) It works by saving a function's return address
	to a separately allocated 'shadow call stack' in the function prolog in
	non-leaf functions and loading the return address from the shadow call stack
	in the function epilog. The return address is also stored on the regular stack
	for compatibility with unwinders, but is otherwise unused.

	The aarch64 implementation is considered production ready, and
	an `implementation of the runtime`_ has been added to Android's libc
	(bionic). An x86_64 implementation was evaluated using Chromium and was found
	to have critical performance and security deficiencies--it was removed in
	LLVM 9.0. Details on the x86_64 implementation can be found in the
	`Clang 7.0.1 documentation`_.

	.. _`implementation of the runtime`: https://android.googlesource.com/platform/bionic/+/808d176e7e0dd727c7f929622ec017f6e065c582/libc/bionic/pthread_create.cpp#128
	.. _`Clang 7.0.1 documentation`: https://releases.llvm.org/7.0.1/tools/clang/docs/ShadowCallStack.html

	Comparison
	----------

	To optimize for memory consumption and cache locality, the shadow call
	stack stores only an array of return addresses. This is in contrast to other
	schemes, like :doc:`SafeStack`, that mirror the entire stack and trade-off
	consuming more memory for shorter function prologs and epilogs with fewer
	memory accesses.

	`Return Flow Guard`_ is a pure software implementation of shadow call stacks
	on x86_64. Like the previous implementation of ShadowCallStack on x86_64, it is
	inherently racy due to the architecture's use of the stack for calls and
	returns.

	Intel `Control-flow Enforcement Technology`_ (CET) is a proposed hardware
	extension that would add native support to use a shadow stack to store/check
	return addresses at call/return time. Being a hardware implementation, it
	would not suffer from race conditions and would not incur the overhead of
	function instrumentation, but it does require operating system support.

	.. _`Return Flow Guard`: https://xlab.tencent.com/en/2016/11/02/return-flow-guard/
	.. _`Control-flow Enforcement Technology`: https://software.intel.com/sites/default/files/managed/4d/2a/control-flow-enforcement-technology-preview.pdf

	Compatibility
	-------------

	A runtime is not provided in compiler-rt so one must be provided by the
	compiled application or the operating system. Integrating the runtime into
	the operating system should be preferred since otherwise all thread creation
	and destruction would need to be intercepted by the application.

	The instrumentation makes use of the platform register ``x18``. On some
	platforms, ``x18`` is reserved, and on others, it is designated as a scratch
	register. This generally means that any code that may run on the same thread
	as code compiled with ShadowCallStack must either target one of the platforms
	whose ABI reserves ``x18`` (currently Android, Darwin, Fuchsia and Windows)
	or be compiled with the flag ``-ffixed-x18``. If absolutely necessary, code
	compiled without ``-ffixed-x18`` may be run on the same thread as code that
	uses ShadowCallStack by saving the register value temporarily on the stack
	(`example in Android`_) but this should be done with care since it risks
	leaking the shadow call stack address.

	.. _`example in Android`: https://android-review.googlesource.com/c/platform/frameworks/base/+/803717

	Because of the use of register ``x18``, the ShadowCallStack feature is
	incompatible with any other feature that may use ``x18``. However, there
	is no inherent reason why ShadowCallStack needs to use register ``x18``
	specifically; in principle, a platform could choose to reserve and use another
	register for ShadowCallStack, but this would be incompatible with the AAPCS64.

	Special unwind information is required on functions that are compiled
	with ShadowCallStack and that may be unwound, i.e. functions compiled with
	``-fexceptions`` (which is the default in C++). Some unwinders (such as the
	libgcc 4.9 unwinder) do not understand this unwind info and will segfault
	when encountering it. LLVM libunwind processes this unwind info correctly,
	however. This means that if exceptions are used together with ShadowCallStack,
	the program must use a compatible unwinder.

	Security
	========

	ShadowCallStack is intended to be a stronger alternative to
	``-fstack-protector``. It protects from non-linear overflows and arbitrary
	memory writes to the return address slot.

	The instrumentation makes use of the ``x18`` register to reference the shadow
	call stack, meaning that references to the shadow call stack do not have
	to be stored in memory. This makes it possible to implement a runtime that
	avoids exposing the address of the shadow call stack to attackers that can
	read arbitrary memory. However, attackers could still try to exploit side
	channels exposed by the operating system `[1]`_ `[2]`_ or processor `[3]`_
	to discover the address of the shadow call stack.

	.. _`[1]`: https://eyalitkin.wordpress.com/2017/09/01/cartography-lighting-up-the-shadows/
	.. _`[2]`: https://www.blackhat.com/docs/eu-16/materials/eu-16-Goktas-Bypassing-Clangs-SafeStack.pdf
	.. _`[3]`: https://www.vusec.net/projects/anc/

	Unless care is taken when allocating the shadow call stack, it may be
	possible for an attacker to guess its address using the addresses of
	other allocations. Therefore, the address should be chosen to make this
	difficult. One way to do this is to allocate a large guard region without
	read/write permissions, randomly select a small region within it to be
	used as the address of the shadow call stack and mark only that region as
	read/write. This also mitigates somewhat against processor side channels.
	The intent is that the Android runtime `will do this`_, but the platform will
	first need to be `changed`_ to avoid using ``setrlimit(RLIMIT_AS)`` to limit
	memory allocations in certain processes, as this also limits the number of
	guard regions that can be allocated.

	.. _`will do this`: https://android-review.googlesource.com/c/platform/bionic/+/891622
	.. _`changed`: https://android-review.googlesource.com/c/platform/frameworks/av/+/837745

	The runtime will need the address of the shadow call stack in order to
	deallocate it when destroying the thread. If the entire program is compiled
	with ``-ffixed-x18``, this is trivial: the address can be derived from the
	value stored in ``x18`` (e.g. by masking out the lower bits). If a guard
	region is used, the address of the start of the guard region could then be
	stored at the start of the shadow call stack itself. But if it is possible
	for code compiled without ``-ffixed-x18`` to run on a thread managed by the
	runtime, which is the case on Android for example, the address must be stored
	somewhere else instead. On Android we store the address of the start of the
	guard region in TLS and deallocate the entire guard region including the
	shadow call stack at thread exit. This is considered acceptable given that
	the address of the start of the guard region is already somewhat guessable.

	One way in which the address of the shadow call stack could leak is in the
	``jmp_buf`` data structure used by ``setjmp`` and ``longjmp``. The Android
	runtime `avoids this`_ by only storing the low bits of ``x18`` in the
	``jmp_buf``, which requires the address of the shadow call stack to be
	aligned to its size.

	.. _`avoids this`: https://android.googlesource.com/platform/bionic/+/808d176e7e0dd727c7f929622ec017f6e065c582/libc/arch-arm64/bionic/setjmp.S#49

	The architecture's call and return instructions (``bl`` and ``ret``) operate on
	a register rather than the stack, which means that leaf functions are generally
	protected from return address overwrites even without ShadowCallStack.

	Usage
	=====

	To enable ShadowCallStack, just pass the ``-fsanitize=shadow-call-stack``
	flag to both compile and link command lines. On aarch64, you also need to pass
	``-ffixed-x18`` unless your target already reserves ``x18``.

	Low-level API
	-------------

	``__has_feature(shadow_call_stack)``
	~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

	In some cases one may need to execute different code depending on whether
	ShadowCallStack is enabled. The macro ``__has_feature(shadow_call_stack)`` can
	be used for this purpose.

	.. code-block:: c

	#if defined(__has_feature)
	# if __has_feature(shadow_call_stack)
	// code that builds only under ShadowCallStack
	# endif
	#endif

	``__attribute__((no_sanitize("shadow-call-stack")))``
	~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

	Use ``__attribute__((no_sanitize("shadow-call-stack")))`` on a function
	declaration to specify that the shadow call stack instrumentation should not be
	applied to that function, even if enabled globally.

	Example
	=======

	The following example code:

	.. code-block:: c++

	int foo() {
	return bar() + 1;
	}

	Generates the following aarch64 assembly when compiled with ``-O2``:

	.. code-block:: none

	stp x29, x30, [sp, #-16]!
	mov x29, sp
	bl bar
	add w0, w0, #1
	ldp x29, x30, [sp], #16
	ret

	Adding ``-fsanitize=shadow-call-stack`` would output the following assembly:

	.. code-block:: none

	str x30, [x18], #8
	stp x29, x30, [sp, #-16]!
	mov x29, sp
	bl bar
	add w0, w0, #1
	ldp x29, x30, [sp], #16
	ldr x30, [x18, #-8]!
	ret