|  | ======================================================= | 
|  | Hardware-assisted AddressSanitizer Design Documentation | 
|  | ======================================================= | 
|  |  | 
|  | This page is a design document for | 
|  | **hardware-assisted AddressSanitizer** (or **HWASAN**) | 
|  | a tool similar to :doc:`AddressSanitizer`, | 
|  | but based on partial hardware assistance. | 
|  |  | 
|  |  | 
|  | Introduction | 
|  | ============ | 
|  |  | 
|  | :doc:`AddressSanitizer` | 
|  | tags every 8 bytes of the application memory with a 1 byte tag (using *shadow memory*), | 
|  | uses *redzones* to find buffer-overflows and | 
|  | *quarantine* to find use-after-free. | 
|  | The redzones, the quarantine, and, to a less extent, the shadow, are the | 
|  | sources of AddressSanitizer's memory overhead. | 
|  | See the `AddressSanitizer paper`_ for details. | 
|  |  | 
|  | AArch64 has `Address Tagging`_ (or top-byte-ignore, TBI), a hardware feature that allows | 
|  | software to use the 8 most significant bits of a 64-bit pointer as | 
|  | a tag. HWASAN uses `Address Tagging`_ | 
|  | to implement a memory safety tool, similar to :doc:`AddressSanitizer`, | 
|  | but with smaller memory overhead and slightly different (mostly better) | 
|  | accuracy guarantees. | 
|  |  | 
|  | Intel's `Linear Address Masking`_ (LAM) also provides address tagging for | 
|  | x86_64, though it is not widely available in hardware yet.  For x86_64, HWASAN | 
|  | has a limited implementation using page aliasing instead. | 
|  |  | 
|  | Algorithm | 
|  | ========= | 
|  | * Every heap/stack/global memory object is forcibly aligned by `TG` bytes | 
|  | (`TG` is e.g. 16 or 64). We call `TG` the **tagging granularity**. | 
|  | * For every such object a random `TS`-bit tag `T` is chosen (`TS`, or tag size, is e.g. 4 or 8) | 
|  | * The pointer to the object is tagged with `T`. | 
|  | * The memory for the object is also tagged with `T` (using a `TG=>1` shadow memory) | 
|  | * Every load and store is instrumented to read the memory tag and compare it | 
|  | with the pointer tag, exception is raised on tag mismatch. | 
|  |  | 
|  | For a more detailed discussion of this approach see https://arxiv.org/pdf/1802.09517.pdf | 
|  |  | 
|  | Short granules | 
|  | -------------- | 
|  |  | 
|  | A short granule is a granule of size between 1 and `TG-1` bytes. The size | 
|  | of a short granule is stored at the location in shadow memory where the | 
|  | granule's tag is normally stored, while the granule's actual tag is stored | 
|  | in the last byte of the granule. This means that in order to verify that a | 
|  | pointer tag matches a memory tag, HWASAN must check for two possibilities: | 
|  |  | 
|  | * the pointer tag is equal to the memory tag in shadow memory, or | 
|  | * the shadow memory tag is actually a short granule size, the value being loaded | 
|  | is in bounds of the granule and the pointer tag is equal to the last byte of | 
|  | the granule. | 
|  |  | 
|  | Pointer tags between 1 to `TG-1` are possible and are as likely as any other | 
|  | tag. This means that these tags in memory have two interpretations: the full | 
|  | tag interpretation (where the pointer tag is between 1 and `TG-1` and the | 
|  | last byte of the granule is ordinary data) and the short tag interpretation | 
|  | (where the pointer tag is stored in the granule). | 
|  |  | 
|  | When HWASAN detects an error near a memory tag between 1 and `TG-1`, it | 
|  | will show both the memory tag and the last byte of the granule. Currently, | 
|  | it is up to the user to disambiguate the two possibilities. | 
|  |  | 
|  | Instrumentation | 
|  | =============== | 
|  |  | 
|  | Memory Accesses | 
|  | --------------- | 
|  | In the majority of cases, memory accesses are prefixed with a call to | 
|  | an outlined instruction sequence that verifies the tags. The code size | 
|  | and performance overhead of the call is reduced by using a custom calling | 
|  | convention that | 
|  |  | 
|  | * preserves most registers, and | 
|  | * is specialized to the register containing the address, and the type and | 
|  | size of the memory access. | 
|  |  | 
|  | Currently, the following sequence is used: | 
|  |  | 
|  | .. code-block:: none | 
|  |  | 
|  | // int foo(int *a) { return *a; } | 
|  | // clang -O2 --target=aarch64-linux-android30 -fsanitize=hwaddress -S -o - load.c | 
|  | [...] | 
|  | foo: | 
|  | stp     x30, x20, [sp, #-16]! | 
|  | adrp    x20, :got:__hwasan_shadow               // load shadow address from GOT into x20 | 
|  | ldr     x20, [x20, :got_lo12:__hwasan_shadow] | 
|  | bl      __hwasan_check_x0_2_short_v2            // call outlined tag check | 
|  | // (arguments: x0 = address, x20 = shadow base; | 
|  | // "2" encodes the access type and size) | 
|  | ldr     w0, [x0]                                // inline load | 
|  | ldp     x30, x20, [sp], #16 | 
|  | ret | 
|  |  | 
|  | [...] | 
|  | __hwasan_check_x0_2_short_v2: | 
|  | sbfx    x16, x0, #4, #52                        // shadow offset | 
|  | ldrb    w16, [x20, x16]                         // load shadow tag | 
|  | cmp     x16, x0, lsr #56                        // extract address tag, compare with shadow tag | 
|  | b.ne    .Ltmp0                                  // jump to short tag handler on mismatch | 
|  | .Ltmp1: | 
|  | ret | 
|  | .Ltmp0: | 
|  | cmp     w16, #15                                // is this a short tag? | 
|  | b.hi    .Ltmp2                                  // if not, error | 
|  | and     x17, x0, #0xf                           // find the address's position in the short granule | 
|  | add     x17, x17, #3                            // adjust to the position of the last byte loaded | 
|  | cmp     w16, w17                                // check that position is in bounds | 
|  | b.ls    .Ltmp2                                  // if not, error | 
|  | orr     x16, x0, #0xf                           // compute address of last byte of granule | 
|  | ldrb    w16, [x16]                              // load tag from it | 
|  | cmp     x16, x0, lsr #56                        // compare with pointer tag | 
|  | b.eq    .Ltmp1                                  // if matches, continue | 
|  | .Ltmp2: | 
|  | stp     x0, x1, [sp, #-256]!                    // save original x0, x1 on stack (they will be overwritten) | 
|  | stp     x29, x30, [sp, #232]                    // create frame record | 
|  | mov     x1, #2                                  // set x1 to a constant indicating the type of failure | 
|  | adrp    x16, :got:__hwasan_tag_mismatch_v2      // call runtime function to save remaining registers and report error | 
|  | ldr     x16, [x16, :got_lo12:__hwasan_tag_mismatch_v2] // (load address from GOT to avoid potential register clobbers in delay load handler) | 
|  | br      x16 | 
|  |  | 
|  | Heap | 
|  | ---- | 
|  |  | 
|  | Tagging the heap memory/pointers is done by `malloc`. | 
|  | This can be based on any malloc that forces all objects to be TG-aligned. | 
|  | `free` tags the memory with a different tag. | 
|  |  | 
|  | Stack | 
|  | ----- | 
|  |  | 
|  | Stack frames are instrumented by aligning all non-promotable allocas | 
|  | by `TG` and tagging stack memory in function prologue and epilogue. | 
|  |  | 
|  | Tags for different allocas in one function are **not** generated | 
|  | independently; doing that in a function with `M` allocas would require | 
|  | maintaining `M` live stack pointers, significantly increasing register | 
|  | pressure. Instead we generate a single base tag value in the prologue, | 
|  | and build the tag for alloca number `M` as `ReTag(BaseTag, M)`, where | 
|  | ReTag can be as simple as exclusive-or with constant `M`. | 
|  |  | 
|  | Stack instrumentation is expected to be a major source of overhead, | 
|  | but could be optional. | 
|  |  | 
|  | Globals | 
|  | ------- | 
|  |  | 
|  | Most globals in HWASAN instrumented code are tagged. This is accomplished | 
|  | using the following mechanisms: | 
|  |  | 
|  | * The address of each global has a static tag associated with it. The first | 
|  | defined global in a translation unit has a pseudorandom tag associated | 
|  | with it, based on the hash of the file path. Subsequent global tags are | 
|  | incremental from the previously-assigned tag. | 
|  |  | 
|  | * The global's tag is added to its symbol address in the object file's symbol | 
|  | table. This causes the global's address to be tagged when its address is | 
|  | taken. | 
|  |  | 
|  | * When the address of a global is taken directly (i.e. not via the GOT), a special | 
|  | instruction sequence needs to be used to add the tag to the address, | 
|  | because the tag would otherwise take the address outside of the small code | 
|  | model (4GB on AArch64). No changes are required when the address is taken | 
|  | via the GOT because the address stored in the GOT will contain the tag. | 
|  |  | 
|  | * An associated ``hwasan_globals`` section is emitted for each tagged global, | 
|  | which indicates the address of the global, its size and its tag.  These | 
|  | sections are concatenated by the linker into a single ``hwasan_globals`` | 
|  | section that is enumerated by the runtime (via an ELF note) when a binary | 
|  | is loaded and the memory is tagged accordingly. | 
|  |  | 
|  | A complete example is given below: | 
|  |  | 
|  | .. code-block:: none | 
|  |  | 
|  | // int x = 1; int *f() { return &x; } | 
|  | // clang -O2 --target=aarch64-linux-android30 -fsanitize=hwaddress -S -o - global.c | 
|  |  | 
|  | [...] | 
|  | f: | 
|  | adrp    x0, :pg_hi21_nc:x            // set bits 12-63 to upper bits of untagged address | 
|  | movk    x0, #:prel_g3:x+0x100000000  // set bits 48-63 to tag | 
|  | add     x0, x0, :lo12:x              // set bits 0-11 to lower bits of address | 
|  | ret | 
|  |  | 
|  | [...] | 
|  | .data | 
|  | .Lx.hwasan: | 
|  | .word   1 | 
|  |  | 
|  | .globl  x | 
|  | .set x, .Lx.hwasan+0x2d00000000000000 | 
|  |  | 
|  | [...] | 
|  | .section        .note.hwasan.globals,"aG",@note,hwasan.module_ctor,comdat | 
|  | .Lhwasan.note: | 
|  | .word   8                            // namesz | 
|  | .word   8                            // descsz | 
|  | .word   3                            // NT_LLVM_HWASAN_GLOBALS | 
|  | .asciz  "LLVM\000\000\000" | 
|  | .word   __start_hwasan_globals-.Lhwasan.note | 
|  | .word   __stop_hwasan_globals-.Lhwasan.note | 
|  |  | 
|  | [...] | 
|  | .section        hwasan_globals,"ao",@progbits,.Lx.hwasan,unique,2 | 
|  | .Lx.hwasan.descriptor: | 
|  | .word   .Lx.hwasan-.Lx.hwasan.descriptor | 
|  | .word   0x2d000004                   // tag = 0x2d, size = 4 | 
|  |  | 
|  | Error reporting | 
|  | --------------- | 
|  |  | 
|  | Errors are generated by the `HLT` instruction and are handled by a signal handler. | 
|  |  | 
|  | Attribute | 
|  | --------- | 
|  |  | 
|  | HWASAN uses its own LLVM IR Attribute `sanitize_hwaddress` and a matching | 
|  | C function attribute. An alternative would be to re-use ASAN's attribute | 
|  | `sanitize_address`. The reasons to use a separate attribute are: | 
|  |  | 
|  | * Users may need to disable ASAN but not HWASAN, or vise versa, | 
|  | because the tools have different trade-offs and compatibility issues. | 
|  | * LLVM (ideally) does not use flags to decide which pass is being used, | 
|  | ASAN or HWASAN are being applied, based on the function attributes. | 
|  |  | 
|  | This does mean that users of HWASAN may need to add the new attribute | 
|  | to the code that already uses the old attribute. | 
|  |  | 
|  |  | 
|  | Comparison with AddressSanitizer | 
|  | ================================ | 
|  |  | 
|  | HWASAN: | 
|  | * Is less portable than :doc:`AddressSanitizer` | 
|  | as it relies on hardware `Address Tagging`_ (AArch64). | 
|  | Address Tagging can be emulated with compiler instrumentation, | 
|  | but it will require the instrumentation to remove the tags before | 
|  | any load or store, which is infeasible in any realistic environment | 
|  | that contains non-instrumented code. | 
|  | * May have compatibility problems if the target code uses higher | 
|  | pointer bits for other purposes. | 
|  | * May require changes in the OS kernels (e.g. Linux seems to dislike | 
|  | tagged pointers passed from address space: | 
|  | https://www.kernel.org/doc/Documentation/arm64/tagged-pointers.txt). | 
|  | * **Does not require redzones to detect buffer overflows**, | 
|  | but the buffer overflow detection is probabilistic, with roughly | 
|  | `1/(2**TS)` chance of missing a bug (6.25% or 0.39% with 4 and 8-bit TS | 
|  | respectively). | 
|  | * **Does not require quarantine to detect heap-use-after-free, | 
|  | or stack-use-after-return**. | 
|  | The detection is similarly probabilistic. | 
|  |  | 
|  | The memory overhead of HWASAN is expected to be much smaller | 
|  | than that of AddressSanitizer: | 
|  | `1/TG` extra memory for the shadow | 
|  | and some overhead due to `TG`-aligning all objects. | 
|  |  | 
|  | Security Considerations | 
|  | ======================= | 
|  |  | 
|  | HWASAN is a bug detection tool and its runtime is not meant to be | 
|  | linked against production executables. While it may be useful for testing, | 
|  | HWASAN's runtime was not developed with security-sensitive | 
|  | constraints in mind and may compromise the security of the resulting executable. | 
|  |  | 
|  | Supported architectures | 
|  | ======================= | 
|  | HWASAN relies on `Address Tagging`_ which is only available on AArch64. | 
|  | For other 64-bit architectures it is possible to remove the address tags | 
|  | before every load and store by compiler instrumentation, but this variant | 
|  | will have limited deployability since not all of the code is | 
|  | typically instrumented. | 
|  |  | 
|  | On x86_64, HWASAN utilizes page aliasing to place tags in userspace address | 
|  | bits.  Currently only heap tagging is supported.  The page aliases rely on | 
|  | shared memory, which will cause heap memory to be shared between processes if | 
|  | the application calls ``fork()``.  Therefore x86_64 is really only safe for | 
|  | applications that do not fork. | 
|  |  | 
|  | HWASAN does not currently support 32-bit architectures since they do not | 
|  | support `Address Tagging`_ and the address space is too constrained to easily | 
|  | implement page aliasing. | 
|  |  | 
|  |  | 
|  | Related Work | 
|  | ============ | 
|  | * `SPARC ADI`_ implements a similar tool mostly in hardware. | 
|  | * `Effective and Efficient Memory Protection Using Dynamic Tainting`_ discusses | 
|  | similar approaches ("lock & key"). | 
|  | * `Watchdog`_ discussed a heavier, but still somewhat similar | 
|  | "lock & key" approach. | 
|  | * *TODO: add more "related work" links. Suggestions are welcome.* | 
|  |  | 
|  |  | 
|  | .. _Watchdog: https://www.cis.upenn.edu/acg/papers/isca12_watchdog.pdf | 
|  | .. _Effective and Efficient Memory Protection Using Dynamic Tainting: https://www.cc.gatech.edu/~orso/papers/clause.doudalis.orso.prvulovic.pdf | 
|  | .. _SPARC ADI: https://lazytyped.blogspot.com/2017/09/getting-started-with-adi.html | 
|  | .. _AddressSanitizer paper: https://www.usenix.org/system/files/conference/atc12/atc12-final39.pdf | 
|  | .. _Address Tagging: http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.den0024a/ch12s05s01.html | 
|  | .. _Linear Address Masking: https://software.intel.com/content/www/us/en/develop/download/intel-architecture-instruction-set-extensions-programming-reference.html |