docs/BoundsSafety.rst - llvm-project/clang - Git at Google

 ==================================================
 ``-fbounds-safety``: Enforcing bounds safety for C
 ==================================================

 .. contents::
    :local:

 Overview
 ========

 ``-fbounds-safety`` is a C extension to enforce bounds safety to prevent
 out-of-bounds (OOB) memory accesses, which remain a major source of security
 vulnerabilities in C. ``-fbounds-safety`` aims to eliminate this class of bugs
 by turning OOB accesses into deterministic traps.

 The ``-fbounds-safety`` extension offers bounds annotations that programmers can
 use to attach bounds to pointers. For example, programmers can add the
 ``__counted_by(N)`` annotation to parameter ``ptr``, indicating that the pointer
 has ``N`` valid elements:

 .. code-block:: c

    void foo(int *__counted_by(N) ptr, size_t N);

 Using this bounds information, the compiler inserts bounds checks on every
 pointer dereference, ensuring that the program does not access memory outside
 the specified bounds. The compiler requires programmers to provide enough bounds
 information so that the accesses can be checked at either run time or compile
 time — and it rejects code if it cannot.

 The most important contribution of ``-fbounds-safety`` is how it reduces the
 programmer's annotation burden by reconciling bounds annotations at ABI
 boundaries with the use of implicit wide pointers (a.k.a. "fat" pointers) that
 carry bounds information on local variables without the need for annotations. We
 designed this model so that it preserves ABI compatibility with C while
 minimizing adoption effort.

 The ``-fbounds-safety`` extension has been adopted on millions of lines of
 production C code and proven to work in a consumer operating system setting. The
 extension was designed to enable incremental adoption — a key requirement in
 real-world settings where modifying an entire project and its dependencies all
 at once is often not possible. It also addresses multiple of other practical
 challenges that have made existing approaches to safer C dialects difficult to
 adopt, offering these properties that make it widely adoptable in practice:

 * It is designed to preserve the Application Binary Interface (ABI).
 * It interoperates well with plain C code.
 * It can be adopted partially and incrementally while still providing safety
   benefits.
 * It is a conforming extension to C.
 * Consequently, source code that adopts the extension can continue to be
   compiled by toolchains that do not support the extension (CAVEAT: this still
   requires inclusion of a header file macro-defining bounds annotations to
   empty).
 * It has a relatively low adoption cost.

 This document discusses the key designs of ``-fbounds-safety``. The document is
 subject to be actively updated with a more detailed specification. The
 implementation plan can be found in :doc:`BoundsSafetyImplPlans`.


 Programming Model
 =================

 Overview
 --------

 ``-fbounds-safety`` ensures that pointers are not used to access memory beyond
 their bounds by performing bounds checking. If a bounds check fails, the program
 will deterministically trap before out-of-bounds memory is accessed.

 In our model, every pointer has an explicit or implicit bounds attribute that
 determines its bounds and ensures guaranteed bounds checking. Consider the
 example below where the ``__counted_by(count)`` annotation indicates that
 parameter ``p`` points to a buffer of integers containing ``count`` elements. An
 off-by-one error is present in the loop condition, leading to ``p[i]`` being
 out-of-bounds access during the loop's final iteration. The compiler inserts a
 bounds check before ``p`` is dereferenced to ensure that the access remains
 within the specified bounds.

 .. code-block:: c

    void fill_array_with_indices(int *__counted_by(count) p, unsigned count) {
       // off-by-one error (i < count)
       for (unsigned i = 0; i <= count; ++i) {
          // bounds check inserted:
          //   if (i >= count) trap();
          p[i] = i;
       }
    }

 A bounds annotation defines an invariant for the pointer type, and the model
 ensures that this invariant remains true. In the example below, pointer ``p``
 annotated with ``__counted_by(count)`` must always point to a memory buffer
 containing at least ``count`` elements of the pointee type. Changing the value
 of ``count``, like in the example below, may violate this invariant and permit
 out-of-bounds access to the pointer. To avoid this, the compiler employs
 compile-time restrictions and emits run-time checks as necessary to ensure the
 new count value doesn't exceed the actual length of the buffer. Section
 `Maintaining correctness of bounds annotations`_ provides more details about
 this programming model.

 .. code-block:: c

    int g;

    void foo(int *__counted_by(count) p, size_t count) {
       count++; // may violate the invariant of __counted_by
       count--; // may violate the invariant of __counted_by if count was 0.
       count = g; // may violate the invariant of __counted_by
                  // depending on the value of `g`.
    }

 The requirement to annotate all pointers with explicit bounds information could
 present a significant adoption burden. To tackle this issue, the model
 incorporates the concept of a "wide pointer" (a.k.a. fat pointer) – a larger
 pointer that carries bounds information alongside the pointer value. Utilizing
 wide pointers can potentially reduce the adoption burden, as it contains bounds
 information internally and eliminates the need for explicit bounds annotations.
 However, wide pointers differ from standard C pointers in their data layout,
 which may result in incompatibilities with the application binary interface
 (ABI). Breaking the ABI complicates interoperability with external code that has
 not adopted the same programming model.

 ``-fbounds-safety`` harmonizes the wide pointer and the bounds annotation
 approaches to reduce the adoption burden while maintaining the ABI. In this
 model, local variables of pointer type are implicitly treated as wide pointers,
 allowing them to carry bounds information without requiring explicit bounds
 annotations. Please note that this approach doesn't apply to function parameters
 which are considered ABI-visible. As local variables are typically hidden from
 the ABI, this approach has a marginal impact on it. In addition,
 ``-fbounds-safety`` employs compile-time restrictions to prevent implicit wide
 pointers from silently breaking the ABI (see `ABI implications of default bounds
 annotations`_). Pointers associated with any other variables, including function
 parameters, are treated as single object pointers (i.e., ``__single``), ensuring
 that they always have the tightest bounds by default and offering a strong
 bounds safety guarantee.

 By implementing default bounds annotations based on ABI visibility, a
 considerable portion of C code can operate without modifications within this
 programming model, reducing the adoption burden.

 The rest of the section will discuss individual bounds annotations and the
 programming model in more detail.

 Bounds annotations
 ------------------

 Annotation for pointers to a single object
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

 The C language allows pointer arithmetic on arbitrary pointers and this has been
 a source of many bounds safety issues. In practice, many pointers are merely
 pointing to a single object and incrementing or decrementing such a pointer
 immediately makes the pointer go out-of-bounds. To prevent this unsafety,
 ``-fbounds-safety`` provides the annotation ``__single`` that causes pointer
 arithmetic on annotated pointers to be a compile time error.

 * ``__single`` : indicates that the pointer is either pointing to a single
   object or null. Hence, pointers with ``__single`` do not permit pointer
   arithmetic nor being subscripted with a non-zero index. Dereferencing a
   ``__single`` pointer is allowed but it requires a null check. Upper and lower
   bounds checks are not required because the ``__single`` pointer should point
   to a valid object unless it's null.

 ``__single`` is the default annotation for ABI-visible pointers. This
 gives strong security guarantees in that these pointers cannot be incremented or
 decremented unless they have an explicit, overriding bounds annotation that can
 be used to verify the safety of the operation. The compiler issues an error when
 a ``__single`` pointer is utilized for pointer arithmetic or array access, as
 these operations would immediately cause the pointer to exceed its bounds.
 Consequently, this prompts programmers to provide sufficient bounds information
 to pointers. In the following example, the pointer on parameter p is
 single-by-default, and is employed for array access. As a result, the compiler
 generates an error suggesting to add ``__counted_by`` to the pointer.

 .. code-block:: c

    void fill_array_with_indices(int *p, unsigned count) {
       for (unsigned i = 0; i < count; ++i) {
          p[i] = i; // error
       }
    }


 External bounds annotations
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^

 "External" bounds annotations provide a way to express a relationship between a
 pointer variable and another variable (or expression) containing the bounds
 information of the pointer. In the following example, ``__counted_by(count)``
 annotation expresses the bounds of parameter p using another parameter count.
 This model works naturally with many C interfaces and structs because the bounds
 of a pointer is often available adjacent to the pointer itself, e.g., at another
 parameter of the same function prototype, or at another field of the same struct
 declaration.

 .. code-block:: c

    void fill_array_with_indices(int *__counted_by(count) p, size_t count) {
       // off-by-one error
       for (size_t i = 0; i <= count; ++i)
          p[i] = i;
    }

 External bounds annotations include ``__counted_by``, ``__sized_by``, and
 ``__ended_by``. These annotations do not change the pointer representation,
 meaning they do not have ABI implications.

 * ``__counted_by(N)`` : The pointer points to memory that contains ``N``
   elements of pointee type. ``N`` is an expression of integer type which can be
   a simple reference to declaration, a constant including calls to constant
   functions, or an arithmetic expression that does not have side effect. The
   ``__counted_by`` annotation cannot apply to pointers to incomplete types or
   types without size such as ``void *``. Instead, ``__sized_by`` can be used to
   describe the byte count.
 * ``__sized_by(N)`` : The pointer points to memory that contains ``N`` bytes.
   Just like the argument of ``__counted_by``, ``N`` is an expression of integer
   type which can be a constant, a simple reference to a declaration, or an
   arithmetic expression that does not have side effects. This is mainly used for
   pointers to incomplete types or types without size such as ``void *``.
 * ``__ended_by(P)`` : The pointer has the upper bound of value ``P``, which is
   one past the last element of the pointer. In other words, this annotation
   describes a range that starts with the pointer that has this annotation and
   ends with ``P`` which is the argument of the annotation. ``P`` itself may be
   annotated with ``__ended_by(Q)``. In this case, the end of the range extends
   to the pointer ``Q``. This is used for "iterator" support in C where you're
   iterating from one pointer value to another until a final pointer value is
   reached (and the final pointer value is not dereferencable).

 Accessing a pointer outside the specified bounds causes a run-time trap or a
 compile-time error. Also, the model maintains correctness of bounds annotations
 when the pointer and/or the related value containing the bounds information are
 updated or passed as arguments. This is done by compile-time restrictions or
 run-time checks (see `Maintaining correctness of bounds annotations`_
 for more detail). For instance, initializing ``buf`` with ``null`` while
 assigning non-zero value to ``count``, as shown in the following example, would
 violate the ``__counted_by`` annotation because a null pointer does not point to
 any valid memory location. To avoid this, the compiler produces either a
 compile-time error or run-time trap.

 .. code-block:: c

    void null_with_count_10(int *__counted_by(count) buf, unsigned count) {
       buf = 0;
       // This is not allowed as it creates a null pointer with non-zero length
       count = 10;
    }

 However, there are use cases where a pointer is either a null pointer or is
 pointing to memory of the specified size. To support this idiom,
 ``-fbounds-safety`` provides ``*_or_null`` variants,
 ``__counted_by_or_null(N)``, ``__sized_by_or_null(N)``, and
 ``__ended_by_or_null(P)``. Accessing a pointer with any of these bounds
 annotations will require an extra null check to avoid a null pointer
 dereference.

 Internal bounds annotations
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^

 A wide pointer (sometimes known as a "fat" pointer) is a pointer that carries
 additional bounds information internally (as part of its data). The bounds
 require additional storage space making wide pointers larger than normal
 pointers, hence the name "wide pointer". The memory layout of a wide pointer is
 equivalent to a struct with the pointer, upper bound, and (optionally) lower
 bound as its fields as shown below.

 .. code-block:: c

    struct wide_pointer_datalayout {
       void* pointer; // Address used for dereferences and pointer arithmetic
       void* upper_bound; // Points one past the highest address that can be
                          // accessed
       void* lower_bound; // (Optional) Points to lowest address that can be
                          // accessed
    };

 Even with this representational change, wide pointers act syntactically as
 normal pointers to allow standard pointer operations, such as pointer
 dereference (``*p``), array subscript (``p[i]``), member access (``p->``), and
 pointer arithmetic, with some restrictions on bounds-unsafe uses.

 ``-fbounds-safety`` has a set of "internal" bounds annotations to turn pointers
 into wide pointers. These are ``__bidi_indexable`` and ``__indexable``. When a
 pointer has either of these annotations, the compiler changes the pointer to the
 corresponding wide pointer. This means these annotations will break the ABI and
 will not be compatible with plain C, and thus they should generally not be used
 in ABI surfaces.

 * ``__bidi_indexable`` : A pointer with this annotation becomes a wide pointer
   to carry the upper bound and the lower bound, the layout of which is
   equivalent to ``struct { T *ptr; T *upper_bound; T *lower_bound; };``. As the
   name indicates, pointers with this annotation are "bidirectionally indexable",
   meaning that they can be indexed with either a negative or a positive offset
   and the pointers can be incremented or decremented using pointer arithmetic. A
   ``__bidi_indexable`` pointer is allowed to hold an out-of-bounds pointer
   value. While creating an OOB pointer is undefined behavior in C,
   ``-fbounds-safety`` makes it well-defined behavior. That is, pointer
   arithmetic overflow with ``__bidi_indexable`` is defined as equivalent of
   two's complement integer computation, and at the LLVM IR level this means
   ``getelementptr`` won't get ``inbounds`` keyword. Accessing memory using the
   OOB pointer is prevented via a run-time bounds check.

 * ``__indexable`` : A pointer with this annotation becomes a wide pointer
   carrying the upper bound (but no explicit lower bound), the layout of which is
   equivalent to ``struct { T *ptr; T *upper_bound; };``. Since ``__indexable``
   pointers do not have a separate lower bound, the pointer value itself acts as
   the lower bound. An ``__indexable`` pointer can only be incremented or indexed
   in the positive direction. Indexing it in the negative direction will trigger
   a compile-time error. Otherwise, the compiler inserts a run-time
   check to ensure pointer arithmetic doesn't make the pointer smaller than the
   original ``__indexable`` pointer (Note that ``__indexable`` doesn't have a
   lower bound so the pointer value is effectively the lower bound). As pointer
   arithmetic overflow will make the pointer smaller than the original pointer,
   it will cause a trap at runtime. Similar to ``__bidi_indexable``, an
   ``__indexable`` pointer is allowed to have a pointer value above the upper
   bound and creating such a pointer is well-defined behavior. Dereferencing such
   a pointer, however, will cause a run-time trap.

 * ``__bidi_indexable`` offers the best flexibility out of all the pointer
   annotations in this model, as ``__bidi_indexable`` pointers can be used for
   any pointer operation. However, this comes with the largest code size and
   memory cost out of the available pointer annotations in this model. In some
   cases, use of the ``__bidi_indexable`` annotation may be duplicating bounds
   information that exists elsewhere in the program. In such cases, using
   external bounds annotations may be a better choice.

 ``__bidi_indexable`` is the default annotation for non-ABI visible pointers,
 such as local pointer variables — that is, if the programmer does not specify
 another bounds annotation, a local pointer variable is implicitly
 ``__bidi_indexable``. Since ``__bidi_indexable`` pointers automatically carry
 bounds information and have no restrictions on kinds of pointer operations that
 can be used with these pointers, most code inside a function works as is without
 modification. In the example below, ``int *buf`` doesn't require manual
 annotation as it's implicitly ``int *__bidi_indexable buf``, carrying the bounds
 information passed from the return value of malloc, which is necessary to insert
 bounds checking for ``buf[i]``.

 .. code-block:: c

    void *__sized_by(size) malloc(size_t size);

    int *__counted_by(n) get_array_with_0_to_n_1(size_t n) {
       int *buf = malloc(sizeof(int) * n);
       for (size_t i = 0; i < n; ++i)
          buf[i] = i;
       return buf;
    }

 Annotations for sentinel-delimited arrays
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

 A C string is an array of characters. The null terminator — the first null
 character ('\0') element in the array — marks the end of the string.
 ``-fbounds-safety`` provides ``__null_terminated`` to annotate C strings and the
 generalized form ``__terminated_by(T)`` to annotate pointers and arrays with an
 end marked by a sentinel value. The model prevents dereferencing a
 ``__terminated_by`` pointer beyond its end. Calculating the location of the end
 (i.e., the address of the sentinel value), requires reading the entire array in
 memory and would have some performance costs. To avoid an unintended performance
 hit, the model puts some restrictions on how these pointers can be used.
 ``__terminated_by`` pointers cannot be indexed and can only be incremented one
 element at a time. To allow these operations, the pointers must be explicitly
 converted to ``__indexable`` pointers using the intrinsic function
 ``__unsafe_terminated_by_to_indexable(P, T)`` (or
 ``__unsafe_null_terminated_to_indexable(P)``) which converts the
 ``__terminated_by`` pointer ``P`` to an ``__indexable`` pointer.

 * ``__null_terminated`` : The pointer or array is terminated by ``NULL`` or
   ``0``. Modifying the terminator or incrementing the pointer beyond it is
   prevented at run time.

 * ``__terminated_by(T)`` : The pointer or array is terminated by ``T`` which is
   a constant expression. Accessing or incrementing the pointer beyond the
   terminator is not allowed. This is a generalization of ``__null_terminated``
   which is defined as ``__terminated_by(0)``.

 Annotation for interoperating with bounds-unsafe code
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

 A pointer with the ``__unsafe_indexable`` annotation behaves the same as a plain
 C pointer. That is, the pointer does not have any bounds information and pointer
 operations are not checked.

 ``__unsafe_indexable`` can be used to mark pointers from system headers or
 pointers from code that has not adopted -fbounds safety. This enables
 interoperation between code using ``-fbounds-safety`` and code that does not.

 Default pointer types
 ---------------------

 ABI visibility and default annotations
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

 Requiring ``-fbounds-safety`` adopters to add bounds annotations to all pointers
 in the codebase would be a significant adoption burden. To avoid this and to
 secure all pointers by default, ``-fbounds-safety`` applies default bounds
 annotations to pointer types.
 Default annotations apply to pointer types of declarations

 ``-fbounds-safety`` applies default bounds annotations to pointer types used in
 declarations. The default annotations are determined by the ABI visibility of
 the pointer. A pointer type is ABI-visible if changing its size or
 representation affects the ABI. For instance, changing the size of a type used
 in a function parameter will affect the ABI and thus pointers used in function
 parameters are ABI-visible pointers. On the other hand, changing the types of
 local variables won't have such ABI implications. Hence, ``-fbounds-safety``
 considers the outermost pointer types of local variables as non-ABI visible. The
 rest of the pointers such as nested pointer types, pointer types of global
 variables, struct fields, and function prototypes are considered ABI-visible.

 All ABI-visible pointers are treated as ``__single`` by default unless annotated
 otherwise. This default both preserves ABI and makes these pointers safe by
 default. This behavior can be controlled with macros, i.e.,
 ``__ptrcheck_abi_assume_*ATTR*()``, to set the default annotation for
 ABI-visible pointers to be either ``__single``, ``__bidi_indexable``,
 ``__indexable``, or ``__unsafe_indexable``. For instance,
 ``__ptrcheck_abi_assume_unsafe_indexable()`` will make all ABI-visible pointers
 be ``__unsafe_indexable``. Non-ABI visible pointers — the outermost pointer
 types of local variables — are ``__bidi_indexable`` by default, so that these
 pointers have the bounds information necessary to perform bounds checks without
 the need for a manual annotation. All ``const char`` pointers or any typedefs
 equivalent to ``const char`` pointers are ``__null_terminated`` by default. This
 means that ``char8_t`` is ``unsigned char`` so ``const char8_t *`` won't be
 ``__null_terminated`` by default. Similarly, ``const wchar_t *`` won't be
 ``__null_terminated`` by default unless the platform defines it as ``typedef
 char wchar_t``. Please note, however, that the programmers can still explicitly
 use ``__null_terminated`` in any other pointers, e.g., ``char8_t
 *__null_terminated``, ``wchar_t *__null_terminated``, ``int
 *__null_terminated``, etc. if they should be treated as ``__null_terminated``.
 The same applies to other annotations.
 In system headers, the default pointer attribute for ABI-visible pointers is set
 to ``__unsafe_indexable`` by default.

 The ``__ptrcheck_abi_assume_*ATTR*()`` macros are defined as pragmas in the
 toolchain header (See `Portability with toolchains that do not support the
 extension`_ for more details about the toolchain header):

 .. code-block:: C

 #define __ptrcheck_abi_assume_single() \
    _Pragma("clang abi_ptr_attr set(single)")

 #define __ptrcheck_abi_assume_indexable() \
   _Pragma("clang abi_ptr_attr set(indexable)")

 #define __ptrcheck_abi_assume_bidi_indexable() \
   _Pragma("clang abi_ptr_attr set(bidi_indexable)")

 #define __ptrcheck_abi_assume_unsafe_indexable() \
   _Pragma("clang abi_ptr_attr set(unsafe_indexable)")


 ABI implications of default bounds annotations
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

 Although simply modifying types of a local variable doesn't normally impact the
 ABI, taking the address of such a modified type could create a pointer type that
 has an ABI mismatch. Looking at the following example, ``int *local`` is
 implicitly ``int *__bidi_indexable`` and thus the type of ``&local`` is a
 pointer to ``int *__bidi_indexable``. On the other hand, in ``void foo(int
 **)``, the parameter type is a pointer to ``int *__single`` (i.e., ``void
 foo(int *__single *__single)``) (or a pointer to ``int *__unsafe_indexable`` if
 it's from a system header). The compiler reports an error for casts between
 pointers whose elements have incompatible pointer attributes. This way,
 ``-fbounds-safety`` prevents pointers that are implicitly ``__bidi_indexable``
 from silently escaping thereby breaking the ABI.

 .. code-block:: c

    void foo(int **);

    void bar(void) {
       int *local = 0;
       // error: passing 'int *__bidi_indexable*__bidi_indexable' to parameter of
       // incompatible nested pointer type 'int *__single*__single'
       foo(&local);
    }

 A local variable may still be exposed to the ABI if ``typeof()`` takes the type
 of local variable to define an interface as shown in the following example.

 .. code-block:: C

    // bar.c
    void bar(int *) { ... }

    // foo.c
    void foo(void) {
       int *p; // implicitly `int *__bidi_indexable p`
       extern void bar(typeof(p)); // creates an interface of type
                                   // `void bar(int *__bidi_indexable)`
    }

 Doing this may break the ABI if the parameter is not ``__bidi_indexable`` at the
 definition of function ``bar()`` which is likely the case because parameters are
 ``__single`` by default without an explicit annotation.

 In order to avoid an implicitly wide pointer from silently breaking the ABI, the
 compiler reports a warning when ``typeof()`` is used on an implicit wide pointer
 at any ABI visible context (e.g., function prototype, struct definition, etc.).

 .. _Default pointer types in typeof:

 Default pointer types in ``typeof()``
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

 When ``typeof()`` takes an expression, it respects the bounds annotation on
 the expression type, including the bounds annotation is implcit. For example,
 the global variable ``g`` in the following code is implicitly ``__single`` so
 ``typeof(g)`` gets ``char *__single``. The similar is true for the parameter
 ``p``, so ``typeof(p)`` returns ``void *__single``. The local variable ``l`` is
 implicitly ``__bidi_indexable``, so ``typeof(l)`` becomes
 ``int *__bidi_indexable``.

 .. code-block:: C

    char *g; // typeof(g) == char *__single

    void foo(void *p) {
       // typeof(p) == void *__single

       int *l; // typeof(l) == int *__bidi_indexable
    }

 When the type of expression has an "external" bounds annotation, e.g.,
 ``__sized_by``, ``__counted_by``, etc., the compiler may report an error on
 ``typeof`` if the annotation creates a dependency with another declaration or
 variable. For example, the compiler reports an error on ``typeof(p1)`` shown in
 the following code because allowing it can potentially create another type
 dependent on the parameter ``size`` in a different context (Please note that an
 external bounds annotation on a parameter may only refer to another parameter of
 the same function). On the other hand, ``typeof(p2)`` works resulting in ``int
 *__counted_by(10)``, since it doesn't depend on any other declaration.

 .. TODO: add a section describing constraints on external bounds annotations

 .. code-block:: C

    void foo(int *__counted_by(size) p1, size_t size) {
       // typeof(p1) == int *__counted_by(size)
       // -> a compiler error as it tries to create another type
       // dependent on `size`.

       int *__counted_by(10) p2; // typeof(p2) == int *__counted_by(10)
                                 // -> no error

    }

 When ``typeof()`` takes a type name, the compiler doesn't apply an implicit
 bounds annotation on the named pointer types. For example, ``typeof(int*)``
 returns ``int *`` without any bounds annotation. A bounds annotation may be
 added after the fact depending on the context. In the following example,
 ``typeof(int *)`` returns ``int *`` so it's equivalent as the local variable is
 declared as ``int *l``, so it eventually becomes implicitly
 ``__bidi_indexable``.

 .. code-block:: c

    void foo(void) {
       typeof(int *) l; // `int *__bidi_indexable` (same as `int *l`)
    }

 The programmers can still explicitly add a bounds annotation on the types named
 inside ``typeof``, e.g., ``typeof(int *__bidi_indexable)``, which evaluates to
 ``int *__bidi_indexable``.


 Default pointer types in ``sizeof()``
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

 When ``sizeof()`` takes a type name, the compiler doesn't apply an implicit
 bounds annotation on the named pointer types. This means if a bounds annotation
 is not specified, the evaluated pointer type is treated identically to a plain C
 pointer type. Therefore, ``sizeof(int*)`` remains the same with or without
 ``-fbounds-safety``. That said, programmers can explicitly add attribute to the
 types, e.g., ``sizeof(int *__bidi_indexable)``, in which case the sizeof
 evaluates to the size of type ``int *__bidi_indexable`` (the value equivalent to
 ``3 * sizeof(int*)``).

 When ``sizeof()`` takes an expression, i.e., ``sizeof(expr``, it behaves as
 ``sizeof(typeof(expr))``, except that ``sizeof(expr)`` does not report an error
 with ``expr`` that has a type with an external bounds annotation dependent on
 another declaration, whereas ``typeof()`` on the same expression would be an
 error as described in :ref:`Default pointer types in typeof`.
 The following example describes this behavior.

 .. code-block:: c

    void foo(int *__counted_by(size) p, size_t size) {
       // sizeof(p) == sizeof(int *__counted_by(size)) == sizeof(int *)
       // typeof(p): error
    };

 Default pointer types in ``alignof()``
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

 ``alignof()`` only takes a type name as the argument and it doesn't take an
 expression. Similar to ``sizeof()`` and ``typeof``, the compiler doesn't apply
 an implicit bounds annotation on the pointer types named inside ``alignof()``.
 Therefore, ``alignof(T *)`` remains the same with or without
 ``-fbounds-safety``, evaluating into the alignment of the raw pointer ``T *``.
 The programmers can explicitly add a bounds annotation to the types, e.g.,
 ``alignof(int *__bidi_indexable)``, which returns the alignment of ``int
 *__bidi_indexable``. A bounds annotation including an internal bounds annotation
 (i.e., ``__indexable`` and ``__bidi_indexable``) doesn't affect the alignment of
 the original pointer. Therefore, ``alignof(int *__bidi_indexable)`` is equal to
 ``alignof(int *)``.


 Default pointer types used in C-style casts
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

 A pointer type used in a C-style cast (e.g., ``(int *)src``) inherits the same
 pointer attribute in the type of src. For instance, if the type of src is ``T
 *__single`` (with ``T`` being an arbitrary C type), ``(int *)src`` will be ``int
 *__single``. The reasoning behind this behavior is so that a C-style cast
 doesn't introduce any unexpected side effects caused by an implicit cast of
 bounds attribute.

 Pointer casts can have explicit bounds annotations. For instance, ``(int
 *__bidi_indexable)src`` casts to ``int *__bidi_indexable`` as long as src has a
 bounds annotation that can implicitly convert to ``__bidi_indexable``. If
 ``src`` has type ``int *__single``, it can implicitly convert to ``int
 *__bidi_indexable`` which then will have the upper bound pointing to one past
 the first element. However, if src has type ``int *__unsafe_indexable``, the
 explicit cast ``(int *__bidi_indexable)src`` will cause an error because
 ``__unsafe_indexable`` cannot cast to ``__bidi_indexable`` as
 ``__unsafe_indexable`` doesn't have bounds information. `Cast rules`_ describes
 in more detail what kinds of casts are allowed between pointers with different
 bounds annotations.

 Default pointer types in typedef
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

 Pointer types in ``typedef``\s do not have implicit default bounds annotations.
 Instead, the bounds annotation is determined when the ``typedef`` is used. The
 following example shows that no pointer annotation is specified in the ``typedef
 pint_t`` while each instance of ``typedef``'ed pointer gets its bounds
 annotation based on the context in which the type is used.

 .. code-block:: c

    typedef int * pint_t; // int *

    pint_t glob; // int *__single glob;

    void foo(void) {
       pint_t local; // int *__bidi_indexable local;
    }

 Pointer types in a ``typedef`` can still have explicit annotations, e.g.,
 ``typedef int *__single``, in which case the bounds annotation ``__single`` will
 apply to every use of the ``typedef``.

 Array to pointer promotion to secure arrays (including VLAs)
 ------------------------------------------------------------

 Arrays on function prototypes
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

 In C, arrays on function prototypes are promoted (or "decayed") to a pointer to
 its first element (e.g., ``&arr[0]``). In ``-fbounds-safety``, arrays are also
 decayed to pointers, but with the addition of an implicit bounds annotation,
 which includes variable-length arrays (VLAs). As shown in the following example,
 arrays on function prototypes are decalyed to corresponding ``__counted_by``
 pointers.

 .. code-block:: c

    // Function prototype: void foo(int n, int *__counted_by(n) arr);
    void foo(int n, int arr[n]);

    // Function prototype: void bar(int *__counted_by(10) arr);
    void bar(int arr[10]);

 This means the array parameters are treated as `__counted_by` pointers within
 the function and callers of the function also see them as the corresponding
 `__counted_by` pointers.

 Incomplete arrays on function prototypes will cause a compiler error unless it
 has ``__counted_by`` annotation in its bracket.

 .. code-block:: c

    void f1(int n, int arr[]); // error

    void f3(int n, int arr[__counted_by(n)]); // ok

    void f2(int n, int arr[n]); // ok, decays to int *__counted_by(n)

    void f4(int n, int *__counted_by(n) arr); // ok

    void f5(int n, int *arr); // ok, but decays to int *__single,
                              // and cannot be used for pointer arithmetic

 Array references
 ^^^^^^^^^^^^^^^^

 In C, similar to arrays on the function prototypes, a reference to array is
 automatically promoted (or "decayed") to a pointer to its first element (e.g.,
 ``&arr[0]``).

 In `-fbounds-safety`, array references are promoted to ``__bidi_indexable``
 pointers which contain the upper and lower bounds of the array, with the
 equivalent of ``&arr[0]`` serving as the lower bound and ``&arr[array_size]``
 (or one past the last element) serving as the upper bound. This applies to all
 types of arrays including constant-length arrays, variable-length arrays (VLAs),
 and flexible array members annotated with `__counted_by`.

 In the following example, reference to ``vla`` promotes to ``int
 *__bidi_indexable``, with ``&vla[n]`` as the upper bound and ``&vla[0]`` as the
 lower bound. Then, it's copied to ``int *p``, which is implicitly ``int
 *__bidi_indexable p``. Please note that value of ``n`` used to create the upper
 bound is ``10``, not ``100``, in this case because ``10`` is the actual length
 of ``vla``, the value of ``n`` at the time when the array is being allocated.

 .. code-block:: c

    void foo(void) {
       int n = 10;
       int vla[n];
       n = 100;
       int *p = vla; // { .ptr: &vla[0], .upper: &vla[10], .lower: &vla[0] }
                     // it's `&vla[10]` because the value of `n` was 10 at the
                     // time when the array is actually allocated.
       // ...
    }

 By promoting array references to ``__bidi_indexable``, all array accesses are
 bounds checked in ``-fbounds-safety``, just as ``__bidi_indexable`` pointers
 are.

 Maintaining correctness of bounds annotations
 ---------------------------------------------

 ``-fbounds-safety`` maintains correctness of bounds annotations by performing
 additional checks when a pointer object and/or its related value containing the
 bounds information is updated.

 For example, ``__single`` expresses an invariant that the pointer must either
 point to a single valid object or be a null pointer. To maintain this invariant,
 the compiler inserts checks when initializing a ``__single`` pointer, as shown
 in the following example:

 .. code-block:: c

    void foo(void *__sized_by(size) vp, size_t size) {
       // Inserted check:
       // if ((int*)upper_bound(vp) - (int*)vp < sizeof(int) && !!vp) trap();
       int *__single ip = (int *)vp;
    }

 Additionally, an explicit bounds annotation such as ``int *__counted_by(count)
 buf`` defines a relationship between two variables, ``buf`` and ``count``:
 namely, that ``buf`` has ``count`` number of elements available. This
 relationship must hold even after any of these related variables are updated. To
 this end, the model requires that assignments to ``buf`` and ``count`` must be
 side by side, with no side effects between them. This prevents ``buf`` and
 ``count`` from temporarily falling out of sync due to updates happening at a
 distance.

 The example below shows a function ``alloc_buf`` that initializes a struct that
 members that use the ``__counted_by`` annotation. The compiler allows these
 assignments because ``sbuf->buf`` and ``sbuf->count`` are updated side by side
 without any side effects in between the assignments.

 Furthermore, the compiler inserts additional run-time checks to ensure the new
 ``buf`` has at least as many elements as the new ``count`` indicates as shown in
 the transformed pseudo code of function ``alloc_buf()`` in the example below.

 .. code-block:: c

    typedef struct {
       int *__counted_by(count) buf;
       size_t count;
    } sized_buf_t;

    void alloc_buf(sized_buf_t *sbuf, sized_t nelems) {
       sbuf->buf = (int *)malloc(sizeof(int) * nelems);
       sbuf->count = nelems;
    }

    // Transformed pseudo code:
    void alloc_buf(sized_buf_t *sbuf, sized_t nelems) {
       // Materialize RHS values:
       int *tmp_ptr = (int *)malloc(sizeof(int) * nelems);
       int tmp_count = nelems;
       // Inserted check:
       //   - checks to ensure that `lower <= tmp_ptr <= upper`
       //   - if (upper(tmp_ptr) - tmp_ptr < tmp_count) trap();
       sbuf->buf = tmp_ptr;
       sbuf->count = tmp_count;
    }

 Whether the compiler can optimize such run-time checks depends on how the upper
 bound of the pointer is derived. If the source pointer has ``__sized_by``,
 ``__counted_by``, or a variant of such, the compiler assumes that the upper
 bound calculation doesn't overflow, e.g., ``ptr + size`` (where the type of
 ``ptr`` is ``void *__sized_by(size)``), because when the ``__sized_by`` pointer
 is initialized, ``-fbounds-safety`` inserts run-time checks to ensure that ``ptr
 + size`` doesn't overflow and that ``size >= 0``.

 Assuming the upper bound calculation doesn't overflow, the compiler can simplify
 the trap condition ``upper(tmp_ptr) - tmp_ptr < tmp_count`` to ``size <
 tmp_count`` so if both ``size`` and ``tmp_count`` values are known at compile
 time such that ``0 <= tmp_count <= size``, the optimizer can remove the check.

 ``ptr + size`` may still overflow if the ``__sized_by`` pointer is created from
 code that doesn't enable ``-fbounds-safety``, which is undefined behavior.

 In the previous code example with the transformed ``alloc_buf()``, the upper
 bound of ``tmp_ptr`` is derived from ``void *__sized_by_or_null(size)``, which
 is the return type of ``malloc()``. Hence, the pointer arithmetic doesn't
 overflow or ``tmp_ptr`` is null. Therefore, if ``nelems`` was given as a
 compile-time constant, the compiler could remove the checks.

 Cast rules
 ----------

 ``-fbounds-safety`` does not enforce overall type safety and bounds invariants
 can still be violated by incorrect casts in some cases. That said,
 ``-fbounds-safety`` prevents type conversions that change bounds attributes in a
 way to violate the bounds invariant of the destination's pointer annotation.
 Type conversions that change bounds attributes may be allowed if it does not
 violate the invariant of the destination or that can be verified at run time.
 Here are some of the important cast rules.

 Two pointers that have different bounds annotations on their nested pointer
 types are incompatible and cannot implicitly cast to each other. For example,
 ``T *__single *__single`` cannot be converted to ``T *__bidi_indexable
 *__single``. Such a conversion between incompatible nested bounds annotations
 can be allowed using an explicit cast (e.g., C-style cast). Hereafter, the rules
 only apply to the top pointer types. ``__unsafe_indexable`` cannot be converted
 to any other safe pointer types (``__single``, ``__bidi_indexable``,
 ``__counted_by``, etc) using a cast. The extension provides builtins to force
 this conversion, ``__unsafe_forge_bidi_indexable(type, pointer, char_count)`` to
 convert pointer to a ``__bidi_indexable`` pointer of type with ``char_count``
 bytes available and ``__unsafe_forge_single(type, pointer)`` to convert pointer
 to a single pointer of type type. The following examples show the usage of these
 functions. Function ``example_forge_bidi()`` gets an external buffer from an
 unsafe library by calling ``get_buf()`` which returns ``void
 *__unsafe_indexable.`` Under the type rules, this cannot be directly assigned to
 ``void *buf`` (implicitly ``void *__bidi_indexable``). Thus,
 ``__unsafe_forge_bidi_indexable`` is used to manually create a
 ``__bidi_indexable`` from the unsafe buffer.

 .. code-block:: c

    // unsafe_library.h
    void *__unsafe_indexable get_buf(void);
    size_t get_buf_size(void);

    // my_source1.c (enables -fbounds-safety)
    #include "unsafe_library.h"
    void example_forge_bidi(void) {
       void *buf =
         __unsafe_forge_bidi_indexable(void *, get_buf(), get_buf_size());
       // ...
    }

    // my_source2.c (enables -fbounds-safety)
    #include <stdio.h>
    void example_forge_single(void) {
       FILE *fp = __unsafe_forge_single(FILE *, fopen("mypath", "rb"));
       // ...
    }

 * Function ``example_forge_single`` takes a file handle by calling fopen defined
   in system header ``stdio.h``. Assuming ``stdio.h`` did not adopt
   ``-fbounds-safety``, the return type of ``fopen`` would implicitly be ``FILE
   *__unsafe_indexable`` and thus it cannot be directly assigned to ``FILE *fp``
   in the bounds-safe source. To allow this operation, ``__unsafe_forge_single``
   is used to create a ``__single`` from the return value of ``fopen``.

 * Similar to ``__unsafe_indexable``, any non-pointer type (including ``int``,
   ``intptr_t``, ``uintptr_t``, etc.) cannot be converted to any safe pointer
   type because these don't have bounds information. ``__unsafe_forge_single`` or
   ``__unsafe_forge_bidi_indexable`` must be used to force the conversion.

 * Any safe pointer types can cast to ``__unsafe_indexable`` because it doesn't
   have any invariant to maintain.

 * ``__single`` casts to ``__bidi_indexable`` if the pointee type has a known
   size. After the conversion, the resulting ``__bidi_indexable`` has the size of
   a single object of the pointee type of ``__single``. ``__single`` cannot cast
   to ``__bidi_indexable`` if the pointee type is incomplete or sizeless. For
   example, ``void *__single`` cannot convert to ``void *__bidi_indexable``
   because void is an incomplete type and thus the compiler cannot correctly
   determine the upper bound of a single void pointer.

 * Similarly, ``__single`` can cast to ``__indexable`` if the pointee type has a
   known size. The resulting ``__indexable`` has the size of a single object of
   the pointee type.

 * ``__single`` casts to ``__counted_by(E)`` only if ``E`` is 0 or 1.

 * ``__single`` can cast to ``__single`` including when they have different
   pointee types as long as it is allowed in the underlying C standard.
   ``-fbounds-safety`` doesn't guarantee type safety.

 * ``__bidi_indexable`` and ``__indexable`` can cast to ``__single``. The
   compiler may insert run-time checks to ensure the pointer has at least a
   single element or is a null pointer.

 * ``__bidi_indexable`` casts to ``__indexable`` if the pointer does not have an
   underflow. The compiler may insert run-time checks to ensure the pointer is
   not below the lower bound.

 * ``__indexable`` casts to ``__bidi_indexable``. The resulting
   ``__bidi_indexable`` gets the lower bound same as the pointer value.

 * A type conversion may involve both a bitcast and a bounds annotation cast. For
   example, casting from ``int *__bidi_indexable`` to ``char *__single`` involve
   a bitcast (``int *`` to ``char *``) and a bounds annotation cast
   (``__bidi_indexable`` to ``__single``). In this case, the compiler performs
   the bitcast and then converts the bounds annotation. This means, ``int
   *__bidi_indexable`` will be converted to ``char *__bidi_indexable`` and then
   to ``char *__single``.

 * ``__terminated_by(T)`` cannot cast to any safe pointer type without the same
   ``__terminated_by(T)`` attribute. To perform the cast, programmers can use an
   intrinsic function such as ``__unsafe_terminated_by_to_indexable(P)`` to force
   the conversion.

 * ``__terminated_by(T)`` can cast to ``__unsafe_indexable``.

 * Any type without ``__terminated_by(T)`` cannot cast to ``__terminated_by(T)``
   without explicitly using an intrinsic function to allow it.

   + ``__unsafe_terminated_by_from_indexable(T, PTR [, PTR_TO_TERM])`` casts any
     safe pointer PTR to a ``__terminated_by(T)`` pointer. ``PTR_TO_TERM`` is an
     optional argument where the programmer can provide the exact location of the
     terminator. With this argument, the function can skip reading the entire
     array in order to locate the end of the pointer (or the upper bound).
     Providing an incorrect ``PTR_TO_TERM`` causes a run-time trap.

   + ``__unsafe_forge_terminated_by(T, P, E)`` creates ``T __terminated_by(E)``
     pointer given any pointer ``P``. Tmust be a pointer type.

 Portability with toolchains that do not support the extension
 -------------------------------------------------------------

 The language model is designed so that it doesn't alter the semantics of the
 original C program, other than introducing deterministic traps where otherwise
 the behavior is undefined and/or unsafe. Clang provides a toolchain header
 (``ptrcheck.h``) that macro-defines the annotations as type attributes when
 ``-fbounds-safety`` is enabled and defines them to empty when the extension is
 disabled. Thus, the code adopting ``-fbounds-safety`` can compile with
 toolchains that do not support this extension, by including the header or adding
 macros to define the annotations to empty. For example, the toolchain not
 supporting this extension may not have a header defining ``__counted_by``, so
 the code using ``__counted_by`` must define it as nothing or include a header
 that has the define.

 .. code-block:: c

    #if defined(__has_feature) && __has_feature(bounds_safety)
    #define __counted_by(T) __attribute__((__counted_by__(T)))
    // ... other bounds annotations
    #else #define __counted_by(T) // defined as nothing
    // ... other bounds annotations
    #endif

    // expands to `void foo(int * ptr, size_t count);`
    // when extension is not enabled or not available
    void foo(int *__counted_by(count) ptr, size_t count);

 Other potential applications of bounds annotations
 ==================================================

 The bounds annotations provided by the ``-fbounds-safety`` programming model
 have potential use cases beyond the language extension itself. For example,
 static and dynamic analysis tools could use the bounds information to improve
 diagnostics for out-of-bounds accesses, even if ``-fbounds-safety`` is not used.
 The bounds annotations could be used to improve C interoperability with
 bounds-safe languages, providing a better mapping to bounds-safe types in the
 safe language interface. The bounds annotations can also serve as documentation
 specifying the relationship between declarations.

 Limitations
 ===========

 ``-fbounds-safety`` aims to bring the bounds safety guarantee to the C language,
 and it does not guarantee other types of memory safety properties. Consequently,
 it may not prevent some of the secondary bounds safety violations caused by
 other types of safety violations such as type confusion. For instance,
 ``-fbounds-safety`` does not perform type-safety checks on conversions between
 `__single`` pointers of different pointee types (e.g., ``char *__single`` →
 ``void *__single`` → ``int *__single``) beyond what the foundation languages
 (C/C++) already offer.

 ``-fbounds-safety`` heavily relies on run-time checks to keep the bounds safety
 and the soundness of the type system. This may incur significant code size
 overhead in unoptimized builds and leaving some of the adoption mistakes to be
 caught only at run time. This is not a fundamental limitation, however, because
 incrementally adding necessary static analysis will allow us to catch issues
 early on and remove unnecessary bounds checks in unoptimized builds.