| =============== |
| Opaque Pointers |
| =============== |
| |
| The Opaque Pointer Type |
| ======================= |
| |
| Traditionally, LLVM IR pointer types have contained a pointee type. For example, |
| ``i32*`` is a pointer that points to an ``i32`` somewhere in memory. However, |
| due to a lack of pointee type semantics and various issues with having pointee |
| types, there is a desire to remove pointee types from pointers. |
| |
| The opaque pointer type project aims to replace all pointer types containing |
| pointee types in LLVM with an opaque pointer type. The new pointer type is |
| represented textually as ``ptr``. |
| |
| Some instructions still need to know what type to treat the memory pointed to by |
| the pointer as. For example, a load needs to know how many bytes to load from |
| memory and what type to treat the resulting value as. In these cases, |
| instructions themselves contain a type argument. For example the load |
| instruction from older versions of LLVM |
| |
| .. code-block:: llvm |
| |
| load i64* %p |
| |
| becomes |
| |
| .. code-block:: llvm |
| |
| load i64, ptr %p |
| |
| Address spaces are still used to distinguish between different kinds of pointers |
| where the distinction is relevant for lowering (e.g. data vs function pointers |
| have different sizes on some architectures). Opaque pointers are not changing |
| anything related to address spaces and lowering. For more information, see |
| `DataLayout <LangRef.html#langref-datalayout>`_. Opaque pointers in non-default |
| address space are spelled ``ptr addrspace(N)``. |
| |
| This was proposed all the way back in |
| `2015 <https://lists.llvm.org/pipermail/llvm-dev/2015-February/081822.html>`_. |
| |
| Issues with explicit pointee types |
| ================================== |
| |
| LLVM IR pointers can be cast back and forth between pointers with different |
| pointee types. The pointee type does not necessarily represent the actual |
| underlying type in memory. In other words, the pointee type carries no real |
| semantics. |
| |
| Historically LLVM was some sort of type-safe subset of C. Having pointee types |
| provided an extra layer of checks to make sure that the Clang frontend matched |
| its frontend values/operations with the corresponding LLVM IR. However, as other |
| languages like C++ adopted LLVM, the community realized that pointee types were |
| more of a hindrance for LLVM development and that the extra type checking with |
| some frontends wasn't worth it. |
| |
| LLVM's type system was `originally designed |
| <https://llvm.org/pubs/2003-05-01-GCCSummit2003.html>`_ to support high-level |
| optimization. However, years of LLVM implementation experience have demonstrated |
| that the pointee type system design does not effectively support |
| optimization. Memory optimization algorithms, such as SROA, GVN, and AA, |
| generally need to look through LLVM's struct types and reason about the |
| underlying memory offsets. The community realized that pointee types hinder LLVM |
| development, rather than helping it. Some of the initially proposed high-level |
| optimizations have evolved into `TBAA |
| <https://llvm.org/docs/LangRef.html#tbaa-metadata>`_ due to limitations with |
| representing higher-level language information directly via SSA values. |
| |
| Pointee types provide some value to frontends because the IR verifier uses types |
| to detect straightforward type confusion bugs. However, frontends also have to |
| deal with the complexity of inserting bitcasts everywhere that they might be |
| required. The community consensus is that the costs of pointee types |
| outweight the benefits, and that they should be removed. |
| |
| Many operations do not actually care about the underlying type. These |
| operations, typically intrinsics, usually end up taking an arbitrary pointer |
| type ``i8*`` and sometimes a size. This causes lots of redundant no-op bitcasts |
| in the IR to and from a pointer with a different pointee type. |
| |
| No-op bitcasts take up memory/disk space and also take up compile time to look |
| through. However, perhaps the biggest issue is the code complexity required to |
| deal with bitcasts. When looking up through def-use chains for pointers it's |
| easy to forget to call `Value::stripPointerCasts()` to find the true underlying |
| pointer obfuscated by bitcasts. And when looking down through def-use chains |
| passes need to iterate through bitcasts to handle uses. Removing no-op pointer |
| bitcasts prevents a category of missed optimizations and makes writing LLVM |
| passes a little bit easier. |
| |
| Fewer no-op pointer bitcasts also reduces the chances of incorrect bitcasts in |
| regards to address spaces. People maintaining backends that care a lot about |
| address spaces have complained that frontends like Clang often incorrectly |
| bitcast pointers, losing address space information. |
| |
| An analogous transition that happened earlier in LLVM is integer signedness. |
| Currently there is no distinction between signed and unsigned integer types, but |
| rather each integer operation (e.g. add) contains flags to signal how to treat |
| the integer. Previously LLVM IR distinguished between unsigned and signed |
| integer types and ran into similar issues of no-op casts. The transition from |
| manifesting signedness in types to instructions happened early on in LLVM's |
| timeline to make LLVM easier to work with. |
| |
| Opaque Pointers Mode |
| ==================== |
| |
| During the transition phase, LLVM can be used in two modes: In typed pointer |
| mode all pointer types have a pointee type and opaque pointers cannot be used. |
| In opaque pointers mode (the default), all pointers are opaque. The opaque |
| pointer mode can be disabled using ``-opaque-pointers=0`` in |
| LLVM tools like ``opt``, or ``-Xclang -no-opaque-pointers`` in clang. |
| Additionally, opaque pointer mode is automatically disabled for IR and bitcode |
| files that explicitly mention ``i8*`` style typed pointers. |
| |
| In opaque pointer mode, all typed pointers used in IR, bitcode, or created |
| using ``PointerType::get()`` and similar APIs are automatically converted into |
| opaque pointers. This simplifies migration and allows testing existing IR with |
| opaque pointers. |
| |
| .. code-block:: llvm |
| |
| define i8* @test(i8* %p) { |
| %p2 = getelementptr i8, i8* %p, i64 1 |
| ret i8* %p2 |
| } |
| |
| ; Is automatically converted into the following if -opaque-pointers |
| ; is enabled: |
| |
| define ptr @test(ptr %p) { |
| %p2 = getelementptr i8, ptr %p, i64 1 |
| ret ptr %p2 |
| } |
| |
| Migration Instructions |
| ====================== |
| |
| In order to support opaque pointers, two types of changes tend to be necessary. |
| The first is the removal of all calls to ``PointerType::getElementType()`` and |
| ``Type::getPointerElementType()``. |
| |
| In the LLVM middle-end and backend, this is usually accomplished by inspecting |
| the type of relevant operations instead. For example, memory access related |
| analyses and optimizations should use the types encoded in the load and store |
| instructions instead of querying the pointer type. |
| |
| Here are some common ways to avoid pointer element type accesses: |
| |
| * For loads, use ``getType()``. |
| * For stores, use ``getValueOperand()->getType()``. |
| * Use ``getLoadStoreType()`` to handle both of the above in one call. |
| * For getelementptr instructions, use ``getSourceElementType()``. |
| * For calls, use ``getFunctionType()``. |
| * For allocas, use ``getAllocatedType()``. |
| * For globals, use ``getValueType()``. |
| * For consistency assertions, use |
| ``PointerType::isOpaqueOrPointeeTypeEquals()``. |
| * To create a pointer type in a different address space, use |
| ``PointerType::getWithSamePointeeType()``. |
| * To check that two pointers have the same element type, use |
| ``PointerType::hasSameElementTypeAs()``. |
| * While it is preferred to write code in a way that accepts both typed and |
| opaque pointers, ``Type::isOpaquePointerTy()`` and |
| ``PointerType::isOpaque()`` can be used to handle opaque pointers specially. |
| ``PointerType::getNonOpaquePointerElementType()`` can be used as a marker in |
| code-paths where opaque pointers have been explicitly excluded. |
| * To get the type of a byval argument, use ``getParamByValType()``. Similar |
| method exists for other ABI-affecting attributes that need to know the |
| element type, such as byref, sret, inalloca and preallocated. |
| * Some intrinsics require an ``elementtype`` attribute, which can be retrieved |
| using ``getParamElementType()``. This attribute is required in cases where |
| the intrinsic does not naturally encode a needed element type. This is also |
| used for inline assembly. |
| |
| Note that some of the methods mentioned above only exist to support both typed |
| and opaque pointers at the same time, and will be dropped once the migration |
| has completed. For example, ``isOpaqueOrPointeeTypeEquals()`` becomes |
| meaningless once all pointers are opaque. |
| |
| While direct usage of pointer element types is immediately apparent in code, |
| there is a more subtle issue that opaque pointers need to contend with: A lot |
| of code assumes that pointer equality also implies that the used load/store |
| type or GEP source element type is the same. Consider the following examples |
| with typed an opaque pointers: |
| |
| .. code-block:: llvm |
| |
| define i32 @test(i32* %p) { |
| store i32 0, i32* %p |
| %bc = bitcast i32* %p to i64* |
| %v = load i64, i64* %bc |
| ret i64 %v |
| } |
| |
| define i32 @test(ptr %p) { |
| store i32 0, ptr %p |
| %v = load i64, ptr %p |
| ret i64 %v |
| } |
| |
| Without opaque pointers, a check that the pointer operand of the load and |
| store are the same also ensures that the accessed type is the same. Using a |
| different type requires a bitcast, which will result in distinct pointer |
| operands. |
| |
| With opaque pointers, the bitcast is not present, and this check is no longer |
| sufficient. In the above example, it could result in store to load forwarding |
| of an incorrect type. Code making such assumptions needs to be adjusted to |
| check the accessed type explicitly: |
| ``LI->getType() == SI->getValueOperand()->getType()``. |
| |
| Frontends |
| --------- |
| |
| Frontends need to be adjusted to track pointee types independently of LLVM, |
| insofar as they are necessary for lowering. For example, clang now tracks the |
| pointee type in the ``Address`` structure. |
| |
| Frontends using the C API through an FFI interface should be aware that a |
| number of C API functions are deprecated and will be removed as part of the |
| opaque pointer transition:: |
| |
| LLVMBuildLoad -> LLVMBuildLoad2 |
| LLVMBuildCall -> LLVMBuildCall2 |
| LLVMBuildInvoke -> LLVMBuildInvoke2 |
| LLVMBuildGEP -> LLVMBuildGEP2 |
| LLVMBuildInBoundsGEP -> LLVMBuildInBoundsGEP2 |
| LLVMBuildStructGEP -> LLVMBuildStructGEP2 |
| LLVMBuildPtrDiff -> LLVMBuildPtrDiff2 |
| LLVMConstGEP -> LLVMConstGEP2 |
| LLVMConstInBoundsGEP -> LLVMConstInBoundsGEP2 |
| LLVMAddAlias -> LLVMAddAlias2 |
| |
| Additionally, it will no longer be possible to call ``LLVMGetElementType()`` |
| on a pointer type. |
| |
| It is possible to control whether opaque pointers are used (if you want to |
| override the default) using ``LLVMContext::setOpaquePointers``. |
| |
| Temporarily disabling opaque pointers |
| ===================================== |
| |
| In LLVM 15, opaque pointers are enabled by default, but it it still possible to |
| use typed pointers using a number of opt-in flags. |
| |
| For users of the clang driver interface, it is possible to temporarily restore |
| the old default using the ``-DCLANG_ENABLE_OPAQUE_POINTERS=OFF`` cmake option, |
| or by passing ``-Xclang -no-opaque-pointers`` to a single clang invocation. |
| |
| For users of the clang cc1 interface, ``-no-opaque-pointers`` can be passed. |
| Note that the ``CLANG_ENABLE_OPAQUE_POINTERS`` cmake option has no effect on |
| the cc1 interface. |
| |
| Usage for LTO can be disabled by passing ``-Wl,-plugin-opt=no-opaque-pointers`` |
| to the clang driver. |
| |
| For users of LLVM as a library, opaque pointers can be disabled by calling |
| ``setOpaquePointers(false)`` on the ``LLVMContext``. |
| |
| For users of LLVM tools like opt, opaque pointers can be disabled by passing |
| ``-opaque-pointers=0``. |
| |
| Version Support |
| =============== |
| |
| **LLVM 14:** Supports all necessary APIs for migrating to opaque pointers and deprecates/removes incompatible APIs. However, using opaque pointers in the optimization pipeline is **not** fully supported. This release can be used to make out-of-tree code compatible with opaque pointers, but opaque pointers should **not** be enabled in production. |
| |
| **LLVM 15:** Opaque pointers are enabled by default. Typed pointers are still |
| supported. |
| |
| **LLVM 16:** Opaque pointers are enabled by default. Typed pointers are |
| supported on a best-effort basis only and not tested. |
| |
| **LLVM 17:** Only opaque pointers are supported. Typed pointers are not |
| supported. |
| |
| Transition State |
| ================ |
| |
| As of January 2023: |
| |
| Typed pointers are **not** supported on the ``main`` branch as a matter of |
| policy. Fixes for typed pointer support are not accepted. Typed pointer |
| support code may be removed without notice at any time. |
| |
| However, tests are still in the process of being converted to opaque pointers. |
| As such, care must be taken when actively removing typed pointer support, to |
| avoid breaking remaining tests. |
| |
| The following typed pointer functionality has already been removed: |
| |
| * The ``CLANG_ENABLE_OPAQUE_POINTERS`` cmake flag is no longer supported. |
| * C APIs that do not support opaque pointers (like ``LLVMBuildLoad``) are no |
| longer supported. |
| * Typed pointer IR and bitcode is implicitly upgraded to use opaque pointers, |
| unless ``-opaque-pointers=0`` is passed. |
| |
| The following typed pointer functionality is still to be removed: |
| |
| * The ``-no-opaque-pointers`` cc1 flag, ``-opaque-pointers=0`` opt flag and |
| ``-plugin-opt=no-opaque-pointers`` lto flag. |
| * Support for typed pointers in LLVM libraries. |