Philip Reames | f612322 | 2014-12-02 19:37:00 +0000 | [diff] [blame] | 1 | ===================================== |
| 2 | Garbage Collection Safepoints in LLVM |
| 3 | ===================================== |
| 4 | |
| 5 | .. contents:: |
| 6 | :local: |
| 7 | :depth: 2 |
| 8 | |
| 9 | Status |
| 10 | ======= |
| 11 | |
Philip Reames | 0d98ada | 2017-04-19 23:16:13 +0000 | [diff] [blame] | 12 | This document describes a set of extensions to LLVM to support garbage |
Shao-Ce SUN | 0c66025 | 2021-11-15 09:17:08 +0800 | [diff] [blame] | 13 | collection. By now, these mechanisms are well proven with commercial java |
| 14 | implementation with a fully relocating collector having shipped using them. |
Philip Reames | 0d98ada | 2017-04-19 23:16:13 +0000 | [diff] [blame] | 15 | There are a couple places where bugs might still linger; these are called out |
| 16 | below. |
Philip Reames | f612322 | 2014-12-02 19:37:00 +0000 | [diff] [blame] | 17 | |
Philip Reames | 0d98ada | 2017-04-19 23:16:13 +0000 | [diff] [blame] | 18 | They are still listed as "experimental" to indicate that no forward or backward |
Shao-Ce SUN | 0c66025 | 2021-11-15 09:17:08 +0800 | [diff] [blame] | 19 | compatibility guarantees are offered across versions. If your use case is such |
| 20 | that you need some form of forward compatibility guarantee, please raise the |
| 21 | issue on the llvm-dev mailing list. |
Philip Reames | 0d98ada | 2017-04-19 23:16:13 +0000 | [diff] [blame] | 22 | |
Shao-Ce SUN | 0c66025 | 2021-11-15 09:17:08 +0800 | [diff] [blame] | 23 | LLVM still supports an alternate mechanism for conservative garbage collection |
Philip Reames | 0d98ada | 2017-04-19 23:16:13 +0000 | [diff] [blame] | 24 | support using the ``gcroot`` intrinsic. The ``gcroot`` mechanism is mostly of |
Sanjoy Das | 25e71d8 | 2017-04-19 23:55:03 +0000 | [diff] [blame] | 25 | historical interest at this point with one exception - its implementation of |
Philip Reames | 0d98ada | 2017-04-19 23:16:13 +0000 | [diff] [blame] | 26 | shadow stacks has been used successfully by a number of language frontends and |
Shao-Ce SUN | 0c66025 | 2021-11-15 09:17:08 +0800 | [diff] [blame] | 27 | is still supported. |
Philip Reames | f612322 | 2014-12-02 19:37:00 +0000 | [diff] [blame] | 28 | |
Philip Reames | 9920f8d | 2018-11-09 16:27:04 +0000 | [diff] [blame] | 29 | Overview & Core Concepts |
| 30 | ======================== |
Philip Reames | f612322 | 2014-12-02 19:37:00 +0000 | [diff] [blame] | 31 | |
Philip Reames | dfc238b | 2015-01-02 19:46:49 +0000 | [diff] [blame] | 32 | To collect dead objects, garbage collectors must be able to identify |
| 33 | any references to objects contained within executing code, and, |
| 34 | depending on the collector, potentially update them. The collector |
| 35 | does not need this information at all points in code - that would make |
| 36 | the problem much harder - but only at well-defined points in the |
| 37 | execution known as 'safepoints' For most collectors, it is sufficient |
| 38 | to track at least one copy of each unique pointer value. However, for |
| 39 | a collector which wishes to relocate objects directly reachable from |
| 40 | running code, a higher standard is required. |
Philip Reames | f612322 | 2014-12-02 19:37:00 +0000 | [diff] [blame] | 41 | |
Philip Reames | dfc238b | 2015-01-02 19:46:49 +0000 | [diff] [blame] | 42 | One additional challenge is that the compiler may compute intermediate |
| 43 | results ("derived pointers") which point outside of the allocation or |
| 44 | even into the middle of another allocation. The eventual use of this |
| 45 | intermediate value must yield an address within the bounds of the |
| 46 | allocation, but such "exterior derived pointers" may be visible to the |
| 47 | collector. Given this, a garbage collector can not safely rely on the |
| 48 | runtime value of an address to indicate the object it is associated |
| 49 | with. If the garbage collector wishes to move any object, the |
| 50 | compiler must provide a mapping, for each pointer, to an indication of |
| 51 | its allocation. |
Philip Reames | f612322 | 2014-12-02 19:37:00 +0000 | [diff] [blame] | 52 | |
Philip Reames | dfc238b | 2015-01-02 19:46:49 +0000 | [diff] [blame] | 53 | To simplify the interaction between a collector and the compiled code, |
| 54 | most garbage collectors are organized in terms of three abstractions: |
| 55 | load barriers, store barriers, and safepoints. |
Philip Reames | f612322 | 2014-12-02 19:37:00 +0000 | [diff] [blame] | 56 | |
Philip Reames | dfc238b | 2015-01-02 19:46:49 +0000 | [diff] [blame] | 57 | #. A load barrier is a bit of code executed immediately after the |
| 58 | machine load instruction, but before any use of the value loaded. |
| 59 | Depending on the collector, such a barrier may be needed for all |
| 60 | loads, merely loads of a particular type (in the original source |
| 61 | language), or none at all. |
Philip Reames | f612322 | 2014-12-02 19:37:00 +0000 | [diff] [blame] | 62 | |
Bruce Mitchener | e9ffb45 | 2015-09-12 01:17:08 +0000 | [diff] [blame] | 63 | #. Analogously, a store barrier is a code fragment that runs |
Philip Reames | dfc238b | 2015-01-02 19:46:49 +0000 | [diff] [blame] | 64 | immediately before the machine store instruction, but after the |
| 65 | computation of the value stored. The most common use of a store |
| 66 | barrier is to update a 'card table' in a generational garbage |
| 67 | collector. |
Philip Reames | f612322 | 2014-12-02 19:37:00 +0000 | [diff] [blame] | 68 | |
Philip Reames | dfc238b | 2015-01-02 19:46:49 +0000 | [diff] [blame] | 69 | #. A safepoint is a location at which pointers visible to the compiled |
| 70 | code (i.e. currently in registers or on the stack) are allowed to |
| 71 | change. After the safepoint completes, the actual pointer value |
| 72 | may differ, but the 'object' (as seen by the source language) |
| 73 | pointed to will not. |
Philip Reames | f612322 | 2014-12-02 19:37:00 +0000 | [diff] [blame] | 74 | |
Philip Reames | dfc238b | 2015-01-02 19:46:49 +0000 | [diff] [blame] | 75 | Note that the term 'safepoint' is somewhat overloaded. It refers to |
| 76 | both the location at which the machine state is parsable and the |
| 77 | coordination protocol involved in bring application threads to a |
| 78 | point at which the collector can safely use that information. The |
| 79 | term "statepoint" as used in this document refers exclusively to the |
| 80 | former. |
Philip Reames | f612322 | 2014-12-02 19:37:00 +0000 | [diff] [blame] | 81 | |
Philip Reames | dfc238b | 2015-01-02 19:46:49 +0000 | [diff] [blame] | 82 | This document focuses on the last item - compiler support for |
| 83 | safepoints in generated code. We will assume that an outside |
| 84 | mechanism has decided where to place safepoints. From our |
| 85 | perspective, all safepoints will be function calls. To support |
| 86 | relocation of objects directly reachable from values in compiled code, |
| 87 | the collector must be able to: |
| 88 | |
| 89 | #. identify every copy of a pointer (including copies introduced by |
| 90 | the compiler itself) at the safepoint, |
Philip Reames | f612322 | 2014-12-02 19:37:00 +0000 | [diff] [blame] | 91 | #. identify which object each pointer relates to, and |
| 92 | #. potentially update each of those copies. |
| 93 | |
Philip Reames | dfc238b | 2015-01-02 19:46:49 +0000 | [diff] [blame] | 94 | This document describes the mechanism by which an LLVM based compiler |
| 95 | can provide this information to a language runtime/collector, and |
Philip Reames | 78b4645 | 2018-11-08 22:56:41 +0000 | [diff] [blame] | 96 | ensure that all pointers can be read and updated if desired. |
| 97 | |
| 98 | Abstract Machine Model |
| 99 | ^^^^^^^^^^^^^^^^^^^^^^^ |
Philip Reames | 0d98ada | 2017-04-19 23:16:13 +0000 | [diff] [blame] | 100 | |
Shao-Ce SUN | 0c66025 | 2021-11-15 09:17:08 +0800 | [diff] [blame] | 101 | At a high level, LLVM has been extended to support compiling to an abstract |
| 102 | machine which extends the actual target with a non-integral pointer type |
| 103 | suitable for representing a garbage collected reference to an object. In |
| 104 | particular, such non-integral pointer type have no defined mapping to an |
| 105 | integer representation. This semantic quirk allows the runtime to pick a |
| 106 | integer mapping for each point in the program allowing relocations of objects |
Philip Reames | 0d98ada | 2017-04-19 23:16:13 +0000 | [diff] [blame] | 107 | without visible effects. |
| 108 | |
Philip Reames | 78b4645 | 2018-11-08 22:56:41 +0000 | [diff] [blame] | 109 | This high level abstract machine model is used for most of the optimizer. As |
| 110 | a result, transform passes do not need to be extended to look through explicit |
| 111 | relocation sequence. Before starting code generation, we switch |
| 112 | representations to an explicit form. The exact location chosen for lowering |
| 113 | is an implementation detail. |
| 114 | |
| 115 | Note that most of the value of the abstract machine model comes for collectors |
| 116 | which need to model potentially relocatable objects. For a compiler which |
| 117 | supports only a non-relocating collector, you may wish to consider starting |
Shao-Ce SUN | 0c66025 | 2021-11-15 09:17:08 +0800 | [diff] [blame] | 118 | with the fully explicit form. |
Philip Reames | 0d98ada | 2017-04-19 23:16:13 +0000 | [diff] [blame] | 119 | |
Shao-Ce SUN | 0c66025 | 2021-11-15 09:17:08 +0800 | [diff] [blame] | 120 | Warning: There is one currently known semantic hole in the definition of |
Philip Reames | 0d98ada | 2017-04-19 23:16:13 +0000 | [diff] [blame] | 121 | non-integral pointers which has not been addressed upstream. To work around |
Shao-Ce SUN | 0c66025 | 2021-11-15 09:17:08 +0800 | [diff] [blame] | 122 | this, you need to disable speculation of loads unless the memory type |
| 123 | (non-integral pointer vs anything else) is known to unchanged. That is, it is |
| 124 | not safe to speculate a load if doing causes a non-integral pointer value to |
| 125 | be loaded as any other type or vice versa. In practice, this restriction is |
Philip Reames | 0d98ada | 2017-04-19 23:16:13 +0000 | [diff] [blame] | 126 | well isolated to isSafeToSpeculate in ValueTracking.cpp. |
| 127 | |
Philip Reames | 78b4645 | 2018-11-08 22:56:41 +0000 | [diff] [blame] | 128 | Explicit Representation |
| 129 | ^^^^^^^^^^^^^^^^^^^^^^^ |
| 130 | |
Shao-Ce SUN | 0c66025 | 2021-11-15 09:17:08 +0800 | [diff] [blame] | 131 | A frontend could directly generate this low level explicit form, but |
Philip Reames | 78b4645 | 2018-11-08 22:56:41 +0000 | [diff] [blame] | 132 | doing so may inhibit optimization. Instead, it is recommended that |
| 133 | compilers with relocating collectors target the abstract machine model just |
Shao-Ce SUN | 0c66025 | 2021-11-15 09:17:08 +0800 | [diff] [blame] | 134 | described. |
Philip Reames | 0d98ada | 2017-04-19 23:16:13 +0000 | [diff] [blame] | 135 | |
Shao-Ce SUN | 0c66025 | 2021-11-15 09:17:08 +0800 | [diff] [blame] | 136 | The heart of the explicit approach is to construct (or rewrite) the IR in a |
Philip Reames | 0d98ada | 2017-04-19 23:16:13 +0000 | [diff] [blame] | 137 | manner where the possible updates performed by the garbage collector are |
Philip Reames | dfc238b | 2015-01-02 19:46:49 +0000 | [diff] [blame] | 138 | explicitly visible in the IR. Doing so requires that we: |
Philip Reames | f612322 | 2014-12-02 19:37:00 +0000 | [diff] [blame] | 139 | |
Philip Reames | dfc238b | 2015-01-02 19:46:49 +0000 | [diff] [blame] | 140 | #. create a new SSA value for each potentially relocated pointer, and |
| 141 | ensure that no uses of the original (non relocated) value is |
| 142 | reachable after the safepoint, |
| 143 | #. specify the relocation in a way which is opaque to the compiler to |
| 144 | ensure that the optimizer can not introduce new uses of an |
| 145 | unrelocated value after a statepoint. This prevents the optimizer |
| 146 | from performing unsound optimizations. |
| 147 | #. recording a mapping of live pointers (and the allocation they're |
| 148 | associated with) for each statepoint. |
Philip Reames | f612322 | 2014-12-02 19:37:00 +0000 | [diff] [blame] | 149 | |
Philip Reames | dfc238b | 2015-01-02 19:46:49 +0000 | [diff] [blame] | 150 | At the most abstract level, inserting a safepoint can be thought of as |
| 151 | replacing a call instruction with a call to a multiple return value |
| 152 | function which both calls the original target of the call, returns |
Sanjoy Das | 25e71d8 | 2017-04-19 23:55:03 +0000 | [diff] [blame] | 153 | its result, and returns updated values for any live pointers to |
Philip Reames | dfc238b | 2015-01-02 19:46:49 +0000 | [diff] [blame] | 154 | garbage collected objects. |
Philip Reames | f612322 | 2014-12-02 19:37:00 +0000 | [diff] [blame] | 155 | |
Philip Reames | dfc238b | 2015-01-02 19:46:49 +0000 | [diff] [blame] | 156 | Note that the task of identifying all live pointers to garbage |
| 157 | collected values, transforming the IR to expose a pointer giving the |
| 158 | base object for every such live pointer, and inserting all the |
| 159 | intrinsics correctly is explicitly out of scope for this document. |
Shao-Ce SUN | 0c66025 | 2021-11-15 09:17:08 +0800 | [diff] [blame] | 160 | The recommended approach is to use the :ref:`utility passes |
| 161 | <statepoint-utilities>` described below. |
Philip Reames | f612322 | 2014-12-02 19:37:00 +0000 | [diff] [blame] | 162 | |
Philip Reames | dfc238b | 2015-01-02 19:46:49 +0000 | [diff] [blame] | 163 | This abstract function call is concretely represented by a sequence of |
Philip Reames | 5017ab5 | 2015-02-26 01:18:21 +0000 | [diff] [blame] | 164 | intrinsic calls known collectively as a "statepoint relocation sequence". |
Philip Reames | f612322 | 2014-12-02 19:37:00 +0000 | [diff] [blame] | 165 | |
| 166 | Let's consider a simple call in LLVM IR: |
Philip Reames | f612322 | 2014-12-02 19:37:00 +0000 | [diff] [blame] | 167 | |
Philip Reames | 5017ab5 | 2015-02-26 01:18:21 +0000 | [diff] [blame] | 168 | .. code-block:: llvm |
Philip Reames | f612322 | 2014-12-02 19:37:00 +0000 | [diff] [blame] | 169 | |
Philip Reames | f05145c | 2024-08-26 13:15:28 -0700 | [diff] [blame] | 170 | declare void @foo() |
| 171 | define ptr addrspace(1) @test1(ptr addrspace(1) %obj) |
Philip Reames | 5017ab5 | 2015-02-26 01:18:21 +0000 | [diff] [blame] | 172 | gc "statepoint-example" { |
Philip Reames | f05145c | 2024-08-26 13:15:28 -0700 | [diff] [blame] | 173 | call void @foo() |
| 174 | ret ptr addrspace(1) %obj |
Philip Reames | 5017ab5 | 2015-02-26 01:18:21 +0000 | [diff] [blame] | 175 | } |
Philip Reames | f612322 | 2014-12-02 19:37:00 +0000 | [diff] [blame] | 176 | |
Shao-Ce SUN | 0c66025 | 2021-11-15 09:17:08 +0800 | [diff] [blame] | 177 | Depending on our language we may need to allow a safepoint during the execution |
| 178 | of ``foo``. If so, we need to let the collector update local values in the |
| 179 | current frame. If we don't, we'll be accessing a potential invalid reference |
Philip Reames | 5017ab5 | 2015-02-26 01:18:21 +0000 | [diff] [blame] | 180 | once we eventually return from the call. |
| 181 | |
Shao-Ce SUN | 0c66025 | 2021-11-15 09:17:08 +0800 | [diff] [blame] | 182 | In this example, we need to relocate the SSA value ``%obj``. Since we can't |
| 183 | actually change the value in the SSA value ``%obj``, we need to introduce a new |
Philip Reames | 5017ab5 | 2015-02-26 01:18:21 +0000 | [diff] [blame] | 184 | SSA value ``%obj.relocated`` which represents the potentially changed value of |
Shao-Ce SUN | 0c66025 | 2021-11-15 09:17:08 +0800 | [diff] [blame] | 185 | ``%obj`` after the safepoint and update any following uses appropriately. The |
Philip Reames | 5017ab5 | 2015-02-26 01:18:21 +0000 | [diff] [blame] | 186 | resulting relocation sequence is: |
| 187 | |
Nuno Lopes | e02fcee | 2017-07-26 14:11:23 +0000 | [diff] [blame] | 188 | .. code-block:: llvm |
Philip Reames | 5017ab5 | 2015-02-26 01:18:21 +0000 | [diff] [blame] | 189 | |
Philip Reames | f05145c | 2024-08-26 13:15:28 -0700 | [diff] [blame] | 190 | define ptr addrspace(1) @test(ptr addrspace(1) %obj) |
Philip Reames | 5017ab5 | 2015-02-26 01:18:21 +0000 | [diff] [blame] | 191 | gc "statepoint-example" { |
Philip Reames | f05145c | 2024-08-26 13:15:28 -0700 | [diff] [blame] | 192 | %safepoint = call token (i64, i32, ptr, i32, i32, ...) @llvm.experimental.gc.statepoint.p0f_isVoidf(i64 0, i32 0, ptr elementtype(void ()) @foo, i32 0, i32 0, i32 0, i32 0) ["gc-live" (ptr addrspace(1) %obj)] |
| 193 | %obj.relocated = call ptr addrspace(1) @llvm.experimental.gc.relocate.p1(token %safepoint, i32 0, i32 0) |
| 194 | ret ptr addrspace(1) %obj.relocated |
Philip Reames | 5017ab5 | 2015-02-26 01:18:21 +0000 | [diff] [blame] | 195 | } |
Philip Reames | f612322 | 2014-12-02 19:37:00 +0000 | [diff] [blame] | 196 | |
Philip Reames | dfc238b | 2015-01-02 19:46:49 +0000 | [diff] [blame] | 197 | Ideally, this sequence would have been represented as a M argument, N |
| 198 | return value function (where M is the number of values being |
| 199 | relocated + the original call arguments and N is the original return |
| 200 | value + each relocated value), but LLVM does not easily support such a |
| 201 | representation. |
| 202 | |
| 203 | Instead, the statepoint intrinsic marks the actual site of the |
| 204 | safepoint or statepoint. The statepoint returns a token value (which |
| 205 | exists only at compile time). To get back the original return value |
Philip Reames | 5017ab5 | 2015-02-26 01:18:21 +0000 | [diff] [blame] | 206 | of the call, we use the ``gc.result`` intrinsic. To get the relocation |
| 207 | of each pointer in turn, we use the ``gc.relocate`` intrinsic with the |
| 208 | appropriate index. Note that both the ``gc.relocate`` and ``gc.result`` are |
Shao-Ce SUN | 0c66025 | 2021-11-15 09:17:08 +0800 | [diff] [blame] | 209 | tied to the statepoint. The combination forms a "statepoint relocation |
Bruce Mitchener | e9ffb45 | 2015-09-12 01:17:08 +0000 | [diff] [blame] | 210 | sequence" and represents the entirety of a parseable call or 'statepoint'. |
Philip Reames | f612322 | 2014-12-02 19:37:00 +0000 | [diff] [blame] | 211 | |
Philip Reames | 5017ab5 | 2015-02-26 01:18:21 +0000 | [diff] [blame] | 212 | When lowered, this example would generate the following x86 assembly: |
| 213 | |
| 214 | .. code-block:: gas |
Shao-Ce SUN | 0c66025 | 2021-11-15 09:17:08 +0800 | [diff] [blame] | 215 | |
Philip Reames | 5017ab5 | 2015-02-26 01:18:21 +0000 | [diff] [blame] | 216 | .globl test1 |
| 217 | .align 16, 0x90 |
| 218 | pushq %rax |
| 219 | callq foo |
| 220 | .Ltmp1: |
| 221 | movq (%rsp), %rax # This load is redundant (oops!) |
| 222 | popq %rdx |
| 223 | retq |
Philip Reames | f612322 | 2014-12-02 19:37:00 +0000 | [diff] [blame] | 224 | |
Philip Reames | dfc238b | 2015-01-02 19:46:49 +0000 | [diff] [blame] | 225 | Each of the potentially relocated values has been spilled to the |
| 226 | stack, and a record of that location has been recorded to the |
Philip Reames | 5017ab5 | 2015-02-26 01:18:21 +0000 | [diff] [blame] | 227 | :ref:`Stack Map section <stackmap-section>`. If the garbage collector |
Philip Reames | dfc238b | 2015-01-02 19:46:49 +0000 | [diff] [blame] | 228 | needs to update any of these pointers during the call, it knows |
| 229 | exactly what to change. |
Philip Reames | f612322 | 2014-12-02 19:37:00 +0000 | [diff] [blame] | 230 | |
Philip Reames | 5017ab5 | 2015-02-26 01:18:21 +0000 | [diff] [blame] | 231 | The relevant parts of the StackMap section for our example are: |
| 232 | |
| 233 | .. code-block:: gas |
Shao-Ce SUN | 0c66025 | 2021-11-15 09:17:08 +0800 | [diff] [blame] | 234 | |
Philip Reames | 5017ab5 | 2015-02-26 01:18:21 +0000 | [diff] [blame] | 235 | # This describes the call site |
| 236 | # Stack Maps: callsite 2882400000 |
| 237 | .quad 2882400000 |
| 238 | .long .Ltmp1-test1 |
| 239 | .short 0 |
| 240 | # .. 8 entries skipped .. |
| 241 | # This entry describes the spill slot which is directly addressable |
Shao-Ce SUN | 0c66025 | 2021-11-15 09:17:08 +0800 | [diff] [blame] | 242 | # off RSP with offset 0. Given the value was spilled with a pushq, |
Philip Reames | 5017ab5 | 2015-02-26 01:18:21 +0000 | [diff] [blame] | 243 | # that makes sense. |
| 244 | # Stack Maps: Loc 8: Direct RSP [encoding: .byte 2, .byte 8, .short 7, .int 0] |
| 245 | .byte 2 |
| 246 | .byte 8 |
| 247 | .short 7 |
| 248 | .long 0 |
| 249 | |
Sanjoy Das | 25e71d8 | 2017-04-19 23:55:03 +0000 | [diff] [blame] | 250 | This example was taken from the tests for the :ref:`RewriteStatepointsForGC` |
| 251 | utility pass. As such, its full StackMap can be easily examined with the |
| 252 | following command. |
Philip Reames | 5017ab5 | 2015-02-26 01:18:21 +0000 | [diff] [blame] | 253 | |
| 254 | .. code-block:: bash |
| 255 | |
| 256 | opt -rewrite-statepoints-for-gc test/Transforms/RewriteStatepointsForGC/basics.ll -S | llc -debug-only=stackmaps |
| 257 | |
Philip Reames | e777f01 | 2018-11-08 23:07:04 +0000 | [diff] [blame] | 258 | Simplifications for Non-Relocating GCs |
| 259 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
| 260 | |
| 261 | Some of the complexity in the previous example is unnecessary for a |
| 262 | non-relocating collector. While a non-relocating collector still needs the |
| 263 | information about which location contain live references, it doesn't need to |
| 264 | represent explicit relocations. As such, the previously described explicit |
| 265 | lowering can be simplified to remove all of the ``gc.relocate`` intrinsic |
Shao-Ce SUN | 0c66025 | 2021-11-15 09:17:08 +0800 | [diff] [blame] | 266 | calls and leave uses in terms of the original reference value. |
Philip Reames | e777f01 | 2018-11-08 23:07:04 +0000 | [diff] [blame] | 267 | |
| 268 | Here's the explicit lowering for the previous example for a non-relocating |
| 269 | collector: |
| 270 | |
| 271 | .. code-block:: llvm |
| 272 | |
Philip Reames | f05145c | 2024-08-26 13:15:28 -0700 | [diff] [blame] | 273 | define void @manual_frame(ptr %a, ptr %b) gc "statepoint-example" { |
| 274 | %alloca = alloca ptr |
| 275 | %allocb = alloca ptr |
| 276 | store ptr %a, ptr %alloca |
| 277 | store ptr %b, ptr %allocb |
| 278 | call token (i64, i32, ptr, i32, i32, ...) @llvm.experimental.gc.statepoint.p0(i64 0, i32 0, ptr elementtype(void ()) @func, i32 0, i32 0, i32 0, i32 0) ["gc-live" (ptr %alloca, ptr %allocb)] |
| 279 | ret void |
Philip Reames | e777f01 | 2018-11-08 23:07:04 +0000 | [diff] [blame] | 280 | } |
| 281 | |
Philip Reames | 8c7b787 | 2018-11-08 23:20:40 +0000 | [diff] [blame] | 282 | Recording On Stack Regions |
| 283 | ^^^^^^^^^^^^^^^^^^^^^^^^^^ |
| 284 | |
| 285 | In addition to the explicit relocation form previously described, the |
| 286 | statepoint infrastructure also allows the listing of allocas within the gc |
| 287 | pointer list. Allocas can be listed with or without additional explicit gc |
| 288 | pointer values and relocations. |
| 289 | |
| 290 | An alloca in the gc region of the statepoint operand list will cause the |
| 291 | address of the stack region to be listed in the stackmap for the statepoint. |
| 292 | |
| 293 | This mechanism can be used to describe explicit spill slots if desired. It |
| 294 | then becomes the generator's responsibility to ensure that values are |
| 295 | spill/filled to/from the alloca as needed on either side of the safepoint. |
| 296 | Note that there is no way to indicate a corresponding base pointer for such |
| 297 | an explicitly specified spill slot, so usage is restricted to values for |
| 298 | which the associated collector can derive the object base from the pointer |
| 299 | itself. |
| 300 | |
| 301 | This mechanism can be used to describe on stack objects containing |
| 302 | references provided that the collector can map from the location on the |
| 303 | stack to a heap map describing the internal layout of the references the |
| 304 | collector needs to process. |
| 305 | |
| 306 | WARNING: At the moment, this alternate form is not well exercised. It is |
| 307 | recommended to use this with caution and expect to have to fix a few bugs. |
| 308 | In particular, the RewriteStatepointsForGC utility pass does not do |
| 309 | anything for allocas today. |
Shao-Ce SUN | 0c66025 | 2021-11-15 09:17:08 +0800 | [diff] [blame] | 310 | |
Philip Reames | c9e5444 | 2015-08-26 17:25:36 +0000 | [diff] [blame] | 311 | Base & Derived Pointers |
| 312 | ^^^^^^^^^^^^^^^^^^^^^^^ |
| 313 | |
Philip Reames | ca22b86 | 2015-08-26 23:13:35 +0000 | [diff] [blame] | 314 | A "base pointer" is one which points to the starting address of an allocation |
| 315 | (object). A "derived pointer" is one which is offset from a base pointer by |
Shao-Ce SUN | 0c66025 | 2021-11-15 09:17:08 +0800 | [diff] [blame] | 316 | some amount. When relocating objects, a garbage collector needs to be able |
| 317 | to relocate each derived pointer associated with an allocation to the same |
Philip Reames | ca22b86 | 2015-08-26 23:13:35 +0000 | [diff] [blame] | 318 | offset from the new address. |
Philip Reames | c9e5444 | 2015-08-26 17:25:36 +0000 | [diff] [blame] | 319 | |
Shao-Ce SUN | 0c66025 | 2021-11-15 09:17:08 +0800 | [diff] [blame] | 320 | "Interior derived pointers" remain within the bounds of the allocation |
| 321 | they're associated with. As a result, the base object can be found at |
Philip Reames | ca22b86 | 2015-08-26 23:13:35 +0000 | [diff] [blame] | 322 | runtime provided the bounds of allocations are known to the runtime system. |
| 323 | |
| 324 | "Exterior derived pointers" are outside the bounds of the associated object; |
| 325 | they may even fall within *another* allocations address range. As a result, |
Shao-Ce SUN | 0c66025 | 2021-11-15 09:17:08 +0800 | [diff] [blame] | 326 | there is no way for a garbage collector to determine which allocation they |
Philip Reames | ca22b86 | 2015-08-26 23:13:35 +0000 | [diff] [blame] | 327 | are associated with at runtime and compiler support is needed. |
| 328 | |
| 329 | The ``gc.relocate`` intrinsic supports an explicit operand for describing the |
Shao-Ce SUN | 0c66025 | 2021-11-15 09:17:08 +0800 | [diff] [blame] | 330 | allocation associated with a derived pointer. This operand is frequently |
Philip Reames | ca22b86 | 2015-08-26 23:13:35 +0000 | [diff] [blame] | 331 | referred to as the base operand, but does not strictly speaking have to be |
| 332 | a base pointer, but it does need to lie within the bounds of the associated |
| 333 | allocation. Some collectors may require that the operand be an actual base |
Shao-Ce SUN | 0c66025 | 2021-11-15 09:17:08 +0800 | [diff] [blame] | 334 | pointer rather than merely an internal derived pointer. Note that during |
| 335 | lowering both the base and derived pointer operands are required to be live |
| 336 | over the associated call safepoint even if the base is otherwise unused |
Philip Reames | ca22b86 | 2015-08-26 23:13:35 +0000 | [diff] [blame] | 337 | afterwards. |
| 338 | |
Sanjoy Das | a34ce95 | 2016-01-20 19:50:25 +0000 | [diff] [blame] | 339 | .. _gc_transition_args: |
Philip Reames | 5017ab5 | 2015-02-26 01:18:21 +0000 | [diff] [blame] | 340 | |
Pat Gavlin | cc0431d | 2015-05-08 18:07:42 +0000 | [diff] [blame] | 341 | GC Transitions |
| 342 | ^^^^^^^^^^^^^^^^^^ |
Philip Reames | 5017ab5 | 2015-02-26 01:18:21 +0000 | [diff] [blame] | 343 | |
Pat Gavlin | cc0431d | 2015-05-08 18:07:42 +0000 | [diff] [blame] | 344 | As a practical consideration, many garbage-collected systems allow code that is |
| 345 | collector-aware ("managed code") to call code that is not collector-aware |
| 346 | ("unmanaged code"). It is common that such calls must also be safepoints, since |
| 347 | it is desirable to allow the collector to run during the execution of |
Sylvestre Ledru | 84666a1 | 2016-02-14 20:16:22 +0000 | [diff] [blame] | 348 | unmanaged code. Furthermore, it is common that coordinating the transition from |
Pat Gavlin | cc0431d | 2015-05-08 18:07:42 +0000 | [diff] [blame] | 349 | managed to unmanaged code requires extra code generation at the call site to |
| 350 | inform the collector of the transition. In order to support these needs, a |
| 351 | statepoint may be marked as a GC transition, and data that is necessary to |
| 352 | perform the transition (if any) may be provided as additional arguments to the |
| 353 | statepoint. |
| 354 | |
| 355 | Note that although in many cases statepoints may be inferred to be GC |
| 356 | transitions based on the function symbols involved (e.g. a call from a |
| 357 | function with GC strategy "foo" to a function with GC strategy "bar"), |
| 358 | indirect calls that are also GC transitions must also be supported. This |
Bruce Mitchener | e9ffb45 | 2015-09-12 01:17:08 +0000 | [diff] [blame] | 359 | requirement is the driving force behind the decision to require that GC |
Pat Gavlin | cc0431d | 2015-05-08 18:07:42 +0000 | [diff] [blame] | 360 | transitions are explicitly marked. |
| 361 | |
| 362 | Let's revisit the sample given above, this time treating the call to ``@foo`` |
| 363 | as a GC transition. Depending on our target, the transition code may need to |
| 364 | access some extra state in order to inform the collector of the transition. |
| 365 | Let's assume a hypothetical GC--somewhat unimaginatively named "hypothetical-gc" |
| 366 | --that requires that a TLS variable must be written to before and after a call |
| 367 | to unmanaged code. The resulting relocation sequence is: |
| 368 | |
Nuno Lopes | e02fcee | 2017-07-26 14:11:23 +0000 | [diff] [blame] | 369 | .. code-block:: llvm |
Pat Gavlin | cc0431d | 2015-05-08 18:07:42 +0000 | [diff] [blame] | 370 | |
| 371 | @flag = thread_local global i32 0, align 4 |
| 372 | |
| 373 | define i8 addrspace(1)* @test1(i8 addrspace(1) *%obj) |
| 374 | gc "hypothetical-gc" { |
| 375 | |
Chen Li | d71999e | 2015-12-26 07:54:32 +0000 | [diff] [blame] | 376 | %0 = call token (i64, i32, void ()*, i32, i32, ...)* @llvm.experimental.gc.statepoint.p0f_isVoidf(i64 0, i32 0, void ()* @foo, i32 0, i32 1, i32* @Flag, i32 0, i8 addrspace(1)* %obj) |
| 377 | %obj.relocated = call coldcc i8 addrspace(1)* @llvm.experimental.gc.relocate.p1i8(token %0, i32 7, i32 7) |
Pat Gavlin | cc0431d | 2015-05-08 18:07:42 +0000 | [diff] [blame] | 378 | ret i8 addrspace(1)* %obj.relocated |
| 379 | } |
| 380 | |
Kazu Hirata | e8fa901 | 2021-02-27 10:09:23 -0800 | [diff] [blame] | 381 | During lowering, this will result in an instruction selection DAG that looks |
Pat Gavlin | cc0431d | 2015-05-08 18:07:42 +0000 | [diff] [blame] | 382 | something like: |
| 383 | |
Pat Gavlin | 7afaed2 | 2015-05-08 18:37:49 +0000 | [diff] [blame] | 384 | :: |
Pat Gavlin | cc0431d | 2015-05-08 18:07:42 +0000 | [diff] [blame] | 385 | |
| 386 | CALLSEQ_START |
| 387 | ... |
| 388 | GC_TRANSITION_START (lowered i32 *@Flag), SRCVALUE i32* Flag |
| 389 | STATEPOINT |
| 390 | GC_TRANSITION_END (lowered i32 *@Flag), SRCVALUE i32 *Flag |
| 391 | ... |
| 392 | CALLSEQ_END |
| 393 | |
| 394 | In order to generate the necessary transition code, the backend for each target |
| 395 | supported by "hypothetical-gc" must be modified to lower ``GC_TRANSITION_START`` |
| 396 | and ``GC_TRANSITION_END`` nodes appropriately when the "hypothetical-gc" |
| 397 | strategy is in use for a particular function. Assuming that such lowering has |
| 398 | been added for X86, the generated assembly would be: |
| 399 | |
| 400 | .. code-block:: gas |
| 401 | |
| 402 | .globl test1 |
| 403 | .align 16, 0x90 |
| 404 | pushq %rax |
| 405 | movl $1, %fs:Flag@TPOFF |
| 406 | callq foo |
| 407 | movl $0, %fs:Flag@TPOFF |
| 408 | .Ltmp1: |
| 409 | movq (%rsp), %rax # This load is redundant (oops!) |
| 410 | popq %rdx |
| 411 | retq |
| 412 | |
| 413 | Note that the design as presented above is not fully implemented: in particular, |
| 414 | strategy-specific lowering is not present, and all GC transitions are emitted as |
| 415 | as single no-op before and after the call instruction. These no-ops are often |
| 416 | removed by the backend during dead machine instruction elimination. |
Philip Reames | 5017ab5 | 2015-02-26 01:18:21 +0000 | [diff] [blame] | 417 | |
Yevgeny Rouban | 4d26f41 | 2021-05-27 09:01:55 +0700 | [diff] [blame] | 418 | Before the abstract machine model is lowered to the explicit statepoint model |
| 419 | of relocations by the :ref:`RewriteStatepointsForGC` pass it is possible for |
| 420 | any derived pointer to get its base pointer and offset from the base pointer |
| 421 | by using the ``gc.get.pointer.base`` and the ``gc.get.pointer.offset`` |
| 422 | intrinsics respectively. These intrinsics are inlined by the |
| 423 | :ref:`RewriteStatepointsForGC` pass and must not be used after this pass. |
| 424 | |
Philip Reames | 5017ab5 | 2015-02-26 01:18:21 +0000 | [diff] [blame] | 425 | |
Philip Reames | e662550 | 2015-02-25 23:22:43 +0000 | [diff] [blame] | 426 | .. _statepoint-stackmap-format: |
Philip Reames | f612322 | 2014-12-02 19:37:00 +0000 | [diff] [blame] | 427 | |
Philip Reames | ce5ff37 | 2014-12-04 00:45:23 +0000 | [diff] [blame] | 428 | Stack Map Format |
Philip Reames | f612322 | 2014-12-02 19:37:00 +0000 | [diff] [blame] | 429 | ================ |
| 430 | |
Philip Reames | dfc238b | 2015-01-02 19:46:49 +0000 | [diff] [blame] | 431 | Locations for each pointer value which may need read and/or updated by |
Philip Reames | 4c45561 | 2018-11-08 15:17:10 +0000 | [diff] [blame] | 432 | the runtime or collector are provided in a separate section of the |
Philip Reames | 9ffd5eb | 2018-11-08 17:20:35 +0000 | [diff] [blame] | 433 | generated object file as specified in the PatchPoint documentation. |
| 434 | This special section is encoded per the |
Philip Reames | 4c45561 | 2018-11-08 15:17:10 +0000 | [diff] [blame] | 435 | :ref:`Stack Map format <stackmap-format>`. |
| 436 | |
| 437 | The general expectation is that a JIT compiler will parse and discard this |
| 438 | format; it is not particularly memory efficient. If you need an alternate |
| 439 | format (e.g. for an ahead of time compiler), see discussion under |
| 440 | :ref: `open work items <OpenWork>` below. |
Philip Reames | f612322 | 2014-12-02 19:37:00 +0000 | [diff] [blame] | 441 | |
| 442 | Each statepoint generates the following Locations: |
| 443 | |
Pat Gavlin | c7dc6d6ee | 2015-05-12 19:50:19 +0000 | [diff] [blame] | 444 | * Constant which describes the calling convention of the call target. This |
| 445 | constant is a valid :ref:`calling convention identifier <callingconv>` for |
| 446 | the version of LLVM used to generate the stackmap. No additional compatibility |
| 447 | guarantees are made for this constant over what LLVM provides elsewhere w.r.t. |
| 448 | these identifiers. |
| 449 | * Constant which describes the flags passed to the statepoint intrinsic |
Philip Reames | dfc238b | 2015-01-02 19:46:49 +0000 | [diff] [blame] | 450 | * Constant which describes number of following deopt *Locations* (not |
Philip Reames | a96fc46 | 2020-08-14 16:06:19 -0700 | [diff] [blame] | 451 | operands). Will be 0 if no "deopt" bundle is provided. |
| 452 | * Variable number of Locations, one for each deopt parameter listed in the |
| 453 | "deopt" operand bundle. At the moment, only deopt parameters with a bitwidth |
| 454 | of 64 bits or less are supported. Values of a type larger than 64 bits can be |
| 455 | specified and reported only if a) the value is constant at the call site, and |
Shao-Ce SUN | 0c66025 | 2021-11-15 09:17:08 +0800 | [diff] [blame] | 456 | b) the constant can be represented with less than 64 bits (assuming zero |
Philip Reames | 95e363d | 2016-01-14 23:58:18 +0000 | [diff] [blame] | 457 | extension to the original bitwidth). |
Shao-Ce SUN | 0c66025 | 2021-11-15 09:17:08 +0800 | [diff] [blame] | 458 | * Variable number of relocation records, each of which consists of |
Philip Reames | 35bafee | 2016-01-15 00:13:39 +0000 | [diff] [blame] | 459 | exactly two Locations. Relocation records are described in detail |
| 460 | below. |
| 461 | |
Shao-Ce SUN | 0c66025 | 2021-11-15 09:17:08 +0800 | [diff] [blame] | 462 | Each relocation record provides sufficient information for a collector to |
| 463 | relocate one or more derived pointers. Each record consists of a pair of |
| 464 | Locations. The second element in the record represents the pointer (or |
| 465 | pointers) which need updated. The first element in the record provides a |
Philip Reames | 35bafee | 2016-01-15 00:13:39 +0000 | [diff] [blame] | 466 | pointer to the base of the object with which the pointer(s) being relocated is |
Shao-Ce SUN | 0c66025 | 2021-11-15 09:17:08 +0800 | [diff] [blame] | 467 | associated. This information is required for handling generalized derived |
Philip Reames | 35bafee | 2016-01-15 00:13:39 +0000 | [diff] [blame] | 468 | pointers since a pointer may be outside the bounds of the original allocation, |
| 469 | but still needs to be relocated with the allocation. Additionally: |
| 470 | |
Shao-Ce SUN | 0c66025 | 2021-11-15 09:17:08 +0800 | [diff] [blame] | 471 | * It is guaranteed that the base pointer must also appear explicitly as a |
| 472 | relocation pair if used after the statepoint. |
Philip Reames | 35bafee | 2016-01-15 00:13:39 +0000 | [diff] [blame] | 473 | * There may be fewer relocation records then gc parameters in the IR |
Philip Reames | dfc238b | 2015-01-02 19:46:49 +0000 | [diff] [blame] | 474 | statepoint. Each *unique* pair will occur at least once; duplicates |
Shao-Ce SUN | 0c66025 | 2021-11-15 09:17:08 +0800 | [diff] [blame] | 475 | are possible. |
| 476 | * The Locations within each record may either be of pointer size or a |
| 477 | multiple of pointer size. In the later case, the record must be |
| 478 | interpreted as describing a sequence of pointers and their corresponding |
Philip Reames | 35bafee | 2016-01-15 00:13:39 +0000 | [diff] [blame] | 479 | base pointers. If the Location is of size N x sizeof(pointer), then |
| 480 | there will be N records of one pointer each contained within the Location. |
| 481 | Both Locations in a pair can be assumed to be of the same size. |
Philip Reames | f612322 | 2014-12-02 19:37:00 +0000 | [diff] [blame] | 482 | |
Philip Reames | dfc238b | 2015-01-02 19:46:49 +0000 | [diff] [blame] | 483 | Note that the Locations used in each section may describe the same |
| 484 | physical location. e.g. A stack slot may appear as a deopt location, |
| 485 | a gc base pointer, and a gc derived pointer. |
Philip Reames | f612322 | 2014-12-02 19:37:00 +0000 | [diff] [blame] | 486 | |
Philip Reames | dfc238b | 2015-01-02 19:46:49 +0000 | [diff] [blame] | 487 | The LiveOut section of the StkMapRecord will be empty for a statepoint |
| 488 | record. |
Philip Reames | f612322 | 2014-12-02 19:37:00 +0000 | [diff] [blame] | 489 | |
| 490 | Safepoint Semantics & Verification |
| 491 | ================================== |
| 492 | |
Philip Reames | dfc238b | 2015-01-02 19:46:49 +0000 | [diff] [blame] | 493 | The fundamental correctness property for the compiled code's |
| 494 | correctness w.r.t. the garbage collector is a dynamic one. It must be |
Kazu Hirata | e8fa901 | 2021-02-27 10:09:23 -0800 | [diff] [blame] | 495 | the case that there is no dynamic trace such that an operation |
Philip Reames | dfc238b | 2015-01-02 19:46:49 +0000 | [diff] [blame] | 496 | involving a potentially relocated pointer is observably-after a |
| 497 | safepoint which could relocate it. 'observably-after' is this usage |
| 498 | means that an outside observer could observe this sequence of events |
| 499 | in a way which precludes the operation being performed before the |
| 500 | safepoint. |
Philip Reames | f612322 | 2014-12-02 19:37:00 +0000 | [diff] [blame] | 501 | |
Philip Reames | dfc238b | 2015-01-02 19:46:49 +0000 | [diff] [blame] | 502 | To understand why this 'observable-after' property is required, |
| 503 | consider a null comparison performed on the original copy of a |
| 504 | relocated pointer. Assuming that control flow follows the safepoint, |
| 505 | there is no way to observe externally whether the null comparison is |
| 506 | performed before or after the safepoint. (Remember, the original |
| 507 | Value is unmodified by the safepoint.) The compiler is free to make |
| 508 | either scheduling choice. |
Philip Reames | f612322 | 2014-12-02 19:37:00 +0000 | [diff] [blame] | 509 | |
Philip Reames | dfc238b | 2015-01-02 19:46:49 +0000 | [diff] [blame] | 510 | The actual correctness property implemented is slightly stronger than |
| 511 | this. We require that there be no *static path* on which a |
| 512 | potentially relocated pointer is 'observably-after' it may have been |
| 513 | relocated. This is slightly stronger than is strictly necessary (and |
| 514 | thus may disallow some otherwise valid programs), but greatly |
| 515 | simplifies reasoning about correctness of the compiled code. |
Philip Reames | f612322 | 2014-12-02 19:37:00 +0000 | [diff] [blame] | 516 | |
Philip Reames | dfc238b | 2015-01-02 19:46:49 +0000 | [diff] [blame] | 517 | By construction, this property will be upheld by the optimizer if |
| 518 | correctly established in the source IR. This is a key invariant of |
| 519 | the design. |
Philip Reames | f612322 | 2014-12-02 19:37:00 +0000 | [diff] [blame] | 520 | |
Philip Reames | dfc238b | 2015-01-02 19:46:49 +0000 | [diff] [blame] | 521 | The existing IR Verifier pass has been extended to check most of the |
| 522 | local restrictions on the intrinsics mentioned in their respective |
| 523 | documentation. The current implementation in LLVM does not check the |
| 524 | key relocation invariant, but this is ongoing work on developing such |
Tanya Lattner | 0d28f80 | 2015-08-05 03:51:17 +0000 | [diff] [blame] | 525 | a verifier. Please ask on llvm-dev if you're interested in |
Philip Reames | dfc238b | 2015-01-02 19:46:49 +0000 | [diff] [blame] | 526 | experimenting with the current version. |
Philip Reames | f612322 | 2014-12-02 19:37:00 +0000 | [diff] [blame] | 527 | |
Philip Reames | c88d732 | 2015-02-25 01:23:59 +0000 | [diff] [blame] | 528 | .. _statepoint-utilities: |
| 529 | |
| 530 | Utility Passes for Safepoint Insertion |
| 531 | ====================================== |
| 532 | |
| 533 | .. _RewriteStatepointsForGC: |
| 534 | |
| 535 | RewriteStatepointsForGC |
| 536 | ^^^^^^^^^^^^^^^^^^^^^^^^ |
| 537 | |
Philip Reames | 0d98ada | 2017-04-19 23:16:13 +0000 | [diff] [blame] | 538 | The pass RewriteStatepointsForGC transforms a function's IR to lower from the |
Shao-Ce SUN | 0c66025 | 2021-11-15 09:17:08 +0800 | [diff] [blame] | 539 | abstract machine model described above to the explicit statepoint model of |
Philip Reames | 0d98ada | 2017-04-19 23:16:13 +0000 | [diff] [blame] | 540 | relocations. To do this, it replaces all calls or invokes of functions which |
| 541 | might contain a safepoint poll with a ``gc.statepoint`` and associated full |
Shao-Ce SUN | 0c66025 | 2021-11-15 09:17:08 +0800 | [diff] [blame] | 542 | relocation sequence, including all required ``gc.relocates``. |
Philip Reames | 0d98ada | 2017-04-19 23:16:13 +0000 | [diff] [blame] | 543 | |
Campbell Suter | 7092dae | 2023-01-25 17:12:19 +0300 | [diff] [blame] | 544 | This pass only applies to GCStrategy instances where the ``UseRS4GC`` flag |
| 545 | is set. The two builtin GC strategies with this set are the |
| 546 | "statepoint-example" and "coreclr" strategies. |
Philip Reames | c88d732 | 2015-02-25 01:23:59 +0000 | [diff] [blame] | 547 | |
| 548 | As an example, given this code: |
| 549 | |
Nuno Lopes | e02fcee | 2017-07-26 14:11:23 +0000 | [diff] [blame] | 550 | .. code-block:: llvm |
Philip Reames | c88d732 | 2015-02-25 01:23:59 +0000 | [diff] [blame] | 551 | |
Philip Reames | f05145c | 2024-08-26 13:15:28 -0700 | [diff] [blame] | 552 | define ptr addrspace(1) @test1(ptr addrspace(1) %obj) |
Philip Reames | c88d732 | 2015-02-25 01:23:59 +0000 | [diff] [blame] | 553 | gc "statepoint-example" { |
Philip Reames | 0d98ada | 2017-04-19 23:16:13 +0000 | [diff] [blame] | 554 | call void @foo() |
Philip Reames | f05145c | 2024-08-26 13:15:28 -0700 | [diff] [blame] | 555 | ret ptr addrspace(1) %obj |
Philip Reames | c88d732 | 2015-02-25 01:23:59 +0000 | [diff] [blame] | 556 | } |
| 557 | |
| 558 | The pass would produce this IR: |
| 559 | |
Nuno Lopes | e02fcee | 2017-07-26 14:11:23 +0000 | [diff] [blame] | 560 | .. code-block:: llvm |
Philip Reames | c88d732 | 2015-02-25 01:23:59 +0000 | [diff] [blame] | 561 | |
Philip Reames | f05145c | 2024-08-26 13:15:28 -0700 | [diff] [blame] | 562 | define ptr addrspace(1) @test_rs4gc(ptr addrspace(1) %obj) gc "statepoint-example" { |
| 563 | %statepoint_token = call token (i64, i32, ptr, i32, i32, ...) @llvm.experimental.gc.statepoint.p0(i64 2882400000, i32 0, ptr elementtype(void ()) @foo, i32 0, i32 0, i32 0, i32 0) [ "gc-live"(ptr addrspace(1) %obj) ] |
| 564 | %obj.relocated = call coldcc ptr addrspace(1) @llvm.experimental.gc.relocate.p1(token %statepoint_token, i32 0, i32 0) ; (%obj, %obj) |
| 565 | ret ptr addrspace(1) %obj.relocated |
Philip Reames | c88d732 | 2015-02-25 01:23:59 +0000 | [diff] [blame] | 566 | } |
| 567 | |
| 568 | In the above examples, the addrspace(1) marker on the pointers is the mechanism |
| 569 | that the ``statepoint-example`` GC strategy uses to distinguish references from |
Campbell Suter | 7092dae | 2023-01-25 17:12:19 +0300 | [diff] [blame] | 570 | non references. This is controlled via GCStrategy::isGCManagedPointer. The |
| 571 | ``statepoint-example`` and ``coreclr`` strategies (the only two default |
| 572 | strategies that support statepoints) both use addrspace(1) to determine which |
| 573 | pointers are references, however custom strategies don't have to follow this |
| 574 | convention. |
Philip Reames | c88d732 | 2015-02-25 01:23:59 +0000 | [diff] [blame] | 575 | |
Shao-Ce SUN | 0c66025 | 2021-11-15 09:17:08 +0800 | [diff] [blame] | 576 | This pass can be used an utility function by a language frontend that doesn't |
| 577 | want to manually reason about liveness, base pointers, or relocation when |
| 578 | constructing IR. As currently implemented, RewriteStatepointsForGC must be |
Philip Reames | ca22b86 | 2015-08-26 23:13:35 +0000 | [diff] [blame] | 579 | run after SSA construction (i.e. mem2ref). |
Philip Reames | c88d732 | 2015-02-25 01:23:59 +0000 | [diff] [blame] | 580 | |
Philip Reames | ca22b86 | 2015-08-26 23:13:35 +0000 | [diff] [blame] | 581 | RewriteStatepointsForGC will ensure that appropriate base pointers are listed |
| 582 | for every relocation created. It will do so by duplicating code as needed to |
| 583 | propagate the base pointer associated with each pointer being relocated to |
Shao-Ce SUN | 0c66025 | 2021-11-15 09:17:08 +0800 | [diff] [blame] | 584 | the appropriate safepoints. The implementation assumes that the following |
| 585 | IR constructs produce base pointers: loads from the heap, addresses of global |
Philip Reames | ca22b86 | 2015-08-26 23:13:35 +0000 | [diff] [blame] | 586 | variables, function arguments, function return values. Constant pointers (such |
| 587 | as null) are also assumed to be base pointers. In practice, this constraint |
Shao-Ce SUN | 0c66025 | 2021-11-15 09:17:08 +0800 | [diff] [blame] | 588 | can be relaxed to producing interior derived pointers provided the target |
| 589 | collector can find the associated allocation from an arbitrary interior |
Philip Reames | ca22b86 | 2015-08-26 23:13:35 +0000 | [diff] [blame] | 590 | derived pointer. |
Philip Reames | c88d732 | 2015-02-25 01:23:59 +0000 | [diff] [blame] | 591 | |
Philip Reames | 0d98ada | 2017-04-19 23:16:13 +0000 | [diff] [blame] | 592 | By default RewriteStatepointsForGC passes in ``0xABCDEF00`` as the statepoint |
| 593 | ID and ``0`` as the number of patchable bytes to the newly constructed |
| 594 | ``gc.statepoint``. These values can be configured on a per-callsite |
| 595 | basis using the attributes ``"statepoint-id"`` and |
| 596 | ``"statepoint-num-patch-bytes"``. If a call site is marked with a |
| 597 | ``"statepoint-id"`` function attribute and its value is a positive |
| 598 | integer (represented as a string), then that value is used as the ID |
| 599 | of the newly constructed ``gc.statepoint``. If a call site is marked |
| 600 | with a ``"statepoint-num-patch-bytes"`` function attribute and its |
| 601 | value is a positive integer, then that value is used as the 'num patch |
| 602 | bytes' parameter of the newly constructed ``gc.statepoint``. The |
| 603 | ``"statepoint-id"`` and ``"statepoint-num-patch-bytes"`` attributes |
| 604 | are not propagated to the ``gc.statepoint`` call or invoke if they |
| 605 | could be successfully parsed. |
| 606 | |
Shao-Ce SUN | 0c66025 | 2021-11-15 09:17:08 +0800 | [diff] [blame] | 607 | In practice, RewriteStatepointsForGC should be run much later in the pass |
| 608 | pipeline, after most optimization is already done. This helps to improve |
Philip Reames | c88d732 | 2015-02-25 01:23:59 +0000 | [diff] [blame] | 609 | the quality of the generated code when compiled with garbage collection support. |
Philip Reames | c88d732 | 2015-02-25 01:23:59 +0000 | [diff] [blame] | 610 | |
Artur Pilipenko | 6ec2c5e | 2020-10-01 20:05:23 -0700 | [diff] [blame] | 611 | .. _RewriteStatepointsForGC_intrinsic_lowering: |
| 612 | |
| 613 | RewriteStatepointsForGC intrinsic lowering |
| 614 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
| 615 | |
| 616 | As a part of lowering to the explicit model of relocations |
Yevgeny Rouban | 4d26f41 | 2021-05-27 09:01:55 +0700 | [diff] [blame] | 617 | RewriteStatepointsForGC performs GC specific lowering for the following |
| 618 | intrinsics: |
Artur Pilipenko | 6ec2c5e | 2020-10-01 20:05:23 -0700 | [diff] [blame] | 619 | |
Yevgeny Rouban | 4d26f41 | 2021-05-27 09:01:55 +0700 | [diff] [blame] | 620 | * ``gc.get.pointer.base`` |
| 621 | * ``gc.get.pointer.offset`` |
| 622 | * ``llvm.memcpy.element.unordered.atomic.*`` |
| 623 | * ``llvm.memmove.element.unordered.atomic.*`` |
| 624 | |
| 625 | There are two possible lowerings for the memcpy and memmove operations: |
| 626 | GC leaf lowering and GC parseable lowering. If a call is explicitly marked with |
Artur Pilipenko | 6ec2c5e | 2020-10-01 20:05:23 -0700 | [diff] [blame] | 627 | "gc-leaf-function" attribute the call is lowered to a GC leaf call to |
| 628 | '``__llvm_memcpy_element_unordered_atomic_*``' or |
| 629 | '``__llvm_memmove_element_unordered_atomic_*``' symbol. Such a call can not |
| 630 | take a safepoint. Otherwise, the call is made GC parseable by wrapping the |
| 631 | call into a statepoint. This makes it possible to take a safepoint during |
| 632 | copy operation. Note that a GC parseable copy operation is not required to |
| 633 | take a safepoint. For example, a short copy operation may be performed without |
| 634 | taking a safepoint. |
| 635 | |
| 636 | GC parseable calls to '``llvm.memcpy.element.unordered.atomic.*``', |
| 637 | '``llvm.memmove.element.unordered.atomic.*``' intrinsics are lowered to calls |
| 638 | to '``__llvm_memcpy_element_unordered_atomic_safepoint_*``', |
| 639 | '``__llvm_memmove_element_unordered_atomic_safepoint_*``' symbols respectively. |
| 640 | This way the runtime can provide implementations of copy operations with and |
| 641 | without safepoints. |
| 642 | |
| 643 | GC parseable lowering also involves adjusting the arguments for the call. |
| 644 | Memcpy and memmove intrinsics take derived pointers as source and destination |
| 645 | arguments. If a copy operation takes a safepoint it might need to relocate the |
| 646 | underlying source and destination objects. This requires the corresponding base |
| 647 | pointers to be available in the copy operation. In order to make the base |
| 648 | pointers available RewriteStatepointsForGC replaces derived pointers with base |
| 649 | pointer and offset pairs. For example: |
| 650 | |
| 651 | .. code-block:: llvm |
| 652 | |
| 653 | declare void @__llvm_memcpy_element_unordered_atomic_safepoint_1( |
| 654 | i8 addrspace(1)* %dest_base, i64 %dest_offset, |
| 655 | i8 addrspace(1)* %src_base, i64 %src_offset, |
| 656 | i64 %length) |
| 657 | |
| 658 | |
Philip Reames | fe755af | 2022-10-13 07:08:35 -0700 | [diff] [blame] | 659 | .. _PlaceSafepoints: |
| 660 | |
| 661 | PlaceSafepoints |
| 662 | ^^^^^^^^^^^^^^^^ |
| 663 | |
| 664 | The pass PlaceSafepoints inserts safepoint polls sufficient to ensure running |
| 665 | code checks for a safepoint request on a timely manner. This pass is expected |
| 666 | to be run before RewriteStatepointsForGC and thus does not produce full |
| 667 | relocation sequences. |
| 668 | |
| 669 | As an example, given input IR of the following: |
| 670 | |
| 671 | .. code-block:: llvm |
| 672 | |
| 673 | define void @test() gc "statepoint-example" { |
| 674 | call void @foo() |
| 675 | ret void |
| 676 | } |
| 677 | |
| 678 | declare void @do_safepoint() |
| 679 | define void @gc.safepoint_poll() { |
| 680 | call void @do_safepoint() |
| 681 | ret void |
| 682 | } |
| 683 | |
| 684 | |
| 685 | This pass would produce the following IR: |
| 686 | |
| 687 | .. code-block:: llvm |
| 688 | |
| 689 | define void @test() gc "statepoint-example" { |
| 690 | call void @do_safepoint() |
| 691 | call void @foo() |
| 692 | ret void |
| 693 | } |
| 694 | |
| 695 | In this case, we've added an (unconditional) entry safepoint poll. Note that |
| 696 | despite appearances, the entry poll is not necessarily redundant. We'd have to |
| 697 | know that ``foo`` and ``test`` were not mutually recursive for the poll to be |
| 698 | redundant. In practice, you'd probably want to your poll definition to contain |
| 699 | a conditional branch of some form. |
| 700 | |
| 701 | At the moment, PlaceSafepoints can insert safepoint polls at method entry and |
| 702 | loop backedges locations. Extending this to work with return polls would be |
| 703 | straight forward if desired. |
| 704 | |
| 705 | PlaceSafepoints includes a number of optimizations to avoid placing safepoint |
| 706 | polls at particular sites unless needed to ensure timely execution of a poll |
| 707 | under normal conditions. PlaceSafepoints does not attempt to ensure timely |
| 708 | execution of a poll under worst case conditions such as heavy system paging. |
| 709 | |
| 710 | The implementation of a safepoint poll action is specified by looking up a |
| 711 | function of the name ``gc.safepoint_poll`` in the containing Module. The body |
| 712 | of this function is inserted at each poll site desired. While calls or invokes |
| 713 | inside this method are transformed to a ``gc.statepoints``, recursive poll |
| 714 | insertion is not performed. |
| 715 | |
| 716 | This pass is useful for any language frontend which only has to support |
| 717 | garbage collection semantics at safepoints. If you need other abstract |
| 718 | frame information at safepoints (e.g. for deoptimization or introspection), |
| 719 | you can insert safepoint polls in the frontend. If you have the later case, |
| 720 | please ask on llvm-dev for suggestions. There's been a good amount of work |
| 721 | done on making such a scheme work well in practice which is not yet documented |
| 722 | here. |
| 723 | |
| 724 | |
Philip Reames | b773631 | 2015-07-16 21:10:46 +0000 | [diff] [blame] | 725 | Supported Architectures |
| 726 | ======================= |
| 727 | |
| 728 | Support for statepoint generation requires some code for each backend. |
Markus Böck | e101eb5 | 2022-02-13 11:14:42 +0100 | [diff] [blame] | 729 | Today, only Aarch64 and X86_64 are supported. |
Philip Reames | 4c45561 | 2018-11-08 15:17:10 +0000 | [diff] [blame] | 730 | |
| 731 | .. _OpenWork: |
Philip Reames | b773631 | 2015-07-16 21:10:46 +0000 | [diff] [blame] | 732 | |
Philip Reames | 5032081 | 2018-11-09 17:09:16 +0000 | [diff] [blame] | 733 | Limitations and Half Baked Ideas |
| 734 | ================================ |
Philip Reames | 2e7383c | 2016-03-03 23:24:44 +0000 | [diff] [blame] | 735 | |
Philip Reames | 5032081 | 2018-11-09 17:09:16 +0000 | [diff] [blame] | 736 | Mixing References and Raw Pointers |
| 737 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
Philip Reames | 2e7383c | 2016-03-03 23:24:44 +0000 | [diff] [blame] | 738 | |
Philip Reames | 5032081 | 2018-11-09 17:09:16 +0000 | [diff] [blame] | 739 | Support for languages which allow unmanaged pointers to garbage collected |
| 740 | objects (i.e. pass a pointer to an object to a C routine) in the abstract |
| 741 | machine model. At the moment, the best idea on how to approach this |
| 742 | involves an intrinsic or opaque function which hides the connection between |
| 743 | the reference value and the raw pointer. The problem is that having a |
| 744 | ptrtoint or inttoptr cast (which is common for such use cases) breaks the |
| 745 | rules used for inferring base pointers for arbitrary references when |
| 746 | lowering out of the abstract model to the explicit physical model. Note |
| 747 | that a frontend which lowers directly to the physical model doesn't have |
| 748 | any problems here. |
Philip Reames | 2e7383c | 2016-03-03 23:24:44 +0000 | [diff] [blame] | 749 | |
Philip Reames | 5032081 | 2018-11-09 17:09:16 +0000 | [diff] [blame] | 750 | Objects on the Stack |
| 751 | ^^^^^^^^^^^^^^^^^^^^ |
Philip Reames | 2e7383c | 2016-03-03 23:24:44 +0000 | [diff] [blame] | 752 | |
Philip Reames | 5032081 | 2018-11-09 17:09:16 +0000 | [diff] [blame] | 753 | As noted above, the explicit lowering supports objects allocated on the |
| 754 | stack provided the collector can find a heap map given the stack address. |
Philip Reames | 2e7383c | 2016-03-03 23:24:44 +0000 | [diff] [blame] | 755 | |
Philip Reames | 5032081 | 2018-11-09 17:09:16 +0000 | [diff] [blame] | 756 | The missing pieces are a) integration with rewriting (RS4GC) from the |
| 757 | abstract machine model and b) support for optionally decomposing on stack |
| 758 | objects so as not to require heap maps for them. The later is required |
Shao-Ce SUN | 0c66025 | 2021-11-15 09:17:08 +0800 | [diff] [blame] | 759 | for ease of integration with some collectors. |
Philip Reames | 2e7383c | 2016-03-03 23:24:44 +0000 | [diff] [blame] | 760 | |
Philip Reames | 5032081 | 2018-11-09 17:09:16 +0000 | [diff] [blame] | 761 | Lowering Quality and Representation Overhead |
| 762 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
| 763 | |
| 764 | The current statepoint lowering is known to be somewhat poor. In the very |
| 765 | long term, we'd like to integrate statepoints with the register allocator; |
| 766 | in the near term this is unlikely to happen. We've found the quality of |
| 767 | lowering to be relatively unimportant as hot-statepoints are almost always |
| 768 | inliner bugs. |
| 769 | |
| 770 | Concerns have been raised that the statepoint representation results in a |
| 771 | large amount of IR being produced for some examples and that this |
| 772 | contributes to higher than expected memory usage and compile times. There's |
| 773 | no immediate plans to make changes due to this, but alternate models may be |
| 774 | explored in the future. |
| 775 | |
| 776 | Relocations Along Exceptional Edges |
| 777 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
| 778 | |
| 779 | Relocations along exceptional paths are currently broken in ToT. In |
| 780 | particular, there is current no way to represent a rethrow on a path which |
| 781 | also has relocations. See `this llvm-dev discussion |
| 782 | <https://groups.google.com/forum/#!topic/llvm-dev/AE417XjgxvI>`_ for more |
| 783 | detail. |
| 784 | |
Philip Reames | 8333152 | 2014-12-04 18:33:28 +0000 | [diff] [blame] | 785 | Bugs and Enhancements |
| 786 | ===================== |
Philip Reames | dfc238b | 2015-01-02 19:46:49 +0000 | [diff] [blame] | 787 | |
| 788 | Currently known bugs and enhancements under consideration can be |
| 789 | tracked by performing a `bugzilla search |
Ismail Donmez | c7ff814 | 2017-02-17 08:26:11 +0000 | [diff] [blame] | 790 | <https://bugs.llvm.org/buglist.cgi?cmdtype=runnamed&namedcmd=Statepoint%20Bugs&list_id=64342>`_ |
Philip Reames | dfc238b | 2015-01-02 19:46:49 +0000 | [diff] [blame] | 791 | for [Statepoint] in the summary field. When filing new bugs, please |
| 792 | use this tag so that interested parties see the newly filed bug. As |
Tanya Lattner | b4990ac | 2022-02-21 18:58:48 -0800 | [diff] [blame] | 793 | with most LLVM features, design discussions take place on the `Discourse forums <https://discourse.llvm.org>`_ and patches |
Philip Reames | dfc238b | 2015-01-02 19:46:49 +0000 | [diff] [blame] | 794 | should be sent to `llvm-commits |
Tanya Lattner | 0d28f80 | 2015-08-05 03:51:17 +0000 | [diff] [blame] | 795 | <http://lists.llvm.org/mailman/listinfo/llvm-commits>`_ for review. |