docs/StackMaps.rst - llvm - Git at Google

 ===================================
 Stack maps and patch points in LLVM
 ===================================

 .. contents::
    :local:
    :depth: 2

 Definitions
 ===========

 In this document we refer to the "runtime" collectively as all
 components that serve as the LLVM client, including the LLVM IR
 generator, object code consumer, and code patcher.

 A stack map records the location of ``live values`` at a particular
 instruction address. These ``live values`` do not refer to all the
 LLVM values live across the stack map. Instead, they are only the
 values that the runtime requires to be live at this point. For
 example, they may be the values the runtime will need to resume
 program execution at that point independent of the compiled function
 containing the stack map.

 LLVM emits stack map data into the object code within a designated
 :ref:`stackmap-section`. This stack map data contains a record for
 each stack map. The record stores the stack map's instruction address
 and contains a entry for each mapped value. Each entry encodes a
 value's location as a register, stack offset, or constant.

 A patch point is an instruction address at which space is reserved for
 patching a new instruction sequence at run time. Patch points look
 much like calls to LLVM. They take arguments that follow a calling
 convention and may return a value. They also imply stack map
 generation, which allows the runtime to locate the patchpoint and
 find the location of ``live values`` at that point.

 Motivation
 ==========

 This functionality is currently experimental but is potentially useful
 in a variety of settings, the most obvious being a runtime (JIT)
 compiler. Example applications of the patchpoint intrinsics are
 implementing an inline call cache for polymorphic method dispatch or
 optimizing the retrieval of properties in dynamically typed languages
 such as JavaScript.

 The intrinsics documented here are currently used by the JavaScript
 compiler within the open source WebKit project, see the `FTL JIT
 <https://trac.webkit.org/wiki/FTLJIT>`_, but they are designed to be
 used whenever stack maps or code patching are needed. Because the
 intrinsics have experimental status, compatibility across LLVM
 releases is not guaranteed.

 The stack map functionality described in this document is separate
 from the functionality described in
 :ref:`stack-map`. `GCFunctionMetadata` provides the location of
 pointers into a collected heap captured by the `GCRoot` intrinsic,
 which can also be considered a "stack map". Unlike the stack maps
 defined above, the `GCFunctionMetadata` stack map interface does not
 provide a way to associate live register values of arbitrary type with
 an instruction address, nor does it specify a format for the resulting
 stack map. The stack maps described here could potentially provide
 richer information to a garbage collecting runtime, but that usage
 will not be discussed in this document.

 Intrinsics
 ==========

 The following two kinds of intrinsics can be used to implement stack
 maps and patch points: ``llvm.experimental.stackmap`` and
 ``llvm.experimental.patchpoint``. Both kinds of intrinsics generate a
 stack map record, and they both allow some form of code patching. They
 can be used independently (i.e. ``llvm.experimental.patchpoint``
 implicitly generates a stack map without the need for an additional
 call to ``llvm.experimental.stackmap``). The choice of which to use
 depends on whether it is necessary to reserve space for code patching
 and whether any of the intrinsic arguments should be lowered according
 to calling conventions. ``llvm.experimental.stackmap`` does not
 reserve any space, nor does it expect any call arguments. If the
 runtime patches code at the stack map's address, it will destructively
 overwrite the program text. This is unlike
 ``llvm.experimental.patchpoint``, which reserves space for in-place
 patching without overwriting surrounding code. The
 ``llvm.experimental.patchpoint`` intrinsic also lowers a specified
 number of arguments according to its calling convention. This allows
 patched code to make in-place function calls without marshaling.

 Each instance of one of these intrinsics generates a stack map record
 in the :ref:`stackmap-section`. The record includes an ID, allowing
 the runtime to uniquely identify the stack map, and the offset within
 the code from the beginning of the enclosing function.

 '``llvm.experimental.stackmap``' Intrinsic
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

 Syntax:
 """""""

 ::

       declare void
         @llvm.experimental.stackmap(i64 <id>, i32 <numShadowBytes>, ...)

 Overview:
 """""""""

 The '``llvm.experimental.stackmap``' intrinsic records the location of
 specified values in the stack map without generating any code.

 Operands:
 """""""""

 The first operand is an ID to be encoded within the stack map. The
 second operand is the number of shadow bytes following the
 intrinsic. The variable number of operands that follow are the ``live
 values`` for which locations will be recorded in the stack map.

 To use this intrinsic as a bare-bones stack map, with no code patching
 support, the number of shadow bytes can be set to zero.

 Semantics:
 """"""""""

 The stack map intrinsic generates no code in place, unless nops are
 needed to cover its shadow (see below). However, its offset from
 function entry is stored in the stack map. This is the relative
 instruction address immediately following the instructions that
 precede the stack map.

 The stack map ID allows a runtime to locate the desired stack map
 record. LLVM passes this ID through directly to the stack map
 record without checking uniqueness.

 LLVM guarantees a shadow of instructions following the stack map's
 instruction offset during which neither the end of the basic block nor
 another call to ``llvm.experimental.stackmap`` or
 ``llvm.experimental.patchpoint`` may occur. This allows the runtime to
 patch the code at this point in response to an event triggered from
 outside the code. The code for instructions following the stack map
 may be emitted in the stack map's shadow, and these instructions may
 be overwritten by destructive patching. Without shadow bytes, this
 destructive patching could overwrite program text or data outside the
 current function. We disallow overlapping stack map shadows so that
 the runtime does not need to consider this corner case.

 For example, a stack map with 8 byte shadow:

 .. code-block:: llvm

   call void @runtime()
   call void (i64, i32, ...)* @llvm.experimental.stackmap(i64 77, i32 8,
                                                          i64* %ptr)
   %val = load i64* %ptr
   %add = add i64 %val, 3
   ret i64 %add

 May require one byte of nop-padding:

 .. code-block:: none

   0x00 callq _runtime
   0x05 nop                <--- stack map address
   0x06 movq (%rdi), %rax
   0x07 addq $3, %rax
   0x0a popq %rdx
   0x0b ret                <---- end of 8-byte shadow

 Now, if the runtime needs to invalidate the compiled code, it may
 patch 8 bytes of code at the stack map's address at follows:

 .. code-block:: none

   0x00 callq _runtime
   0x05 movl  $0xffff, %rax <--- patched code at stack map address
   0x0a callq *%rax         <---- end of 8-byte shadow

 This way, after the normal call to the runtime returns, the code will
 execute a patched call to a special entry point that can rebuild a
 stack frame from the values located by the stack map.

 '``llvm.experimental.patchpoint.*``' Intrinsic
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

 Syntax:
 """""""

 ::

       declare void
         @llvm.experimental.patchpoint.void(i64 <id>, i32 <numBytes>,
                                            i8* <target>, i32 <numArgs>, ...)
       declare i64
         @llvm.experimental.patchpoint.i64(i64 <id>, i32 <numBytes>,
                                           i8* <target>, i32 <numArgs>, ...)

 Overview:
 """""""""

 The '``llvm.experimental.patchpoint.*``' intrinsics creates a function
 call to the specified ``<target>`` and records the location of specified
 values in the stack map.

 Operands:
 """""""""

 The first operand is an ID, the second operand is the number of bytes
 reserved for the patchable region, the third operand is the target
 address of a function (optionally null), and the fourth operand
 specifies how many of the following variable operands are considered
 function call arguments. The remaining variable number of operands are
 the ``live values`` for which locations will be recorded in the stack
 map.

 Semantics:
 """"""""""

 The patch point intrinsic generates a stack map. It also emits a
 function call to the address specified by ``<target>`` if the address
 is not a constant null. The function call and its arguments are
 lowered according to the calling convention specified at the
 intrinsic's callsite. Variants of the intrinsic with non-void return
 type also return a value according to calling convention.

 On PowerPC, note that ``<target>`` must be the ABI function pointer for the
 intended target of the indirect call. Specifically, when compiling for the
 ELF V1 ABI, ``<target>`` is the function-descriptor address normally used as
 the C/C++ function-pointer representation.

 Requesting zero patch point arguments is valid. In this case, all
 variable operands are handled just like
 ``llvm.experimental.stackmap.*``. The difference is that space will
 still be reserved for patching, a call will be emitted, and a return
 value is allowed.

 The location of the arguments are not normally recorded in the stack
 map because they are already fixed by the calling convention. The
 remaining ``live values`` will have their location recorded, which
 could be a register, stack location, or constant. A special calling
 convention has been introduced for use with stack maps, anyregcc,
 which forces the arguments to be loaded into registers but allows
 those register to be dynamically allocated. These argument registers
 will have their register locations recorded in the stack map in
 addition to the remaining ``live values``.

 The patch point also emits nops to cover at least ``<numBytes>`` of
 instruction encoding space. Hence, the client must ensure that
 ``<numBytes>`` is enough to encode a call to the target address on the
 supported targets. If the call target is constant null, then there is
 no minimum requirement. A zero-byte null target patchpoint is
 valid.

 The runtime may patch the code emitted for the patch point, including
 the call sequence and nops. However, the runtime may not assume
 anything about the code LLVM emits within the reserved space. Partial
 patching is not allowed. The runtime must patch all reserved bytes,
 padding with nops if necessary.

 This example shows a patch point reserving 15 bytes, with one argument
 in $rdi, and a return value in $rax per native calling convention:

 .. code-block:: llvm

   %target = inttoptr i64 -281474976710654 to i8*
   %val = call i64 (i64, i32, ...)*
            @llvm.experimental.patchpoint.i64(i64 78, i32 15,
                                              i8* %target, i32 1, i64* %ptr)
   %add = add i64 %val, 3
   ret i64 %add

 May generate:

 .. code-block:: none

   0x00 movabsq $0xffff000000000002, %r11 <--- patch point address
   0x0a callq   *%r11
   0x0d nop
   0x0e nop                               <--- end of reserved 15-bytes
   0x0f addq    $0x3, %rax
   0x10 movl    %rax, 8(%rsp)

 Note that no stack map locations will be recorded. If the patched code
 sequence does not need arguments fixed to specific calling convention
 registers, then the ``anyregcc`` convention may be used:

 .. code-block:: none

   %val = call anyregcc @llvm.experimental.patchpoint(i64 78, i32 15,
                                                      i8* %target, i32 1,
                                                      i64* %ptr)

 The stack map now indicates the location of the %ptr argument and
 return value:

 .. code-block:: none

   Stack Map: ID=78, Loc0=%r9 Loc1=%r8

 The patch code sequence may now use the argument that happened to be
 allocated in %r8 and return a value allocated in %r9:

 .. code-block:: none

   0x00 movslq 4(%r8) %r9              <--- patched code at patch point address
   0x03 nop
   ...
   0x0e nop                            <--- end of reserved 15-bytes
   0x0f addq    $0x3, %r9
   0x10 movl    %r9, 8(%rsp)

 .. _stackmap-format:

 Stack Map Format
 ================

 The existence of a stack map or patch point intrinsic within an LLVM
 Module forces code emission to create a :ref:`stackmap-section`. The
 format of this section follows:

 .. code-block:: none

   Header {
     uint8  : Stack Map Version (current version is 3)
     uint8  : Reserved (expected to be 0)
     uint16 : Reserved (expected to be 0)
   }
   uint32 : NumFunctions
   uint32 : NumConstants
   uint32 : NumRecords
   StkSizeRecord[NumFunctions] {
     uint64 : Function Address
     uint64 : Stack Size
     uint64 : Record Count
   }
   Constants[NumConstants] {
     uint64 : LargeConstant
   }
   StkMapRecord[NumRecords] {
     uint64 : PatchPoint ID
     uint32 : Instruction Offset
     uint16 : Reserved (record flags)
     uint16 : NumLocations
     Location[NumLocations] {
       uint8  : Register | Direct | Indirect | Constant | ConstantIndex
       uint8  : Reserved (expected to be 0)
       uint16 : Location Size
       uint16 : Dwarf RegNum
       uint16 : Reserved (expected to be 0)
       int32  : Offset or SmallConstant
     }
     uint32 : Padding (only if required to align to 8 byte)
     uint16 : Padding
     uint16 : NumLiveOuts
     LiveOuts[NumLiveOuts]
       uint16 : Dwarf RegNum
       uint8  : Reserved
       uint8  : Size in Bytes
     }
     uint32 : Padding (only if required to align to 8 byte)
   }

 The first byte of each location encodes a type that indicates how to
 interpret the ``RegNum`` and ``Offset`` fields as follows:

 ======== ========== =================== ===========================
 Encoding Type       Value               Description
 -------- ---------- ------------------- ---------------------------
 0x1      Register   Reg                 Value in a register
 0x2      Direct     Reg + Offset        Frame index value
 0x3      Indirect   [Reg + Offset]      Spilled value
 0x4      Constant   Offset              Small constant
 0x5      ConstIndex Constants[Offset]   Large constant
 ======== ========== =================== ===========================

 In the common case, a value is available in a register, and the
 ``Offset`` field will be zero. Values spilled to the stack are encoded
 as ``Indirect`` locations. The runtime must load those values from a
 stack address, typically in the form ``[BP + Offset]``. If an
 ``alloca`` value is passed directly to a stack map intrinsic, then
 LLVM may fold the frame index into the stack map as an optimization to
 avoid allocating a register or stack slot. These frame indices will be
 encoded as ``Direct`` locations in the form ``BP + Offset``. LLVM may
 also optimize constants by emitting them directly in the stack map,
 either in the ``Offset`` of a ``Constant`` location or in the constant
 pool, referred to by ``ConstantIndex`` locations.

 At each callsite, a "liveout" register list is also recorded. These
 are the registers that are live across the stackmap and therefore must
 be saved by the runtime. This is an important optimization when the
 patchpoint intrinsic is used with a calling convention that by default
 preserves most registers as callee-save.

 Each entry in the liveout register list contains a DWARF register
 number and size in bytes. The stackmap format deliberately omits
 specific subregister information. Instead the runtime must interpret
 this information conservatively. For example, if the stackmap reports
 one byte at ``%rax``, then the value may be in either ``%al`` or
 ``%ah``. It doesn't matter in practice, because the runtime will
 simply save ``%rax``. However, if the stackmap reports 16 bytes at
 ``%ymm0``, then the runtime can safely optimize by saving only
 ``%xmm0``.

 The stack map format is a contract between an LLVM SVN revision and
 the runtime. It is currently experimental and may change in the short
 term, but minimizing the need to update the runtime is
 important. Consequently, the stack map design is motivated by
 simplicity and extensibility. Compactness of the representation is
 secondary because the runtime is expected to parse the data
 immediately after compiling a module and encode the information in its
 own format. Since the runtime controls the allocation of sections, it
 can reuse the same stack map space for multiple modules.

 Stackmap support is currently only implemented for 64-bit
 platforms. However, a 32-bit implementation should be able to use the
 same format with an insignificant amount of wasted space.

 .. _stackmap-section:

 Stack Map Section
 ^^^^^^^^^^^^^^^^^

 A JIT compiler can easily access this section by providing its own
 memory manager via the LLVM C API
 ``LLVMCreateSimpleMCJITMemoryManager()``. When creating the memory
 manager, the JIT provides a callback:
 ``LLVMMemoryManagerAllocateDataSectionCallback()``. When LLVM creates
 this section, it invokes the callback and passes the section name. The
 JIT can record the in-memory address of the section at this time and
 later parse it to recover the stack map data.

 For MachO (e.g. on Darwin), the stack map section name is
 "__llvm_stackmaps". The segment name is "__LLVM_STACKMAPS".

 For ELF (e.g. on Linux), the stack map section name is
 ".llvm_stackmaps".  The segment name is "__LLVM_STACKMAPS".

 Stack Map Usage
 ===============

 The stack map support described in this document can be used to
 precisely determine the location of values at a specific position in
 the code. LLVM does not maintain any mapping between those values and
 any higher-level entity. The runtime must be able to interpret the
 stack map record given only the ID, offset, and the order of the
 locations, records, and functions, which LLVM preserves.

 Note that this is quite different from the goal of debug information,
 which is a best-effort attempt to track the location of named
 variables at every instruction.

 An important motivation for this design is to allow a runtime to
 commandeer a stack frame when execution reaches an instruction address
 associated with a stack map. The runtime must be able to rebuild a
 stack frame and resume program execution using the information
 provided by the stack map. For example, execution may resume in an
 interpreter or a recompiled version of the same function.

 This usage restricts LLVM optimization. Clearly, LLVM must not move
 stores across a stack map. However, loads must also be handled
 conservatively. If the load may trigger an exception, hoisting it
 above a stack map could be invalid. For example, the runtime may
 determine that a load is safe to execute without a type check given
 the current state of the type system. If the type system changes while
 some activation of the load's function exists on the stack, the load
 becomes unsafe. The runtime can prevent subsequent execution of that
 load by immediately patching any stack map location that lies between
 the current call site and the load (typically, the runtime would
 simply patch all stack map locations to invalidate the function). If
 the compiler had hoisted the load above the stack map, then the
 program could crash before the runtime could take back control.

 To enforce these semantics, stackmap and patchpoint intrinsics are
 considered to potentially read and write all memory. This may limit
 optimization more than some clients desire. This limitation may be
 avoided by marking the call site as "readonly". In the future we may
 also allow meta-data to be added to the intrinsic call to express
 aliasing, thereby allowing optimizations to hoist certain loads above
 stack maps.

 Direct Stack Map Entries
 ^^^^^^^^^^^^^^^^^^^^^^^^

 As shown in :ref:`stackmap-section`, a Direct stack map location
 records the address of frame index. This address is itself the value
 that the runtime requested. This differs from Indirect locations,
 which refer to a stack locations from which the requested values must
 be loaded. Direct locations can communicate the address if an alloca,
 while Indirect locations handle register spills.

 For example:

 .. code-block:: none

   entry:
     %a = alloca i64...
     llvm.experimental.stackmap(i64 <ID>, i32 <shadowBytes>, i64* %a)

 The runtime can determine this alloca's relative location on the
 stack immediately after compilation, or at any time thereafter. This
 differs from Register and Indirect locations, because the runtime can
 only read the values in those locations when execution reaches the
 instruction address of the stack map.

 This functionality requires LLVM to treat entry-block allocas
 specially when they are directly consumed by an intrinsics. (This is
 the same requirement imposed by the llvm.gcroot intrinsic.) LLVM
 transformations must not substitute the alloca with any intervening
 value. This can be verified by the runtime simply by checking that the
 stack map's location is a Direct location type.


 Supported Architectures
 =======================

 Support for StackMap generation and the related intrinsics requires
 some code for each backend.  Today, only a subset of LLVM's backends
 are supported.  The currently supported architectures are X86_64,
 PowerPC, and Aarch64.
	===================================
	Stack maps and patch points in LLVM
	===================================

	.. contents::
	:local:
	:depth: 2

	Definitions
	===========

	In this document we refer to the "runtime" collectively as all
	components that serve as the LLVM client, including the LLVM IR
	generator, object code consumer, and code patcher.

	A stack map records the location of ``live values`` at a particular
	instruction address. These ``live values`` do not refer to all the
	LLVM values live across the stack map. Instead, they are only the
	values that the runtime requires to be live at this point. For
	example, they may be the values the runtime will need to resume
	program execution at that point independent of the compiled function
	containing the stack map.

	LLVM emits stack map data into the object code within a designated
	:ref:`stackmap-section`. This stack map data contains a record for
	each stack map. The record stores the stack map's instruction address
	and contains a entry for each mapped value. Each entry encodes a
	value's location as a register, stack offset, or constant.

	A patch point is an instruction address at which space is reserved for
	patching a new instruction sequence at run time. Patch points look
	much like calls to LLVM. They take arguments that follow a calling
	convention and may return a value. They also imply stack map
	generation, which allows the runtime to locate the patchpoint and
	find the location of ``live values`` at that point.

	Motivation
	==========

	This functionality is currently experimental but is potentially useful
	in a variety of settings, the most obvious being a runtime (JIT)
	compiler. Example applications of the patchpoint intrinsics are
	implementing an inline call cache for polymorphic method dispatch or
	optimizing the retrieval of properties in dynamically typed languages
	such as JavaScript.

	The intrinsics documented here are currently used by the JavaScript
	compiler within the open source WebKit project, see the `FTL JIT
	<https://trac.webkit.org/wiki/FTLJIT>`_, but they are designed to be
	used whenever stack maps or code patching are needed. Because the
	intrinsics have experimental status, compatibility across LLVM
	releases is not guaranteed.

	The stack map functionality described in this document is separate
	from the functionality described in
	:ref:`stack-map`. `GCFunctionMetadata` provides the location of
	pointers into a collected heap captured by the `GCRoot` intrinsic,
	which can also be considered a "stack map". Unlike the stack maps
	defined above, the `GCFunctionMetadata` stack map interface does not
	provide a way to associate live register values of arbitrary type with
	an instruction address, nor does it specify a format for the resulting
	stack map. The stack maps described here could potentially provide
	richer information to a garbage collecting runtime, but that usage
	will not be discussed in this document.

	Intrinsics
	==========

	The following two kinds of intrinsics can be used to implement stack
	maps and patch points: ``llvm.experimental.stackmap`` and
	``llvm.experimental.patchpoint``. Both kinds of intrinsics generate a
	stack map record, and they both allow some form of code patching. They
	can be used independently (i.e. ``llvm.experimental.patchpoint``
	implicitly generates a stack map without the need for an additional
	call to ``llvm.experimental.stackmap``). The choice of which to use
	depends on whether it is necessary to reserve space for code patching
	and whether any of the intrinsic arguments should be lowered according
	to calling conventions. ``llvm.experimental.stackmap`` does not
	reserve any space, nor does it expect any call arguments. If the
	runtime patches code at the stack map's address, it will destructively
	overwrite the program text. This is unlike
	``llvm.experimental.patchpoint``, which reserves space for in-place
	patching without overwriting surrounding code. The
	``llvm.experimental.patchpoint`` intrinsic also lowers a specified
	number of arguments according to its calling convention. This allows
	patched code to make in-place function calls without marshaling.

	Each instance of one of these intrinsics generates a stack map record
	in the :ref:`stackmap-section`. The record includes an ID, allowing
	the runtime to uniquely identify the stack map, and the offset within
	the code from the beginning of the enclosing function.

	'``llvm.experimental.stackmap``' Intrinsic
	^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

	Syntax:
	"""""""

	::

	declare void
	@llvm.experimental.stackmap(i64 <id>, i32 <numShadowBytes>, ...)

	Overview:
	"""""""""

	The '``llvm.experimental.stackmap``' intrinsic records the location of
	specified values in the stack map without generating any code.

	Operands:
	"""""""""

	The first operand is an ID to be encoded within the stack map. The
	second operand is the number of shadow bytes following the
	intrinsic. The variable number of operands that follow are the ``live
	values`` for which locations will be recorded in the stack map.

	To use this intrinsic as a bare-bones stack map, with no code patching
	support, the number of shadow bytes can be set to zero.

	Semantics:
	""""""""""

	The stack map intrinsic generates no code in place, unless nops are
	needed to cover its shadow (see below). However, its offset from
	function entry is stored in the stack map. This is the relative
	instruction address immediately following the instructions that
	precede the stack map.

	The stack map ID allows a runtime to locate the desired stack map
	record. LLVM passes this ID through directly to the stack map
	record without checking uniqueness.

	LLVM guarantees a shadow of instructions following the stack map's
	instruction offset during which neither the end of the basic block nor
	another call to ``llvm.experimental.stackmap`` or
	``llvm.experimental.patchpoint`` may occur. This allows the runtime to
	patch the code at this point in response to an event triggered from
	outside the code. The code for instructions following the stack map
	may be emitted in the stack map's shadow, and these instructions may
	be overwritten by destructive patching. Without shadow bytes, this
	destructive patching could overwrite program text or data outside the
	current function. We disallow overlapping stack map shadows so that
	the runtime does not need to consider this corner case.

	For example, a stack map with 8 byte shadow:

	.. code-block:: llvm

	call void @runtime()
	call void (i64, i32, ...)* @llvm.experimental.stackmap(i64 77, i32 8,
	i64* %ptr)
	%val = load i64* %ptr
	%add = add i64 %val, 3
	ret i64 %add

	May require one byte of nop-padding:

	.. code-block:: none

	0x00 callq _runtime
	0x05 nop <--- stack map address
	0x06 movq (%rdi), %rax
	0x07 addq $3, %rax
	0x0a popq %rdx
	0x0b ret <---- end of 8-byte shadow

	Now, if the runtime needs to invalidate the compiled code, it may
	patch 8 bytes of code at the stack map's address at follows:

	.. code-block:: none

	0x00 callq _runtime
	0x05 movl $0xffff, %rax <--- patched code at stack map address
	0x0a callq *%rax <---- end of 8-byte shadow

	This way, after the normal call to the runtime returns, the code will
	execute a patched call to a special entry point that can rebuild a
	stack frame from the values located by the stack map.

	'``llvm.experimental.patchpoint.*``' Intrinsic
	^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

	Syntax:
	"""""""

	::

	declare void
	@llvm.experimental.patchpoint.void(i64 <id>, i32 <numBytes>,
	i8* <target>, i32 <numArgs>, ...)
	declare i64
	@llvm.experimental.patchpoint.i64(i64 <id>, i32 <numBytes>,
	i8* <target>, i32 <numArgs>, ...)

	Overview:
	"""""""""

	The '``llvm.experimental.patchpoint.*``' intrinsics creates a function
	call to the specified ``<target>`` and records the location of specified
	values in the stack map.

	Operands:
	"""""""""

	The first operand is an ID, the second operand is the number of bytes
	reserved for the patchable region, the third operand is the target
	address of a function (optionally null), and the fourth operand
	specifies how many of the following variable operands are considered
	function call arguments. The remaining variable number of operands are
	the ``live values`` for which locations will be recorded in the stack
	map.

	Semantics:
	""""""""""

	The patch point intrinsic generates a stack map. It also emits a
	function call to the address specified by ``<target>`` if the address
	is not a constant null. The function call and its arguments are
	lowered according to the calling convention specified at the
	intrinsic's callsite. Variants of the intrinsic with non-void return
	type also return a value according to calling convention.

	On PowerPC, note that ``<target>`` must be the ABI function pointer for the
	intended target of the indirect call. Specifically, when compiling for the
	ELF V1 ABI, ``<target>`` is the function-descriptor address normally used as
	the C/C++ function-pointer representation.

	Requesting zero patch point arguments is valid. In this case, all
	variable operands are handled just like
	``llvm.experimental.stackmap.*``. The difference is that space will
	still be reserved for patching, a call will be emitted, and a return
	value is allowed.

	The location of the arguments are not normally recorded in the stack
	map because they are already fixed by the calling convention. The
	remaining ``live values`` will have their location recorded, which
	could be a register, stack location, or constant. A special calling
	convention has been introduced for use with stack maps, anyregcc,
	which forces the arguments to be loaded into registers but allows
	those register to be dynamically allocated. These argument registers
	will have their register locations recorded in the stack map in
	addition to the remaining ``live values``.

	The patch point also emits nops to cover at least ``<numBytes>`` of
	instruction encoding space. Hence, the client must ensure that
	``<numBytes>`` is enough to encode a call to the target address on the
	supported targets. If the call target is constant null, then there is
	no minimum requirement. A zero-byte null target patchpoint is
	valid.

	The runtime may patch the code emitted for the patch point, including
	the call sequence and nops. However, the runtime may not assume
	anything about the code LLVM emits within the reserved space. Partial
	patching is not allowed. The runtime must patch all reserved bytes,
	padding with nops if necessary.

	This example shows a patch point reserving 15 bytes, with one argument
	in $rdi, and a return value in $rax per native calling convention:

	.. code-block:: llvm

	%target = inttoptr i64 -281474976710654 to i8*
	%val = call i64 (i64, i32, ...)*
	@llvm.experimental.patchpoint.i64(i64 78, i32 15,
	i8* %target, i32 1, i64* %ptr)
	%add = add i64 %val, 3
	ret i64 %add

	May generate:

	.. code-block:: none

	0x00 movabsq $0xffff000000000002, %r11 <--- patch point address
	0x0a callq *%r11
	0x0d nop
	0x0e nop <--- end of reserved 15-bytes
	0x0f addq $0x3, %rax
	0x10 movl %rax, 8(%rsp)

	Note that no stack map locations will be recorded. If the patched code
	sequence does not need arguments fixed to specific calling convention
	registers, then the ``anyregcc`` convention may be used:

	.. code-block:: none

	%val = call anyregcc @llvm.experimental.patchpoint(i64 78, i32 15,
	i8* %target, i32 1,
	i64* %ptr)

	The stack map now indicates the location of the %ptr argument and
	return value:

	.. code-block:: none

	Stack Map: ID=78, Loc0=%r9 Loc1=%r8

	The patch code sequence may now use the argument that happened to be
	allocated in %r8 and return a value allocated in %r9:

	.. code-block:: none

	0x00 movslq 4(%r8) %r9 <--- patched code at patch point address
	0x03 nop
	...
	0x0e nop <--- end of reserved 15-bytes
	0x0f addq $0x3, %r9
	0x10 movl %r9, 8(%rsp)

	.. _stackmap-format:

	Stack Map Format
	================

	The existence of a stack map or patch point intrinsic within an LLVM
	Module forces code emission to create a :ref:`stackmap-section`. The
	format of this section follows:

	.. code-block:: none

	Header {
	uint8 : Stack Map Version (current version is 3)
	uint8 : Reserved (expected to be 0)
	uint16 : Reserved (expected to be 0)
	}
	uint32 : NumFunctions
	uint32 : NumConstants
	uint32 : NumRecords
	StkSizeRecord[NumFunctions] {
	uint64 : Function Address
	uint64 : Stack Size
	uint64 : Record Count
	}
	Constants[NumConstants] {
	uint64 : LargeConstant
	}
	StkMapRecord[NumRecords] {
	uint64 : PatchPoint ID
	uint32 : Instruction Offset
	uint16 : Reserved (record flags)
	uint16 : NumLocations
	Location[NumLocations] {
	uint8 : Register \| Direct \| Indirect \| Constant \| ConstantIndex
	uint8 : Reserved (expected to be 0)
	uint16 : Location Size
	uint16 : Dwarf RegNum
	uint16 : Reserved (expected to be 0)
	int32 : Offset or SmallConstant
	}
	uint32 : Padding (only if required to align to 8 byte)
	uint16 : Padding
	uint16 : NumLiveOuts
	LiveOuts[NumLiveOuts]
	uint16 : Dwarf RegNum
	uint8 : Reserved
	uint8 : Size in Bytes
	}
	uint32 : Padding (only if required to align to 8 byte)
	}

	The first byte of each location encodes a type that indicates how to
	interpret the ``RegNum`` and ``Offset`` fields as follows:

	======== ========== =================== ===========================
	Encoding Type Value Description
	-------- ---------- ------------------- ---------------------------
	0x1 Register Reg Value in a register
	0x2 Direct Reg + Offset Frame index value
	0x3 Indirect [Reg + Offset] Spilled value
	0x4 Constant Offset Small constant
	0x5 ConstIndex Constants[Offset] Large constant
	======== ========== =================== ===========================

	In the common case, a value is available in a register, and the
	``Offset`` field will be zero. Values spilled to the stack are encoded
	as ``Indirect`` locations. The runtime must load those values from a
	stack address, typically in the form ``[BP + Offset]``. If an
	``alloca`` value is passed directly to a stack map intrinsic, then
	LLVM may fold the frame index into the stack map as an optimization to
	avoid allocating a register or stack slot. These frame indices will be
	encoded as ``Direct`` locations in the form ``BP + Offset``. LLVM may
	also optimize constants by emitting them directly in the stack map,
	either in the ``Offset`` of a ``Constant`` location or in the constant
	pool, referred to by ``ConstantIndex`` locations.

	At each callsite, a "liveout" register list is also recorded. These
	are the registers that are live across the stackmap and therefore must
	be saved by the runtime. This is an important optimization when the
	patchpoint intrinsic is used with a calling convention that by default
	preserves most registers as callee-save.

	Each entry in the liveout register list contains a DWARF register
	number and size in bytes. The stackmap format deliberately omits
	specific subregister information. Instead the runtime must interpret
	this information conservatively. For example, if the stackmap reports
	one byte at ``%rax``, then the value may be in either ``%al`` or
	``%ah``. It doesn't matter in practice, because the runtime will
	simply save ``%rax``. However, if the stackmap reports 16 bytes at
	``%ymm0``, then the runtime can safely optimize by saving only
	``%xmm0``.

	The stack map format is a contract between an LLVM SVN revision and
	the runtime. It is currently experimental and may change in the short
	term, but minimizing the need to update the runtime is
	important. Consequently, the stack map design is motivated by
	simplicity and extensibility. Compactness of the representation is
	secondary because the runtime is expected to parse the data
	immediately after compiling a module and encode the information in its
	own format. Since the runtime controls the allocation of sections, it
	can reuse the same stack map space for multiple modules.

	Stackmap support is currently only implemented for 64-bit
	platforms. However, a 32-bit implementation should be able to use the
	same format with an insignificant amount of wasted space.

	.. _stackmap-section:

	Stack Map Section
	^^^^^^^^^^^^^^^^^

	A JIT compiler can easily access this section by providing its own
	memory manager via the LLVM C API
	``LLVMCreateSimpleMCJITMemoryManager()``. When creating the memory
	manager, the JIT provides a callback:
	``LLVMMemoryManagerAllocateDataSectionCallback()``. When LLVM creates
	this section, it invokes the callback and passes the section name. The
	JIT can record the in-memory address of the section at this time and
	later parse it to recover the stack map data.

	For MachO (e.g. on Darwin), the stack map section name is
	"__llvm_stackmaps". The segment name is "__LLVM_STACKMAPS".

	For ELF (e.g. on Linux), the stack map section name is
	".llvm_stackmaps". The segment name is "__LLVM_STACKMAPS".

	Stack Map Usage
	===============

	The stack map support described in this document can be used to
	precisely determine the location of values at a specific position in
	the code. LLVM does not maintain any mapping between those values and
	any higher-level entity. The runtime must be able to interpret the
	stack map record given only the ID, offset, and the order of the
	locations, records, and functions, which LLVM preserves.

	Note that this is quite different from the goal of debug information,
	which is a best-effort attempt to track the location of named
	variables at every instruction.

	An important motivation for this design is to allow a runtime to
	commandeer a stack frame when execution reaches an instruction address
	associated with a stack map. The runtime must be able to rebuild a
	stack frame and resume program execution using the information
	provided by the stack map. For example, execution may resume in an
	interpreter or a recompiled version of the same function.

	This usage restricts LLVM optimization. Clearly, LLVM must not move
	stores across a stack map. However, loads must also be handled
	conservatively. If the load may trigger an exception, hoisting it
	above a stack map could be invalid. For example, the runtime may
	determine that a load is safe to execute without a type check given
	the current state of the type system. If the type system changes while
	some activation of the load's function exists on the stack, the load
	becomes unsafe. The runtime can prevent subsequent execution of that
	load by immediately patching any stack map location that lies between
	the current call site and the load (typically, the runtime would
	simply patch all stack map locations to invalidate the function). If
	the compiler had hoisted the load above the stack map, then the
	program could crash before the runtime could take back control.

	To enforce these semantics, stackmap and patchpoint intrinsics are
	considered to potentially read and write all memory. This may limit
	optimization more than some clients desire. This limitation may be
	avoided by marking the call site as "readonly". In the future we may
	also allow meta-data to be added to the intrinsic call to express
	aliasing, thereby allowing optimizations to hoist certain loads above
	stack maps.

	Direct Stack Map Entries
	^^^^^^^^^^^^^^^^^^^^^^^^

	As shown in :ref:`stackmap-section`, a Direct stack map location
	records the address of frame index. This address is itself the value
	that the runtime requested. This differs from Indirect locations,
	which refer to a stack locations from which the requested values must
	be loaded. Direct locations can communicate the address if an alloca,
	while Indirect locations handle register spills.

	For example:

	.. code-block:: none

	entry:
	%a = alloca i64...
	llvm.experimental.stackmap(i64 <ID>, i32 <shadowBytes>, i64* %a)

	The runtime can determine this alloca's relative location on the
	stack immediately after compilation, or at any time thereafter. This
	differs from Register and Indirect locations, because the runtime can
	only read the values in those locations when execution reaches the
	instruction address of the stack map.

	This functionality requires LLVM to treat entry-block allocas
	specially when they are directly consumed by an intrinsics. (This is
	the same requirement imposed by the llvm.gcroot intrinsic.) LLVM
	transformations must not substitute the alloca with any intervening
	value. This can be verified by the runtime simply by checking that the
	stack map's location is a Direct location type.


	Supported Architectures
	=======================

	Support for StackMap generation and the related intrinsics requires
	some code for each backend. Today, only a subset of LLVM's backends
	are supported. The currently supported architectures are X86_64,
	PowerPC, and Aarch64.