Allow Location Descriptions on the DWARF Expression Stack

1. Extension

In DWARF 5, expressions are evaluated using a typed value stack, a separate location area, and an independent loclist mechanism. This extension unifies all three mechanisms into a single generalized DWARF expression evaluation model that allows both typed values and location descriptions to be manipulated on the evaluation stack. Both single and multiple location descriptions are supported on the stack. In addition, the call frame information (CFI) is extended to support the full generality of location descriptions. This is done in a manner that is backwards compatible with DWARF 5. The extension involves changes to the DWARF 5 sections 2.5 (pp 26-38), 2.6 (pp 38-45), and 6.4 (pp 171-182).

The extension permits operations to act on location descriptions in an incremental, consistent, and composable manner. It allows a small number of operations to be defined to address the requirements of heterogeneous devices as well as providing benefits to non-heterogeneous devices. It acts as a foundation to provide support for other issues that have been raised that would benefit all devices.

Other approaches were explored that involved adding specialized operations and rules. However, these resulted in the need for more operations that did not compose. It also resulted in operations with context sensitive semantics and corner cases that had to be defined. The observation was that numerous specialized context sensitive operations are harder for both producers and consumers than a smaller number of general composable operations that have consistent semantics regardless of context.

First, section 2. Heterogeneous Computing Devices describes heterogeneous devices and the features they have that are not addressed by DWARF 5. Then section 3. DWARF 5 presents a brief simplified overview of the DWARF 5 expression evaluation model that highlights the difficulties for supporting the heterogeneous features. Next, section 4. Extension Solution provides an overview of the proposal, using simplified examples to illustrate how it can address the issues of heterogeneous devices and also benefit non-heterogeneous devices. Then overall conclusions are covered in section 5. Conclusion. Appendix A. Changes to DWARF Debugging Information Format Version 5 gives changes relative to the DWARF Version 5 standard. Finally, appendix B. Further Information has references to further information.

2. Heterogeneous Computing Devices

GPUs and other heterogeneous computing devices have features not common to CPU computing devices.

These devices often have many more registers than a CPU. This helps reduce memory accesses which tend to be more expensive than on a CPU due to the much larger number of threads concurrently executing. In addition to traditional scalar registers of a CPU, these devices often have many wide vector registers.

Example GPU Hardware

They may support masked vector instructions that are used by the compiler to map high level language threads onto the lanes of the vector registers. As a consequence, multiple language threads execute in lockstep as the vector instructions are executed. This is termed single instruction multiple thread (SIMT) execution.

SIMT/SIMD Execution Model

GPUs can have multiple memory address spaces in addition to the single global memory address space of a CPU. These additional address spaces are accessed using distinct instructions and are often local to a particular thread or group of threads.

For example, a GPU may have a per thread block address space that is implemented as scratch pad memory with explicit hardware support to isolate portions to specific groups of threads created as a single thread block.

A GPU may also use global memory in a non linear manner. For example, to support providing a SIMT per lane address space efficiently, there may be instructions that support interleaved access.

Through optimization, the source variables may be located across these different storage kinds. SIMT execution requires locations to be able to express selection of runtime defined pieces of vector registers. With the more complex locations, there is a benefit to be able to factorize their calculation which requires all location kinds to be supported uniformly, otherwise duplication is necessary.

3. DWARF 5

Before presenting the proposed solution to supporting heterogeneous devices, a brief overview of the DWARF 5 expression evaluation model will be given to highlight the aspects being addressed by the extension.

3.1 How DWARF Maps Source Language To Hardware

DWARF is a standardized way to specify debug information. It describes source language entities such as compilation units, functions, types, variables, etc. It is either embedded directly in sections of the code object executables, or split into separate files that they reference.

DWARF maps between source program language entities and their hardware representations. For example:

  • It maps a hardware instruction program counter to a source language program line, and vice versa.
  • It maps a source language function to the hardware instruction program counter for its entry point.
  • It maps a source language variable to its hardware location when at a particular program counter.
  • It provides information to allow virtual unwinding of hardware registers for a source language function call stack.
  • In addition, it provides numerous other information about the source language program.

In particular, there is great diversity in the way a source language entity could be mapped to a hardware location. The location may involve runtime values. For example, a source language variable location could be:

  • In register.
  • At a memory address.
  • At an offset from the current stack pointer.
  • Optimized away, but with a known compiler time value.
  • Optimized away, but with an unknown value, such as happens for unused variables.
  • Spread across combination of the above kinds of locations.
  • At a memory address, but also transiently loaded into registers.

To support this DWARF 5 defines a rich expression language comprised of loclist expressions and operation expressions. Loclist expressions allow the result to vary depending on the PC. Operation expressions are made up of a list of operations that are evaluated on a simple stack machine.

A DWARF expression can be used as the value of different attributes of different debug information entries (DIE). A DWARF expression can also be used as an argument to call frame information information (CFI) entry operations. An expression is evaluated in a context dictated by where it is used. The context may include:

  • Whether the expression needs to produce a value or the location of an entity.
  • The current execution point including process, thread, PC, and stack frame.
  • Some expressions are evaluated with the stack initialized with a specific value or with the location of a base object that is available using the DW_OP_push_object_address operation.

3.2 Examples

The following examples illustrate how DWARF expressions involving operations are evaluated in DWARF 5. DWARF also has expressions involving location lists that are not covered in these examples.

3.2.1 Dynamic Array Size

The first example is for an operation expression associated with a DIE attribute that provides the number of elements in a dynamic array type. Such an attribute dictates that the expression must be evaluated in the context of providing a value result kind.

Dynamic Array Size Example

In this hypothetical example, the compiler has allocated an array descriptor in memory and placed the descriptor's address in architecture register SGPR0. The first location of the array descriptor is the runtime size of the array.

A possible expression to retrieve the dynamic size of the array is:

DW_OP_regval_type SGPR0 Generic
DW_OP_deref

The expression is evaluated one operation at a time. Operations have operands and can pop and push entries on a stack.

Dynamic Array Size Example: Step 1

The expression evaluation starts with the first DW_OP_regval_type operation. This operation reads the current value of an architecture register specified by its first operand: SGPR0. The second operand specifies the size of the data to read. The read value is pushed on the stack. Each stack element is a value and its associated type.

Dynamic Array Size Example: Step 2

The type must be a DWARF base type. It specifies the encoding, byte ordering, and size of values of the type. DWARF defines that each architecture has a default generic type: it is an architecture specific integral encoding and byte ordering, that is the size of the architecture's global memory address.

The DW_OP_deref operation pops a value off the stack, treats it as a global memory address, and reads the contents of that location using the generic type. It pushes the read value on the stack as the value and its associated generic type.

Dynamic Array Size Example: Step 3

The evaluation stops when it reaches the end of the expression. The result of an expression that is evaluated with a value result kind context is the top element of the stack, which provides the value and its type.

3.2.2 Variable Location in Register

This example is for an operation expression associated with a DIE attribute that provides the location of a source language variable. Such an attribute dictates that the expression must be evaluated in the context of providing a location result kind.

DWARF defines the locations of objects in terms of location descriptions.

In this example, the compiler has allocated a source language variable in architecture register SGPR0.

Variable Location in Register Example

A possible expression to specify the location of the variable is:

DW_OP_regx SGPR0

Variable Location in Register Example: Step 1

The DW_OP_regx operation creates a location description that specifies the location of the architecture register specified by the operand: SGPR0. Unlike values, location descriptions are not pushed on the stack. Instead they are conceptually placed in a location area. Unlike values, location descriptions do not have an associated type, they only denote the location of the base of the object.

Variable Location in Register Example: Step 2

Again, evaluation stops when it reaches the end of the expression. The result of an expression that is evaluated with a location result kind context is the location description in the location area.

3.2.3 Variable Location in Memory

The next example is for an operation expression associated with a DIE attribute that provides the location of a source language variable that is allocated in a stack frame. The compiler has placed the stack frame pointer in architecture register SGPR0, and allocated the variable at offset 0x10 from the stack frame base. The stack frames are allocated in global memory, so SGPR0 contains a global memory address.

Variable Location in Memory Example

A possible expression to specify the location of the variable is:

DW_OP_regval_type SGPR0 Generic
DW_OP_plus_uconst 0x10

Variable Location in Memory Example: Step 1

As in the previous example, the DW_OP_regval_type operation pushes the stack frame pointer global memory address onto the stack. The generic type is the size of a global memory address.

Variable Location in Memory Example: Step 2

The DW_OP_plus_uconst operation pops a value from the stack, which must have a type with an integral encoding, adds the value of its operand, and pushes the result back on the stack with the same associated type. In this example, that computes the global memory address of the source language variable.

Variable Location in Memory Example: Step 3

Evaluation stops when it reaches the end of the expression. If the expression that is evaluated has a location result kind context, and the location area is empty, then the top stack element must be a value with the generic type. The value is implicitly popped from the stack, and treated as a global memory address to create a global memory location description, which is placed in the location area. The result of the expression is the location description in the location area.

Variable Location in Memory Example: Step 4

3.2.4 Variable Spread Across Different Locations

This example is for a source variable that is partly in a register, partly undefined, and partly in memory.

Variable Spread Across Different Locations Example

DWARF defines composite location descriptions that can have one or more parts. Each part specifies a location description and the number of bytes used from it. The following operation expression creates a composite location description.

DW_OP_regx SGPR3
DW_OP_piece 4
DW_OP_piece 2
DW_OP_bregx SGPR0 0x10
DW_OP_piece 2

Variable Spread Across Different Locations Example: Step 1

The DW_OP_regx operation creates a register location description in the location area.

Variable Spread Across Different Locations Example: Step 2

The first DW_OP_piece operation creates an incomplete composite location description in the location area with a single part. The location description in the location area is used to define the beginning of the part for the size specified by the operand, namely 4 bytes.

Variable Spread Across Different Locations Example: Step 3

A subsequent DW_OP_piece adds a new part to an incomplete composite location description already in the location area. The parts form a contiguous set of bytes. If there are no other location descriptions in the location area, and no value on the stack, then the part implicitly uses the undefined location description. Again, the operand specifies the size of the part in bytes. The undefined location description can be used to indicate a part that has been optimized away. In this case, 2 bytes of undefined value.

Variable Spread Across Different Locations Example: Step 4

The DW_OP_bregx operation reads the architecture register specified by the first operand (SGPR0) as the generic type, adds the value of the second operand (0x10), and pushes the value on the stack.

Variable Spread Across Different Locations Example: Step 5

The next DW_OP_piece operation adds another part to the already created incomplete composite location.

If there is no other location in the location area, but there is a value on stack, the new part is a memory location description. The memory address used is popped from the stack. In this case, the operand of 2 indicates there are 2 bytes from memory.

Variable Spread Across Different Locations Example: Step 6

Evaluation stops when it reaches the end of the expression. If the expression that is evaluated has a location result kind context, and the location area has an incomplete composite location description, the incomplete composite location is implicitly converted to a complete composite location description. The result of the expression is the location description in the location area.