Application developers spend a significant time debugging the applications that they create. Hence it is important that a compiler provide support for a good debug experience. DWARF[1] is the standard debugging file format used by compilers and debuggers. The LLVM infrastructure supports debug info generation using metadata[2]. Support for generating debug metadata is present in MLIR by way of MLIR attributes. Flang can leverage these MLIR attributes to generate good debug information.
We can break the work for debug generation into two separate tasks:
By default, Flang will not generate any debug or linetable information. Debug information will be generated if the following flags are present.
-gline-tables-only, -g1 : Emit debug line number tables only
-g : Emit full debug info
There is existing AddDebugFoundationPass which add FusedLoc
with a SubprogramAttr
on FuncOp. This allows MLIR to generate LLVM IR metadata for that function. However, following values are hardcoded at the moment. These will instead be passed from the driver.
DW_CC_normal
by default and DW_CC_program
if it is the main program.DISubroutineTypeAttr
currently has a fixed type. This will be changed to match the signature of the actual function/subroutine.
Full debug info will include metadata to describe functions, variables and types. Flang will generate debug metadata in the form of MLIR attributes. These attributes will be converted to the format expected by LLVM IR in DebugTranslation[4].
Debug metadata generation can be broken down in 2 steps.
MLIR attributes are generated by reading information from AST or FIR. This step can happen anytime before or during conversion to LLVM dialect. An example of the metadata generated in this step is DILocalVariableAttr
or DIDerivedTypeAttr
.
Changes that can only happen during or after conversion to LLVM dialect. The example of this is passing DIGlobalVariableExpressionAttr
while creating LLVM::GlobalOp
. Another example will be generation of DbgDeclareOp
that is required for local variables. It can only be created after conversion to LLVM dialect as it requires LLVM.Ptr type. The changes required for step 2 are quite minimal. The bulk of the work happens in step 1.
One design decision that we need to make is to decide where to perform step 1. Here are some possible options:
During conversion to LLVM dialect
Pros:
Cons:
DeclareOp
is removed before this pass.DeclareOp
is retained, creating debug metadata while some ops have been converted to LLVMdialect and others are not may cause its own issues. We have to walk the ops chain to extract the information which may be problematic in this case.During a pass before conversion to LLVM dialect
This is similar to what AddDebugFoundationPass is currently doing.
Pros:
Cons:
During Lowering from AST
Pros
Cons:
The design below assumes that we are extracting the information from FIR. If we generate debug metadata during lowering then the description below may need to change. Although the generated metadata remains the same in both cases.
The AddDebugFoundationPass will be renamed to AddDebugInfo Pass. The information mentioned in the line info section above will be passed to it from the driver. This pass will run quite late in the pipeline but before DeclareOp
is removed.
In this pass, we will iterate through the GlobalOp
, TypeInfoOp
, FuncOp
and DeclareOp
to extract the source information and build the MLIR attributes. A class will be added to handle conversion of MLIR and FIR types to DITypeAttr
.
Following sections provide details of how various language constructs will be handled. In these sections, the LLVM IR metadata and MLIR attributes have been used interchangeably. As an example, DILocalVariableAttr
is an MLIR attribute which gets translated to LLVM IR's DILocalVariable
.
In MLIR, local variables are represented by DILocalVariableAttr
which stores information like source location and type. They also require a DbgDeclareOp
which binds DILocalVariableAttr
with a location.
In FIR, DeclareOp
has source information about the variable. The DeclareOp
will be processed to create DILocalVariableAttr
. This attr is attached to the memref op of the DeclareOp
using a FusedLoc
approach.
During conversion to LLVM dialect, when an op is encountered that has a DILocalVariableAttr
in its FusedLoc
, a DbgDeclareOp
is created which binds the attr with its location.
The change in the IR look like as follows:
original fir %2 = fir.alloca i32 loc(#loc4) %3 = fir.declare %2 {uniq_name = "_QMhelperFchangeEi"} Fir with FusedLoc. %2 = fir.alloca i32 loc(#loc38) %3 = fir.declare %2 {uniq_name = "_QMhelperFchangeEi"} #di_local_variable5 = #llvm.di_local_variable<name = "i", line = 5, type = #di_basic_type ... > #loc38 = loc(fused<#di_local_variable5>[#loc4]) After conversion to llvm dialect #di_local_variable = #llvm.di_local_variable<name = "i", line = 5, type = #di_basic_type ...> %1 = llvm.alloca %0 x i64 llvm.intr.dbg.declare #di_local_variable = %1
Arguments work in similar way, but they present a difficulty that DeclareOp
's memref points to BlockArgument
. Unlike the op in local variable case, the BlockArgument
are not handled by the FIRToLLVMLowering. This can easily be handled by adding after conversion to LLVM dialect either in FIRToLLVMLowering or in a separate pass.
In debug metadata, the Fortran module will be represented by DIModuleAttr
. The variables or functions inside module will have scope pointing to the parent module.
module helper real glr ... end module helper !1 = !DICompileUnit(language: DW_LANG_Fortran90 ...) !2 = !DIModule(scope: !1, name: "helper" ...) !3 = !DIGlobalVariable(scope: !2, name: "glr" ...) Use of a module results in the following metadata. !4 = !DIImportedEntity(tag: DW_TAG_imported_module, entity: !2)
Modules are not first class entities in the FIR. So there is no way to get the location where they are declared in source file.
But the information that a variable or function is part of a module can be extracted from its mangled name along with name of the module. There is a GlobalOp
generated for each module variable in FIR and there is also a DeclareOp
in each function where the module variable is used.
We will use the GlobalOp
to generate the DIModuleAttr
and associated DIGlobalVariableAttr
. A DeclareOp
for module variable will be used to generate DIImportedEntityAttr
. Care will be taken to avoid generating duplicate DIImportedEntityAttr
entries in same function.
A derived type will be represented in metadata by DICompositeType
with a tag of DW_TAG_structure_type
. It will have elements which point to the components.
type :: t_pair integer :: i real :: x end type !1 = !DICompositeType(tag: DW_TAG_structure_type, name: "t_pair", elements: !2 ...) !2 = !{!3, !4} !3 = !DIDerivedType(tag: DW_TAG_member, scope: !1, name: "i", size: 32, offset: 0, baseType: !5 ...) !4 = !DIDerivedType(tag: DW_TAG_member, scope: !1, name: "x", size: 32, offset: 32, baseType: !6 ...) !5 = !DIBasicType(tag: DW_TAG_base_type, name: "integer" ...) !6 = !DIBasicType(tag: DW_TAG_base_type, name: "real" ...)
In FIR, RecordType
and TypeInfoOp
can be used to get information about the location of the derived type and the types of its components. We may also use FusedLoc
on TypeInfoOp
to encode location information for all the components of the derived type.
A common block will be represented in metadata by DICommonBlockAttr
which will be used as scope by the variable inside common block. DIExpression
can be used to give the offset of any given variable inside the global storage for common block.
integer a, b common /test/ a, b ;@test_ = common global [8 x i8] zeroinitializer, !dbg !5, !dbg !6 !1 = !DISubprogram() !2 = !DICommonBlock(scope: !1, name: "test" ...) !3 = !DIGlobalVariable(scope: !2, name: "a" ...) !4 = !DIExpression() !5 = !DIGlobalVariableExpression(var: !3, expr: !4) !6 = !DIGlobalVariable(scope: !2, name: "b" ...) !7 = !DIExpression(DW_OP_plus_uconst, 4) !8 = !DIGlobalVariableExpression(var: !6, expr: !7)
In FIR, a common block results in a GlobalOp
with common linkage. Every function where the common block is used has DeclareOp
for that variable. This DeclareOp
will point to global storage through CoordinateOp
and AddrOfOp
. The CoordinateOp
has the offset of the location of this variable in global storage. There is enough information to generate the required metadata. Although it requires walking up the chain from DeclaredOp
to locate CoordinateOp
and AddrOfOp
.
The type of fixed size array is represented using DICompositeType
. The DISubrangeAttr
is used to provide bounds in any given dimensions.
integer abc(4,5) !1 = !DICompositeType(tag: DW_TAG_array_type, baseType: !5, elements: !2 ...) !2 = !{ !3, !4 } !3 = !DISubrange(lowerBound: 1, upperBound: 4 ...) !4 = !DISubrange(lowerBound: 1, upperBound: 5 ...) !5 = !DIBasicType(tag: DW_TAG_base_type, name: "integer" ...)
The debug metadata for the adjustable array looks similar to fixed sized array with one change. The bounds are not constant values but point to a DILocalVariableAttr
.
In FIR, the DeclareOp
points to a ShapeOp
and we can walk the chain to get the value that represents the array bound in any dimension. We will create a DILocalVariableAttr
that will point to that location. This variable will be used in the DISubrangeAttr
. Note that this DILocalVariableAttr
does not correspond to any source variable.
This is treated as raw array. Debug information will not provide any upper bound information for the last dimension.
The assumed shape array will use the similar representation as fixed size array but there will be 2 differences.
There will be a datalocation
field which will be an expression. This will enable debugger to get the data pointer from array descriptor.
The field in DISubrangeAttr
for array bounds will be expression which will allow the debugger to get the bounds from descriptor.
integer(4), intent(out) :: a(:,:) !1 = !DICompositeType(tag: DW_TAG_array_type, baseType: !8, elements: !2, dataLocation: !3) !2 = !{!5, !7} !3 = !DIExpression(DW_OP_push_object_address, DW_OP_deref) !4 = !DIExpression(DW_OP_push_object_address, DW_OP_plus_uconst, 32, DW_OP_deref) !5 = !DISubrange(lowerBound: !1, upperBound: !4 ...) !6 = !DIExpression(DW_OP_push_object_address, DW_OP_plus_uconst, 56, DW_OP_deref) !7 = !DISubrange(lowerBound: !1, upperBound: !6, ...) !8 = !DIBasicType(tag: DW_TAG_base_type, name: "integer" ...)
In assumed shape case, the rank can be determined from the FIR's SequenceType
. This allows us to generate a DISubrangeAttr
in each dimension.
This is currently unsupported in flang. Its representation will be similar to array representation for assumed shape array with the following difference.
DICompositeTypeAttr
will have a rank field which will be an expression. It will be used to get the rank value from descriptor.DISubrangeType
for each dimension, there will be a single DIGenericSubrange
which will allow debuggers to calculate bounds in any dimension.The pointer and allocatable will be represented using DICompositeTypeAttr
. It is quirk of DWARF that scalar allocatable or pointer variables will show up in the debug info as pointer to scalar while array pointer or allocatable variables show up as arrays. The behavior is same in gfortran and classic flang.
integer, allocatable :: ar(:) integer, pointer :: sc !1 = !DILocalVariable(name: "sc", type: !2) !2 = !DIDerivedType(tag: DW_TAG_pointer_type, baseType: !3, associated: !9 ...) !3 = !DIBasicType(tag: DW_TAG_base_type, name: "integer", ...) !4 = !DILocalVariable(name: "ar", type: !5 ...) !5 = !DICompositeType(tag: DW_TAG_array_type, baseType: !3, elements: !6, dataLocation: !8, allocated: !9) !6 = !{!7} !7 = !DISubrange(lowerBound: !10, upperBound: !11 ...) !8 = !DIExpression(DW_OP_push_object_address, DW_OP_deref) !9 = !DIExpression(DW_OP_push_object_address, DW_OP_deref, DW_OP_lit0, DW_OP_ne) !10 = !DIExpression(DW_OP_push_object_address, DW_OP_plus_uconst, 24, DW_OP_deref) !11 = !DIExpression(DW_OP_push_object_address, DW_OP_plus_uconst, 32, DW_OP_deref)
IN FIR, these variable are represent as <!fir.box<!fir.heap<>> or fir.box<!fir.ptr<>>. There is also allocatable
or pointer
attribute on the DeclareOp
. This allows us to generate allocated/associated status of these variables. The metadata to get the information from the descriptor is similar to arrays.
The DIStringTypeAttr
can represent both fixed size and allocatable strings. For the allocatable case, the stringLengthExpression
and stringLocationExpression
are used to provide the length and the location of the string respectively.
character(len=:), allocatable :: var character(len=20) :: fixed !1 = !DILocalVariable(name: "var", type: !2) !2 = !DIStringType(name: "character(*)", stringLengthExpression: !4, stringLocationExpression: !3 ...) !3 = !DIExpression(DW_OP_push_object_address, DW_OP_deref) !4 = !DIExpression(DW_OP_push_object_address, DW_OP_plus_uconst, 8) !5 = !DILocalVariable(name: "fixed", type: !6) !6 = !DIStringType(name: "character (20)", size: 160)
They will be treated like normal variables. Although we may require to handle the case where the DeclareOp
of one variable points to the DeclareOp
of another variable (e.g. a => b).
FIR does not seem to have a way to extract information about namelists.
namelist /abc/ x3, y3 (gdb) p abc $1 = ( x3 = 100, y3 = 500 ) (gdb) p x3 $2 = 100 (gdb) p y3 $3 = 500
Even without namelist support, we should be able to see the value of the individual variables like x3
and y3
in the above example. But we would not be able to evaluate the namelist and have the debugger prints the value of all the variables in it as shown above for abc
.
Some metadata types that are needed for fortran are present in LLVM IR but are absent from MLIR. A non comprehensive list is given below.
DICommonBlockAttr
DIGenericSubrangeAttr
DISubrangeAttr
in MLIR takes IntegerAttr at the moment so only works with fixed sizes arrays. It needs to also accept DIExpressionAttr
or DILocalVariableAttr
to support assumed shape and adjustable arrays.DICompositeTypeAttr
will need to have field for datalocation
, rank
, allocated
and associated
.DIStringTypeAttr
fir-opt
.flang -fc1
that end-to-end debug info generation works.GDB
's gdb.fortran testsuite with llvm-flang.