===================================== | |
The PDB TPI and IPI Streams | |
===================================== | |
.. contents:: | |
:local: | |
.. _tpi_intro: | |
Introduction | |
============ | |
The PDB TPI Stream (Index 2) and IPI Stream (Index 4) contain information about | |
all types used in the program. It is organized as a :ref:`header <tpi_header>` | |
followed by a list of :doc:`CodeView Type Records <CodeViewTypes>`. Types are | |
referenced from various streams and records throughout the PDB by their | |
:ref:`type index <type_indices>`. In general, the sequence of type records | |
following the :ref:`header <tpi_header>` forms a topologically sorted DAG | |
(directed acyclic graph), which means that a type record B can only refer to | |
the type A if ``A.TypeIndex < B.TypeIndex``. While there are rare cases where | |
this property will not hold (particularly when dealing with object files | |
compiled with MASM), an implementation should try very hard to make this | |
property hold, as it means the entire type graph can be constructed in a single | |
pass. | |
.. important:: | |
Type records form a topologically sorted DAG (directed acyclic graph). | |
.. _tpi_ipi: | |
TPI vs IPI Stream | |
================= | |
Recent versions of the PDB format (aka all versions covered by this document) | |
have 2 streams with identical layout, henceforth referred to as the TPI stream | |
and IPI stream. Subsequent contents of this document describing the on-disk | |
format apply equally whether it is for the TPI Stream or the IPI Stream. The | |
only difference between the two is in *which* CodeView records are allowed to | |
appear in each one, summarized by the following table: | |
+----------------------+---------------------+ | |
| TPI Stream | IPI Stream | | |
+======================+=====================+ | |
| LF_POINTER | LF_FUNC_ID | | |
+----------------------+---------------------+ | |
| LF_MODIFIER | LF_MFUNC_ID | | |
+----------------------+---------------------+ | |
| LF_PROCEDURE | LF_BUILDINFO | | |
+----------------------+---------------------+ | |
| LF_MFUNCTION | LF_SUBSTR_LIST | | |
+----------------------+---------------------+ | |
| LF_LABEL | LF_STRING_ID | | |
+----------------------+---------------------+ | |
| LF_ARGLIST | LF_UDT_SRC_LINE | | |
+----------------------+---------------------+ | |
| LF_FIELDLIST | LF_UDT_MOD_SRC_LINE | | |
+----------------------+---------------------+ | |
| LF_ARRAY | | | |
+----------------------+---------------------+ | |
| LF_CLASS | | | |
+----------------------+---------------------+ | |
| LF_STRUCTURE | | | |
+----------------------+---------------------+ | |
| LF_INTERFACE | | | |
+----------------------+---------------------+ | |
| LF_UNION | | | |
+----------------------+---------------------+ | |
| LF_ENUM | | | |
+----------------------+---------------------+ | |
| LF_TYPESERVER2 | | | |
+----------------------+---------------------+ | |
| LF_VFTABLE | | | |
+----------------------+---------------------+ | |
| LF_VTSHAPE | | | |
+----------------------+---------------------+ | |
| LF_BITFIELD | | | |
+----------------------+---------------------+ | |
| LF_METHODLIST | | | |
+----------------------+---------------------+ | |
| LF_PRECOMP | | | |
+----------------------+---------------------+ | |
| LF_ENDPRECOMP | | | |
+----------------------+---------------------+ | |
The usage of these records is described in more detail in | |
:doc:`CodeView Type Records <CodeViewTypes>`. | |
.. _type_indices: | |
Type Indices | |
============ | |
A type index is a 32-bit integer that uniquely identifies a type inside of an | |
object file's ``.debug$T`` section or a PDB file's TPI or IPI stream. The | |
value of the type index for the first type record from the TPI stream is given | |
by the ``TypeIndexBegin`` member of the :ref:`TPI Stream Header <tpi_header>` | |
although in practice this value is always equal to 0x1000 (4096). | |
Any type index with a high bit set is considered to come from the IPI stream, | |
although this appears to be more of a hack, and LLVM does not generate type | |
indices of this nature. They can, however, be observed in Microsoft PDBs | |
occasionally, so one should be prepared to handle them. Note that having the | |
high bit set is not a necessary condition to determine whether a type index | |
comes from the IPI stream, it is only sufficient. | |
Once the high bit is cleared, any type index >= ``TypeIndexBegin`` is presumed | |
to come from the appropriate stream, and any type index less than this is a | |
bitmask which can be decomposed as follows: | |
.. code-block:: none | |
.---------------------------.------.----------. | |
| Unused | Mode | Kind | | |
'---------------------------'------'----------' | |
|+32 |+12 |+8 |+0 | |
- **Kind** - A value from the following enum: | |
.. code-block:: c++ | |
enum class SimpleTypeKind : uint32_t { | |
None = 0x0000, // uncharacterized type (no type) | |
Void = 0x0003, // void | |
NotTranslated = 0x0007, // type not translated by cvpack | |
HResult = 0x0008, // OLE/COM HRESULT | |
SignedCharacter = 0x0010, // 8 bit signed | |
UnsignedCharacter = 0x0020, // 8 bit unsigned | |
NarrowCharacter = 0x0070, // really a char | |
WideCharacter = 0x0071, // wide char | |
Character16 = 0x007a, // char16_t | |
Character32 = 0x007b, // char32_t | |
SByte = 0x0068, // 8 bit signed int | |
Byte = 0x0069, // 8 bit unsigned int | |
Int16Short = 0x0011, // 16 bit signed | |
UInt16Short = 0x0021, // 16 bit unsigned | |
Int16 = 0x0072, // 16 bit signed int | |
UInt16 = 0x0073, // 16 bit unsigned int | |
Int32Long = 0x0012, // 32 bit signed | |
UInt32Long = 0x0022, // 32 bit unsigned | |
Int32 = 0x0074, // 32 bit signed int | |
UInt32 = 0x0075, // 32 bit unsigned int | |
Int64Quad = 0x0013, // 64 bit signed | |
UInt64Quad = 0x0023, // 64 bit unsigned | |
Int64 = 0x0076, // 64 bit signed int | |
UInt64 = 0x0077, // 64 bit unsigned int | |
Int128Oct = 0x0014, // 128 bit signed int | |
UInt128Oct = 0x0024, // 128 bit unsigned int | |
Int128 = 0x0078, // 128 bit signed int | |
UInt128 = 0x0079, // 128 bit unsigned int | |
Float16 = 0x0046, // 16 bit real | |
Float32 = 0x0040, // 32 bit real | |
Float32PartialPrecision = 0x0045, // 32 bit PP real | |
Float48 = 0x0044, // 48 bit real | |
Float64 = 0x0041, // 64 bit real | |
Float80 = 0x0042, // 80 bit real | |
Float128 = 0x0043, // 128 bit real | |
Complex16 = 0x0056, // 16 bit complex | |
Complex32 = 0x0050, // 32 bit complex | |
Complex32PartialPrecision = 0x0055, // 32 bit PP complex | |
Complex48 = 0x0054, // 48 bit complex | |
Complex64 = 0x0051, // 64 bit complex | |
Complex80 = 0x0052, // 80 bit complex | |
Complex128 = 0x0053, // 128 bit complex | |
Boolean8 = 0x0030, // 8 bit boolean | |
Boolean16 = 0x0031, // 16 bit boolean | |
Boolean32 = 0x0032, // 32 bit boolean | |
Boolean64 = 0x0033, // 64 bit boolean | |
Boolean128 = 0x0034, // 128 bit boolean | |
}; | |
- **Mode** - A value from the following enum: | |
.. code-block:: c++ | |
enum class SimpleTypeMode : uint32_t { | |
Direct = 0, // Not a pointer | |
NearPointer = 1, // Near pointer | |
FarPointer = 2, // Far pointer | |
HugePointer = 3, // Huge pointer | |
NearPointer32 = 4, // 32 bit near pointer | |
FarPointer32 = 5, // 32 bit far pointer | |
NearPointer64 = 6, // 64 bit near pointer | |
NearPointer128 = 7 // 128 bit near pointer | |
}; | |
Note that for pointers, the bitness is represented in the mode. So a ``void*`` | |
would have a type index with ``Mode=NearPointer32, Kind=Void`` if built for 32-bits | |
but a type index with ``Mode=NearPointer64, Kind=Void`` if built for 64-bits. | |
By convention, the type index for ``std::nullptr_t`` is constructed the same way | |
as the type index for ``void*``, but using the bitless enumeration value | |
``NearPointer``. | |
.. _tpi_header: | |
Stream Header | |
============= | |
At offset 0 of the TPI Stream is a header with the following layout: | |
.. code-block:: c++ | |
struct TpiStreamHeader { | |
uint32_t Version; | |
uint32_t HeaderSize; | |
uint32_t TypeIndexBegin; | |
uint32_t TypeIndexEnd; | |
uint32_t TypeRecordBytes; | |
uint16_t HashStreamIndex; | |
uint16_t HashAuxStreamIndex; | |
uint32_t HashKeySize; | |
uint32_t NumHashBuckets; | |
int32_t HashValueBufferOffset; | |
uint32_t HashValueBufferLength; | |
int32_t IndexOffsetBufferOffset; | |
uint32_t IndexOffsetBufferLength; | |
int32_t HashAdjBufferOffset; | |
uint32_t HashAdjBufferLength; | |
}; | |
- **Version** - A value from the following enum. | |
.. code-block:: c++ | |
enum class TpiStreamVersion : uint32_t { | |
V40 = 19950410, | |
V41 = 19951122, | |
V50 = 19961031, | |
V70 = 19990903, | |
V80 = 20040203, | |
}; | |
Similar to the :doc:`PDB Stream <PdbStream>`, this value always appears to be | |
``V80``, and no other values have been observed. It is assumed that should | |
another value be observed, the layout described by this document may not be | |
accurate. | |
- **HeaderSize** - ``sizeof(TpiStreamHeader)`` | |
- **TypeIndexBegin** - The numeric value of the type index representing the | |
first type record in the TPI stream. This is usually the value 0x1000 as type | |
indices lower than this are reserved (see :ref:`Type Indices <type_indices>` for | |
a discussion of reserved type indices). | |
- **TypeIndexEnd** - One greater than the numeric value of the type index | |
representing the last type record in the TPI stream. The total number of type | |
records in the TPI stream can be computed as ``TypeIndexEnd - TypeIndexBegin``. | |
- **TypeRecordBytes** - The number of bytes of type record data following the header. | |
- **HashStreamIndex** - The index of a stream which contains a list of hashes for | |
every type record. This value may be -1, indicating that hash information is not | |
present. In practice a valid stream index is always observed, so any producer | |
implementation should be prepared to emit this stream to ensure compatibility with | |
tools which may expect it to be present. | |
- **HashAuxStreamIndex** - Presumably the index of a stream which contains a separate | |
hash table, although this has not been observed in practice and it's unclear what it | |
might be used for. | |
- **HashKeySize** - The size of a hash value (usually 4 bytes). | |
- **NumHashBuckets** - The number of buckets used to generate the hash values in the | |
aforementioned hash streams. | |
- **HashValueBufferOffset / HashValueBufferLength** - The offset and size within | |
the TPI Hash Stream of the list of hash values. It should be assumed that there | |
are either 0 hash values, or a number equal to the number of type records in the | |
TPI stream (``TypeIndexEnd - TypeEndBegin``). Thus, if ``HashBufferLength`` is | |
not equal to ``(TypeIndexEnd - TypeEndBegin) * HashKeySize`` we can consider the | |
PDB malformed. | |
- **IndexOffsetBufferOffset / IndexOffsetBufferLength** - The offset and size | |
within the TPI Hash Stream of the Type Index Offsets Buffer. This is a list of | |
pairs of uint32_t's where the first value is a :ref:`Type Index <type_indices>` | |
and the second value is the offset in the type record data of the type with this | |
index. This can be used to do a binary search followed bin a linear search to | |
get amortized O(log n) lookup by type index. | |
- **HashAdjBufferOffset / HashAdjBufferLength** - The offset and size within | |
the TPI hash stream of a serialized hash table whose keys are the hash values | |
in the hash value buffer and whose values are type indices. This appears to | |
be useful in incremental linking scenarios, so that if a type is modified an | |
entry can be created mapping the old hash value to the new type index so that | |
a PDB file consumer can always have the most up to date version of the type | |
without forcing the incremental linker to garbage collect and update | |
references that point to the old version to now point to the new version. | |
The layout of this hash table is described in :doc:`HashTable`. | |
.. _tpi_records: | |
CodeView Type Record List | |
========================= | |
Following the header, there are ``TypeRecordBytes`` bytes of data that represent a | |
variable length array of :doc:`CodeView type records <CodeViewTypes>`. The number | |
of such records (e.g. the length of the array) can be determined by computing the | |
value ``Header.TypeIndexEnd - Header.TypeIndexBegin``. | |
log(n) random access is provided by way of the Type Index Offsets array (if present) | |
described previously. |