| ===================================== |
| The PDB TPI and IPI Streams |
| ===================================== |
| |
| .. contents:: |
| :local: |
| |
| .. _tpi_intro: |
| |
| Introduction |
| ============ |
| |
| The PDB TPI Stream (Index 2) and IPI Stream (Index 4) contain information about |
| all types used in the program. It is organized as a :ref:`header <tpi_header>` |
| followed by a list of :doc:`CodeView Type Records <CodeViewTypes>`. Types are |
| referenced from various streams and records throughout the PDB by their |
| :ref:`type index <type_indices>`. In general, the sequence of type records |
| following the :ref:`header <tpi_header>` forms a topologically sorted DAG |
| (directed acyclic graph), which means that a type record B can only refer to |
| the type A if ``A.TypeIndex < B.TypeIndex``. While there are rare cases where |
| this property will not hold (particularly when dealing with object files |
| compiled with MASM), an implementation should try very hard to make this |
| property hold, as it means the entire type graph can be constructed in a single |
| pass. |
| |
| .. important:: |
| Type records form a topologically sorted DAG (directed acyclic graph). |
| |
| .. _tpi_ipi: |
| |
| TPI vs IPI Stream |
| ================= |
| |
| Recent versions of the PDB format (aka all versions covered by this document) |
| have 2 streams with identical layout, henceforth referred to as the TPI stream |
| and IPI stream. Subsequent contents of this document describing the on-disk |
| format apply equally whether it is for the TPI Stream or the IPI Stream. The |
| only difference between the two is in *which* CodeView records are allowed to |
| appear in each one, summarized by the following table: |
| |
| +----------------------+---------------------+ |
| | TPI Stream | IPI Stream | |
| +======================+=====================+ |
| | LF_POINTER | LF_FUNC_ID | |
| +----------------------+---------------------+ |
| | LF_MODIFIER | LF_MFUNC_ID | |
| +----------------------+---------------------+ |
| | LF_PROCEDURE | LF_BUILDINFO | |
| +----------------------+---------------------+ |
| | LF_MFUNCTION | LF_SUBSTR_LIST | |
| +----------------------+---------------------+ |
| | LF_LABEL | LF_STRING_ID | |
| +----------------------+---------------------+ |
| | LF_ARGLIST | LF_UDT_SRC_LINE | |
| +----------------------+---------------------+ |
| | LF_FIELDLIST | LF_UDT_MOD_SRC_LINE | |
| +----------------------+---------------------+ |
| | LF_ARRAY | | |
| +----------------------+---------------------+ |
| | LF_CLASS | | |
| +----------------------+---------------------+ |
| | LF_STRUCTURE | | |
| +----------------------+---------------------+ |
| | LF_INTERFACE | | |
| +----------------------+---------------------+ |
| | LF_UNION | | |
| +----------------------+---------------------+ |
| | LF_ENUM | | |
| +----------------------+---------------------+ |
| | LF_TYPESERVER2 | | |
| +----------------------+---------------------+ |
| | LF_VFTABLE | | |
| +----------------------+---------------------+ |
| | LF_VTSHAPE | | |
| +----------------------+---------------------+ |
| | LF_BITFIELD | | |
| +----------------------+---------------------+ |
| | LF_METHODLIST | | |
| +----------------------+---------------------+ |
| | LF_PRECOMP | | |
| +----------------------+---------------------+ |
| | LF_ENDPRECOMP | | |
| +----------------------+---------------------+ |
| |
| The usage of these records is described in more detail in |
| :doc:`CodeView Type Records <CodeViewTypes>`. |
| |
| .. _type_indices: |
| |
| Type Indices |
| ============ |
| |
| A type index is a 32-bit integer that uniquely identifies a type inside of an |
| object file's ``.debug$T`` section or a PDB file's TPI or IPI stream. The |
| value of the type index for the first type record from the TPI stream is given |
| by the ``TypeIndexBegin`` member of the :ref:`TPI Stream Header <tpi_header>` |
| although in practice this value is always equal to 0x1000 (4096). |
| |
| Any type index with a high bit set is considered to come from the IPI stream, |
| although this appears to be more of a hack, and LLVM does not generate type |
| indices of this nature. They can, however, be observed in Microsoft PDBs |
| occasionally, so one should be prepared to handle them. Note that having the |
| high bit set is not a necessary condition to determine whether a type index |
| comes from the IPI stream, it is only sufficient. |
| |
| Once the high bit is cleared, any type index >= ``TypeIndexBegin`` is presumed |
| to come from the appropriate stream, and any type index less than this is a |
| bitmask which can be decomposed as follows: |
| |
| .. code-block:: none |
| |
| .---------------------------.------.----------. |
| | Unused | Mode | Kind | |
| '---------------------------'------'----------' |
| |+32 |+12 |+8 |+0 |
| |
| |
| - **Kind** - A value from the following enum: |
| |
| .. code-block:: c++ |
| |
| enum class SimpleTypeKind : uint32_t { |
| None = 0x0000, // uncharacterized type (no type) |
| Void = 0x0003, // void |
| NotTranslated = 0x0007, // type not translated by cvpack |
| HResult = 0x0008, // OLE/COM HRESULT |
| |
| SignedCharacter = 0x0010, // 8 bit signed |
| UnsignedCharacter = 0x0020, // 8 bit unsigned |
| NarrowCharacter = 0x0070, // really a char |
| WideCharacter = 0x0071, // wide char |
| Character16 = 0x007a, // char16_t |
| Character32 = 0x007b, // char32_t |
| Character8 = 0x007c, // char8_t |
| |
| SByte = 0x0068, // 8 bit signed int |
| Byte = 0x0069, // 8 bit unsigned int |
| Int16Short = 0x0011, // 16 bit signed |
| UInt16Short = 0x0021, // 16 bit unsigned |
| Int16 = 0x0072, // 16 bit signed int |
| UInt16 = 0x0073, // 16 bit unsigned int |
| Int32Long = 0x0012, // 32 bit signed |
| UInt32Long = 0x0022, // 32 bit unsigned |
| Int32 = 0x0074, // 32 bit signed int |
| UInt32 = 0x0075, // 32 bit unsigned int |
| Int64Quad = 0x0013, // 64 bit signed |
| UInt64Quad = 0x0023, // 64 bit unsigned |
| Int64 = 0x0076, // 64 bit signed int |
| UInt64 = 0x0077, // 64 bit unsigned int |
| Int128Oct = 0x0014, // 128 bit signed int |
| UInt128Oct = 0x0024, // 128 bit unsigned int |
| Int128 = 0x0078, // 128 bit signed int |
| UInt128 = 0x0079, // 128 bit unsigned int |
| |
| Float16 = 0x0046, // 16 bit real |
| Float32 = 0x0040, // 32 bit real |
| Float32PartialPrecision = 0x0045, // 32 bit PP real |
| Float48 = 0x0044, // 48 bit real |
| Float64 = 0x0041, // 64 bit real |
| Float80 = 0x0042, // 80 bit real |
| Float128 = 0x0043, // 128 bit real |
| |
| Complex16 = 0x0056, // 16 bit complex |
| Complex32 = 0x0050, // 32 bit complex |
| Complex32PartialPrecision = 0x0055, // 32 bit PP complex |
| Complex48 = 0x0054, // 48 bit complex |
| Complex64 = 0x0051, // 64 bit complex |
| Complex80 = 0x0052, // 80 bit complex |
| Complex128 = 0x0053, // 128 bit complex |
| |
| Boolean8 = 0x0030, // 8 bit boolean |
| Boolean16 = 0x0031, // 16 bit boolean |
| Boolean32 = 0x0032, // 32 bit boolean |
| Boolean64 = 0x0033, // 64 bit boolean |
| Boolean128 = 0x0034, // 128 bit boolean |
| }; |
| |
| - **Mode** - A value from the following enum: |
| |
| .. code-block:: c++ |
| |
| enum class SimpleTypeMode : uint32_t { |
| Direct = 0, // Not a pointer |
| NearPointer = 1, // Near pointer |
| FarPointer = 2, // Far pointer |
| HugePointer = 3, // Huge pointer |
| NearPointer32 = 4, // 32 bit near pointer |
| FarPointer32 = 5, // 32 bit far pointer |
| NearPointer64 = 6, // 64 bit near pointer |
| NearPointer128 = 7 // 128 bit near pointer |
| }; |
| |
| Note that for pointers, the bitness is represented in the mode. So a ``void*`` |
| would have a type index with ``Mode=NearPointer32, Kind=Void`` if built for |
| 32-bits but a type index with ``Mode=NearPointer64, Kind=Void`` if built for |
| 64-bits. |
| |
| By convention, the type index for ``std::nullptr_t`` is constructed the same |
| way as the type index for ``void*``, but using the bitless enumeration value |
| ``NearPointer``. |
| |
| .. _tpi_header: |
| |
| Stream Header |
| ============= |
| At offset 0 of the TPI Stream is a header with the following layout: |
| |
| .. code-block:: c++ |
| |
| struct TpiStreamHeader { |
| uint32_t Version; |
| uint32_t HeaderSize; |
| uint32_t TypeIndexBegin; |
| uint32_t TypeIndexEnd; |
| uint32_t TypeRecordBytes; |
| |
| uint16_t HashStreamIndex; |
| uint16_t HashAuxStreamIndex; |
| uint32_t HashKeySize; |
| uint32_t NumHashBuckets; |
| |
| int32_t HashValueBufferOffset; |
| uint32_t HashValueBufferLength; |
| |
| int32_t IndexOffsetBufferOffset; |
| uint32_t IndexOffsetBufferLength; |
| |
| int32_t HashAdjBufferOffset; |
| uint32_t HashAdjBufferLength; |
| }; |
| |
| - **Version** - A value from the following enum. |
| |
| .. code-block:: c++ |
| |
| enum class TpiStreamVersion : uint32_t { |
| V40 = 19950410, |
| V41 = 19951122, |
| V50 = 19961031, |
| V70 = 19990903, |
| V80 = 20040203, |
| }; |
| |
| Similar to the :doc:`PDB Stream <PdbStream>`, this value always appears to be |
| ``V80``, and no other values have been observed. It is assumed that should |
| another value be observed, the layout described by this document may not be |
| accurate. |
| |
| - **HeaderSize** - ``sizeof(TpiStreamHeader)`` |
| |
| - **TypeIndexBegin** - The numeric value of the type index representing the |
| first type record in the TPI stream. This is usually the value 0x1000 as |
| type indices lower than this are reserved (see :ref:`Type Indices |
| <type_indices>` for |
| a discussion of reserved type indices). |
| |
| - **TypeIndexEnd** - One greater than the numeric value of the type index |
| representing the last type record in the TPI stream. The total number of |
| type records in the TPI stream can be computed as ``TypeIndexEnd - |
| TypeIndexBegin``. |
| |
| - **TypeRecordBytes** - The number of bytes of type record data following the |
| header. |
| |
| - **HashStreamIndex** - The index of a stream which contains a list of hashes |
| for every type record. This value may be -1, indicating that hash |
| information is not present. In practice a valid stream index is always |
| observed, so any producer implementation should be prepared to emit this |
| stream to ensure compatibility with tools which may expect it to be present. |
| |
| - **HashAuxStreamIndex** - Presumably the index of a stream which contains a |
| separate hash table, although this has not been observed in practice and it's |
| unclear what it might be used for. |
| |
| - **HashKeySize** - The size of a hash value (usually 4 bytes). |
| |
| - **NumHashBuckets** - The number of buckets used to generate the hash values |
| in the aforementioned hash streams. |
| |
| - **HashValueBufferOffset / HashValueBufferLength** - The offset and size within |
| the TPI Hash Stream of the list of hash values. It should be assumed that |
| there are either 0 hash values, or a number equal to the number of type |
| records in the TPI stream (``TypeIndexEnd - TypeEndBegin``). Thus, if |
| ``HashBufferLength`` is not equal to ``(TypeIndexEnd - TypeEndBegin) * |
| HashKeySize`` we can consider the PDB malformed. |
| |
| - **IndexOffsetBufferOffset / IndexOffsetBufferLength** - The offset and size |
| within the TPI Hash Stream of the Type Index Offsets Buffer. This is a list |
| of pairs of uint32_t's where the first value is a :ref:`Type Index |
| <type_indices>` and the second value is the offset in the type record data of |
| the type with this index. This can be used to do a binary search followed by |
| a linear search to get O(log n) lookup by type index. |
| |
| - **HashAdjBufferOffset / HashAdjBufferLength** - The offset and size within |
| the TPI hash stream of a serialized hash table whose keys are the hash values |
| in the hash value buffer and whose values are type indices. This appears to |
| be useful in incremental linking scenarios, so that if a type is modified an |
| entry can be created mapping the old hash value to the new type index so that |
| a PDB file consumer can always have the most up to date version of the type |
| without forcing the incremental linker to garbage collect and update |
| references that point to the old version to now point to the new version. |
| The layout of this hash table is described in :doc:`HashTable`. |
| |
| .. _tpi_records: |
| |
| CodeView Type Record List |
| ========================= |
| Following the header, there are ``TypeRecordBytes`` bytes of data that |
| represent a variable length array of :doc:`CodeView type records |
| <CodeViewTypes>`. The number of such records (e.g. the length of the array) |
| can be determined by computing the value ``Header.TypeIndexEnd - |
| Header.TypeIndexBegin``. |
| |
| O(log(n)) access is provided by way of the Type Index Offsets array (if |
| present) described previously. |