| ======================================== |
| The PDB Info Stream (aka the PDB Stream) |
| ======================================== |
| |
| .. contents:: |
| :local: |
| |
| .. _pdb_stream_header: |
| |
| Stream Header |
| ============= |
| At offset 0 of the PDB Stream is a header with the following layout: |
| |
| |
| .. code-block:: c++ |
| |
| struct PdbStreamHeader { |
| ulittle32_t Version; |
| ulittle32_t Signature; |
| ulittle32_t Age; |
| Guid UniqueId; |
| }; |
| |
| - **Version** - A Value from the following enum: |
| |
| .. code-block:: c++ |
| |
| enum class PdbStreamVersion : uint32_t { |
| VC2 = 19941610, |
| VC4 = 19950623, |
| VC41 = 19950814, |
| VC50 = 19960307, |
| VC98 = 19970604, |
| VC70Dep = 19990604, |
| VC70 = 20000404, |
| VC80 = 20030901, |
| VC110 = 20091201, |
| VC140 = 20140508, |
| }; |
| |
| While the meaning of this field appears to be obvious, in practice we have |
| never observed a value other than ``VC70``, even with modern versions of |
| the toolchain, and it is unclear why the other values exist. It is assumed |
| that certain aspects of the PDB stream's layout, and perhaps even that of |
| the other streams, will change if the value is something other than ``VC70``. |
| |
| - **Signature** - A 32-bit time-stamp generated with a call to ``time()`` at |
| the time the PDB file is written. Note that due to the inherent uniqueness |
| problems of using a timestamp with 1-second granularity, this field does not |
| really serve its intended purpose, and as such is typically ignored in favor |
| of the ``Guid`` field, described below. |
| |
| - **Age** - The number of times the PDB file has been written. This can be used |
| along with ``Guid`` to match the PDB to its corresponding executable. |
| |
| - **Guid** - A 128-bit identifier guaranteed to be unique across space and time. |
| In general, this can be thought of as the result of calling the Win32 API |
| `UuidCreate <https://msdn.microsoft.com/en-us/library/windows/desktop/aa379205(v=vs.85).aspx>`__, |
| although LLVM cannot rely on that, as it must work on non-Windows platforms. |
| |
| .. _pdb_named_stream_map: |
| |
| Named Stream Map |
| ================ |
| |
| Following the header is a serialized hash table whose key type is a string, and |
| whose value type is an integer. The existence of a mapping ``X -> Y`` means |
| that the stream with the name ``X`` has stream index ``Y`` in the underlying MSF |
| file. Note that not all streams are named (for example, the |
| :doc:`TPI Stream <TpiStream>` has a fixed index and as such there is no need to |
| look up its index by name). In practice, there are usually only a small number |
| of named streams and these are enumerated in the table of streams in :doc:`index`. |
| A corollary of this is if a stream does have a name (and as such is in the named |
| stream map) then consulting the Named Stream Map is likely to be the only way to |
| discover the stream's MSF stream index. Several important streams (such as the |
| global string table, which is called ``/names``) can only be located this way, and |
| so it is important to both produce and consume this correctly as tools will not |
| function correctly without it. |
| |
| .. important:: |
| Some streams are located by fixed indices (e.g TPI Stream has index 2), but |
| other streams are located by fixed names (e.g. the string table is called |
| ``/names``) and can only be located by consulting the Named Stream Map. |
| |
| The on-disk layout of the Named Stream Map consists of 2 components. The first is |
| a buffer of string data prefixed by a 32-bit length. The second is a serialized |
| hash table whose key and value types are both ``uint32_t``. The key is the offset |
| of a null-terminated string in the string data buffer specifying the name of the |
| stream, and the value is the MSF stream index of the stream with said name. |
| Note that although the key is an integer, the hash function used to find the right |
| bucket hashes the string at the corresponding offset in the string data buffer. |
| |
| The on-disk layout of the serialized hash table is described at :doc:`HashTable`. |
| |
| Note that the entire Named Stream Map is not length-prefixed, so the only way to |
| get to the data following it is to de-serialize it in its entirety. |
| |
| |
| .. _pdb_stream_features: |
| |
| PDB Feature Codes |
| ================= |
| Following the Named Stream Map, and consuming all remaining bytes of the PDB |
| Stream is a list of values from the following enumeration: |
| |
| .. code-block:: c++ |
| |
| enum class PdbRaw_FeatureSig : uint32_t { |
| VC110 = 20091201, |
| VC140 = 20140508, |
| NoTypeMerge = 0x4D544F4E, |
| MinimalDebugInfo = 0x494E494D, |
| }; |
| |
| The meaning of these values is summarized by the following table: |
| |
| +------------------+-------------------------------------------------+ |
| | Flag | Meaning | |
| +==================+=================================================+ |
| | VC110 | - No other features flags are present | |
| | | - PDB contains an :doc:`IPI Stream <TpiStream>` | |
| +------------------+-------------------------------------------------+ |
| | VC140 | - Other feature flags may be present | |
| | | - PDB contains an :doc:`IPI Stream <TpiStream>` | |
| +------------------+-------------------------------------------------+ |
| | NoTypeMerge | - Presumably duplicate types can appear in the | |
| | | TPI Stream, although it's unclear why this | |
| | | might happen. | |
| +------------------+-------------------------------------------------+ |
| | MinimalDebugInfo | - Program was linked with /DEBUG:FASTLINK | |
| | | - There is no TPI / IPI stream, all type info | |
| | | is contained in the original object files. | |
| +------------------+-------------------------------------------------+ |
| |
| Matching a PDB to its executable |
| ================================ |
| The linker is responsible for writing both the PDB and the final executable, and |
| as a result is the only entity capable of writing the information necessary to |
| match the PDB to the executable. |
| |
| In order to accomplish this, the linker generates a guid for the PDB (or |
| re-uses the existing guid if it is linking incrementally) and increments the Age |
| field. |
| |
| The executable is a PE/COFF file, and part of a PE/COFF file is the presence of |
| number of "directories". For our purposes here, we are interested in the "debug |
| directory". The exact format of a debug directory is described by the |
| `IMAGE_DEBUG_DIRECTORY structure <https://msdn.microsoft.com/en-us/library/windows/desktop/ms680307(v=vs.85).aspx>`__. |
| For this particular case, the linker emits a debug directory of type |
| ``IMAGE_DEBUG_TYPE_CODEVIEW``. The format of this record is defined in |
| ``llvm/DebugInfo/CodeView/CVDebugRecord.h``, but it suffices to say here only |
| that it includes the same ``Guid`` and ``Age`` fields. At runtime, a |
| debugger or tool can scan the COFF executable image for the presence of |
| a debug directory of the correct type and verify that the Guid and Age match. |