| The PDB Serialized Hash Table Format |
| ==================================== |
| |
| .. contents:: |
| :local: |
| |
| .. _hash_intro: |
| |
| Introduction |
| ============ |
| |
| One of the design goals of the PDB format is to provide accelerated access to |
| debug information, and for this reason there are several occasions where hash |
| tables are serialized and embedded directly to the file, rather than requiring |
| a consumer to read a list of values and reconstruct the hash table on the fly. |
| |
| The serialization format supports hash tables of arbitrarily large size and |
| capacity, as well as value types and hash functions. The only supported key |
| value type is a uint32. The only requirement is that the producer and consumer |
| agree on the hash function. As such, the hash function can is not discussed |
| further in this document, it is assumed that for a particular instance of a PDB |
| file hash table, the appropriate hash function is being used. |
| |
| On-Disk Format |
| ============== |
| |
| .. code-block:: none |
| |
| .--------------------.-- +0 |
| | Size | |
| .--------------------.-- +4 |
| | Capacity | |
| .--------------------.-- +8 |
| | Present Bit Vector | |
| .--------------------.-- +N |
| | Deleted Bit Vector | |
| .--------------------.-- +M ─╮ |
| | Key | │ |
| .--------------------.-- +M+4 │ |
| | Value | │ |
| .--------------------.-- +M+4+sizeof(Value) │ |
| ... ├─ |Capacity| Bucket entries |
| .--------------------. │ |
| | Key | │ |
| .--------------------. │ |
| | Value | │ |
| .--------------------. ─╯ |
| |
| - **Size** - The number of values contained in the hash table. |
| |
| - **Capacity** - The number of buckets in the hash table. Producers should |
| maintain a load factor of no greater than ``2/3*Capacity+1``. |
| |
| - **Present Bit Vector** - A serialized bit vector which contains information |
| about which buckets have valid values. If the bucket has a value, the |
| corresponding bit will be set, and if the bucket doesn't have a value (either |
| because the bucket is empty or because the value is a tombstone value) the bit |
| will be unset. |
| |
| - **Deleted Bit Vector** - A serialized bit vector which contains information |
| about which buckets have tombstone values. If the entry in this bucket is |
| deleted, the bit will be set, otherwise it will be unset. |
| |
| - **Keys and Values** - A list of ``Capacity`` hash buckets, where the first |
| entry is the key (always a uint32), and the second entry is the value. The |
| state of each bucket (valid, empty, deleted) can be determined by examining |
| the present and deleted bit vectors. |
| |
| |
| .. _hash_bit_vectors: |
| |
| Present and Deleted Bit Vectors |
| =============================== |
| |
| The bit vectors indicating the status of each bucket are serialized as follows: |
| |
| .. code-block:: none |
| |
| .--------------------.-- +0 |
| | Word Count | |
| .--------------------.-- +4 |
| | Word_0 | ─╮ |
| .--------------------.-- +8 │ |
| | Word_1 | │ |
| .--------------------.-- +12 ├─ |Word Count| values |
| ... │ |
| .--------------------. │ |
| | Word_N | │ |
| .--------------------. ─╯ |
| |
| The words, when viewed as a contiguous block of bytes, represent a bit vector |
| with the following layout: |
| |
| .. code-block:: none |
| |
| .------------. .------------.------------. |
| | Word_N | ... | Word_1 | Word_0 | |
| .------------. .------------.------------. |
| | | | | | |
| +N*32 +(N-1)*32 +64 +32 +0 |
| |
| where the k'th bit of this bit vector represents the status of the k'th bucket |
| in the hash table. |