|  | The PDB Serialized Hash Table Format | 
|  | ==================================== | 
|  |  | 
|  | .. contents:: | 
|  | :local: | 
|  |  | 
|  | .. _hash_intro: | 
|  |  | 
|  | Introduction | 
|  | ============ | 
|  |  | 
|  | One of the design goals of the PDB format is to provide accelerated access to | 
|  | debug information, and for this reason there are several occasions where hash | 
|  | tables are serialized and embedded directly to the file, rather than requiring | 
|  | a consumer to read a list of values and reconstruct the hash table on the fly. | 
|  |  | 
|  | The serialization format supports hash tables of arbitrarily large size and | 
|  | capacity, as well as value types and hash functions.  The only supported key | 
|  | value type is a uint32.  The only requirement is that the producer and consumer | 
|  | agree on the hash function.  As such, the hash function can is not discussed | 
|  | further in this document, it is assumed that for a particular instance of a PDB | 
|  | file hash table, the appropriate hash function is being used. | 
|  |  | 
|  | On-Disk Format | 
|  | ============== | 
|  |  | 
|  | .. code-block:: none | 
|  |  | 
|  | .--------------------.-- +0 | 
|  | |        Size        | | 
|  | .--------------------.-- +4 | 
|  | |      Capacity      | | 
|  | .--------------------.-- +8 | 
|  | | Present Bit Vector | | 
|  | .--------------------.-- +N | 
|  | | Deleted Bit Vector | | 
|  | .--------------------.-- +M                  ─╮ | 
|  | |        Key         |                        │ | 
|  | .--------------------.-- +M+4                 │ | 
|  | |       Value        |                        │ | 
|  | .--------------------.-- +M+4+sizeof(Value)   │ | 
|  | ...                                  ├─ |Capacity| Bucket entries | 
|  | .--------------------.                        │ | 
|  | |        Key         |                        │ | 
|  | .--------------------.                        │ | 
|  | |       Value        |                        │ | 
|  | .--------------------.                       ─╯ | 
|  |  | 
|  | - **Size** - The number of values contained in the hash table. | 
|  |  | 
|  | - **Capacity** - The number of buckets in the hash table.  Producers should | 
|  | maintain a load factor of no greater than ``2/3*Capacity+1``. | 
|  |  | 
|  | - **Present Bit Vector** - A serialized bit vector which contains information | 
|  | about which buckets have valid values.  If the bucket has a value, the | 
|  | corresponding bit will be set, and if the bucket doesn't have a value (either | 
|  | because the bucket is empty or because the value is a tombstone value) the bit | 
|  | will be unset. | 
|  |  | 
|  | - **Deleted Bit Vector** - A serialized bit vector which contains information | 
|  | about which buckets have tombstone values.  If the entry in this bucket is | 
|  | deleted, the bit will be set, otherwise it will be unset. | 
|  |  | 
|  | - **Keys and Values** - A list of ``Capacity`` hash buckets, where the first | 
|  | entry is the key (always a uint32), and the second entry is the value.  The | 
|  | state of each bucket (valid, empty, deleted) can be determined by examining | 
|  | the present and deleted bit vectors. | 
|  |  | 
|  |  | 
|  | .. _hash_bit_vectors: | 
|  |  | 
|  | Present and Deleted Bit Vectors | 
|  | =============================== | 
|  |  | 
|  | The bit vectors indicating the status of each bucket are serialized as follows: | 
|  |  | 
|  | .. code-block:: none | 
|  |  | 
|  | .--------------------.-- +0 | 
|  | |     Word Count     | | 
|  | .--------------------.-- +4 | 
|  | |        Word_0      |        ─╮ | 
|  | .--------------------.-- +8    │ | 
|  | |        Word_1      |         │ | 
|  | .--------------------.-- +12   ├─ |Word Count| values | 
|  | ...                   │ | 
|  | .--------------------.         │ | 
|  | |       Word_N       |         │ | 
|  | .--------------------.        ─╯ | 
|  |  | 
|  | The words, when viewed as a contiguous block of bytes, represent a bit vector | 
|  | with the following layout: | 
|  |  | 
|  | .. code-block:: none | 
|  |  | 
|  | .------------.         .------------.------------. | 
|  | |   Word_N   |   ...   |   Word_1   |   Word_0   | | 
|  | .------------.         .------------.------------. | 
|  | |            |         |            |            | | 
|  | +N*32      +(N-1)*32    +64          +32          +0 | 
|  |  | 
|  | where the k'th bit of this bit vector represents the status of the k'th bucket | 
|  | in the hash table. |