| ===================================== |
| The MSF File Format |
| ===================================== |
| |
| .. contents:: |
| :local: |
| |
| .. _msf_layout: |
| |
| File Layout |
| =========== |
| |
| The MSF file format consists of the following components: |
| |
| 1. :ref:`msf_superblock` |
| 2. :ref:`msf_freeblockmap` (also know as Free Page Map, or FPM) |
| 3. Data |
| |
| Each component is stored as an indexed block, the length of which is specified |
| in ``SuperBlock::BlockSize``. The file consists of 1 or more iterations of the |
| following pattern (sometimes referred to as an "interval"): |
| |
| 1. 1 block of data |
| 2. Free Block Map 1 (corresponds to ``SuperBlock::FreeBlockMapBlock`` 1) |
| 3. Free Block Map 2 (corresponds to ``SuperBlock::FreeBlockMapBlock`` 2) |
| 4. ``SuperBlock::BlockSize - 3`` blocks of data |
| |
| In the first interval, the first data block is used to store |
| :ref:`msf_superblock`. |
| |
| The following diagram demonstrates the general layout of the file (\| denotes |
| the end of an interval, and is for visualization purposes only): |
| |
| +-------------+-----------------------+------------------+------------------+----------+----+------+------+------+-------------+----+-----+ |
| | Block Index | 0 | 1 | 2 | 3 - 4095 | \| | 4096 | 4097 | 4098 | 4099 - 8191 | \| | ... | |
| +=============+=======================+==================+==================+==========+====+======+======+======+=============+====+=====+ |
| | Meaning | :ref:`msf_superblock` | Free Block Map 1 | Free Block Map 2 | Data | \| | Data | FPM1 | FPM2 | Data | \| | ... | |
| +-------------+-----------------------+------------------+------------------+----------+----+------+------+------+-------------+----+-----+ |
| |
| The file may end after any block, including immediately after a FPM1. |
| |
| .. note:: |
| LLVM only supports 4096 byte blocks (sometimes referred to as the "BigMsf" |
| variant), so the rest of this document will assume a block size of 4096. |
| |
| .. _msf_superblock: |
| |
| The Superblock |
| ============== |
| At file offset 0 in an MSF file is the MSF *SuperBlock*, which is laid out as |
| follows: |
| |
| .. code-block:: c++ |
| |
| struct SuperBlock { |
| char FileMagic[sizeof(Magic)]; |
| ulittle32_t BlockSize; |
| ulittle32_t FreeBlockMapBlock; |
| ulittle32_t NumBlocks; |
| ulittle32_t NumDirectoryBytes; |
| ulittle32_t Unknown; |
| ulittle32_t BlockMapAddr; |
| }; |
| |
| - **FileMagic** - Must be equal to ``"Microsoft C / C++ MSF 7.00\\r\\n"`` |
| followed by the bytes ``1A 44 53 00 00 00``. |
| - **BlockSize** - The block size of the internal file system. Valid values are |
| 512, 1024, 2048, and 4096 bytes. Certain aspects of the MSF file layout vary |
| depending on the block sizes. For the purposes of LLVM, we handle only block |
| sizes of 4KiB, and all further discussion assumes a block size of 4KiB. |
| - **FreeBlockMapBlock** - The index of a block within the file, at which begins |
| a bitfield representing the set of all blocks within the file which are "free" |
| (i.e. the data within that block is not used). See :ref:`msf_freeblockmap` |
| for more information. |
| **Important**: ``FreeBlockMapBlock`` can only be ``1`` or ``2``! |
| - **NumBlocks** - The total number of blocks in the file. ``NumBlocks * |
| BlockSize`` should equal the size of the file on disk. |
| - **NumDirectoryBytes** - The size of the stream directory, in bytes. The |
| stream directory contains information about each stream's size and the set of |
| blocks that it occupies. It will be described in more detail later. |
| - **BlockMapAddr** - The index of a block within the MSF file. At this block is |
| an array of ``ulittle32_t``'s listing the blocks that the stream directory |
| resides on. For large MSF files, the stream directory (which describes the |
| block layout of each stream) may not fit entirely on a single block. As a |
| result, this extra layer of indirection is introduced, whereby this block |
| contains the list of blocks that the stream directory occupies, and the stream |
| directory itself can be stitched together accordingly. The number of |
| ``ulittle32_t``'s in this array is given by ``ceil(NumDirectoryBytes / |
| BlockSize)``. |
| |
| .. _msf_freeblockmap: |
| |
| The Free Block Map |
| ================== |
| |
| The Free Block Map (sometimes referred to as the Free Page Map, or FPM) is a |
| series of blocks which contains a bit flag for every block in the file. The |
| flag will be set to 0 if the block is in use, and 1 if the block is unused. |
| |
| Each file contains two FPMs, one of which is active at any given time. This |
| feature is designed to support incremental and atomic updates of the underlying |
| MSF file. While writing to an MSF file, if the active FPM is FPM1, you can |
| write your new modified bitfield to FPM2, and vice versa. Only when you commit |
| the file to disk do you need to swap the value in the SuperBlock to point to |
| the new ``FreeBlockMapBlock``. |
| |
| The Free Block Maps are stored as a series of single blocks throughout the file |
| at intervals of BlockSize. Because each FPM block is of size ``BlockSize`` |
| bytes, it contains 8 times as many bits as an interval has blocks. This means |
| that the first block of each FPM refers to the first 8 intervals of the file |
| (the first 32768 blocks), the second block of each FPM refers to the next 8 |
| blocks, and so on. This results in far more FPM blocks being present than are |
| required, but in order to maintain backwards compatibility the format must stay |
| this way. |
| |
| The Stream Directory |
| ==================== |
| The Stream Directory is the root of all access to the other streams in an MSF |
| file. Beginning at byte 0 of the stream directory is the following structure: |
| |
| .. code-block:: c++ |
| |
| struct StreamDirectory { |
| ulittle32_t NumStreams; |
| ulittle32_t StreamSizes[NumStreams]; |
| ulittle32_t StreamBlocks[NumStreams][]; |
| }; |
| |
| And this structure occupies exactly ``SuperBlock->NumDirectoryBytes`` bytes. |
| Note that each of the last two arrays is of variable length, and in particular |
| that the second array is jagged. |
| |
| **Example:** Suppose a hypothetical PDB file with a 4KiB block size, and 4 |
| streams of lengths {1000 bytes, 8000 bytes, 16000 bytes, 9000 bytes}. |
| |
| Stream 0: ceil(1000 / 4096) = 1 block |
| |
| Stream 1: ceil(8000 / 4096) = 2 blocks |
| |
| Stream 2: ceil(16000 / 4096) = 4 blocks |
| |
| Stream 3: ceil(9000 / 4096) = 3 blocks |
| |
| In total, 10 blocks are used. Let's see what the stream directory might look |
| like: |
| |
| .. code-block:: c++ |
| |
| struct StreamDirectory { |
| ulittle32_t NumStreams = 4; |
| ulittle32_t StreamSizes[] = {1000, 8000, 16000, 9000}; |
| ulittle32_t StreamBlocks[][] = { |
| {4}, |
| {5, 6}, |
| {11, 9, 7, 8}, |
| {10, 15, 12} |
| }; |
| }; |
| |
| In total, this occupies ``15 * 4 = 60`` bytes, so |
| ``SuperBlock->NumDirectoryBytes`` would equal ``60``, and |
| ``SuperBlock->BlockMapAddr`` would be an array of one ``ulittle32_t``, since |
| ``60 <= SuperBlock->BlockSize``. |
| |
| Note also that the streams are discontiguous, and that part of stream 3 is in the |
| middle of part of stream 2. You cannot assume anything about the layout of the |
| blocks! |
| |
| Alignment and Block Boundaries |
| ============================== |
| As may be clear by now, it is possible for a single field (whether it be a high |
| level record, a long string field, or even a single ``uint16``) to begin and |
| end in separate blocks. For example, if the block size is 4096 bytes, and a |
| ``uint16`` field begins at the last byte of the current block, then it would |
| need to end on the first byte of the next block. Since blocks are not |
| necessarily contiguously laid out in the file, this means that both the consumer |
| and the producer of an MSF file must be prepared to split data apart |
| accordingly. In the aforementioned example, the high byte of the ``uint16`` |
| would be written to the last byte of block N, and the low byte would be written |
| to the first byte of block N+1, which could be tens of thousands of bytes later |
| (or even earlier!) in the file, depending on what the stream directory says. |