Community Wiki

Edit

Bundle schema

source

As of patch 3.11.2 for Harvest preparing for the launch of Heist the distribution method and patching of Path of Exile changed from the previous monolithic Content.ggpk file to a system of multiple bundles more akin to what has previously been deployed for console game clients.

The path hashing scheme was updated in patch 3.21.2 and the differences are explicitly noted in this page.

On Steam there are tens of thousands of bundle files each containing related assets which makes it easier for Steam to do its write-a-new-file type of atomic patching as files will be smaller. In the Standalone client the bundles are contained in a Content.ggpk with a "node count" of 3 but retaining the existing PDIR and FILE structure for patching.

Bundle file format

Bundles start with a fixed header indicating among other things the total uncompressed size of the bundle and the total compressed size of payloads in the bundle. There's a list of sizes for the compressed chunks in the file, whose length sum up to the total compressed payload size.

A work-in-progress 010 Editor template for the bundle file:

uint32 uncompressed_size;
uint32 total_payload_size;
uint32 head_payload_size;
struct head_payload_t {
    enum <uint32> {Kraken_6 = 8, Mermaid_A = 9, Leviathan_C = 13 } first_file_encode;
    uint32 unk10;
    uint64 uncompressed_size2;
    uint64 total_payload_size2;
    uint32 block_count;
    uint32 uncompressed_block_granularity;
    uint32 unk28[4];
    uint32 block_sizes[block_count];
} head;
local int i <hidden=true>;
for (i = 0; i < head.block_count; ++i) {
    struct block_t {
        ubyte data[head.block_sizes[i]];
    } block;
}

At the end of the file are compressed blocks of varying sizes which when uncompressed are 256 KiB in size (matching the common value in the uncompressed_block_granularity header?), except for the last one which may be less. The last one is uncompressed_size-256*1024*(block_count-1).

Blocks start with a two-byte header like 8C 0C, 8C 06 and 8C 0A. These are Oodle headers for the Kraken, Leviathan and Mermaid compression methods. These are not exhaustive and include ones that have an uncompressed payload, but they all feed into Oodle.

Note that Oodle decompressors might write up to 64 bytes out of bounds at the end of the buffer, so when calling one make sure to over-allocate the buffer but still pass in the actual decompressed payload size.

The block boundaries do not correspond to any file boundaries, this information is contained elsewhere.

Bundle index format

At the top level of the bundle tree there are one file beginning with underscores that of which contain all the bundle and file information: _.index.bin.

The binary index file is a compressed bundle that does not contain any data from files. Instead, it holds several lists of information needed to find the containing bundle and the extents for logical files.

A binary template for the index bundle contents:

uint32 bundle_count;
struct bundles_t
{
    local int i;
    for (i = 0; i < bundle_count; ++i) {
        struct {
            uint32 name_length;
            char name[name_length];
            uint32 bundle_uncompressed_size;
        } bundle_info;
    }
} bundles;

uint32 file_count;
struct files_t
{
    local int i;
    for (i = 0; i < file_count; ++i) {
        struct {
            uint64 hash <format=hex>;
            uint32 bundle_index <comment=BundleIndexComment>;
            uint32 file_offset;
            uint32 file_size;
        } file_info;
    }
} files;

string BundleIndexComment(int bundle_index)
{
 return bundles.bundle_info[bundle_index].name;
}

uint32 path_rep_count;
struct path_rep_t
{
    uint64 hash <format=hex>;
    uint32 payload_offset;
    uint32 payload_size;
    uint32 payload_recursive_size;
} path_rep[path_rep_count];

// The file ends in a nested compressed bundle containing
// compact representation of all possible paths.
local int bundle_start = FTell();
ubyte path_rep_bundle[FileSize() - bundle_start];

The list of file_info is ordered by bundle_index. For groups with equal bundle_index the beginning of the list is ordered by distinct values of file_offset. Following those there may be several files with the same (bundle_index, file_offset, file_size), denoting distinct files that have the same payload.

The hash field is generated by one of two different algorithms and schemes depending on the game version:

Up until 3.21.2 the hash is the FNV1a hash of the full file path in lower case. This hash is also salted with ++ suffixed at the end of the file name, thus taking the format <lower_file_name>++.

For example the file Art/UIDivinationImages.txt will become art/uidivinationimages.txt++ and resulting hash 0x574cc9062dcda786. This can be used for looking up a file to it's corresponding section of a bundle.

Since 3.21.2 the scheme is instead MurmurHash64A (C, Python) with a seed of 0x1337b33f with full file path in lower case but without the ++ suffix of the previous scheme.

The bundle at the end of the binary index file contains information on how to generate a set of paths from base paths and append operations. The entries in path_rep indicate the specification extents for this payload. Each element slices out a part of the payload at offset payload_offset of size payload_size.

The hash field of path_rep is generated the same way as the one for file paths and has no trailing /. For the legacy 3.11.2 scheme the path name is kept in upper & lowercase and has a ++ appended to the end.

In 3.21.2 onward the directory path is unconditionally lowercased and has no ++ suffix. For example Art/2DArt/SkillIcons/passives/Assassin/4K/ will become Art/2DArt/SkillIcons/passives/Assassin/4K++ and have a hash of 0xe8deca74810f821f.

When using payload_size to generate paths for all slices, the number of paths is equal to the number of files.

The payload_recursive_size field in path_rep_t is similar to payload_size but also generates file paths from all subdirectories of the directory entry. Some directory entries generate no paths of their own when using payload_size as they have no files of their own, just subdirectories.

The name of a directory is not explicitly expressed but can for non-empty directories be obtained by finding the last slash in any generated path in a directory. All generated file paths in a directory entry share the same parent directory, path generation may not introduce additional subdirectories.

Note that since 3.21.2 all paths generated are forced lower-case, losing previously known path information and requiring applications to adapt to this change in path lookups.

Path specification encoding

A path specification section has two kinds of elements, an unsigned 32-bit integers and null-terminated narrow strings.

The full set can be generated with an algorithm that alternates between two phases, one constructing reference base strings and one emitting resulting paths. This cycle can restart with an empty set of references in a section.

A word of zero toggles the phase and the section always starts with a zero, bringing it into a fresh base phase.

In the base phase there is an alternating non-zero words and strings. The word is an one-based index, if the word refers to an existing string the string is appended to a copy of the base string and inserted at the end of the list. If it's out of bounds the string is inserted as-is at the end of the list.

In the generation phase there is alternating non-zero words and strings. The word is again a one-based index into the reference base list. if the index is in bounds the string is appended to a copy of the base string and recorded as an output. If it is out of bounds the string is recorded as-is as an output.


Wikis Content is available under CC BY-NC-SA 3.0 unless otherwise noted.