This is a backup of Ghostwheel's notes on the BSA file format. It was uploaded as Ghostwheel's site was down.
The only alteration done was to ajust the table width so it fitted on my site and ajust some of the colours.
The Elder Scrolls 3 (Morrowind) BSA File Format
Overview
BSA file is used by Morrowind to store various data files like icons, meshes and textures (normally those files can be found in the Data Files\Icons, Data Files\Meshes and Data Files\Textures directories in the Morrowind’s installation. While it is possible to store other files in the BSA archive, it seems that Morrowind’s engine will look only for those 3 types of files inside BSA archives.
In order to be used, BSA file should be referenced in the [Archives] section of the morrowind.ini file. Also in [General] section of the same file, key TryArchiveFirst should be set either to 0 (use newer data from archives or from Data Files directory) or to 1 (use archives only).
File format
BSA file can be split into 6 sections:
|
Name |
Descript |
|
1. Header |
Contains version number, total size of sections 2, 3 and 4 and the number of files |
|
2. File size / offset table |
Contains size of each file and relative offset of the file data in the section 6 |
|
3. Archive directory offset table |
Contains relative offset of each file name in the section 4 |
|
4. Archive directory |
Contains file names |
|
5. Hash table |
Contains 2 32-bit hash values for each file name |
|
6. File data |
Contains actual file data |
Sections themselves described below. All 4 byte fields should be treated as normal 32-bit unsigned integers (lower byte first).
Header section
Header section size is always 12 bytes. It contains following information:
|
Field name |
Size (bytes) |
Description |
|
Version |
4 |
Always contains following hex sequence: 00 01 00 00 |
|
Hash table offset (archive directory size) |
4 |
Contains total size of the sections 2-4. If you will take this value and add header size (12) you will get offset of the hash table section from the beginning of the file. |
|
Number of files |
4 |
Number of files in the archive |
File size/offset section
This section contains size of each file and its relative offset in the file data section. Each record in this section will correspond with appropriate records in sections 3-5 (in other words, first record in this section will correspond to the first file listed in the archive directory and to the first record in the hash table section and so on). Section size is 8 bytes * number of files (from header). Each record contains following information:
|
Field name |
Size (bytes) |
Description |
|
File size |
4 |
File size |
|
File offset |
4 |
Relative offset of the file data in the section 6 (file data section) |
Archive directory offset section
This section contains relative offsets for each file name in the archive directory section. Section size is 4 bytes * number of files from header. Each record contains just single field:
|
Field name |
Size (bytes) |
Description |
|
File name offset |
4 |
Relative offset of the file name in the archive directory section |
Archive directory section
This section contains list of file names, one after another. Each file name includes relative path of the file (as if it was in the Data Files\ directory). Each name is represented using ASCII characters (not UNICODE) in C format (meaning, each string ends with ‘\0’ byte). Since each file name has different size, all records in this section have different sizes (number of characters in the file name plus 1 for ‘\0’ ending character). In order to find appropriate file name previous section should be used, since it contains offsets where each file name is stored. Please, also note, that all file names are stored in lowercase.
Total size of this section can be calculated as (archive directory size field from header section) – (size of file size/offset section) – (size of archive directory offset section). Or, in other words, (archive directory size field from header section) – 12 * (number of files).
Hash table section
This section contains two 32-bit hash values for each file name (not the file data!) in the archive. Section size is 8 bytes * number of files. Each record contains two fields:
|
Field name |
Size (bytes) |
Description |
|
Hash value 1 |
4 |
1st hash value (calculated based on first half of the file name string) |
|
Hash value 2 |
4 |
2nd hash value (calculated based on the second half of the file name string) |
Algorithms for calculating hash values can be found below.
File data section
This section contains actual data from the files. Each file is stored in the archive as is – no compression is used. File data is stored one after another – in order to retrieve data for the appropriate file, file size/offset record should be used – it will show an offset in this section, where file data starts and size of the file.
Hash value algorithms
As you noticed, for each file name, two hash values must be calculated. File name is split in half (first half will be equal or 1 byte smaller than second) and hash values calculated separately for each half. Below you can find code using C notation for calculating hash value.
Used variables:
char
*name; // file name string (in C
format)
unsigned hash1; //
first hash value
unsigned
hash2; // second hash value
Following code will calculate both hash values (I tried to make it as readable as possible for non-C folks – += , ++ operators are not used on purpose):
unsigned full_len = strlen(name); //
this will calculate string length
unsigned half_len = (full_len >> 1);
unsigned sum, off, temp, i, n;
num =
off = 0;
for(i = 0; i < half_len; i = i + 1) {
temp
= ( ((unsigned)(name[i])) << (off
& 0x1F) );
sum = sum ^ temp;
off = off + 8;
}
hash1
= sum;
sum = off = 0;
for(;
i < full_len; i = i + 1) {
temp
= ( ((unsigned)(name[i])) << (off
& 0x1F) );
sum = sum ^ temp;
n = temp & 0x1F;
sum =
(sum << (32-n)) | (sum >> n); //
"rotate right" operation
off
= off + 8;
}
hash2
= sum;
For those who not familiar with C, << means bitwise left shift, >> - right shift, & - bitwise AND, | - bitwise OR, ^ - bitwise XOR. All unsigned variables considered to be 32-bit.
Record ordering
All records in the BSA file sections should be placed in specific order. As it was mentioned before, records in sections 2-5 (file size/offset, directory offset, archive directory and hash table sections) are correlated – records related to one file should be in exactly same position in all those sections. All records in this sections must be sorted, so (hash value 1, hash value 2) pair will end up sorted in ascending order. If this will not be done, Morrowind will not recognize files in the archive.
Also, in standard Morrowind files (morrowind.bsa, tribunal.bsa) file data section is also sorted – but files there placed in alphabetical order of their names. I am not sure whether or not this is necessary, but since I wanted to reproduce exact archive after unpack/pack cycle, I think that this ordering is also important.
Other notes
BSA archive does not store file dates;
BSA archive always stores file names in lowercase;
All files inside BSA archive must have unique hash value – otherwise Morrowind will report “hash map collision” error.
Files from Data Files\Sound, Data Files\Music, Data Files\Fonts and Data Files\Splash were not recognized by Morrowind’s engine when placed inside BSA archive.
One BSA archiver implementation with uncommented C++ source can be found at http://www34.brinkster.com/ghostwheel/bsapack.zip.
[It can also be found on Planet Elder Scrolls]