I'm looking at parsing raw EXT4-formatted block devices, just using Python, primarily as a learning exercise, but am having trouble manually generating the expected Group Descriptor checksums - there appears to be some conflicting, missing or (seemingly) incorrect information when I search resources online. I am able to correctly calculate the expected block bitmap and inode bitmap checksums, using the following calculation:
0xffffffff - (crc32c(s_uuid + bitmap_block))
as opposed to:
crc32c(s_uuid + bitmap_block)
As most of the documentation suggests (though I don't understand why inverting it yields the expected checksum value). However, I am unable to calculate the expected group descriptor checksum. The documentation suggests this should be:
crc32c(s_uuid + bg_num + group_desc) & 0xffff
I have tried calculating it as the documentation suggests, inverting as before, with and without the block group number, using the full block as the group descriptor, using the 64 byte descriptor, using only 32 bytes as the descriptor. And I have tried all of these with zero-ing out the 16-bit checksum field, and skipping over that field in calculations. Nothing I try yields the expected checksum value.
For reference, both METADATA_CSUM and FLEX_BG feature flags are set, and maybe the group_desc part of the calculation that I am using is incorrect as a result of this.
Can anyone provide more information on how to correctly calculate the group descriptor checksum within EXT4 group descriptors? Can anyone also advise why the bitmap checksums only yield the expected (correct) values when subtracted from 0xffffffff despite no documentation I've found suggesting that this is necessary?
-
Can you please clarify what do you mean by "raw EXT4-formatted block devices"? If it's raw for me mean it is not formatted.Romeo Ninov– Romeo Ninov2024-07-18 11:14:44 +00:00Commented Jul 18, 2024 at 11:14
-
In which case I would probably refer to it as being unformatted as opposed to EXT4-formatted. By raw, I mean the block device is being read as-is, be that parsing out the partition from /dev/sda, parsing from /dev/sda5 for example, or /dev/mapper/root, for example. Perhaps “raw” wasn’t the most appropriate term to use, so apologies if that has caused confusion.genericuser99– genericuser992024-07-18 11:26:16 +00:00Commented Jul 18, 2024 at 11:26
1 Answer
Firstly, regarding the "0xffffffff" inversion, as per Mark Addler's answer here, typically, CRC32 implementations will do pre- and post- XOR-ing with 0xffffffff. However, for whatever reason, the Linux kernel implementation doesn't do the post-XOR step (if anyone can shed light on why this is the case, that would be appreciated!). I'm guessing that the actual checksums in the block group descriptors are stored without the post-XOR step, and the python implementation I was using to calculate the crc32c checksum was doing that (hence, it was necessary to do that XOR step again to get back to the expected value - or vice-versa - either way, XOR-ing worked).
Secondly, as per the documentation I'd found, to calculate the checksum, it was indeed:
crc32c(s_uuid + bg_num + group_desc) & 0xffff
The vital bit of information which was missing, however, was that the block group number used is of type _le32 - this should've been fairly obvious, but I was stupidly only using a single byte for this value, which was obviously going to produce an incorrect checksum. The second bit which wasn't abundantly clear in the documentation is that, when calculating against the group descriptor structure, the 2-byte checksum field must be zero-ed out (which is common practice when calculating checksums for structures containing the checksum itself). However, I did find some information elsewhere suggesting that, when calculating the checksum, you 'just skip over' the checksum field itself (whatever that means!). Finally, again, if it isn't clear, the size of the group descriptor structure to calculate the checksum against is specified in the EXT4 superblock (and will likely be 64 bytes; possibly 32).
Flexi block groups don't impact upon the checksums - i.e. there aren't any "master" checksums calculated against the whole flexi group, etc.
Hopefully, this information will be of value to others if they have similar queries!