0

What I presumed is

  • File systems like EXT4 and XFS. Writes blocks to disk (Could be a Hard Disk or a SSD) along with a checksum.
  • When Reading back, if checksum fails, read operation will NOT return data and will indicate error or EOF (or no data).
  • On a re-boot (even if it is after a abrupt shutdown where OS in-memory pages were not flushed), Linux OS checks the integrity of file blocks using checksum and if an application tries to read a file where such error detected blocks are part of it, read operation will fail when the read reaches the afore mentioned block.
  • If the corruption detected(by Linux OS) block is in the middle of a file. Even if subsequent blocks does NOT contain an error, read operation will fail after reaching the corrupted block and application will never be able to read beyond the corrupted block.
    e.g. Let's say file size is 4KB, and block size 1KB. So this file will be spread across 4 blocks A,B,C and D. Let's say Linux detected that block C is corrupted. Now read operation will be able to provide first 2KB from the file. But but there is no way to read last 1KB even if the application place File position after 3KB.

If above assumptions are correct [Please note that some intruder messing with the file is not considered here ]

  • Application does NOT need to place it's own checksum when writing to a file and verify it when reading back (even after a abrupt OS shutdown)
  • Application can assume provided data from read is what it wrote originally and worst thing that can happen is NOT being able to read data from end of the file

Is these assumptions wrong (in OS=Linux RedHat Enterprise Linux 8.X and File system=EXT4,XFS context) ? If so kindly please let me know which of them are wrong.

3
  • I don't think ext4 protects against bitrot like that. Commented Jun 22, 2023 at 14:58
  • 1
    can be done at a lower (block) layer: docs.kernel.org/admin-guide/device-mapper/dm-integrity.html (which can be combined together with dm-crypt too, including with LUKS). Commented Jun 22, 2023 at 15:36
  • Use BTRFS as filesystem, as it natively provides checksums for all files. Commented Jun 23, 2023 at 10:01

1 Answer 1

0

File systems like EXT4 and XFS. Writes blocks to disk (Could be a Hard Disk or a SSD) along with a checksum.

No, neither file system has data checksums.

When Reading back, if checksum fails, read operation will NOT return data and will indicate error or EOF (or no data).

No. As said, no data checksumming is done, but even if it were, there would be no EOF (that would be wrong!) but an error code EIO would be set.

On a re-boot (even if it is after a abrupt shutdown where OS in-memory pages were not flushed), Linux OS checks the integrity of file blocks using checksum.

No. These are journaling file systems and hence, whole-filesystem checksum checks are usually not necessary. Also, as said, there's no data checksums in either file system (there's checksums for the file metadata, at least in XFS)

Also, no, even for the checksums that exist, and even for the file systems that have data checksums, Linux as operating system will not do a whole-volume check. That would take ages, and is something that can very well be left to user land software. It would also, considering the fact that a broken checksum probably means the storage device is beginning to fail, do more harm than good. As said, complete file system checks are very rarely done on modern file systems. Instead, checks on access are common.

and if an application tries to read a file where such error detected blocks are part of it, read operation will fail when the read reaches the afore mentioned block.

no, because no data checksums are present. But let's assume they were:

If the corruption detected(by Linux OS) block is in the middle of a file. Even if subsequent blocks does NOT contain an error, read operation will fail after reaching the corrupted block and application will never be able to read beyond the corrupted block.

No. I don't think Linux nor POSIX make any guarantees on that, and no reasonable file system driver for a file system with data checksums would read 1GB of data to check checksums when you seek to the position after that 1GB. So, your assumption is wrong.

e.g. Let's say file size is 4KB, and block size 1KB. So this file will be spread across 4 blocks A,B,C and D.

Not in detail how XFS works, and usually also not how ext4 works. You might want to look up what extent-based allocation is.

Let's say Linux detected that block C is corrupted.

Again, even if either file system had data checksums, that would only happen the moment C is read. Not earlier.

Now read operation will be able to provide first 2KB from the file.

Why shouldn't you be able to fseek beyond that?

But but there is no way to read last 1KB even if the application place File position after 3KB.

As explained above, that's wrong.

However,

Application does NOT need to place it's own checksum when writing to a file and verify it when reading back (even after a abrupt OS shutdown)

Yes, placing data integrity in the hands of the application instead of the hands of the block device abstraction would be a bad design mistake.

The whole point of a file system is that applications do not need to be aware of the physical medium they store data on. Keeping that abstraction free from leaking the ugly details of the underlying hardware means that the applications cannot be aware of the error probabilities, the kind of errors that can occur, and their position on the physical medium.

All this, however, would be necessary to make a sensible choice on redundancy information.

In short: When using a file system to store data on a block device, it's the block device layer that guarantees data integrity, not the application layer.

Application can assume provided data from read is what it wrote originally and worst thing that can happen is NOT being able to read data from end of the file

Applications Must assume data read from files is correct, otherwise it also couldn't assume the application is correct, or anything else they do is correct. Again, that's a guarantee that the file system offers to the application. You're using a file system, so you assume that.

Is these assumptions wrong (in OS=Linux RedHat Enterprise Linux 8.X and File system=EXT4,XFS context) ? If so kindly please let me know which of them are wrong.

As seen above, basically all your assumptions were wrong :(

18
  • "Yes, placing data integrity in the hands of the application instead of the hands of the block device abstraction would be a bad design mistake." -- I'm not sure all database engineers would agree here. And this, "Applications Must assume data read from files is correct", is just wrong enough to be dangerous. Isn't the usual recommendation to validate all inputs, since you never know where a file came from, or if it was created by a broken or outright malicious system? And for a reason, since a lot of the data we process comes from someone else we've never met. Commented Jun 22, 2023 at 20:05
  • Unrelated to the bigger issues, even with extent-based allocation, if the storage is divided into numbered blocks, a 4 kB file will still need four 1 kB blocks. Whether the bookkeeping says it's 4 blocks starting at block 123, or if it says it's the 4 blocks 123, 124, 125, 126, it doesn't really change the basic idea. Commented Jun 22, 2023 at 20:08
  • @ilkkachu hi :) easier things to address first: Re: extents/blocks: yeah, that's why I chose the wording "not in detail how… works". I hoped that would make things clearer; seemingly didn't. (and the thing is that the existing checksums refer to metadata, and that describes extents) Commented Jun 23, 2023 at 6:57
  • @ilkkachu re: Database engineers: Heh, that's a good point you're making there. But I'm not sure it's really in conflict with what I say, in practice. What databases typically do (disclaimer: no idea about anything but small-scale InnoDB(Mysql), Postgres and SQLite) is having mechanisms for recovery when the system fails as specified, especially mechanisms recovering from a system breakdown with unfinished data writes. The assumption is still that what is read from storage was what was written, only that it's uncertain whether a write completed. Now, I bet you're 100% on point and especially Commented Jun 23, 2023 at 7:01
  • … large databases beyond terabytes with high throughputs do well having their own data optionally contain checksums, for data integrity instead of just checking whether an operation completed successfully. "Perfect storage" is a bad approximation once you have enough billons of bits that can flip with a nonzero probability. Still, the usual ways (and which we see in servers) is that a) RAM being a likely culprit here, we integrate forward error correction into that hardware (ECC) and b) storage being a likely culprit here, we integrate error correction into that: Commented Jun 23, 2023 at 7:05

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.