1

I have an old 16GB SD card that started giving I/O errors, known it's gone bad, I dumped all the content to an image file to see what I could restore, and the Disk manager alerted me that 16.1MB were unreadable and replaced with zero.

To do some post-mortem autopsy, I ran badblocks -n /dev/sdx (read only analysis). After ~18 hours, it reported 309 blocks as bad. To do some other testing, I erased just the partition table (it was two partitions, FAT32 + ext4) and ran badblocks -w /dev/sdx (read/write analysis). To my surprise, this time it took <6 hours and reported no bad blocks. I ran the read only analysis once again and also reported very quickly no bad sector.

How is that possible? I was under the impression that badblocks checks for physical blocks damage, and that in a flash device, if a block is reported as "bad", it's because the flash chip already ran out of space space to replace damaged blocks, but since removing the partition seemingly fixed the issues, it may look like they were some software corruption.

After running badblocks these 3 times, I created a new partition table and a new partition, and filling it with data from /dev/zero reported no I/O error whatsoever.

From what I can see, these are the possible explanations:

  • badblocks checks for FS bad blocks and my FS was corrupted, not the SD card flash itself
  • badblocks somewhat marked these blocks that are now automatically ignored by the hardware controller and/or the FS
  • the SD card noticed the bad memory area while badblocks was finding them and now replaced them with fresh ones. If so, why didn't it do when I got I/O errors from reading the files?

Which one is the more likely?

0

2 Answers 2

6

When a block goes bad, how the device reacts depends on the next operation involving that block. If it’s a write, then the data in the bad block doesn’t need to be preserved: the storage device can remap the bad block, unless it’s run out of spare blocks. If it’s a read, then the data in the bad block can’t be discarded; if it can’t be read at all, an error is reported. Storage devices are conservative in the latter scenario: they don’t remap blocks on read errors, on the off-chance that the block might be read successfully in different circumstances.

This explains what you’ve seen: when reading, badblocks reports errors, and the scan takes a while (because either the hardware or the software, or both, retries reads before giving up). When writing, the storage device remaps as it encounters errors, and badblocks doesn’t find anything wrong.

3

How is that possible? I was under the impression that badblocks checks for physical blocks damage, and that in a flash device, if a block is reported as "bad", it's because the flash chip already ran out of space space to replace damaged blocks, but since removing the partition seemingly fixed the issues, it may look like they were some software corruption.

badblocks can't look behind the flash translation layer – it's a tool from the transition time between when hard drives where actual spinning disks and the operating system was aware of their layout, and had to account for errors and how to deal with them itself, and more modern approaches to error reporting. The former is not the case anymore for hard drives, so even for these badblocks is probably not very useful. Quite honestly, unless you are in a data center setting and have massive redundancy schemes spanning many (not: 2 or 3) disks, then a disk whose SMART (or equivalent) reports a single uncorrectable block error should be replaced ASAP. There's no value in looking more closely: you're losing data, and that will very likely speed up, so replacing while you've not lost all data is a good ideaTM.

badblocks doesn't understand at all that there's a flash translation layer; and that writing to a logical block can still succeed if the data stored (previously) on that logical block was erroneous, because the written data ends up somewhere else, completely. So, it's normal that in an aging flash device, data at some logical address might not be possible to read, but writing to the address succeeds.

So, wrong tool, and: You really won't gain any more insight into how a SD mass storage device's stored data aged aside from "there's read errors here, and here, and here". Not something the SD protocol can transport, and not something that the flash controller that translates between the SD commands and the actual storage side is willing to tell you.

Therefore:

Does badblock "fix" blocks on an SD card?

No. But writing data to a block is a "write to a different location" operation on an SD card, stresses the SD card further (there's disturbance on neighboring cells from writing, and your neighboring cells are already degraded), and you're causing it this way.

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.