Troubleshooting RAID1 unbootable disks after CMOS reset

Question

I've been running a RAID1 setup using two Samsung 970 EVO 500GB NVMEs on an Asus Z390 Maximus XI Formula motherboard for the last two years. This is my home server running Ubuntu 22.04 and I chose RAID1 for this specific reason in case one of the drives would fail. I wanted to upgrade the machine from 32GB to 128GB RAM for more ZFS Cache, but as the system didn't POST by itself I opted for a CMOS reset using the button on the motherboards I/O. After the reset I was anticipating having to change SATA Mode from AHCI to RAID (Intel Rapid Storage Technology) in the BIOS before being able to boot from my storage again as normal. Unfortunately this was not the case.

I tried CSM (Compatibility Support Module) Settings recommended online, enable it and set Boot Device Control to UEFI and Legacy OPROM and UEFI priority for booting from storage devices. No change.

I also tried manually rebuilding the RAID Volume without initializing/formatting the disks by creating a new RAID1 volume, but no change.

So I moved on to data recovery by moving one of the disks to a separate PC I have with a minimal Ubuntu install. lsblk and sudo fdisk -l both show the disk being recognized, while the type reported by fdisk is cleary Linux RAID:

Disk model: Samsung SSD 970 EVO 500GB               
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: 23ABA6D9-5923-4891-B67E-2208F422E40C

Device         Start       End   Sectors   Size Type
/dev/nvme0n1p1  4096 976506887 976502792 465.6G Linux RAID

Steps taken in Ubuntu:

Copied the entire disk content to my local device: sudo dd if=/dev/nvme0n1 of=~/nvme_backup.img bs=4M status=progress
Installed and ran testdisk sudo testdisk /dev/nvme0n1
Chose [EFI GPT] partition table type, analyzed and found Linux Raid 4096 976506887 976502792 [ubuntu-server:0] The name of my raid volume! So I accepted and WROTE.
After reboot, I attempted to mount the volume mount: /mnt/recovery: unknown filesystem type 'linux_raid_member' and realized I need to assemble the degraded RAID using mdadm.
I followed this blogpost to examine the metadata, finding the following:

sudo mdadm --examine /dev/nvme0n1p1 
/dev/nvme0n1p1:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x1
     Array UUID : f4c5d883:1a67c158:7d34d079:c5e09397
           Name : ubuntu-server:0
  Creation Time : Mon Jul 10 16:26:43 2023
     Raid Level : raid1
   Raid Devices : 2

 Avail Dev Size : 976502784 sectors (465.63 GiB 499.97 GB)
     Array Size : 488251392 KiB (465.63 GiB 499.97 GB)
    Data Offset : 264192 sectors
   Super Offset : 8 sectors
   Unused Space : before=264112 sectors, after=-264183 sectors DEVICE TOO SMALL
          State : clean TRUNCATED DEVICE
    Device UUID : de5af620:88784a78:693bf52a:bab766a4

Internal Bitmap : 8 sectors from superblock
    Update Time : Sun Apr 27 23:57:35 2025
  Bad Block Log : 512 entries available at offset 16 sectors
       Checksum : 88404970 - correct
         Events : 9909


   Device Role : Active device 1
   Array State : AA ('A' == active, '.' == missing, 'R' == replacing)

Attempted to start the raid again using mdadm (my knowledge is very limited here so I tried both):

~$ sudo mdadm --assemble --run /dev/md0 /dev/nvme0n1p1 --force
mdadm: Device /dev/nvme0n1p1 is not large enough for data described in superblock
mdadm: no RAID superblock on /dev/nvme0n1p1
mdadm: /dev/nvme0n1p1 has no superblock - assembly aborted
~$ sudo mdadm --assemble --run /dev/md0 /dev/nvme0n1 --force
mdadm: Cannot assemble mbr metadata on /dev/nvme0n1
mdadm: /dev/nvme0n1 has no superblock - assembly aborted

My hopes was that this last step would allow me to mount the disk directly and find my now inaccessible data (OS files and configs, all important data is still on the ZFS pool luckily). I initially thought it would be as simple as moving a single disk to another system, enabling IRST and booting from the drive as the disks are "mirrored". Worth noting is that data is clearly still readable if I use:

sudo photorec /dev/nvme0n1

But I would of course prefer if there were some way for me to get access to the proper file system again.

Any help or guidance is greatly appreciated.

According to your setup, your system is probably configured in UEFI?. BIOS reset generally wipes out the EFI entries, so, you probably just needed to add a new BOOT entry from EFI menu in your BIOS and that's it. By default, a BIOS reset does not kill a RAID configuration! — darxmurf
– darxmurf, Commented Apr 30 at 8:17

Hauke Laging · Accepted Answer · 2025-04-30 03:43:17Z

You can use

a small image file
a loop device
dmsetup

to create a virtual device which contains your data but seems larger to that mdadm is happy:

MISSING_SECTORS=264183
FILL_DEV="/dev/loop0"
REAL_DEV=/dev/nvme0n1p1

dd if=/dev/zero of=append.img count=$MISSING_SECTORS
losetup /dev/loop0 append.img
REAL_DEV_SECTORS=$(blockdev --getsz $REAL_DEV)

cat <<EOF | dmsetup create larger-blockdev
0 $REAL_DEV_SECTORS linear $REAL_DEV 0
$REAL_DEV_SECTORS $MISSING_SECTORS linear $FILL_DEV 0
EOF

mdadm --examine /dev/mapper/larger-blockdev

That should allow you to access the filesystem. And shrink it (by at least 264183 sectors).

permanent fix - shrink the MDRAID device

You cannot shrink an MD device directly. But you can overwrite it with a new one (different UUIDs) with the correct size.

You can do this safely by creating a read-only DM device over the filesystem range so that mdadm can only overwrite the MD metadata area (the first 264192 sectors).

REAL_DEV=/dev/nvme0n1p1
SECTORS_TOTAL=$(blockdev --getsz "$REAL_DEV")

MD_HEADER_SECTORS=264192
DATA_SECTORS=$((SECTORS_TOTAL - MD_HEADER_SECTORS))

echo "0 $DATA_SECTORS linear $REAL_DEV $MD_HEADER_SECTORS" |
    dmsetup create fs-area-ro --readonly

# check that this worked with e.g. fsck or mount on /dev/mapper/fs-area-ro

cat <<EOF | dmsetup create md-rebuild-target
0 $MD_HEADER_SECTORS linear $REAL_DEV 0
$MD_HEADER_SECTORS $DATA_SECTORS linear /dev/mapper/fs-area-ro 0
EOF

mdadm --create /dev/md0 \
  --level=1 \
  --raid-devices=1 \
  /dev/mapper/md-rebuild-target \
  --metadata=1.2 --data-offset=$(( MD_HEADER_SECTORS/2 ))K

mdadm --stop /dev/md0

dmsetup remove fs-area-ro

mdadm --assemble /dev/md0 $REAL_DEV

Stack Exchange Network

Troubleshooting RAID1 unbootable disks after CMOS reset

Steps taken in Ubuntu:

1 Answer 1

permanent fix - shrink the MDRAID device

You must log in to answer this question.

Hot Network Questions

Troubleshooting RAID1 unbootable disks after CMOS reset

Steps taken in Ubuntu:

1 Answer 1

permanent fix - shrink the MDRAID device

You must log in to answer this question.

Related

Hot Network Questions