3

My RAID1 array was working fine at /dev/md128 but seems to have completely disappeared after a reboot, with mdadm reporting both disks are missing superblocks. My question is how to fix this?

Background: System running CentOS 7. There's 2xSSD (sda, sdb) and 2xHDD (sdc, sdd). There should be a RAID1 array /dev/md128 consisting of sdc and sdd but nothing shows up. It was working perfectly until it was rebooted for a kernel update.

Array not listed in /etc/mdadm.conf:

# cat /etc/mdadm.conf
# mdadm.conf written out by anaconda
MAILADDR root
AUTO +imsm +1.x -all
ARRAY /dev/md/boot level=raid1 num-devices=2 UUID=a2f6b6fe:31c80062:67e7a858:a21502a9
ARRAY /dev/md/boot_efi level=raid1 num-devices=2 UUID=ffbc39c9:ff982933:b77aece5:b44bec5f
ARRAY /dev/md/root level=raid1 num-devices=2 UUID=b31f6af6:78305117:7ca807e7:7691d745
ARRAY /dev/md/swap level=raid0 num-devices=2 UUID=f31db9e8:e136e642:1ae8f2d0:7178c956

Trying to assemble the array manually:

# mdadm --verbose --assemble /dev/md128 /dev/sdc1 /dev/sdd1
mdadm: looking for devices for /dev/md128
mdadm: no recogniseable superblock on /dev/sdc1
mdadm: /dev/sdc1 has no superblock - assembly aborted
# mdadm -E /dev/sdc1
mdadm: No md superblock detected on /dev/sdc1.
# mdadm -E /dev/sdd1
mdadm: No md superblock detected on /dev/sdd1.

Other things checked: smartctl shows no errors (both drives around 3 months old and lightly used), mdadm -E /dev/sdc doesn't show any superblocks at the device level. Reverted to older kernel with no change. I'm happy to add other output, just trying not to make the question unnecessarily long.

Any ideas appreciated! In the meantime, I'm planning to dd both drives to spares on hand.

1 Answer 1

4

I got this fixed, and for anyone else's benefit here's what worked for me. I did it by mounting one of the RAID1 disks outside the array. Be careful, take a copy of the disk before starting. In my case the RAID1 contained an LVM physical volume.

  1. Create a new array using one of the disks,
mdadm --create /dev/md128 --raid-devices=2 --level=1 /dev/sdc1 missing
  1. Re-create the LVM structure. You can do this manually or restore from the automatic backups:
pvcreate vg00 /dev/md128
vgcfgrestore --list vg00
vgcfgrestore --force -f /etc/lvm/backup/vg00 vg00
  1. Rename the volume group (temporarily):
vgrename yyyyyy-9OHC-OlB2-izuQ-dyPi-jw2S-zzzzzz vg00new
  1. Find the start of the filesystem on the other disk. I didn't have anything in /etc/mdadm.conf so I couldn't easily get this information. So I just looked for the filesystem signature:
grep -a -b -o LABELONE /dev/sdd1

The signature is described in the LVM spec, and that document tells us it is stored in the second sector. My sectors are 512 bytes, so I subtract 512 from the number returned above, and create a read-only loop device:

losetup -f /dev/sdd1 --read-only -o 134217728
  1. Scan for LVM data on the loop device:
vgscan
lvscan

Commands like lsblk and lvdisplay should now show volumes in both vg00 and vg00new. Check that the devices exist in /dev/vg00 and activate the volumes if not, lvchange -a y vg00/<volname>.

  1. Copy data to RAID1. Can be done by mounting and cp,
mkdir /data/old
mount -t <fstype> /dev/vg00/<volname> /data/old
cp -pr /data/old/* /data/current/

or, depending on your data, you may want to use dd for each logical volume,

dd if=/dev/vg00/vol1 of=/dev/vg00new/vol1 bs=1M conv=sparse

Note that conv=sparse is important for thin provisioned LVs since it avoids fully allocating the space.

  1. Now the loop device can be disconnected,
lvchange -a n vg00/<volname>
losetup -d loop0
  1. This should leave you with only vg00new on /dev/md128, check with lsblk. Rename the VG back to what it was:
vgrename yyyyyy-9OHC-OlB2-izuQ-dyPi-jw2S-zzzzzz vg00

Finally, after you have made 100% sure everything is copied and working correctly and done any fsck that you need to do, add /dev/sdd1 back into the RAID1 array.

mdadm --manage /dev/md128 --add /dev/sdd1

Credit to @frostschutz for the essence of the solution, found at https://unix.stackexchange.com/a/98803/384096

I still don't know how the problem happened, which is a bit concerning, but at least this got it running again.

2
  • 1
    Thank you for taking the time to answer your own question. I'm pleased to see you got your data back Commented Apr 30, 2022 at 11:35
  • Possibly worth mentioning, if your machine has come back on with only one of the two drives partially mounted (like I had). You might need to run mdadm --stop /dev/md0 to allow you to run the command mentioned in step 1. The system had listed the drive as busy before I did this. Commented Jul 12, 2022 at 10:57

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.