2

How can one make a Linux-based device to reboot once its rootfs gets unavailable?

There is software watchdog available only.

The problem is that rootfs gets mounted from NFS. When I stop the NFS server, the device gets blocked. I want it to get rebooted though. How can I achieve this?

I.e: there is a problematic rootfs, is there anything on the kernel level that can reset the whole system? I don't care of open/corrupted files and resources.

Note: I don't have the kernel sources for this architecture. The device is headless, no monitor or keyboard is attached. There is a root console with agetty (defined in /etc/inittab).

8
  • The software watchdog is inside the kernel so it should be possible to trigger it just as normal. Loop: read contents of rootfs. If no error then reset watchdog timer. This will catch unblocked read errors and blocked reads Commented Oct 8, 2020 at 7:19
  • Where shall I put this loop? Into a bash script? Commented Oct 8, 2020 at 10:30
  • The problem is that once rootfs goes down, even root console (via agetty) seems blocked. Commented Oct 8, 2020 at 10:33
  • Related - linux watchdog and systemd watchdog Commented Oct 8, 2020 at 11:10
  • There is no hardware watchdog available. Even if there is /dev/watchdog, it can be sw-based implementation. I'm using it, but once rootfs goes down, it seems all process get blocked. Commented Oct 8, 2020 at 12:07

2 Answers 2

0

You didn't state if you have a physical keyboard attached, but if you do, then the "Magic SysRq Keys" might help. In your case

  • Alt+SysRq+S for emergency sync-to-disk, and
  • Alt+SysRq+B for immediate reboot

should do the job. Notice that for this to work it is necessary that these key combinations are not deactivated, see the setting in /proc/sys/kernel/sysrq which is an ORed bitmask of allowed SysRq-Actions (reproduced from here):

  2 =   0x2 - enable control of console logging level
  4 =   0x4 - enable control of keyboard (SAK, unraw)
  8 =   0x8 - enable debugging dumps of processes etc.
 16 =  0x10 - enable sync command
 32 =  0x20 - enable remount read-only
 64 =  0x40 - enable signalling of processes (term, kill, oom-kill)
128 =  0x80 - allow reboot/poweroff
256 = 0x100 - allow nicing of all RT tasks

You can also trigger this from a shell script/program by writing to /proc/sysrq-trigger:

echo "b" > /proc/sysrq-trigger

This will work no matter what the settings in /proc/sys/kernel/sysrq are, which only restrict keyboard-induced SysRq-events.

4
  • No, I have no keyboard nor monitor attached. Only a root console via agetty. Commented Oct 8, 2020 at 10:31
  • But the problem is once rootfs goes down, even this root console seems blocked. Commented Oct 8, 2020 at 10:32
  • so, in theory this could be good, but I would need something which can survive a broken rootfs. Otherwise, /proc can be still alive, but I have to start a program which can run in the memory and can write to this file once the rootfs gets unavailable. Do you have idea how can I achieve this? Commented Oct 8, 2020 at 12:10
  • @Daniel Ah, that's tricky; I cannot come up with a quick solution to this unfortunately ... Commented Oct 8, 2020 at 12:19
0

Sounds like you would need the mount option onerror=panic for your NFS root filesystem, but I’m not sure if it will work with NFS. You might also need to mount the NFS root filesystem with the NFS-specific mount option soft so it will time out and return an error instead of retrying forever.

Note: the soft mount option may cause file corruption and/or data loss, but in the comments you specifically said you don't care about that.

Worth a try, maybe?

2
  • how shall I config onerror=panic? If I write it to fstab, it results invalid argument. Commented Oct 9, 2020 at 11:57
  • onerror is ufs-specific. Can't applied to nfs mounts. Commented Oct 9, 2020 at 12:04

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.