We have a constant issue where sssd is getting itself in failed state when an oom-killer event happens and kills a user's memory hog job. No idea why this happens as the oom-killer is not touching sssd, but when this happens sssd gets itself in a state where one cannot even get a login prompt to login as root. So power cycling the server becomes the only solution.
As a workaround I want to detect the broken sssd in the syslog and restart sssd when it happens. So I created a /etc/rsyslog.d/oom-sssd-restart.conf file with
:msg, contains, "was terminated by own WATCHDOG" ^/usr/etc/restart-sssd.sh
This script runs
systemctl restart sssd </dev/null >> /tmp/restart-sssd.log
When I first tested this on my Rocky 8 box using logger to simulate the message it failed to restart sssd with an error in the above log saying
/usr/etc/restart-sssd.sh: line 6: /usr/bin/systemctl: Permission denied
When I put SELINUX in permissive mode and tested it worked. Running
ausearch -m AVC -ts recent -c restart-sssd.sh | audit2allow
gave me
allow syslogd_t systemd_systemctl_exec_t:file { execute execute_no_trans open read };
So, okay, I just need to do the audit2allow thing which I did
ausearch -m AVC -ts recent -c restart-sssd.sh | audit2allow -a -M systemlogd_exec_systemctl
semodule -i systemlogd_exec_systemctl.pp
But now when SELINUX is enabled and I test the trigger it still fails and the log gives me a different error
Failed to restart sssd.service: Access denied
See system logs and 'systemctl status sssd.service' for details.
and ausearch is empty so it gives me no reason SELINUX is causing this failure. I even did semodule -DB to make sure it shows all audit errors.
If I do
semanage permissive -a syslogd_t
that also makes it work. And this is better than disabling SELINUX entirely and I guess I will go this route. But I still do not understand why SELINUX is causing the call to systemctl to fail but not logging any reason in the audit trail.
audit2allow -ato be sure you're not missing something with your ausearch filter?