Settings for long timeout in SBD_DELAY_START
This document (7023572) is provided subject to the disclaimer at the end of this document.
Issue number one can be that the SBD service will timeout during start, as the SBD_DELAY_START might take longer than the default for system services in systemd.
Issue number two can be that the on return the returning node starts corosync and by this blocks the cluster. The symptom looks like everything from a cluster perspective worked, for example fencing. But then the "surviving node waited until the fenced node returned"
The logs show entries similar to
Dec 03 15:29:25  animal pengine: notice: LogActions: Start fs_mysap (animal - blocked)
cp /usr/lib/systemd/system/sbd.service /etc/systemd/system/sbd.service
and add in section
and add in section
so the files looks like
Description=Shared-storage based fencing daemon
ExecStart=/usr/sbin/sbd $SBD_OPTS -p /var/run/sbd.pid watch
ExecStop=/usr/bin/kill -TERM $MAINPID
# Could this benefit from exit codes for restart?
# Does this need to be set to msgwait * 1.2?
# If SBD crashes, it'll very likely suicide immediately due to the
# hardware watchdog. But one can always try.
and then issue
Issue number two is caused by starting the corosync service on the returning node before waiting for the SBD timeout
This Support Knowledgebase provides a valuable tool for SUSE customers and parties interested in our products and solutions to acquire information, ideas and learn from one another. Materials are provided for informational, personal or non-commercial use within your organization and are presented "AS IS" WITHOUT WARRANTY OF ANY KIND.
- Document ID:7023572
- Creation Date: 10-Dec-2018
- Modified Date:03-Mar-2020
- SUSE Linux Enterprise High Availability Extension
For questions or concerns with the SUSE Knowledgebase please contact: tidfeedback[at]suse.com