SUSE Linux Enterprise High Availability Extension 12
SUSE Linux Enterprise High Availability Extension 15
After Pacemaker cluster node is fenced, pacemaker.service unit fails with exit status 100. Pacemaker starts up with no errors when issuing a manual restart.
Edit the /etc/sysconfig/sbd file.
Change SBD_DELAY_START parameter to "yes"
Tweak sbd device msgwait timeout just shorter than the time it takes for SBD fencing action to complete and sbd.service to start up again after reboot. Modify watchdog parameter to 50% of new msgwait timeout. This is a process of optimization and must be tuned on a system-by-system basis. For personalized SBD optimization assistance from a SUSE architect, professional consultancy is available through SUSE Professional Services.
If a node attempts to rejoin the cluster after it is fenced and before the msgwait timeout completes, pacemaker.service will fail to start with an exit status of 100. Enabling the SBD_DELAY_START setting puts a "msgwait" delay on the startup of sbd.service. While this will increase the time for the node to rejoin, it will ensure the node can rejoin without experiencing the msgwait conflict. This is more commonly seen in environments optimized for quick reboots, such as virtual and Public Cloud environments.
Per SBD man page:
Set msgwait timeout to N seconds. This should be twice the watchdog timeout.
This is the time after which a message written to the node's slot will be
considered delivered. (Or long enough for the node to detect that it needed
Settings for long timeout in SBD_DELAY_START
This Support Knowledgebase provides a valuable tool for NetIQ/Novell/SUSE customers and parties interested in our products and solutions to acquire information, ideas and learn from one another. Materials are provided for informational, personal or non-commercial use within your organization and are presented "AS IS" WITHOUT WARRANTY OF ANY KIND.