How to safely change sbd timeout settings in a running pacemaker cluster
This document (7023689) is provided subject to the disclaimer at the end of this document.
Environment
SUSE Linux Enterprise High Availability Extension 12
Situation
sles12cluster2:~ # sbd -d /dev/disk/by-id/scsi-36000c29c0348eb3640b99be0f96e80fe dump
==Dumping header on disk /dev/disk/by-id/scsi-36000c29c0348eb3640b99be0f96e80fe
Header version : 2.1
UUID : 62caa488-cbee-4449-84c3-5fd0659dcc09
Number of slots : 255
Sector size : 512
Timeout (watchdog) : 90
Timeout (allocate) : 2
Timeout (loop) : 1
Timeout (msgwait) : 180
==Header on disk /dev/disk/by-id/scsi-36000c29c0348eb3640b99be0f96e80fe is dumped
Resolution
- The following commands need to be executed as root user or user with equivalent permissions.
- Make sure none of the cluster resources are in stopped state before putting the cluster in maintenance mode.
- Make sure cluster will be stopped and restarted as described below, otherwise new settings will not be activated on the cluster nodes.
- Verify the sbd service was successfully stopped, check the output of: systemctl status sbd.
- In case existing sbd devices are exchanges with new ones, keep in mind to update /etc/sysconfig/sbd accordingly.
1. Run the following command to display the current settings of the sbd device:
# sbd -d <device> dump
2. Put the cluster into maintenance mode:
3. Verify if all cluster resources in "unmanged" state:
4. Stop the cluster services on all nodes:
5. Recreate the metadata on the sbd device(s):
Full example (using three sbd disks):
# sbd -d /dev/disk/by-id/scsi-36000c29c0348eb3640b99be0f96e80fe -d /dev/disk/by-id/scsi-36000c29d7b18a8c4a6e980da7fd74fab -d /dev/disk/by-id/scsi-36000c2912306cd2a42adc9c0c95f450c -4 20 -1 10 create
6. Start the cluster services on all nodes:
# sbd -d <device> list
8. Put the cluster back to normal mode:
Additional Information
sles12cluster2:~ # sbd -d /dev/disk/by-id/scsi-36000c29c0348eb3640b99be0f96e80fe dump ==Dumping header on disk /dev/disk/by-id/scsi-36000c29c0348eb3640b99be0f96e80fe Header version : 2.1 UUID : f2faed5e-c0a5-46a8-8fb8-45d7bab44182 Number of slots : 255 Sector size : 512 Timeout (watchdog) : 10 Timeout (allocate) : 2 Timeout (loop) : 1 Timeout (msgwait) : 20 ==Header on disk /dev/disk/by-id/scsi-36000c29c0348eb3640b99be0f96e80fe is dumped sles12cluster2:~ # sbd -d /dev/disk/by-id/scsi-36000c29c0348eb3640b99be0f96e80fe list 0 sles12cluster2 clear 1 sles12cluster1 clear
Output systool -vc watchdog after the change (12 SP4 and later):
systool -vc watchdog Class = "watchdog" Class Device = "watchdog0" Class Device path = "/sys/devices/virtual/watchdog/watchdog0" bootstatus = "0" dev = "249:0" identity = "Software Watchdog" nowayout = "0" pretimeout = "0" pretimeout_available_governors= "noop" pretimeout_governor = "noop" state = "active" status = "0x8000" timeout = "10" uevent = "MAJOR=249 MINOR=0 DEVNAME=watchdog0"
For more information please refer to:
Disclaimer
This Support Knowledgebase provides a valuable tool for SUSE customers and parties interested in our products and solutions to acquire information, ideas and learn from one another. Materials are provided for informational, personal or non-commercial use within your organization and are presented "AS IS" WITHOUT WARRANTY OF ANY KIND.
- Document ID:7023689
- Creation Date: 30-Jan-2019
- Modified Date:12-Jan-2023
-
- SUSE Linux Enterprise High Availability Extension
For questions or concerns with the SUSE Knowledgebase please contact: tidfeedback[at]suse.com