How to safely change sbd timeout settings in a running pacemaker cluster
This document (7023689) is provided subject to the disclaimer at the end of this document.
Environment
SUSE Linux Enterprise High Availability Extension 12
Situation
sles12cluster2:~ # sbd -d /dev/disk/by-id/scsi-36000c29c0348eb3640b99be0f96e80fe dump
==Dumping header on disk /dev/disk/by-id/scsi-36000c29c0348eb3640b99be0f96e80fe
Header version : 2.1
UUID : 62caa488-cbee-4449-84c3-5fd0659dcc09
Number of slots : 255
Sector size : 512
Timeout (watchdog) : 90
Timeout (allocate) : 2
Timeout (loop) : 1
Timeout (msgwait) : 180
==Header on disk /dev/disk/by-id/scsi-36000c29c0348eb3640b99be0f96e80fe is dumped
Resolution
Note: Make sure the services pacemaker and depending sbd is stopped and restarted as described. Otherwise new settings will not be active, on all nodes.
- Run the command
sbd -d <device> dump
to display the current settings of a sbd device.
- Change the cluster into maintenance mode:
crm configure property maintenance-mode=true
- Deactivate the STONITH option to make sure a second time no fencing will happen during the upcoming tasks. Run:
crm configure property stonith-enabled=false
- Stop the pacemaker service on the involved nodes (on all nodes):
systemctl stop pacemaker
- Recreate the metadata on the sbd devices:
sbd -d <device> -4 xx -1 xx create
Full example (using three sbd disks):
sbd -d /dev/disk/by-id/scsi-36000c29c0348eb3640b99be0f96e80fe -d /dev/disk/by-id/scsi-36000c29d7b18a8c4a6e980da7fd74fab -d /dev/disk/by-id/scsi-36000c2912306cd2a42adc9c0c95f450c -4 20 -1 10 create
- Optionally, set the stonith-timeout to a desired value:
crm configure property stonith-timeout=xx
- Start the pacemaker service on both nodes:
systemctl start pacemaker
- Check the sbd partition information:
sbd -d <device> dump
and make sure the cluster nodes have been assigned a slot:
sbd -d <device> list
- Re-arm the STONITH device:
crm configure property stonith-enabled=true
- Disable the maintenance mode:
crm configure property maintenance-mode=false
Additional Information
sles12cluster2:~ # sbd -d /dev/disk/by-id/scsi-36000c29c0348eb3640b99be0f96e80fe dump ==Dumping header on disk /dev/disk/by-id/scsi-36000c29c0348eb3640b99be0f96e80fe Header version : 2.1 UUID : f2faed5e-c0a5-46a8-8fb8-45d7bab44182 Number of slots : 255 Sector size : 512 Timeout (watchdog) : 10 Timeout (allocate) : 2 Timeout (loop) : 1 Timeout (msgwait) : 20 ==Header on disk /dev/disk/by-id/scsi-36000c29c0348eb3640b99be0f96e80fe is dumped sles12cluster2:~ # sbd -d /dev/disk/by-id/scsi-36000c29c0348eb3640b99be0f96e80fe list 0 sles12cluster2 clear 1 sles12cluster1 clear
Output systool -vc watchdog after the change (12 SP4 and later):
systool -vc watchdog Class = "watchdog" Class Device = "watchdog0" Class Device path = "/sys/devices/virtual/watchdog/watchdog0" bootstatus = "0" dev = "249:0" identity = "Software Watchdog" nowayout = "0" pretimeout = "0" pretimeout_available_governors= "noop" pretimeout_governor = "noop" state = "active" status = "0x8000" timeout = "10" uevent = "MAJOR=249 MINOR=0 DEVNAME=watchdog0"
For more information please refer to:
Disclaimer
This Support Knowledgebase provides a valuable tool for SUSE customers and parties interested in our products and solutions to acquire information, ideas and learn from one another. Materials are provided for informational, personal or non-commercial use within your organization and are presented "AS IS" WITHOUT WARRANTY OF ANY KIND.
- Document ID:7023689
- Creation Date: 30-Jan-2019
- Modified Date:14-Dec-2021
-
- SUSE Linux Enterprise High Availability Extension
For questions or concerns with the SUSE Knowledgebase please contact: tidfeedback@suse.com