How to safely change sbd timeout settings in a running pacemaker cluster

This document (7023689) is provided subject to the disclaimer at the end of this document.

Environment

SUSE Linux Enterprise High Availability Extension 12
SUSE Linux Enterprise High Availability Extension 15

Situation

For various potential reasons, the timeout settings for the configured sbd device(s) may need to be adjusted. For example, the timeout settings for watchdog (90) and msgwait (180) should be adjusted:
 
sles12cluster2:~ # sbd -d /dev/disk/by-id/scsi-36000c29c0348eb3640b99be0f96e80fe dump
==Dumping header on disk /dev/disk/by-id/scsi-36000c29c0348eb3640b99be0f96e80fe
Header version     : 2.1
UUID               : 62caa488-cbee-4449-84c3-5fd0659dcc09
Number of slots    : 255
Sector size        : 512
Timeout (watchdog) : 90
Timeout (allocate) : 2
Timeout (loop)     : 1
Timeout (msgwait)  : 180
==Header on disk /dev/disk/by-id/scsi-36000c29c0348eb3640b99be0f96e80fe is dumped
Note that this document is not intended to show what values should be used, only *how* to change them.  For recommendations about values, see TID https://www.suse.com/support/kb/doc/?id=000017952

Resolution

The following commands need to be executed as root user or user with equivalent permissions.

Note: Make sure the services pacemaker and depending sbd is stopped and restarted as described. Otherwise new settings will not be active, on all nodes.
  1. Run the command

    sbd -d <device> dump

    to display the current settings of a sbd device.
     
  2. Change the cluster into maintenance mode:

    crm configure property maintenance-mode=true
     
  3. Deactivate the STONITH option to make sure a second time no fencing will happen during the upcoming tasks.  Run:

    crm configure property stonith-enabled=false
     
  4. Stop the pacemaker service on the involved nodes (on all nodes):

    systemctl stop pacemaker
     
  5. Recreate the metadata on the sbd devices:

    sbd -d <device> -4 xx -1 xx create

    Full example (using three sbd disks):

    sbd -d /dev/disk/by-id/scsi-36000c29c0348eb3640b99be0f96e80fe -d /dev/disk/by-id/scsi-36000c29d7b18a8c4a6e980da7fd74fab -d /dev/disk/by-id/scsi-36000c2912306cd2a42adc9c0c95f450c -4 20 -1 10 create
     
  6. Optionally, set the stonith-timeout to a desired value: 

    crm configure property stonith-timeout=xx
     
  7. Start the pacemaker service on both nodes:

    systemctl start pacemaker
     
  8. Check the sbd partition information:

    sbd -d <device> dump

    and make sure the cluster nodes have been assigned a slot:

    sbd -d <device> list
     
  9. Re-arm the STONITH device:
    crm configure property stonith-enabled=true
     
  10. Disable the maintenance mode:

    crm configure property maintenance-mode=false

Additional Information

Output of sbd -d <device> dump after the change:
sles12cluster2:~ # sbd -d /dev/disk/by-id/scsi-36000c29c0348eb3640b99be0f96e80fe dump
==Dumping header on disk /dev/disk/by-id/scsi-36000c29c0348eb3640b99be0f96e80fe
Header version     : 2.1
UUID               : f2faed5e-c0a5-46a8-8fb8-45d7bab44182
Number of slots    : 255
Sector size        : 512
Timeout (watchdog) : 10
Timeout (allocate) : 2
Timeout (loop)     : 1
Timeout (msgwait)  : 20
==Header on disk /dev/disk/by-id/scsi-36000c29c0348eb3640b99be0f96e80fe is dumped
sles12cluster2:~ # sbd -d /dev/disk/by-id/scsi-36000c29c0348eb3640b99be0f96e80fe list
0       sles12cluster2  clear
1       sles12cluster1  clear


Output systool -vc watchdog after the change (12 SP4 and later):
systool -vc watchdog
Class = "watchdog"

  Class Device = "watchdog0"
  Class Device path = "/sys/devices/virtual/watchdog/watchdog0"
    bootstatus          = "0"
    dev                 = "249:0"
    identity            = "Software Watchdog"
    nowayout            = "0"
    pretimeout          = "0"
    pretimeout_available_governors= "noop"
    pretimeout_governor = "noop"
    state               = "active"
    status              = "0x8000"
    timeout             = "10"
    uevent              = "MAJOR=249
MINOR=0
DEVNAME=watchdog0"

For more information please refer to:

Disclaimer

This Support Knowledgebase provides a valuable tool for SUSE customers and parties interested in our products and solutions to acquire information, ideas and learn from one another. Materials are provided for informational, personal or non-commercial use within your organization and are presented "AS IS" WITHOUT WARRANTY OF ANY KIND.

  • Document ID:7023689
  • Creation Date: 30-Jan-2019
  • Modified Date:21-Oct-2021
    • SUSE Linux Enterprise High Availability Extension

< Back to Support Search

For questions or concerns with the SUSE Knowledgebase please contact: tidfeedback@suse.com

SUSE Support Forums

Get your questions answered by experienced Sys Ops or interact with other SUSE community experts.

Join Our Community

Support Resources

Learn how to get the most from the technical support you receive with your SUSE Subscription, Premium Support, Academic Program, or Partner Program.


SUSE Customer Support Quick Reference Guide SUSE Technical Support Handbook Update Advisories
Support FAQ

Open an Incident

Open an incident with SUSE Technical Support, manage your subscriptions, download patches, or manage user access.

Go to Customer Center