SBD Stonith fails despite seemingly correct setup (OPENAIS STONITH)

This document (7008921) is provided subject to the disclaimer at the end of this document.

Environment

SUSE Linux Enterprise Server 11
SUSE Linux Enterprise Server 11 Service Pack 1

Situation

Despite setting up SBD Stonith in the cluster and configuring it and checking that the block device is available on all nodes and STONITH enabled, during a failure of the corosync process, there is no fencing.
This can be simulated by issuing
 
   pkill -9 corosync

on one node. The resulting cluster is then one that has the surviving nodes, that actually wrote the reset to the SBD slot and deem the "failing" node fenced, and the "failed" node that just increases the load of the system.

Resolution

The reason for this behaviour is that there is no working watchdog module loaded. This can be identified by checking the /var/log/messages after the start of the cluster software for entries like

Jun 30 17:19:18 mercury sbd: [3117]: notice: Using watchdog device: /dev/watchdog
Jun 30 17:19:18 mercury sbd: [3117]: ERROR: WDIOC_SETTIMEOUT: Failed to set watchdog timer to 10 seconds.: Inappropriate ioctl for device
Jun 30 17:19:18 mercury sbd: [3117]: CRIT: Please validate your watchdog configuration!
Jun 30 17:19:18 mercury sbd: [3117]: CRIT: Choose a different watchdog driver or specify -T to silence this check if you are sure.

so the administrator has to ensure to load the proper watchdog module. This is hardware dependent.
On xen domU's one can use softdog and the solution would be to add

   modprobe softdog

to /etc/init.d/boot.local

Disclaimer

This Support Knowledgebase provides a valuable tool for SUSE customers and parties interested in our products and solutions to acquire information, ideas and learn from one another. Materials are provided for informational, personal or non-commercial use within your organization and are presented "AS IS" WITHOUT WARRANTY OF ANY KIND.

  • Document ID:7008921
  • Creation Date: 30-Jun-2011
  • Modified Date:03-Mar-2020
    • SUSE Linux Enterprise Server

< Back to Support Search

For questions or concerns with the SUSE Knowledgebase please contact: tidfeedback[at]suse.com

SUSE Support Forums

Get your questions answered by experienced Sys Ops or interact with other SUSE community experts.

Join Our Community

Support Resources

Learn how to get the most from the technical support you receive with your SUSE Subscription, Premium Support, Academic Program, or Partner Program.


SUSE Customer Support Quick Reference Guide SUSE Technical Support Handbook Update Advisories
Support FAQ

Open an Incident

Open an incident with SUSE Technical Support, manage your subscriptions, download patches, or manage user access.

Go to Customer Center