My Favorites

Close

Please to see your favorites.


SBD Operation Guidelines for HAE Clusters

This document (7011346) is provided subject to the disclaimer at the end of this document.

Environment

SUSE Linux Enterprise High Availability Extension 11 (HAE)
SUSE Linux Enterprise Server 11 (SLES)
Split Brain Detection (SBD)

Situation

SBD operations fail or do not work in a timely manner. There are a few factors involved with proper STONITH SBD functionality in an HAE cluster. This TID is intended for a brief set of guidelines. See TID7009485 - SBD setup - debug and verify (OPENAIS) and http://linux-ha.org/wiki/SBD_Fencing for additional details.

There are several variables associated with SBD funcationality. The variable and how to determine its value are shown below.

TOTEM Token (Default=5000): The time spent detecting a failure of a processor.

# cat /etc/corosync/corosync.conf
<snip>
totem {
    version:    2
    token:      5000
</snip>

Watchdog Timeout (Default=5): Time interval where at least one response from the sbd device has to be received.

Message Wait Timeout (Default=10): Specifies the time delay incurred when another node sends the poison pill.

# /usr/sbin/sbd -d /dev/sdb1 dump
==Dumping header on disk /dev/sdb1
Header version     : 2
Number of slots    : 255
Sector size        : 512
Timeout (watchdog) : 5
Timeout (allocate) : 2
Timeout (loop)     : 1
Timeout (msgwait)  : 10
==Header on disk /dev/sdb1 is dumped

STONITH Timeout (Default=60): How long to wait for the STONITH action to complete. If the nvpair xml tag for the stonith-timeout is missing, the default of 60 seconds is assumed.

# /usr/sbin/cibadmin -Q
<snip>
  <configuration>
    <crm_config>
      <cluster_property_set id="cib-bootstrap-options">
        <nvpair id="cib-bootstrap-options-dc-version" name="dc-version" value="1.1.6-b988976485d15cb702c9307df55512d323831a5e"/>
        <nvpair id="cib-bootstrap-options-cluster-infrastructure" name="cluster-infrastructure" value="openais"/>
        <nvpair id="cib-bootstrap-options-expected-quorum-votes" name="expected-quorum-votes" value="2"/>
        <nvpair id="cib-bootstrap-options-stonith-timeout" name="stonith-timeout" value="120"/>
        <nvpair id="cib-bootstrap-options-last-lrm-refresh" name="last-lrm-refresh" value="1352238282"/>
      </cluster_property_set>
    </crm_config>
</snip>

Resolution

Principle guidelines for SBD functionality. These are guidelines. If you deviate from them, make sure you know what you are doing.
1. Watchdog < Message Wait < STONITH Timeout
2. Message Wait = 2 x Watchdog
3. STONITH Timeout >= Message Wait + (Message Wait / 100 * 20)
4. TOTEM Token >= 5 seconds
5. STONITH Timeout < 300 seconds
6. Watchdog <= 120 seconds

Common Recommendations
VariableDefault
Suggestion 1
Suggestion 2
Watchdog
5
20
30
Message Wait
10
40
60
STONITH Timeout
60
90
120

Disclaimer

This Support Knowledgebase provides a valuable tool for NetIQ/Novell/SUSE customers and parties interested in our products and solutions to acquire information, ideas and learn from one another. Materials are provided for informational, personal or non-commercial use within your organization and are presented "AS IS" WITHOUT WARRANTY OF ANY KIND.

  • Document ID:7011346
  • Creation Date:12-NOV-12
  • Modified Date:12-NOV-12
    • SUSESUSE Linux Enterprise High Availability Extension
      SUSE Linux Enterprise Server

Did this document solve your problem? Provide Feedback