One HAE node fails to start at boot with openais showing help screen

This document (7011300) is provided subject to the disclaimer at the end of this document.

Environment

SUSE Linux Enterprise Server 11 (SLES)
SUSE Linux Enterprise High Availability Extension 11 (HAE)
Split Brain Detection (SBD) Partitions

Situation

In a two node cluster, one server starts clustering, while the other server fails. The node that failed to start clustering displays the sbd command help information in /var/log/boot.msg:

<snip>
Starting OpenAIS/Corosync daemon (corosync): Starting SBD - Shared storage fencing tool.
Syntax:
        sbd <options> <command> <cmdarguments>
Options:
<snip/>
The /etc/sysconfig/sbd files do not match on all nodes in the cluster. The following configuration details were found:

Node1 /etc/sysconfig/sbd
SBD_DEVICE="/dev/sdd1;/dev/sdc1;/dev/sdb1"
SBD_OPTS="-W"

Node 2 /etc/sysconfig/sbd
SBD_DEVICE="/dev/sdb1;/dev/sdc1;/dev/sdd1"
SBD_OPTS="-W"

The HAE Cluster Information Base stonith sbd resource configuration:
<primitive class="stonith" id="stonith-sbd" type="external/sbd">
  <instance_attributes id="stonith-sbd-instance_attributes">
    <nvpair id="stonith-sbd-instance_attributes-sbd_device" name="sbd_device" value="/dev/sdb1;/dev/sdc1;/dev/sdd1"/>
  </instance_attributes>
</primitive>

Resolution

Make sure the CIB sbd_device values match in content and order in each of the node's /etc/sysconfig/sbd files. Since Node 2's version of /etc/sysconfig/sbd matches the CIB database's device list in the specific devices and their order, copy Node 2's /etc/sysconfig/sbd to Node1.

1. Fix the list of devices and their order.
# cat /etc/sysconfig/sbd
SBD_DEVICE="/dev/sdb1;/dev/sdc1;/dev/sdd1"
SBD_OPTS="-W"

2. Confirm that the shared device names listed in /etc/sysconfig/sbd exist with the same device path on all nodes, and are the same physical media (ie confirm /dev/sdb1, /dev/sdc1 and /dev/sdd1 exist on all nodes and all nodes have the same device path for each. In other words, one device cannot be /dev/sdc1 on node1 and /dev/sdf1 on node2).

3. Confirm that the same /etc/sysconfig/sbd exists on all nodes. Copy the /etc/sysconfig/sbd from step one above to all nodes in the cluster.
# scp /etc/sysconfig/sbd hn2:/etc/sysconfig/

4. Reformat the SBD partition on each devices listed:
# sbd -d /dev/sdb1 -d /dev/sdc1 -d /dev/sdd1 create

5. Reboot one node in the cluster. When it comes back online, reboot another node. Repeat the process until each node in the cluster has been rebooted.

The corrected configuration files would look like this.


Node1 /etc/sysconfig/sbd
SBD_DEVICE="/dev/sdb1;/dev/sdc1;/dev/sdd1"
SBD_OPTS="-W"

Node 2 /etc/sysconfig/sbd
SBD_DEVICE="/dev/sdb1;/dev/sdc1;/dev/sdd1"
SBD_OPTS="-W"

The HAE Cluster Information Base stonith sbd resource configuration would look like either of the following:
<primitive class="stonith" id="stonith-sbd" type="external/sbd">
  <instance_attributes id="stonith-sbd-instance_attributes">
    <nvpair id="stonith-sbd-instance_attributes-sbd_device" name="sbd_device" value="/dev/sdb1;/dev/sdc1;/dev/sdd1"/>
  </instance_attributes>
</primitive>
-OR-

<primitive class="stonith" id="stonith-sbd" type="external/sbd" />

Cause

The /etc/sysconfig/sbd must match on all nodes in the HAE cluster.

The order of the devices and the device names are important. It is a safe practice to modify the /etc/sysconfig/sbd on one node, and then always copy it to all other nodes in the cluster. The CIB database must have the same device list in its sbd_device parameter list. If the CIB sbd_device parameter list is missing, the cluster will use the /etc/sysconfig/sbd devices.

Disclaimer

This Support Knowledgebase provides a valuable tool for SUSE customers and parties interested in our products and solutions to acquire information, ideas and learn from one another. Materials are provided for informational, personal or non-commercial use within your organization and are presented "AS IS" WITHOUT WARRANTY OF ANY KIND.

  • Document ID:7011300
  • Creation Date: 02-Nov-2012
  • Modified Date:03-Mar-2020
    • SUSE Linux Enterprise High Availability Extension
    • SUSE Linux Enterprise Server

< Back to Support Search

For questions or concerns with the SUSE Knowledgebase please contact: tidfeedback@suse.com

SUSE Support Forums

Get your questions answered by experienced Sys Ops or interact with other SUSE community experts.

Join Our Community

Support Resources

Learn how to get the most from the technical support you receive with your SUSE Subscription, Premium Support, Academic Program, or Partner Program.


SUSE Customer Support Quick Reference Guide SUSE Technical Support Handbook Update Advisories
Support FAQ

Open an Incident

Open an incident with SUSE Technical Support, manage your subscriptions, download patches, or manage user access.

Go to Customer Center