10.3 STONITH Resources and Configuration

To set up fencing, you need to configure one or more STONITH resources—the stonithd daemon requires no configuration. All configuration is stored in the CIB. A STONITH resource is a resource of class stonith (see Section 6.3.2, Supported Resource Agent Classes). STONITH resources are a representation of STONITH plug-ins in the CIB. Apart from the fencing operations, the STONITH resources can be started, stopped and monitored, like any other resource. Starting or stopping STONITH resources means loading and unloading the STONITH device driver on a node. Starting and stopping are thus only administrative operations and do not translate to any operation on the fencing device itself. However, monitoring does translate to logging it to the device (to verify that the device will work in case it is needed). When a STONITH resource fails over to another node it enables the current node to talk to the STONITH device by loading the respective driver.

STONITH resources can be configured like any other resource. For details how to do so with your preferred cluster management tool:

The list of parameters (attributes) depends on the respective STONITH type. To view a list of parameters for a specific device, use the stonith command:

stonith -t stonith-device-type -n

For example, to view the parameters for the ibmhmc device type, enter the following:

stonith -t ibmhmc -n

To get a short help text for the device, use the -h option:

stonith -t stonith-device-type -h

10.3.1 Example STONITH Resource Configurations

In the following, find some example configurations written in the syntax of the crm command line tool. To apply them, put the sample in a text file (for example, sample.txt) and run:

root # crm < sample.txt

For more information about configuring resources with the crm command line tool, refer to Section 8.0, Configuring and Managing Cluster Resources (Command Line).

Example 10-1 Configuration of an IBM RSA Lights-out Device

An IBM RSA lights-out device might be configured like this:

configure
primitive st-ibmrsa-1 stonith:external/ibmrsa-telnet \
params nodename=alice ip_address=192.168.0.101 \
username=USERNAME password=PASSW0RD
primitive st-ibmrsa-2 stonith:external/ibmrsa-telnet \
params nodename=bob ip_address=192.168.0.102 \
username=USERNAME password=PASSW0RD
location l-st-alice st-ibmrsa-1 -inf: alice
location l-st-bob st-ibmrsa-2 -inf: bob
commit

In this example, location constraints are used for the following reason: There is always a certain probability that the STONITH operation is going to fail. Therefore, a STONITH operation on the node which is the executioner as well is not reliable. If the node is reset, it cannot send the notification about the fencing operation outcome. The only way to do that is to assume that the operation is going to succeed and send the notification beforehand. But if the operation fails, problems could arise. Therefore, by convention, stonithd refuses to terminate its host.

Example 10-2 Configuration of a UPS Fencing Device

The configuration of a UPS type fencing device is similar to the examples above. The details are not covered here. All UPS devices employ the same mechanics for fencing. How the device is accessed varies. Old UPS devices only had a serial port, usually connected at 1200baud using a special serial cable. Many new ones still have a serial port, but often they also use a USB or Ethernet interface. The kind of connection you can use depends on what the plug-in supports.

For example, compare the apcmaster with the apcsmart device by using the stonith -t stonith-device-type -n command:

stonith -t apcmaster -h

returns the following information:

STONITH Device: apcmaster - APC MasterSwitch (via telnet)
NOTE: The APC MasterSwitch accepts only one (telnet)
connection/session a time. When one session is active,
subsequent attempts to connect to the MasterSwitch will fail.
For more information see http://www.apc.com/
List of valid parameter names for apcmaster STONITH device:
ipaddr
login
 password

With

stonith -t apcsmart -h

you get the following output:

STONITH Device: apcsmart - APC Smart UPS
(via serial port - NOT USB!). 
Works with higher-end APC UPSes, like
Back-UPS Pro, Smart-UPS, Matrix-UPS, etc.
(Smart-UPS may have to be >= Smart-UPS 700?).
See http://www.networkupstools.org/protocols/apcsmart.html
for protocol compatibility details.
For more information see http://www.apc.com/
List of valid parameter names for apcsmart STONITH device:
ttydev
hostlist

The first plug-in supports APC UPS with a network port and telnet protocol. The second plug-in uses the APC SMART protocol over the serial line, which is supported by many APC UPS product lines.

Configuration of a Kdump Device

Kdump belongs to the Special Fencing Devices and is in fact the opposite of a fencing device. The plug-in checks if a Kernel dump is in progress on a node. If so, it returns true, and acts as if the node has been fenced.

The Kdump plug-in must be used in concert with another, real STONITH device, for example, external/ipmi. For the fencing mechanism to work properly, you must specify that Kdump is checked before a real STONITH device is triggered. Use crm configure fencing_topology to specify the order of the fencing devices as shown in the following procedure.

  1. Use the stonith:fence_kdump resource agent (provided by the package fence-agents) to monitor all nodes with the Kdump function enabled. Find a configuration example for the resource below:

    configure
      primitive st-kdump stonith:fence_kdump \
        params nodename="alice "\ 
        pcmk_host_check="static-list" \
        pcmk_reboot_action="off" \
        pcmk_monitor_action="metadata" \
        pcmk_reboot_retries="1" \
        timeout="60"
    commit

    Name of the node to be monitored. If you need to monitor more than one node, configure more STONITH resources. To prevent a specific node from using a fencing device, add location constraints.

    The fencing action will be started after the timeout of the resource.

  2. In /etc/sysconfig/kdump on each node, configure KDUMP_POSTSCRIPT to send a notification to all nodes when the Kdump process is finished. For example:

    /usr/lib/fence_kdump_send -i INTERVAL -p PORT -c 1 alice bob charlie [...]

    The node that does a Kdump will restart automatically after Kdump has finished.

  3. Write a new initrd to include the library fence_kdump_send with network enabled. Use the -f option to overwrite the existing file, so the new file will be used for the next boot process:

    root # dracut -f -a kdump
  4. Open a port in the firewall for the fence_kdump resource. The default port is 7410.

  5. To achieve that Kdump is checked before triggering a real fencing mechanism (like external/ipmi), use a configuration similar to the following:

    fencing_topology \
      alice: kdump-node1 ipmi-node1 \
      bob: kdump-node2 ipmi-node2

    For more details on fencing_topology:

    crm configure help fencing_topology