10.2 Node Level Fencing

In a Pacemaker cluster, the implementation of node level fencing is STONITH (Shoot The Other Node in the Head). The High Availability Extension includes the stonith command line tool, an extensible interface for remotely powering down a node in the cluster. For an overview of the available options, run stonith --help or refer to the man page of stonith for more information.

10.2.1 STONITH Devices

To use node level fencing, you first need to have a fencing device. To get a list of STONITH devices which are supported by the High Availability Extension, run one of the following commands on any of the nodes:

root # stonith -L

or

root # crm ra list stonith

STONITH devices may be classified into the following categories:

Power Distribution Units (PDU)

Power Distribution Units are an essential element in managing power capacity and functionality for critical network, server and data center equipment. They can provide remote load monitoring of connected equipment and individual outlet power control for remote power recycling.

Uninterruptible Power Supplies (UPS)

A stable power supply provides emergency power to connected equipment by supplying power from a separate source if a utility power failure occurs.

Blade Power Control Devices

If you are running a cluster on a set of blades, then the power control device in the blade enclosure is the only candidate for fencing. Of course, this device must be capable of managing single blade computers.

Lights-out Devices

Lights-out devices (IBM RSA, HP iLO, Dell DRAC) are becoming increasingly popular and may even become standard in off-the-shelf computers. However, if they share a power supply with their host (a cluster node), they might not work when needed. If a node stays without power, the device supposed to control it would be useless. Therefore, it is highly recommended to use battery backed lights-out devices. Another aspect is that these devices are accessed by network. This might imply a single point of failure, or security concerns.

Testing Devices

Testing devices are used exclusively for testing purposes. They are usually more gentle on the hardware. Before the cluster goes into production, they must be replaced with real fencing devices.

The choice of the STONITH device depends mainly on your budget and the kind of hardware you use.

10.2.2 STONITH Implementation

The STONITH implementation of SUSEĀ® Linux Enterprise High Availability Extension consists of two components:

stonithd

stonithd is a daemon which can be accessed by local processes or over the network. It accepts the commands which correspond to fencing operations: reset, power-off, and power-on. It can also check the status of the fencing device.

The stonithd daemon runs on every node in the CRM HA cluster. The stonithd instance running on the DC node receives a fencing request from the CRM. It is up to this and other stonithd programs to carry out the desired fencing operation.

STONITH Plug-ins

For every supported fencing device there is a STONITH plug-in which is capable of controlling said device. A STONITH plug-in is the interface to the fencing device. The STONITH plug-ins contained in the cluster-glue package reside in /usr/lib64/stonith/plugins on each node. (If you installed the fence-agents package, too, the plug-ins contained there are installed in /usr/sbin/fence_*.) All STONITH plug-ins look the same to stonithd, but are quite different on the other side reflecting the nature of the fencing device.

Some plug-ins support more than one device. A typical example is ipmilan (or external/ipmi) which implements the IPMI protocol and can control any device which supports this protocol.