6.3 Cluster Resources

As a cluster administrator, you need to create cluster resources for every resource or application you run on servers in your cluster. Cluster resources can include Web sites, e-mail servers, databases, file systems, virtual machines, and any other server-based applications or services you want to make available to users at all times.

6.3.1 Resource Management

Before you can use a resource in the cluster, it must be set up. For example, to use an Apache server as a cluster resource, set up the Apache server first and complete the Apache configuration before starting the respective resource in your cluster.

If a resource has specific environment requirements, make sure they are present and identical on all cluster nodes. This kind of configuration is not managed by the High Availability Extension. You must do this yourself.

NOTE: Do Not Touch Services Managed by the Cluster

When managing a resource with the High Availability Extension, the same resource must not be started or stopped otherwise (outside of the cluster, for example manually or on boot or reboot). The High Availability Extension software is responsible for all service start or stop actions.

If you need to execute testing or maintenance tasks after the services are already running under cluster control, make sure to put the resources, nodes, or the whole cluster into maintenance mode before you touch any of them manually. For details, see Section 16.2, Different Options for Maintenance Tasks.

After having configured the resources in the cluster, use the cluster management tools to start, stop, clean up, remove or migrate any resources manually. For details how to do so with your preferred cluster management tool:

6.3.2 Supported Resource Agent Classes

For each cluster resource you add, you need to define the standard that the resource agent conforms to. Resource agents abstract the services they provide and present an accurate status to the cluster, which allows the cluster to be non-committal about the resources it manages. The cluster relies on the resource agent to react appropriately when given a start, stop or monitor command.

Typically, resource agents come in the form of shell scripts. The High Availability Extension supports the following classes of resource agents:

Open Cluster Framework (OCF) Resource Agents

OCF RA agents are best suited for use with High Availability, especially when you need multi-state resources or special monitoring abilities. The agents are generally located in /usr/lib/ocf/resource.d/provider/. Their functionality is similar to that of LSB scripts. However, the configuration is always done with environmental variables which allow them to accept and process parameters easily. The OCF specification (as it relates to resource agents) can be found at https://github.com/ClusterLabs/OCF-spec/blob/master/ra/1.0/resource-agent-api.md. OCF specifications have strict definitions of which exit codes must be returned by actions, see Section 9.3, OCF Return Codes and Failure Recovery. The cluster follows these specifications exactly.

All OCF Resource Agents are required to have at least the actions start, stop, status, monitor, and meta-data. The meta-data action retrieves information about how to configure the agent. For example, if you want to know more about the IPaddr agent by the provider heartbeat, use the following command:

OCF_ROOT=/usr/lib/ocf /usr/lib/ocf/resource.d/heartbeat/IPaddr meta-data

The output is information in XML format, including several sections (general description, available parameters, available actions for the agent).

Alternatively, use the crmsh to view information on OCF resource agents. For details, see Section 8.1.3, Displaying Information about OCF Resource Agents.

Linux Standards Base (LSB) Scripts

LSB resource agents are generally provided by the operating system/distribution and are found in /etc/init.d. To be used with the cluster, they must conform to the LSB init script specification. For example, they must have several actions implemented, which are, at minimum, start, stop, restart, reload, force-reload, and status. For more information, see http://refspecs.linuxbase.org/LSB_4.1.0/LSB-Core-generic/LSB-Core-generic/iniscrptact.html.

The configuration of those services is not standardized. If you intend to use an LSB script with High Availability, make sure that you understand how the relevant script is configured. Often you can find information about this in the documentation of the relevant package in /usr/share/doc/packages/PACKAGENAME.


Starting with SUSE Linux Enterprise 12, systemd is a replacement for the popular System V init daemon. Pacemaker can manage systemd services if they are present. Instead of init scripts, systemd has unit files. Generally the services (or unit files) are provided by the operating system. In case you want to convert existing init scripts, find more information at http://0pointer.de/blog/projects/systemd-for-admins-3.html.


There are currently many common types of system services that exist in parallel: LSB (belonging to System V init), systemd, and (in some distributions) upstart. Therefore, Pacemaker supports a special alias which intelligently figures out which one applies to a given cluster node. This is particularly useful when the cluster contains a mix of systemd, upstart, and LSB services. Pacemaker will try to find the named service in the following order: as an LSB (SYS-V) init script, a systemd unit file, or an Upstart job.


Monitoring plug-ins (formerly called Nagios plug-ins) allow to monitor services on remote hosts. Pacemaker can do remote monitoring with the monitoring plug-ins if they are present. For detailed information, see Section 6.6.1, Monitoring Services on Remote Hosts with Monitoring Plug-ins.

STONITH (Fencing) Resource Agents

This class is used exclusively for fencing related resources. For more information, see Section 10.0, Fencing and STONITH.

The agents supplied with the High Availability Extension are written to OCF specifications.

6.3.3 Types of Resources

The following types of resources can be created:


A primitive resource, the most basic type of resource.

Learn how to create primitive resources with your preferred cluster management tool:


Groups contain a set of resources that need to be located together, started sequentially and stopped in the reverse order. For more information, refer to Groups.


Clones are resources that can be active on multiple hosts. Any resource can be cloned, provided the respective resource agent supports it. For more information, refer to Clones.

Multi-state Resources (formerly known as Master/Slave Resources)

Multi-state resources are a special type of clone resources that can have multiple modes. For more information, refer to Multi-state Resources.

6.3.4 Resource Templates

If you want to create lots of resources with similar configurations, defining a resource template is the easiest way. After having been defined, it can be referenced in primitives—or in certain types of constraints, as described in Section 6.5.3, Resource Templates and Constraints.

If a template is referenced in a primitive, the primitive will inherit all operations, instance attributes (parameters), meta attributes, and utilization attributes defined in the template. Additionally, you can define specific operations or attributes for your primitive. If any of these are defined in both the template and the primitive, the values defined in the primitive will take precedence over the ones defined in the template.

Learn how to define resource templates with your preferred cluster configuration tool:

6.3.5 Advanced Resource Types

Whereas primitives are the simplest kind of resources and therefore easy to configure, you will probably also need more advanced resource types for cluster configuration, such as groups, clones or multi-state resources.


Some cluster resources depend on other components or resources. They require that each component or resource starts in a specific order and runs together on the same server with resources it depends on. To simplify this configuration, you can use cluster resource groups.

Example 6-3 Resource Group for a Web Server

An example of a resource group would be a Web server that requires an IP address and a file system. In this case, each component is a separate resource that is combined into a cluster resource group. The resource group would run on one or more servers. In case of a software or hardware malfunction, the group would fail over to another server in the cluster, similar to an individual cluster resource.

Figure 6-1 Group Resource

Groups have the following properties:

Starting and Stopping

Resources are started in the order they appear in and stopped in the reverse order.


If a resource in the group cannot run anywhere, then none of the resources located after that resource in the group is allowed to run.


Groups may only contain a collection of primitive cluster resources. Groups must contain at least one resource, otherwise the configuration is not valid. To refer to the child of a group resource, use the child’s ID instead of the group’s ID.


Although it is possible to reference the group’s children in constraints, it is usually preferable to use the group’s name instead.


Stickiness is additive in groups. Every active member of the group will contribute its stickiness value to the group’s total. So if the default resource-stickiness is 100 and a group has seven members (five of which are active), the group as a whole will prefer its current location with a score of 500.

Resource Monitoring

To enable resource monitoring for a group, you must configure monitoring separately for each resource in the group that you want monitored.

Learn how to create groups with your preferred cluster management tool:


You may want certain resources to run simultaneously on multiple nodes in your cluster. To do this you must configure a resource as a clone. Examples of resources that might be configured as clones include cluster file systems like OCFS2. You can clone any resource provided. This is supported by the resource’s Resource Agent. Clone resources may even be configured differently depending on which nodes they are hosted.

There are three types of resource clones:

Anonymous Clones

These are the simplest type of clones. They behave identically anywhere they are running. Because of this, there can only be one instance of an anonymous clone active per machine.

Globally Unique Clones

These resources are distinct entities. An instance of the clone running on one node is not equivalent to another instance on another node; nor would any two instances on the same node be equivalent.

Stateful Clones (Multi-state Resources)

Active instances of these resources are divided into two states, active and passive. These are also sometimes called primary and secondary, or master and slave. Stateful clones can be either anonymous or globally unique. See also Multi-state Resources.

Clones must contain exactly one group or one regular resource.

When configuring resource monitoring or constraints, clones have different requirements than simple resources. For details, see Pacemaker Explained, available from http://www.clusterlabs.org/doc/. Refer to section Clones - Resources That Get Active on Multiple Hosts.

Learn how to create clones with your preferred cluster management tool:

Multi-state Resources

Multi-state resources are a specialization of clones. They allow the instances to be in one of two operating modes (called master or slave, but can mean whatever you want them to mean). Multi-state resources must contain exactly one group or one regular resource.

When configuring resource monitoring or constraints, multi-state resources have different requirements than simple resources. For details, see Pacemaker Explained, available from http://www.clusterlabs.org/doc/. Refer to section Multi-state - Resources That Have Multiple Modes.

6.3.6 Resource Options (Meta Attributes)

For each resource you add, you can define options. Options are used by the cluster to decide how your resource should behave—they tell the CRM how to treat a specific resource. Resource options can be set with the crm_resource --meta command or with Hawk2 as described in Adding a Primitive Resource.

Table 6-1 Options for a Primitive Resource





If not all resources can be active, the cluster will stop lower priority resources to keep higher priority ones active.



In what state should the cluster attempt to keep this resource? Allowed values: stopped, started, master.



Is the cluster allowed to start and stop the resource? Allowed values: true, false. If the value is set to false, the status of the resource is still monitored and any failures are reported (which is different from setting a resource to maintenance="true").



Can the resources be touched manually? Allowed values: true, false. If set to true, all resources become unmanaged: the cluster will stop monitoring them and thus be oblivious about their status. You can stop or restart cluster resources at will, without the cluster attempting to restart them.



How much does the resource prefer to stay where it is?



How many failures should occur for this resource on a node before making the node ineligible to host this resource?

INFINITY (disabled)


What should the cluster do if it ever finds the resource active on more than one node? Allowed values: block (mark the resource as unmanaged), stop_only, stop_start.



How many seconds to wait before acting as if the failure had not occurred (and potentially allowing the resource back to the node on which it failed)?

0 (disabled)


Allow resource migration for resources which support migrate_to/migrate_from actions.



The name of the remote node this resource defines. This both enables the resource as a remote node and defines the unique name used to identify the remote node. If no other parameters are set, this value will also be assumed as the host name to connect to at remote-portport.

WARNING: Use Unique IDs

This value must not overlap with any existing resource or node IDs.

none (disabled)


Custom port for the guest connection to pacemaker_remote.



The IP address or host name to connect to if the remote node’s name is not the host name of the guest.

remote-node (value used as host name)


How long before a pending guest connection will time out.


6.3.7 Instance Attributes (Parameters)

The scripts of all resource classes can be given parameters which determine how they behave and which instance of a service they control. If your resource agent supports parameters, you can add them with the crm_resource command or with Hawk2 as described in Adding a Primitive Resource. In the crm command line utility and in Hawk2, instance attributes are called params or Parameter, respectively. The list of instance attributes supported by an OCF script can be found by executing the following command as root:

root # crm ra info [class:[provider:]]resource_agent

or (without the optional parts):

root # crm ra info resource_agent

The output lists all the supported attributes, their purpose and default values.

For example, the command

root # crm ra info IPaddr

returns the following output:

Manages virtual IPv4 addresses (portable version) (ocf:heartbeat:IPaddr)
This script manages IP alias IP addresses
It can add an IP alias, or remove one.   
Parameters (* denotes required, [] the default):
ip* (string): IPv4 address
The IPv4 address to be configured in dotted quad notation, for example
nic (string, [eth0]): Network interface
The base network interface on which the IP address will be brought
If left empty, the script will try and determine this from the    
routing table.                                                    
Do NOT specify an alias interface in the form eth0:1 or anything here;
rather, specify the base interface only.                              
cidr_netmask (string): Netmask
The netmask for the interface in CIDR format. (ie, 24), or in
dotted quad notation                        
If unspecified, the script will also try to determine this from the
routing table.                                                     
broadcast (string): Broadcast address
Broadcast address associated with the IP. If left empty, the script will
determine this from the netmask.                                        
iflabel (string): Interface label
You can specify an additional label for your IP address here.
lvs_support (boolean, [false]): Enable support for LVS DR
Enable support for LVS Direct Routing configurations. In case a IP
address is stopped, only move it to the loopback device to allow the
local node to continue to service requests, but no longer advertise it
on the network.                                                       
local_stop_script (string): 
Script called when the IP is released
local_start_script (string): 
Script called when the IP is added
ARP_INTERVAL_MS (integer, [500]): milliseconds between gratuitous ARPs
milliseconds between ARPs                                         
ARP_REPEAT (integer, [10]): repeat count
How many gratuitous ARPs to send out when bringing up a new address
ARP_BACKGROUND (boolean, [yes]): run in background
run in background (no longer any reason to do this)
ARP_NETMASK (string, [ffffffffffff]): netmask for ARP
netmask for ARP - in nonstandard hexadecimal format.
Operations' defaults (advisory minimum):
start         timeout=90
stop          timeout=100
monitor_0     interval=5s timeout=20s

NOTE: Instance Attributes for Groups, Clones or Multi-state Resources

Note that groups, clones and multi-state resources do not have instance attributes. However, any instance attributes set will be inherited by the group's, clone's or multi-state resource's children.

6.3.8 Resource Operations

By default, the cluster will not ensure that your resources are still healthy. To instruct the cluster to do this, you need to add a monitor operation to the resource’s definition. Monitor operations can be added for all classes or resource agents. For more information, refer to Section 6.4, Resource Monitoring.

Table 6-2 Resource Operation Properties




Your name for the action. Must be unique. (The ID is not shown).


The action to perform. Common values: monitor, start, stop.


How frequently to perform the operation. Unit: seconds


How long to wait before declaring the action has failed.


What conditions need to be satisfied before this action occurs. Allowed values: nothing, quorum, fencing. The default depends on whether fencing is enabled and if the resource’s class is stonith. For STONITH resources, the default is nothing.


The action to take if this action ever fails. Allowed values:

  • ignore: Pretend the resource did not fail.

  • block: Do not perform any further operations on the resource.

  • stop: Stop the resource and do not start it elsewhere.

  • restart: Stop the resource and start it again (possibly on a different node).

  • fence: Bring down the node on which the resource failed (STONITH).

  • standby: Move all resources away from the node on which the resource failed.


If false, the operation is treated as if it does not exist. Allowed values: true, false.


Run the operation only if the resource has this role.


Can be set either globally or for individual resources. Makes the CIB reflect the state of in-flight operations on resources.


Description of the operation.

6.3.9 Timeout Values

Timeouts values for resources can be influenced by the following parameters:

  • op_defaults (global timeout for operations),

  • a specific timeout value defined in a resource template,

  • a specific timeout value defined for a resource.

NOTE: Priority of Values

If a specific value is defined for a resource, it takes precedence over the global default. A specific value for a resource also takes precedence over a value that is defined in a resource template.

Getting timeout values right is very important. Setting them too low will result in a lot of (unnecessary) fencing operations for the following reasons:

  1. If a resource runs into a timeout, it fails and the cluster will try to stop it.

  2. If stopping the resource also fails (for example, because the timeout for stopping is set too low), the cluster will fence the node. It considers the node where this happens to be out of control.

You can adjust the global default for operations and set any specific timeout values with both crmsh and Hawk2. The best practice for determining and setting timeout values is as follows:

Determining Timeout Values

  1. Check how long it takes your resources to start and stop (under load).

  2. If needed, add the op_defaults parameter and set the (default) timeout value accordingly:

    1. For example, set op_defaults to 60 seconds:

      crm(live)configure#  op_defaults timeout=60
    2. For resources that need longer periods of time, define individual timeout values.

  3. When configuring operations for a resource, add separate start and stop operations. When configuring operations with Hawk2, it will provide useful timeout proposals for those operations.