A concept of how services are running on nodes. An active-passive scenario means that one or more services are running on the active node and the passive node waits for the active node to fail. Active-active means that each node is active and passive at the same time. For example, it has some services running, but can take over other services from the other node. Compare with primary/secondary and dual-primary in DRBD speak.
Additional instance in a Geo cluster that helps to reach consensus about decisions such as failover of resources across sites. Arbitrators are single machines that run one or more booth instances in a special mode.
AutoYaST is a system for installing one or more SUSE Linux Enterprise systems automatically and without user intervention.
bindnetaddr (bind network address)
The network address the Corosync executive should bind to.
The instance that manages the failover process between the sites of a Geo cluster. It aims to get multi-site resources active on one and only one site. This is achieved by using so-called tickets that are treated as failover domain between cluster sites, in case a site should be down.
boothd (booth daemon)
Each of the participating clusters and arbitrators in a Geo cluster runs a service, the boothd. It connects to the booth daemons running at the other sites and exchanges connectivity details.
CCM (consensus cluster membership)
The CCM determines which nodes make up the cluster and shares this information across the cluster. Any new addition and any loss of nodes or quorum is delivered by the CCM. A CCM module runs on each node of the cluster.
CIB (cluster information base)
A representation of the whole cluster configuration and status (cluster options, nodes, resources, constraints and the relationship to each other). It is written in XML and resides in memory. A master CIB is kept and maintained on the DC (designated coordinator) and replicated to the other nodes. Normal read and write operations on the CIB are serialized through the master CIB.
A high-performance cluster is a group of computers (real or virtual) sharing the application load to achieve faster results. A high-availability cluster is designed primarily to secure the highest possible availability of services.
Whenever communication fails between one or more nodes and the rest of the cluster, a cluster partition occurs. The nodes of a cluster are split into partitions but still active. They can only communicate with nodes in the same partition and are unaware of the separated nodes. As the loss of the nodes on the other partition cannot be confirmed, a split brain scenario develops (see also split brain).
A resource that should be running on only one node in the cluster is running on several nodes.
Allow interaction with the in-kernel connection tracking system for enabling stateful packet inspection for iptables. Used by the High Availability Extension to synchronize the connection status between cluster nodes.
CRM (cluster resource manager)
The main management entity responsible for coordinating all non-local interactions. The High Availability Extension uses Pacemaker as CRM. Each node of the cluster has its own CRM instance, but the one running on the DC is the one elected to relay decisions to the other non-local CRMs and process their input. A CRM interacts with several components: local resource managers, both on its own node and on the other nodes, non-local CRMs, administrative commands, the fencing functionality, the membership layer, and booth.
crmd (cluster resource manager daemon)
The CRM is implemented as daemon, crmd. It has an instance on each cluster node. All cluster decision-making is centralized by electing one of the crmd instances to act as a master. If the elected crmd process fails (or the node it ran on), a new one is established.
The command line utility crmsh manages your cluster, nodes, and resources.
See Section 7.0, Configuring and Managing Cluster Resources (Command Line) for more information.
A synchronization tool that can be used to replicate configuration files across all nodes in the cluster, and even across Geo clusters.
DC (designated coordinator)
One CRM in the cluster is elected as the Designated Coordinator (DC). The DC is the only entity in the cluster that can decide that a cluster-wide change needs to be performed, such as fencing a node or moving resources around. The DC is also the node where the master copy of the CIB is kept. All other nodes get their configuration and resource allocation information from the current DC. The DC is elected from all nodes in the cluster after a membership change.
Unexpected interruption of critical infrastructure induced by nature, humans, hardware failure, or software bugs.
Disaster recovery is the process by which a business function is restored to the normal, steady state after a disaster.
Disaster Recover Plan
A strategy to recover from a disaster with minimum impact on IT infrastructure.
DLM (distributed lock manager)
DLM coordinates disk access for clustered file systems and administers file locking to increase performance and availability.
is a block device designed for building high availability clusters. The whole block device is mirrored via a dedicated network and is seen as a network RAID-1.
existing cluster is used to refer to any
cluster that consists of at least one node. Existing clusters have a basic
Corosync configuration that defines the communication channels, but
they do not necessarily have resource configuration yet.
Occurs when a resource or node fails on one machine and the affected resources are started on another node.
A named subset of cluster nodes that are eligible to run a cluster service if a node fails.
Describes the concept of preventing access to a shared resource by isolated or failing cluster members. Should a cluster node fail, it will be shut down or reset to prevent it from causing trouble. This way, resources are locked out of a node whose status is uncertain.
geo cluster (geographically dispersed cluster)
See Geo cluster.
The ability to make several servers participate in the same service and do the same work.
A single cluster in one location (for example, all nodes are located in one data center). Network latency can be neglected. Storage is typically accessed synchronously by all nodes.
LRM (local resource manager)
Responsible for performing operations on resources. It uses the resource
agent scripts to carry out these operations. The LRM is
dumb in that it does not know of any policy. It needs the
DC to tell it what to do.
mcastaddr (multicast address)
IP address to be used for multicasting by the Corosync executive. The IP address can either be IPv4 or IPv6.
mcastport (multicast port)
The port to use for cluster communication.
A single cluster that can stretch over multiple buildings or data centers, with all sites connected by fibre channel. Network latency is usually low (<5 ms for distances of approximately 20 miles). Storage is frequently replicated (mirroring or synchronous replication).
A technology used for a one-to-many communication within a network that can be used for cluster communication. Corosync supports both multicast and unicast.
Consists of multiple, geographically dispersed sites with a local cluster each. The sites communicate via IP. Failover across the sites is coordinated by a higher-level entity, the booth. Geo clusters need to cope with limited network bandwidth and high latency. Storage is replicated asynchronously.
Any computer (real or virtual) that is a member of a cluster and invisible to the user.
PE (policy engine)
The policy engine computes the actions that need to be taken to implement policy changes in the CIB. The PE also produces a transition graph containing a list of (resource) actions and dependencies to achieve the next cluster state. The PE always runs on the DC.
In a cluster, a cluster partition is defined to have quorum (is
quorate) if it has the majority of nodes (or votes).
Quorum distinguishes exactly one partition. It is part of the algorithm
to prevent several disconnected partitions or nodes from proceeding and
causing data and service corruption (split brain). Quorum is a
prerequisite for fencing, which then ensures that quorum is indeed
RA (resource agent)
A script acting as a proxy to manage a resource (for example, to start, stop or monitor a resource). The High Availability Extension supports three different kinds of resource agents: OCF (Open Cluster Framework) resource agents, LSB (Linux Standards Base) resource agents (Standard LSB init scripts), and Heartbeat resource agents. For more information, refer to Section 4.2.2, Supported Resource Agent Classes.
Rear (Relax and Recover)
An administrator tool set for creating disaster recovery images.
Any type of service or application that is known to Pacemaker. Examples include an IP address, a file system, or a database.
resource is also used for DRBD, where it
names a set of block devices that are using a common connection for
RRP (redundant ring protocol)
Allows the use of multiple redundant local area networks for resilience against partial or total network faults. This way, cluster communication can still be kept up as long as a single network is operational. Corosync supports the Totem Redundant Ring Protocol.
SBD (STONITH block device)
In an environment where all nodes have access to shared storage, a small partition is used for disk-based fencing.
SFEX (shared disk file exclusiveness)
SFEX provides storage protection over SAN.
A scenario in which the cluster nodes are divided into two or more
groups that do not know of each other (either through a software or
hardware failure). STONITH prevents a split brain situation from badly
affecting the entire cluster. Also known as a
The term split brain is also used in DRBD but means that the two nodes contain different data.
SPOF (single point of failure)
Any component of a cluster that, should it fail, triggers the failure of the entire cluster.
The acronym for
Shoot the other node in the head. It refers
to the fencing mechanism that shuts down a misbehaving node to
prevent it from causing trouble in a cluster.
Planned, on-demand moving of services to other nodes in a cluster. See failover.
A component used in Geo clusters. A ticket grants the right to run certain resources on a specific cluster site. A ticket can only be owned by one site at a time. Resources can be bound to a certain ticket by dependencies. Only if the defined ticket is available at a site, the respective resources are started. Vice versa, if the ticket is removed, the resources depending on that ticket are automatically stopped.
A technology for sending messages to a single network destination. Corosync supports both multicast and unicast. In Corosync, unicast is implemented as UDP-unicast (UDPU).