The holy cluster. Part 2 HAE Setup

July 13, 2017 | By: Chris Mertens

Welcome to part 2! Still not braindead after part one (Holy Cluster Part 1 ISCSI Setup)? Fantastic! So here we go. In part 1 I showed you how to setup ISCSI shared storage for the cluster. In this second part I will show you how to set up the HAE server, including creating the STONITH Block Devices (SBD) for the cluster “heartbeat/split brain avoidence” and also will go through the initial setup of the cluster until it is up and running.

So don’t waste time let’s continue with the fun part and compared to the ISCSI part the HAE part is the fairly easy section of the guide ;-).

1. Prerequisites

What you should have in place right now from part 1 is:

1 x SLES 12 SP2 Server running ISCSI LIO server

Shared storage for the SBD device and a shared device for later use in part 3.

2 x SLES 12 SP2 Server’s as cluster nodes + ISCSI initiator having the above devices mapped already.

Also the HAE High Availability Extension needs to be installed on those servers.

For the rest of this guide please also note the following naming:
ISCSI LIO Server: skinner.simpsons.gov

Cluster Nodes: bart.simpsons.gov
maggy.simpsons.gov

2. Creating the shared SBD device for cluster “heartbeat/split brain avoidence”

Let’s find the corresponding device for the SBD.

bart:~ # ll /dev/disk/by-path/
total 0
lrwxrwxrwx 1 root root  9 Jul 11 08:23 ip-192.168.2.235:3260-iscsi-iqn.2003-01.org.linux-iscsi.skinner.x8664:sn.1f9f4047108d-lun-0 -> ../../sdb
lrwxrwxrwx 1 root root  9 Jul 11 08:23 ip-192.168.2.235:3260-iscsi-iqn.2003-01.org.linux-iscsi.skinner.x8664:sn.1f9f4047108d-lun-1 -> ../../sdc
....

The above shows us that the sbd device (lun-0) is mapped to sdb. As we are going to use the device ID with the SBD config let’s find out the ID:

bart:~ # ll /dev/disk/by-id
total 0
lrwxrwxrwx 1 root root  9 Jul 10 19:42 scsi-36001405b17796fe35ea4f19917a2b53f -> ../../sdc
lrwxrwxrwx 1 root root  9 Jul 10 19:42 scsi-36001405c746d5e5b05c479eb99d12d97 -> ../../sdb
.....

The above shows us that the device ID for the corresponding device is

scsi-36001405c746d5e5b05c479eb99d12d97

Now we can create the SBD device (this needs to be done on both nodes bart and maggy):

bart:~ # sbd -d /dev/disk/by-id/scsi-36001405c746d5e5b05c479eb99d12d97 create
Initializing device /dev/disk/by-id/scsi-36001405c746d5e5b05c479eb99d12d97
Creating version 2.1 header on device 4 (uuid: 92eccbda-d6b6-4196-8e37-6b6d97379af1)
Initializing 255 slots on device 4
Device /dev/disk/by-id/scsi-36001405c746d5e5b05c479eb99d12d97 is initialized.

Looking up the ID for the SBD on maggy shows that we indeed are using the exact same devices here:

maggy:~ # ll /dev/disk/by-id
total 0
lrwxrwxrwx 1 root root  9 Jul 10 19:42 scsi-36001405b17796fe35ea4f19917a2b53f -> ../../sdc
lrwxrwxrwx 1 root root  9 Jul 10 19:42 scsi-36001405c746d5e5b05c479eb99d12d97 -> ../../sdb
.....

To enable the watchdog module kernel which is needed for fencing (aka. SMITH shoot myself in the head) please load it into the running kernel:

bart:~ # modprobe -v softdog
insmod /lib/modules/4.4.59-92.24-default/kernel/drivers/watchdog/softdog.ko

To make the module being loaded persistent also during reboot please create a file under /etc/modules-load.d/ :

bart:~ # vi /etc/modules-load.d/softdog.conf

With the following content:

softdog

Now we need to enable the sbd service to also be automatically started during boot:

maggy:~ # systemctl enable sbd.service

Dumping the SBD configuration it should look like this:

bart:~ # sbd -d /dev/disk/by-id/scsi-36001405c746d5e5b05c479eb99d12d97 dump
==Dumping header on disk /dev/disk/by-id/scsi-36001405c746d5e5b05c479eb99d12d97
Header version     : 2.1
UUID               : 4017caae-6a35-41bf-aec6-2ea838d10a9d
Number of slots    : 255
Sector size        : 512
Timeout (watchdog) : 5
Timeout (allocate) : 2
Timeout (loop)     : 1
Timeout (msgwait)  : 10
==Header on disk /dev/disk/by-id/scsi-36001405c746d5e5b05c479eb99d12d97 is dumped

3. Creating initial cluster configuration

First we need to copy the corosync.conf (network communication between cluster nodes) template under /etc/corosync/ to enable the ha-cluster-init script to write it’s configuration to it on initial cluster configuration:

bart:~ # cd /etc/corosync/

bart:/etc/corosync # ll
total 8
lrwxrwxrwx 1 root root 54 Jul 10 18:30 corosync.conf.example -> /usr/share/doc/packages/corosync/corosync.conf.example
lrwxrwxrwx 1 root root 59 Jul 10 18:30 corosync.conf.example.unicast -> /usr/share/doc/packages/corosync/corosync.conf.example.udpu
drwxr-xr-x 1 root root  0 Sep 28  2016 uidgid.d

bart:/etc/corosync # cp corosync.conf.example corosync.conf

Next we will start the ha-cluster-init script to create the initial cluster configuration. Please see the script’s output below in order to know what to answer for each question raised during the script’s execution:

bart:/etc/corosync # ha-cluster-init
Enabling sshd.service
Generating ssh key
Configuring csync2
Generating csync2 shared key (this may take a while)...done
Enabling csync2.socket
csync2 checking files

Configure Corosync:
This will configure the cluster messaging layer. You will need
to specify a network address over which to communicate (default
is eth0's network, but you can use the network address of any
active interface), a multicast address and multicast port.

/etc/corosync/corosync.conf already exists - overwrite? [y/N] y
Network address to bind to (e.g.: 192.168.1.0) [192.168.2.0]
Multicast address (e.g.: 239.x.x.x) [239.7.10.129]
Multicast port [5405]

Configure SBD:
If you have shared storage, for example a SAN or iSCSI target,
you can use it avoid split-brain scenarios by configuring SBD.
This requires a 1 MB partition, accessible to all nodes in the
cluster. The device path must be persistent and consistent
across all nodes in the cluster, so /dev/disk/by-id/* devices
are a good choice. Note that all data on the partition you
specify here will be destroyed.

Do you wish to use SBD? [y/N] y
Path to storage device (e.g. /dev/disk/by-id/...) [] /dev/disk/by-id/scsi-36001405c746d5e5b05c479eb99d12d97
All data on /dev/disk/by-id/scsi-36001405c746d5e5b05c479eb99d12d97 will be destroyed
Are you sure you wish to use this device [y/N] y
Initializing SBD......done
Enabling hawk.service
HA Web Konsole is now running, to see cluster status go to:
https://192.168.2.238:7630/
Log in with username 'hacluster', password 'linux'
WARNING: You should change the hacluster password to something more secure!
Enabling pacemaker.service
Waiting for cluster........done
Loading initial configuration

Configure Administration IP Address:
Optionally configure an administration virtual IP
address. The purpose of this IP address is to
provide a single IP that can be used to interact
with the cluster, rather than using the IP address
of any specific cluster node.

Do you wish to configure an administration IP? [y/N] N
Done (log saved to /var/log/ha-cluster-bootstrap.log)

If all worked out and ran through without any error then congratulations you just configured your first cluster node and should be able to monitor if it is up and running now via the cluster resource manager monitor:

bart:/etc/corosync # crm_mon

The output should look like this:

Stack: corosync
Current DC: bart (version 1.1.15-21.1-e174ec8) - partition with quorum
Last updated: Mon Jul 10 19:55:31 2017
Last change: Mon Jul 10 19:54:38 2017 by root via cibadmin on bart

1 node configured
1 resource configured

Online: [ bart ]

stonith-sbd     (stonith:external/sbd): Started bart

In case the above worked out and the first cluster node bart is running happily it is time to set him a friend aside. So we will configure the second node maggy to join the cluster:

On the second node executing the ha-cluster-join script is actually all that we need to do if all the other prerequisites have been met:

maggy:/etc/corosync # ha-cluster-join

Join This Node to Cluster:
  You will be asked for the IP address of an existing node, from which
  configuration will be copied.  If you have not already configured
  passwordless ssh between nodes, you will be prompted for the root
  password of the existing node.

  IP address or hostname of existing node (e.g.: 192.168.1.1) [] 192.168.2.238
  Enabling sshd.service
  Retrieving SSH keys from 192.168.2.238
Password:
  One new SSH key installed
  Configuring csync2
  Enabling csync2.socket
  Merging known_hosts
  Probing for new partitions......done
  Enabling hawk.service
    HA Web Konsole is now running, to see cluster status go to:
      https://192.168.2.239:7630/
    Log in with username 'hacluster', password 'linux'
WARNING: You should change the hacluster password to something more secure!
  Enabling pacemaker.service
  Waiting for cluster....done
  Done (log saved to /var/log/ha-cluster-bootstrap.log)

If all worked out and ran through without any error then congratulations you just configured your second cluster node and should be able to monitor if it is up and running now via the cluster resource manager monitor as well. That was easy wasn’t it ;-)!? :

maggy:/etc/corosync # crm_mon

Stack: corosync
Current DC: bart (version 1.1.15-21.1-e174ec8) - partition with quorum
Last updated: Mon Jul 10 19:58:26 2017
Last change: Mon Jul 10 19:57:30 2017 by hacluster via crmd on bart

2 nodes configured
1 resource configured

Online: [ bart maggy ]

stonith-sbd     (stonith:external/sbd): Started bart

Executing cluster resource manager monitor on the first node bart again should also show the second node maggy now as well:

bart:/etc/corosync # crm_mon

Stack: corosync
Current DC: bart (version 1.1.15-21.1-e174ec8) - partition with quorum
Last updated: Mon Jul 10 19:59:04 2017
Last change: Mon Jul 10 19:54:38 2017 by root via cibadmin on bart

2 nodes configured
1 resource configured

Online: [ bart maggy ]

stonith-sbd     (stonith:external/sbd): Started bart

4. Optionals, additionals and basic config changes

First of all congratulations again you just configured your first SLES 12 SP2 HAE cluster! Now you can start playing around with it.

As a first thing you can check that in case one node goes down the other node will take over and the second node comes back after rebooting fine joining the cluster again. For this you will need to “shoot node one into the head” via executing:

bart:/etc/corosync # echo b > /proc/sysrq-trigger

While executing the above and shooting bart into the head have the cluster resource manager monitor running on maggy to monitor what happens.

maggy:/etc/corosync # crm_mon

Stack: corosync
Current DC: maggy (version 1.1.15-21.1-e174ec8) - partition with quorum
Last updated: Mon Jul 10 20:01:55 2017
Last change: Mon Jul 10 19:57:30 2017 by hacluster via crmd on bart

2 nodes configured
1 resource configured

Node bart: UNCLEAN (offline)
Online: [ maggy ]

stonith-sbd     (stonith:external/sbd): Started[ bart maggy ]

While bart is being rebooted crm_mon will show the following:

Stack: corosync
Current DC: maggy (version 1.1.15-21.1-e174ec8) - partition with quorum
Last updated: Mon Jul 10 20:02:13 2017
Last change: Mon Jul 10 19:57:30 2017 by hacluster via crmd on bart

2 nodes configured
1 resource configured

Online: [ maggy ]
OFFLINE: [ bart ]

stonith-sbd     (stonith:external/sbd): Started maggy

After bart restarted the crm_mon will show the following again:

Stack: corosync
Current DC: maggy (version 1.1.15-21.1-e174ec8) - partition with quorum
Last updated: Mon Jul 10 20:02:42 2017
Last change: Mon Jul 10 19:57:30 2017 by hacluster via crmd on bart

2 nodes configured
1 resource configured

Online: [ bart maggy ]

stonith-sbd     (stonith:external/sbd): Started maggy

You also can have a look at the current cluster configuration:

maggy:/etc/corosync # crm configure show
node 1084752622: bart
node 1084752623: maggy
primitive stonith-sbd stonith:external/sbd \
        params pcmk_delay_max=30s
property cib-bootstrap-options: \
        have-watchdog=true \
        dc-version=1.1.15-21.1-e174ec8 \
        cluster-infrastructure=corosync \
        cluster-name=hacluster \
        stonith-enabled=true \
        placement-strategy=balanced
rsc_defaults rsc-options: \
        resource-stickiness=1 \
        migration-threshold=3
op_defaults op-options: \
        timeout=600 \
        record-pending=true

Executing the same on bart shows the exact same configuration:

bart:~ # crm configure show
node 1084752622: bart
node 1084752623: maggy
primitive stonith-sbd stonith:external/sbd \
        params pcmk_delay_max=30s
property cib-bootstrap-options: \
        have-watchdog=true \
        dc-version=1.1.15-21.1-e174ec8 \
        cluster-infrastructure=corosync \
        cluster-name=hacluster \
        stonith-enabled=true \
        placement-strategy=balanced
rsc_defaults rsc-options: \
        resource-stickiness=1 \
        migration-threshold=3
op_defaults op-options: \
        timeout=600 \
        record-pending=true

Some additional and useful options for crm_mon are:

maggy:/etc/corosync # crm_mon -n

Stack: corosync
Current DC: maggy (version 1.1.15-21.1-e174ec8) - partition with quorum
Last updated: Mon Jul 10 20:08:19 2017
Last change: Mon Jul 10 19:57:30 2017 by hacluster via crmd on bart

2 nodes configured
1 resource configured

Node bart: online
Node maggy: online
        stonith-sbd     (stonith:external/sbd): Started

As you can see the above shows resources grouped by node.

Another useful option to only execute the command once and then exit the monitor again so that you do not need to “Ctrl + C” out of it is the following:

maggy:/etc/corosync # crm_mon -1
Stack: corosync
Current DC: maggy (version 1.1.15-21.1-e174ec8) - partition with quorum
Last updated: Mon Jul 10 20:09:07 2017
Last change: Mon Jul 10 19:57:30 2017 by hacluster via crmd on bart

2 nodes configured
1 resource configured

Online: [ bart maggy ]

 stonith-sbd    (stonith:external/sbd): Started maggy
maggy:~ #

As you can see the above only get’s the cluster status once and then exits again.

Next you could add another Dummy resource just to be able to play around even more with it. And it will be needed for a possible third part of the guide which will show some more advanced cluster configuration.

The following will create a Dummy resource called dummy:

maggy:/etc/corosync # crm configure primitive dummy Dummy

Now looking at the cluster configuration again it’ll look like this:

maggy:/etc/corosync # crm configure show
node 1084752622: bart
node 1084752623: maggy
primitive dummy Dummy
primitive stonith-sbd stonith:external/sbd \
        params pcmk_delay_max=30s
property cib-bootstrap-options: \
        have-watchdog=true \
        dc-version=1.1.15-21.1-e174ec8 \
        cluster-infrastructure=corosync \
        cluster-name=hacluster \
        stonith-enabled=true \
        placement-strategy=balanced
rsc_defaults rsc-options: \
        resource-stickiness=1 \
        migration-threshold=3
op_defaults op-options: \
        timeout=600 \
        record-pending=true

You see primitive dummy Dummy has been added to the configuration and it’ll show on the second node as well as it did get synced over immediately.

Cluster resource manager monitor will also immediately show the new Dummy resource as well:

maggy:/etc/corosync # crm_mon -n

Stack: corosync
Current DC: maggy (version 1.1.15-21.1-e174ec8) - partition with quorum
Last updated: Mon Jul 10 20:11:33 2017
Last change: Mon Jul 10 20:09:49 2017 by root via cibadmin on maggy

2 nodes configured
2 resources configured

Node bart: online
        dummy   (ocf::heartbeat:Dummy): Started
Node maggy: online
        stonith-sbd     (stonith:external/sbd): Started

For sure you can also combine the crm_mon options as outlined below using “-n1”:

bart:~ # crm_mon -n1
Stack: corosync
Current DC: maggy (version 1.1.15-21.1-e174ec8) - partition with quorum
Last updated: Mon Jul 10 20:12:42 2017
Last change: Mon Jul 10 20:09:49 2017 by root via cibadmin on maggy

2 nodes configured
2 resources configured

Node bart: online
        dummy   (ocf::heartbeat:Dummy): Started
Node maggy: online
        stonith-sbd     (stonith:external/sbd): Started
bart:~ #

Last but not least as an optional “basic” configuration task we can change the configuration for corosync (network communication) to use unicast instead of always flodding the network with multicast. Only thing to watch out here is that when adding new nodes to the cluster please make sure they have been really added via the ha-cluster-join script to the corosyn.conf into the nodelist section.

To configure unicast communication the following needs to be done.

Stop the pacemaker service on all nodes:

bart:~ # systemctl stop pacemaker

Check the status of pacemaker and corosync service and that they are indeed stopped

bart:~ # systemctl status pacemaker

bart:~ # systemctl status corosync

Edit the the corosync.conf and add the following highlighted in bold below and mark out the bold section as well:

bart:~ # vi /etc/corosync/corosync.conf

# Please read the corosync.conf.5 manual page

totem {
        version:        2
        secauth:        on
        crypto_hash:    sha1
        crypto_cipher:  aes256
        cluster_name:   hacluster
        clear_node_high_bit: yes

        token:          5000
        token_retransmits_before_loss_const: 10
        join:           60
        consensus:      6000
        max_messages:   20

        interface {
                ringnumber:     0
                bindnetaddr:    192.168.2.0
#               mcastaddr:      239.7.10.129
                mcastport:      5405
                ttl:            1
        }
        transport: updu
}
logging {
        fileline:       off
        to_stderr:      no
        to_logfile:     no
        logfile:        /var/log/cluster/corosync.log
        to_syslog:      yes
        debug:          off
        timestamp:      on
        logger_subsys {
                subsys: QUORUM
                debug:  off
        }
}
nodelist {
        node {
                ring0_addr: 192.168.2.238
                nodeid: 1
        }
        node {
                ring0_addr: 192.168.2.239
                nodeid: 2
        }
}
quorum {
        # Enable and configure quorum subsystem (default: off)
        # see also corosync.conf.5 and votequorum.5
        provider: corosync_votequorum
        expected_votes: 2
        two_node: 1
}

After the above has been changed on any of the nodes we need to manually sync it to the other node. Hm sync it without having the cluster running how should this work? Yeah we could use scp to do that but a fairly easier way is to just use what cluster configuration added for use as a nice tool. All is already setup ssh keys have been exchanged and so on. The tool that we are going to use here is csync2.

Perform the following in order to sync the changed corosync.conf from the current cluster node to all other nodes within the cluster ring:

bart:~ # cd /etc/corosync/

bart:/etc/corosync # csync2 -c corosync.conf -u

Last but least we need to restart the cluster again on all the nodes:

bart:~ # systemctl start pacemaker

So that was it for part 2. I hope I will be able to release part 3 soon after my summer vacation. Hope you had fun!? Take care and happy clustering!

Feb 08th, 2023

Retailers keep sales flowing by working with Flooid and SUSE

Patricia Gautry

Feb 06th, 2024

SUSE and IBM: Bringing the Mainframe to the Masses

Stacey Miller

Oct 18th, 2022

Enriched system visibility in the SUSE Customer Center

Hernán Schmidt

Dec 18th, 2023

Python 3.11 Stack for SUSE Linux Enterprise 15

Dirk Müller

Comments

mohsinalmelkar says:

July 14, 2017 at 10:20 am

Its very usable …Thank you very much for this .

Leilei says:

October 18, 2018 at 9:23 am

Thanks. Chris.

I just have one question, i try to using ha-cluster-join script to join a node to the exsited cluster, then run to waiting for cluster for a long time , no output. I don’t know to check this issue.

Chris Mertens says:

November 16, 2018 at 12:08 pm

Depending on which version of SLES you are using you might run into issues when using unicast as outlined in my guide when joining new nodes into the cluster.

Maybe the following TID will help you solving this issue:

https://www.suse.com/de-de/support/kb/doc/?id=7021065

Reply

Dayal says:

January 15, 2019 at 3:07 am

Hi Chris

I need you advice , for some reason the SBD device corrupted and i deleted the SBD partition , i wanted to create back and it fails and says it already exists in the FS , how to fix this sistuation.
Thank you
Dayal

Chris Mertens says:

January 15, 2019 at 7:07 am

Hi Dayal,

yes this can happen in very rare cases. Please have a look at the following TID in order to re-create the sbd and hopefully fix your issue:

https://www.suse.com/de-de/support/kb/doc/?id=7018194

Regards,
Chris

Reply

The holy cluster. Part 2 HAE Setup

1. Prerequisites

2. Creating the shared SBD device for cluster “heartbeat/split brain avoidence”

3. Creating initial cluster configuration

4. Optionals, additionals and basic config changes

Related Articles

Retailers keep sales flowing by working with Flooid and SUSE

SUSE and IBM: Bringing the Mainframe to the Masses

Enriched system visibility in the SUSE Customer Center

Python 3.11 Stack for SUSE Linux Enterprise 15

Comments

Leave a Reply Cancel reply

The holy cluster. Part 2 HAE Setup

1. Prerequisites

2. Creating the shared SBD device for cluster “heartbeat/split brain avoidence”

3. Creating initial cluster configuration

4. Optionals, additionals and basic config changes

Related Articles

Retailers keep sales flowing by working with Flooid and SUSE

SUSE and IBM: Bringing the Mainframe to the Masses

Enriched system visibility in the SUSE Customer Center

Python 3.11 Stack for SUSE Linux Enterprise 15

Comments

Leave a Reply Cancel reply

Business-Critical Linux

Enterprise Container Management

Edge

Solutions

Industries

Support

Services

Resources

Partners

Communities

About