The holy cluster. Part 2 HAE Setup
Welcome to part 2! Still not braindead after part one (Holy Cluster Part 1 ISCSI Setup)? Fantastic! So here we go. In part 1 I showed you how to setup ISCSI shared storage for the cluster. In this second part I will show you how to set up the HAE server, including creating the STONITH Block Devices (SBD) for the cluster “heartbeat/split brain avoidence” and also will go through the initial setup of the cluster until it is up and running.
So don’t waste time let’s continue with the fun part and compared to the ISCSI part the HAE part is the fairly easy section of the guide ;-).
1. Prerequisites
What you should have in place right now from part 1 is:
1 x SLES 12 SP2 Server running ISCSI LIO server
Shared storage for the SBD device and a shared device for later use in part 3.
2 x SLES 12 SP2 Server’s as cluster nodes + ISCSI initiator having the above devices mapped already.
Also the HAE High Availability Extension needs to be installed on those servers.
For the rest of this guide please also note the following naming:
ISCSI LIO Server: skinner.simpsons.gov
Cluster Nodes: bart.simpsons.gov
maggy.simpsons.gov
2. Creating the shared SBD device for cluster “heartbeat/split brain avoidence”
Let’s find the corresponding device for the SBD.
bart:~ # ll /dev/disk/by-path/ total 0 lrwxrwxrwx 1 root root 9 Jul 11 08:23 ip-192.168.2.235:3260-iscsi-iqn.2003-01.org.linux-iscsi.skinner.x8664:sn.1f9f4047108d-lun-0 -> ../../sdb lrwxrwxrwx 1 root root 9 Jul 11 08:23 ip-192.168.2.235:3260-iscsi-iqn.2003-01.org.linux-iscsi.skinner.x8664:sn.1f9f4047108d-lun-1 -> ../../sdc ....
The above shows us that the sbd device (lun-0) is mapped to sdb. As we are going to use the device ID with the SBD config let’s find out the ID:
bart:~ # ll /dev/disk/by-id total 0 lrwxrwxrwx 1 root root 9 Jul 10 19:42 scsi-36001405b17796fe35ea4f19917a2b53f -> ../../sdc lrwxrwxrwx 1 root root 9 Jul 10 19:42 scsi-36001405c746d5e5b05c479eb99d12d97 -> ../../sdb .....
The above shows us that the device ID for the corresponding device is
scsi-36001405c746d5e5b05c479eb99d12d97
Now we can create the SBD device (this needs to be done on both nodes bart and maggy):
bart:~ # sbd -d /dev/disk/by-id/scsi-36001405c746d5e5b05c479eb99d12d97 create Initializing device /dev/disk/by-id/scsi-36001405c746d5e5b05c479eb99d12d97 Creating version 2.1 header on device 4 (uuid: 92eccbda-d6b6-4196-8e37-6b6d97379af1) Initializing 255 slots on device 4 Device /dev/disk/by-id/scsi-36001405c746d5e5b05c479eb99d12d97 is initialized.
Looking up the ID for the SBD on maggy shows that we indeed are using the exact same devices here:
maggy:~ # ll /dev/disk/by-id total 0 lrwxrwxrwx 1 root root 9 Jul 10 19:42 scsi-36001405b17796fe35ea4f19917a2b53f -> ../../sdc lrwxrwxrwx 1 root root 9 Jul 10 19:42 scsi-36001405c746d5e5b05c479eb99d12d97 -> ../../sdb .....
To enable the watchdog module kernel which is needed for fencing (aka. SMITH shoot myself in the head) please load it into the running kernel:
bart:~ # modprobe -v softdog insmod /lib/modules/4.4.59-92.24-default/kernel/drivers/watchdog/softdog.ko
To make the module being loaded persistent also during reboot please create a file under /etc/modules-load.d/ :
bart:~ # vi /etc/modules-load.d/softdog.conf
With the following content:
softdog
Now we need to enable the sbd service to also be automatically started during boot:
maggy:~ # systemctl enable sbd.service
Dumping the SBD configuration it should look like this:
bart:~ # sbd -d /dev/disk/by-id/scsi-36001405c746d5e5b05c479eb99d12d97 dump ==Dumping header on disk /dev/disk/by-id/scsi-36001405c746d5e5b05c479eb99d12d97 Header version : 2.1 UUID : 4017caae-6a35-41bf-aec6-2ea838d10a9d Number of slots : 255 Sector size : 512 Timeout (watchdog) : 5 Timeout (allocate) : 2 Timeout (loop) : 1 Timeout (msgwait) : 10 ==Header on disk /dev/disk/by-id/scsi-36001405c746d5e5b05c479eb99d12d97 is dumped
3. Creating initial cluster configuration
First we need to copy the corosync.conf (network communication between cluster nodes) template under /etc/corosync/ to enable the ha-cluster-init script to write it’s configuration to it on initial cluster configuration:
bart:~ # cd /etc/corosync/
bart:/etc/corosync # ll total 8 lrwxrwxrwx 1 root root 54 Jul 10 18:30 corosync.conf.example -> /usr/share/doc/packages/corosync/corosync.conf.example lrwxrwxrwx 1 root root 59 Jul 10 18:30 corosync.conf.example.unicast -> /usr/share/doc/packages/corosync/corosync.conf.example.udpu drwxr-xr-x 1 root root 0 Sep 28 2016 uidgid.d
bart:/etc/corosync # cp corosync.conf.example corosync.conf
Next we will start the ha-cluster-init script to create the initial cluster configuration. Please see the script’s output below in order to know what to answer for each question raised during the script’s execution:
bart:/etc/corosync # ha-cluster-init Enabling sshd.service Generating ssh key Configuring csync2 Generating csync2 shared key (this may take a while)...done Enabling csync2.socket csync2 checking files Configure Corosync: This will configure the cluster messaging layer. You will need to specify a network address over which to communicate (default is eth0's network, but you can use the network address of any active interface), a multicast address and multicast port. /etc/corosync/corosync.conf already exists - overwrite? [y/N] y Network address to bind to (e.g.: 192.168.1.0) [192.168.2.0] Multicast address (e.g.: 239.x.x.x) [239.7.10.129] Multicast port [5405] Configure SBD: If you have shared storage, for example a SAN or iSCSI target, you can use it avoid split-brain scenarios by configuring SBD. This requires a 1 MB partition, accessible to all nodes in the cluster. The device path must be persistent and consistent across all nodes in the cluster, so /dev/disk/by-id/* devices are a good choice. Note that all data on the partition you specify here will be destroyed. Do you wish to use SBD? [y/N] y Path to storage device (e.g. /dev/disk/by-id/...) [] /dev/disk/by-id/scsi-36001405c746d5e5b05c479eb99d12d97 All data on /dev/disk/by-id/scsi-36001405c746d5e5b05c479eb99d12d97 will be destroyed Are you sure you wish to use this device [y/N] y Initializing SBD......done Enabling hawk.service HA Web Konsole is now running, to see cluster status go to: https://192.168.2.238:7630/ Log in with username 'hacluster', password 'linux' WARNING: You should change the hacluster password to something more secure! Enabling pacemaker.service Waiting for cluster........done Loading initial configuration Configure Administration IP Address: Optionally configure an administration virtual IP address. The purpose of this IP address is to provide a single IP that can be used to interact with the cluster, rather than using the IP address of any specific cluster node. Do you wish to configure an administration IP? [y/N] N Done (log saved to /var/log/ha-cluster-bootstrap.log)
If all worked out and ran through without any error then congratulations you just configured your first cluster node and should be able to monitor if it is up and running now via the cluster resource manager monitor:
bart:/etc/corosync # crm_mon
The output should look like this:
Stack: corosync Current DC: bart (version 1.1.15-21.1-e174ec8) - partition with quorum Last updated: Mon Jul 10 19:55:31 2017 Last change: Mon Jul 10 19:54:38 2017 by root via cibadmin on bart 1 node configured 1 resource configured Online: [ bart ] stonith-sbd (stonith:external/sbd): Started bart
In case the above worked out and the first cluster node bart is running happily it is time to set him a friend aside. So we will configure the second node maggy to join the cluster:
On the second node executing the ha-cluster-join script is actually all that we need to do if all the other prerequisites have been met:
maggy:/etc/corosync # ha-cluster-join Join This Node to Cluster: You will be asked for the IP address of an existing node, from which configuration will be copied. If you have not already configured passwordless ssh between nodes, you will be prompted for the root password of the existing node. IP address or hostname of existing node (e.g.: 192.168.1.1) [] 192.168.2.238 Enabling sshd.service Retrieving SSH keys from 192.168.2.238 Password: One new SSH key installed Configuring csync2 Enabling csync2.socket Merging known_hosts Probing for new partitions......done Enabling hawk.service HA Web Konsole is now running, to see cluster status go to: https://192.168.2.239:7630/ Log in with username 'hacluster', password 'linux' WARNING: You should change the hacluster password to something more secure! Enabling pacemaker.service Waiting for cluster....done Done (log saved to /var/log/ha-cluster-bootstrap.log)
If all worked out and ran through without any error then congratulations you just configured your second cluster node and should be able to monitor if it is up and running now via the cluster resource manager monitor as well. That was easy wasn’t it ;-)!? :
maggy:/etc/corosync # crm_mon Stack: corosync Current DC: bart (version 1.1.15-21.1-e174ec8) - partition with quorum Last updated: Mon Jul 10 19:58:26 2017 Last change: Mon Jul 10 19:57:30 2017 by hacluster via crmd on bart 2 nodes configured 1 resource configured Online: [ bart maggy ] stonith-sbd (stonith:external/sbd): Started bart
Executing cluster resource manager monitor on the first node bart again should also show the second node maggy now as well:
bart:/etc/corosync # crm_mon Stack: corosync Current DC: bart (version 1.1.15-21.1-e174ec8) - partition with quorum Last updated: Mon Jul 10 19:59:04 2017 Last change: Mon Jul 10 19:54:38 2017 by root via cibadmin on bart 2 nodes configured 1 resource configured Online: [ bart maggy ] stonith-sbd (stonith:external/sbd): Started bart
4. Optionals, additionals and basic config changes
First of all congratulations again you just configured your first SLES 12 SP2 HAE cluster! Now you can start playing around with it.
As a first thing you can check that in case one node goes down the other node will take over and the second node comes back after rebooting fine joining the cluster again. For this you will need to “shoot node one into the head” via executing:
bart:/etc/corosync # echo b > /proc/sysrq-trigger
While executing the above and shooting bart into the head have the cluster resource manager monitor running on maggy to monitor what happens.
maggy:/etc/corosync # crm_mon Stack: corosync Current DC: maggy (version 1.1.15-21.1-e174ec8) - partition with quorum Last updated: Mon Jul 10 20:01:55 2017 Last change: Mon Jul 10 19:57:30 2017 by hacluster via crmd on bart 2 nodes configured 1 resource configured Node bart: UNCLEAN (offline) Online: [ maggy ] stonith-sbd (stonith:external/sbd): Started[ bart maggy ]
While bart is being rebooted crm_mon will show the following:
Stack: corosync Current DC: maggy (version 1.1.15-21.1-e174ec8) - partition with quorum Last updated: Mon Jul 10 20:02:13 2017 Last change: Mon Jul 10 19:57:30 2017 by hacluster via crmd on bart 2 nodes configured 1 resource configured Online: [ maggy ] OFFLINE: [ bart ] stonith-sbd (stonith:external/sbd): Started maggy
After bart restarted the crm_mon will show the following again:
Stack: corosync Current DC: maggy (version 1.1.15-21.1-e174ec8) - partition with quorum Last updated: Mon Jul 10 20:02:42 2017 Last change: Mon Jul 10 19:57:30 2017 by hacluster via crmd on bart 2 nodes configured 1 resource configured Online: [ bart maggy ] stonith-sbd (stonith:external/sbd): Started maggy
You also can have a look at the current cluster configuration:
maggy:/etc/corosync # crm configure show node 1084752622: bart node 1084752623: maggy primitive stonith-sbd stonith:external/sbd \ params pcmk_delay_max=30s property cib-bootstrap-options: \ have-watchdog=true \ dc-version=1.1.15-21.1-e174ec8 \ cluster-infrastructure=corosync \ cluster-name=hacluster \ stonith-enabled=true \ placement-strategy=balanced rsc_defaults rsc-options: \ resource-stickiness=1 \ migration-threshold=3 op_defaults op-options: \ timeout=600 \ record-pending=true
Executing the same on bart shows the exact same configuration:
bart:~ # crm configure show node 1084752622: bart node 1084752623: maggy primitive stonith-sbd stonith:external/sbd \ params pcmk_delay_max=30s property cib-bootstrap-options: \ have-watchdog=true \ dc-version=1.1.15-21.1-e174ec8 \ cluster-infrastructure=corosync \ cluster-name=hacluster \ stonith-enabled=true \ placement-strategy=balanced rsc_defaults rsc-options: \ resource-stickiness=1 \ migration-threshold=3 op_defaults op-options: \ timeout=600 \ record-pending=true
Some additional and useful options for crm_mon are:
maggy:/etc/corosync # crm_mon -n Stack: corosync Current DC: maggy (version 1.1.15-21.1-e174ec8) - partition with quorum Last updated: Mon Jul 10 20:08:19 2017 Last change: Mon Jul 10 19:57:30 2017 by hacluster via crmd on bart 2 nodes configured 1 resource configured Node bart: online Node maggy: online stonith-sbd (stonith:external/sbd): Started
As you can see the above shows resources grouped by node.
Another useful option to only execute the command once and then exit the monitor again so that you do not need to “Ctrl + C” out of it is the following:
maggy:/etc/corosync # crm_mon -1 Stack: corosync Current DC: maggy (version 1.1.15-21.1-e174ec8) - partition with quorum Last updated: Mon Jul 10 20:09:07 2017 Last change: Mon Jul 10 19:57:30 2017 by hacluster via crmd on bart 2 nodes configured 1 resource configured Online: [ bart maggy ] stonith-sbd (stonith:external/sbd): Started maggy maggy:~ #
As you can see the above only get’s the cluster status once and then exits again.
Next you could add another Dummy resource just to be able to play around even more with it. And it will be needed for a possible third part of the guide which will show some more advanced cluster configuration.
The following will create a Dummy resource called dummy:
maggy:/etc/corosync # crm configure primitive dummy Dummy
Now looking at the cluster configuration again it’ll look like this:
maggy:/etc/corosync # crm configure show node 1084752622: bart node 1084752623: maggy primitive dummy Dummy primitive stonith-sbd stonith:external/sbd \ params pcmk_delay_max=30s property cib-bootstrap-options: \ have-watchdog=true \ dc-version=1.1.15-21.1-e174ec8 \ cluster-infrastructure=corosync \ cluster-name=hacluster \ stonith-enabled=true \ placement-strategy=balanced rsc_defaults rsc-options: \ resource-stickiness=1 \ migration-threshold=3 op_defaults op-options: \ timeout=600 \ record-pending=true
You see primitive dummy Dummy has been added to the configuration and it’ll show on the second node as well as it did get synced over immediately.
Cluster resource manager monitor will also immediately show the new Dummy resource as well:
maggy:/etc/corosync # crm_mon -n Stack: corosync Current DC: maggy (version 1.1.15-21.1-e174ec8) - partition with quorum Last updated: Mon Jul 10 20:11:33 2017 Last change: Mon Jul 10 20:09:49 2017 by root via cibadmin on maggy 2 nodes configured 2 resources configured Node bart: online dummy (ocf::heartbeat:Dummy): Started Node maggy: online stonith-sbd (stonith:external/sbd): Started
For sure you can also combine the crm_mon options as outlined below using “-n1”:
bart:~ # crm_mon -n1 Stack: corosync Current DC: maggy (version 1.1.15-21.1-e174ec8) - partition with quorum Last updated: Mon Jul 10 20:12:42 2017 Last change: Mon Jul 10 20:09:49 2017 by root via cibadmin on maggy 2 nodes configured 2 resources configured Node bart: online dummy (ocf::heartbeat:Dummy): Started Node maggy: online stonith-sbd (stonith:external/sbd): Started bart:~ #
Last but not least as an optional “basic” configuration task we can change the configuration for corosync (network communication) to use unicast instead of always flodding the network with multicast. Only thing to watch out here is that when adding new nodes to the cluster please make sure they have been really added via the ha-cluster-join script to the corosyn.conf into the nodelist section.
To configure unicast communication the following needs to be done.
Stop the pacemaker service on all nodes:
bart:~ # systemctl stop pacemaker
Check the status of pacemaker and corosync service and that they are indeed stopped
bart:~ # systemctl status pacemaker
bart:~ # systemctl status corosync
Edit the the corosync.conf and add the following highlighted in bold below and mark out the bold section as well:
bart:~ # vi /etc/corosync/corosync.conf # Please read the corosync.conf.5 manual page totem { version: 2 secauth: on crypto_hash: sha1 crypto_cipher: aes256 cluster_name: hacluster clear_node_high_bit: yes token: 5000 token_retransmits_before_loss_const: 10 join: 60 consensus: 6000 max_messages: 20 interface { ringnumber: 0 bindnetaddr: 192.168.2.0 # mcastaddr: 239.7.10.129 mcastport: 5405 ttl: 1 } transport: updu } logging { fileline: off to_stderr: no to_logfile: no logfile: /var/log/cluster/corosync.log to_syslog: yes debug: off timestamp: on logger_subsys { subsys: QUORUM debug: off } } nodelist { node { ring0_addr: 192.168.2.238 nodeid: 1 } node { ring0_addr: 192.168.2.239 nodeid: 2 } } quorum { # Enable and configure quorum subsystem (default: off) # see also corosync.conf.5 and votequorum.5 provider: corosync_votequorum expected_votes: 2 two_node: 1 }
After the above has been changed on any of the nodes we need to manually sync it to the other node. Hm sync it without having the cluster running how should this work? Yeah we could use scp to do that but a fairly easier way is to just use what cluster configuration added for use as a nice tool. All is already setup ssh keys have been exchanged and so on. The tool that we are going to use here is csync2.
Perform the following in order to sync the changed corosync.conf from the current cluster node to all other nodes within the cluster ring:
bart:~ # cd /etc/corosync/
bart:/etc/corosync # csync2 -c corosync.conf -u
Last but least we need to restart the cluster again on all the nodes:
bart:~ # systemctl start pacemaker
So that was it for part 2. I hope I will be able to release part 3 soon after my summer vacation. Hope you had fun!? Take care and happy clustering!
Comments
Its very usable …Thank you very much for this .
Thanks. Chris.
I just have one question, i try to using ha-cluster-join script to join a node to the exsited cluster, then run to waiting for cluster for a long time , no output. I don’t know to check this issue.
Depending on which version of SLES you are using you might run into issues when using unicast as outlined in my guide when joining new nodes into the cluster.
Maybe the following TID will help you solving this issue:
https://www.suse.com/de-de/support/kb/doc/?id=7021065
Hi Chris
I need you advice , for some reason the SBD device corrupted and i deleted the SBD partition , i wanted to create back and it fails and says it already exists in the FS , how to fix this sistuation.
Thank you
Dayal
Hi Dayal,
yes this can happen in very rare cases. Please have a look at the following TID in order to re-create the sbd and hopefully fix your issue:
https://www.suse.com/de-de/support/kb/doc/?id=7018194
Regards,
Chris