Release Notes for SUSE Linux Enterprise High Availability Extension 11 Services Pack 4

Version 11.4.7 (2016-09-08)

Abstract

These release notes apply to all SUSE Linux Enterprise High Availability Extension 11 Services Pack 4 based products (e.g. for x86, x86_64, Itanium, Power and System z). Some sections may not apply to a particular architecture or product. Where this is not obvious, the respective architectures are listed explicitly in these notes. Instructions for installing SUSE Linux Enterprise High Availability Extension can be found in the README file on the CD.

Manuals can be found in the docu directory of the installation media. Any documentation (if installed) can be found in the /usr/share/doc/ directory of the installed system.

This SUSE product includes materials licensed to SUSE under the GNU General Public License (GPL). The GPL requires SUSE to provide the source code that corresponds to the GPL-licensed material. The source code is available for download at http://www.suse.com/download-linux/source-code.html. Also, for up to three years after distribution of the SUSE product, upon request, SUSE will mail a copy of the source code. Requests should be sent by e-mail to mailto:sle_source_request@suse.com or as otherwise instructed at http://www.suse.com/download-linux/source-code.html. SUSE may charge a reasonable fee to recover distribution costs.


1. Purpose and News
1.1. Purpose
1.2. What's New in SUSE Linux Enterprise High Availability Extension 11 SP4
2. Features and Versions
2.1. Resource Management
2.1.1. Data Replication—Distributed Remote Block Device (DRBD)
2.1.2. IP Load Balancing—Linux Virtual Server (LVS)
2.1.3. Distributed Lock Manager (DLM)
2.2. Other Changes and Version Updates
2.2.1. Support coredumps with STONITH Enabled and Timeouts for Kdump
2.2.2. IPVS support for iptables -m
3. Changed Functionality in SUSE Linux Enterprise High Availability Extension 11 SP4
4. Deprecated Functionality in SUSE Linux Enterprise High Availability Extension 11 SP4
5. Infrastructure, Package and Architecture Specific Information
5.1. Architecture Independent Information
5.1.1. Changes in Packaging and Delivery
5.1.2. Security
5.1.3. Network
5.2. Systems Management
5.2.1. Hawk Wizards for Common Configurations
5.2.2. Hawk Wizard for DB2 HADR
5.2.3. Hawk Wizard for DB2
5.2.4. Hawk Wizard for Configuring the Oracle Database
5.2.5. Hawk Wizards for Configuring Common Scenarios
5.3. AMD64/Intel64 64-Bit (x86_64) and Intel/AMD 32-Bit (x86) Specific Information
5.3.1. System and Vendor Specific Information
6. Other Updates
7. Update-Related Notes
8. Supported Deployment Scenarios SUSE Linux Enterprise High Availability Extension 11 SP4
8.1. Local Cluster
8.2. Metro Area Cluster
8.3. Geographical Clustering
9. Known Issues in SUSE Linux Enterprise High Availability Extension 11 SP4
9.1. Linux Virtual Server Tunnelling Support
9.2. Samba CTDB Cluster Rolling Update Support
10. Further Notes on Functionality
10.1. Cluster-concurrent RAID1 Resynchronization
10.2. Quotas on OCFS2 Filesystem
11. Support Statement for SUSE Linux Enterprise High Availability Extension 11 SP4
12. Technical Information
13. Miscellaneous
13.1. crmsh: Enable Anonymous Shadow CIBs
14. More Information and Feedback

Chapter 1. Purpose and News

1.1. Purpose

SUSE Linux Enterprise High Availability Extension is an affordable, integrated suite of robust open source clustering technologies that enable enterprises to implement highly available Linux clusters and eliminate single points of failure.

Used with SUSE Linux Enterprise Server, it helps firms maintain business continuity, protect data integrity, and reduce unplanned downtime for their mission-critical Linux workloads.

SUSE Linux Enterprise High Availability Extension provides all of the essential monitoring, messaging, and cluster resource management functionality of proprietary third-party solutions, but at a more affordable price, making it accessible to a wider range of enterprises.

It is optimized to work with SUSE Linux Enterprise Server, and its tight integration ensures customers have the most robust, secure, and up to date high availability solution. Based on an innovative, highly flexible policy engine, it supports a wide range of clustering scenarios.

With static or stateless content, the High Availability cluster can be used without a cluster file system. This includes web-services with static content as well as printing systems or communication systems like proxies that do not need to recover data.

Finally, its open source license minimizes the risk of vendor lock-in, and it's adherence to open standards encourages interoperability with industry standard tools and technologies.

1.2. What's New in SUSE Linux Enterprise High Availability Extension 11 SP4

In Service Pack 4, a number of improvements have been added, some of which are called out explicitly here. For the full list of changes and bugfixes, refer to the change logs of the RPM packages. These changes are in addition to those that have already been added with Service Pack 1, 2, and 3. Here are some highlights:

SUSE Linux Enterprise High Availability Extension 11 Service Pack 4 includes a new feature called pacemaker_remote. It allows nodes not running the cluster stack (pacemaker+corosync) to integrate into the cluster and have the cluster manage their resources just as if they were a real cluster node. This feature makes it ideal to deploy large scale SAP deployment by supporting worker nodes as scale-out clustering option. With this feature, you can deploy a HA cluster with up to 40+ nodes.

SP4 improves the usability in HAWK by introducing new templates and wizards. Oracle and DB2 related templates and wizards make SP4 easier to use to boost your database availability.

Chapter 2. Features and Versions

This section includes an overview of some of the major features and new functionality provided by SUSE Linux Enterprise High Availability Extension 11 SP4.

2.1. Resource Management

2.1.1. Data Replication—Distributed Remote Block Device (DRBD)

Data replication is part of a disaster prevention strategy in most large enterprises. Using network connections data is replicated between different nodes to ensure consistent data storages in case of a site failure.

Data replication is provided in SUSE Linux Enterprise High Availability Extension 11 SP4 with DRBD. This software based data replication allows customers to use different types of storage systems and communication layers without vendor lock-in. At the same time, data replication is deeply integrated into the operating system and thus provide ease-of-use. Features related to data replication and included with this product release are:

  • YaST setup tools to assist initial setup

  • Fully synchronous, memory synchronous or asynchronous modes of operation

  • Differential storage resynchronization after failure

  • Bandwidth of background resynchronization tunable

  • Shared secret to authenticate the peer upon connect

  • Configurable handler scripts for various DRBD events

  • Online data verification

With these features data replication can be easier configured and used. And with improved storage resynchronization recovery times will be decreased significantly.

The distributed replicated block device (DRBD) version included supports active/active mirroring, enabling the use of services such as cLVM2 or OCFS2 on top.

2.1.2. IP Load Balancing—Linux Virtual Server (LVS)

Linux Virtual Server (LVS) is an advanced IP load balancing solution for Linux. IP load balancing provides a high-performance, scalable network infrastructure. Such infrastructure is typically used by enterprise customers for webservers or other network related service workloads.

With LVS network requests can be spread over multiple nodes to scale the available resources and balance the resulting workload. By monitoring the compute nodes, LVS can handle node failures and redirect requests to other nodes maintaining the availability of the service.

2.1.3. Distributed Lock Manager (DLM)

The DLM in SUSE Linux Enterprise High Availability Extension 11 SP4 supports both TCP and SCTP for network communications, allowing for improved cluster redundancy in scenarios where network interface bonding is not feasible.

2.2. Other Changes and Version Updates

2.2.1. Support coredumps with STONITH Enabled and Timeouts for Kdump

The kdumpcheck STONITH plugin did not work as expected. This plug-in checks if a kernel dump is in progress on a node. If so, it returns true, and acts as if the node has been fenced. This avoids fencing a node that is already down but doing a dump, which takes some time.

Use the stonith:fence_kdump resource agent (provided by the package fence-agents) to monitor all nodes with the kdump function enabled. In /etc/sysconfig/kdump, configure KDUMP_POSTSCRIPT to send a notification to all nodes when the kdump process is finished. The node that does a kdump will restart automatically after kdump has finished.

Do not forget to open a port in the firewall for the fence_kdump resource. The default port is 7410.

2.2.2. IPVS support for iptables -m

Setting up Linux Virtual Server (LVS) in combination with SMTP did not work due to missing iptables -m ipvs support. IPVS (IP Virtual Server) is used for Linux Virtual Server.

The iptables package has been updated to a higher version and now includes the match support of iptables. To make it work, you need the probe kernel module xt_ipvs. It is provided by the cluster-network-* package.

Chapter 3. Changed Functionality in SUSE Linux Enterprise High Availability Extension 11 SP4

Chapter 4. Deprecated Functionality in SUSE Linux Enterprise High Availability Extension 11 SP4

Chapter 5. Infrastructure, Package and Architecture Specific Information

5.1. Architecture Independent Information

5.1.1. Changes in Packaging and Delivery

5.1.1.1. SBD: Set "pcmk_delay_max" to Prevent Double-fencing

In previous Service Packs of SLE HA 11, SBD STONITH resources for two-node clusters could be configured with:

primitive stonith-sbd stonith:external/sbd \
op start start-delay=15

The start operation configured with a start-delay prevented double-fencing (both nodes being fenced) in case of a split-brain situation. In SLE HA 11 SP4, this solution no longer prevents double-fencing and must not be used anymore.

In SLE HA 11 SP4, we introduced a feature called "random fencing delay" for the same purpose.

Configure the SBD resource with the parameter pcmk_delay_max, for example:

primitive stonith-sbd stonith:external/sbd \
params pcmk_delay_max=30

The parameter pcmk_delay_max enables random fencing delay for the SBD fencing device and specifies a maximum number of seconds for the random delay.

5.1.2. Security

5.1.2.1. Improved pssh -P Output

Using pssh with the -P option now prints the host name in front of each line received.

5.1.3. Network

5.1.3.1. crmsh: Manage Multiple Resources as One Using Resource Tags

It may be desired to start and stop multiple resources all at once, without having explicit dependencies between those resources.

This feature adds resource tags to Pacemaker. Tags are collections of resources that do not imply any colocation or ordering constraints, but can be referenced in constraints or when starting and stopping resources.

5.1.3.2. Enable Colocating Resources Without Further Dependency Between Them

Sometimes, it is desired that two resources should run on the same node. However, there should be no further dependency implied between the two resources, so that if one fails, the other one can keep running.

This can be accomplished using a third, dummy resource which both resources depend on in turn. To make it easier to create a configuration like this, the command "assist weak-bond" has been added to crmsh.

5.1.3.3. Avoid Starting openais at Boot

In the past you have had to start the cluster at boot unconditionally, if you want to make sure that the cluster stops when the server stops.

Now it is possible to the cluster later.

The openais service is still in sysconfig, but additionally, we now have a parameter START_ON_BOOT=Yes/No in /etc/sysconfig/openais.

  • If START-ON_BOOT=Yes (default), the openais service will start at boot.

  • If START_ON_BOOT=No, the openais service will not start at boot. Then you can start it manually whenever you want to start it.

5.2. Systems Management

5.2.1. Hawk Wizards for Common Configurations

Multiple wizards have been added to hawk to ease configuration, including cLVM + MD-RAID and DRBD.

5.2.2. Hawk Wizard for DB2 HADR

Hawk now includes a wizard for configuring a cluster resource for the DB2 HADR database.

5.2.3. Hawk Wizard for DB2

Hawk now includes a wizard for configuring a cluster resource for the DB2 database.

5.2.4. Hawk Wizard for Configuring the Oracle Database

Hawk now includes a wizard for configuring cluster resources for the Oracle database.

5.2.5. Hawk Wizards for Configuring Common Scenarios

Hawk now includes additional wizards for configuring common cluster configurations.

5.3. AMD64/Intel64 64-Bit (x86_64) and Intel/AMD 32-Bit (x86) Specific Information

5.3.1. System and Vendor Specific Information

5.3.1.1. Additional Relax-and-Recover Version 1.16 (rear116)

When Relax-and-Recover version 1.10.0 does not work the newer version 1.16 could help.

In addition to Relax-and-Recover version 1.10.0 that is still provided in the RPM package "rear" we provide Relax-and-Recover version 1.16 as additional totally separated RPM package "rear116".

A new separated package name rear116 is used so that users where version 1.10.0 does not support their particular needs can manually upgrade to version 1.16 but on the other hand users who have a working disaster recovery procedure with version 1.10.0 do not need to upgrade. Therefore the package name contains the version and the packages conflict with each other to avoid that an installed version may get accidentally replaced with another version.

When you have a working disaster recovery procedure, do not change it!

For each rear version upgrade you must carefully re-validate that your particular disaster recovery procedure still works for you.

See in particular the section "Version upgrades" at https://en.opensuse.org/SDB:Disaster_Recovery.

Chapter 6. Other Updates

Chapter 7. Update-Related Notes

This section includes update-related information for this release.

Chapter 8. Supported Deployment Scenarios SUSE Linux Enterprise High Availability Extension 11 SP4

The SUSE Linux Enterprise High Availability Extension stack supports a wide range of different cluster topologies.

Local and Metro Area (stretched) clusters are supported as part of a SUSE Linux Enterprise High Availability Extension subscription. Geographical clustering requires an additional Geo Clustering for SUSE Linux Enterprise High Availability Extension subscription.

8.1. Local Cluster

In a local cluster environment, all nodes are connected to the same storage network and on the same network segment; redundant network interconnects are provided. Latency is below 1 millisecond, and network bandwidth is at least 1 Gigabit/s.

Cluster storage is fully symmetric on all nodes, either provided via the storage layer itself, mirrored via MD Raid1, cLVM2, or replicated via DRBD.

In a local cluster all nodes run in a single corosync domain, forming a single cluster.

8.2. Metro Area Cluster

In a Metro Area cluster, the network segment can be stretched to a maximum latency of 15 milliseconds between any two nodes (approximately 20 miles or 30 kilometers in physical distance), but fully symmetric and meshed network inter-connectivity is required.

Cluster storage is assumed to be fully symmetric as in local deployments.

As a stretched version of the local cluster, all nodes in a Metro Area cluster run in a single corosync domain, forming a single cluster.

8.3. Geographical Clustering

A Geo scenario is primarily defined by the network topology; network latency higher than 15 milliseconds, reduced network bandwidth, and not fully interconnected subnets. In these scenarios, each site by itself must satisfy the requirements of and be configured as a local or metropolitan cluster as defined above. A maximum of three sites are then connected via Geo Clustering for SUSE Linux Enterprise High Availability Extension; for this, direct TCP connections between the sites must be possible, and typical latency should not exceed 1 second.

Storage is typically asymmetrically replicated by the storage layer, such as DRBD, MD Raid1, or vendor-specific solutions.

DLM, OCFS2, and cLVM2 are not available across site boundaries.

Chapter 9. Known Issues in SUSE Linux Enterprise High Availability Extension 11 SP4

9.1. Linux Virtual Server Tunnelling Support

The LVS TCP/UDP load balancer currently only works with Direct Routing and NAT setups. IP-over-IP tunnelling forwarding to the real servers does not currently work.

9.2. Samba CTDB Cluster Rolling Update Support

The CTDB resource should be stopped on all nodes prior to update. Rolling CTDB updates are not supported for this release, due to the risk of corruption on nodes running previous CTDB versions.

Chapter 10. Further Notes on Functionality

10.1. Cluster-concurrent RAID1 Resynchronization

To ensure data integrity, a full RAID1 resync is triggered when a device is re-added to the mirror group. This can impact performance, and it is thus advised to use multipath IO to reduce exposure to mirror loss.

Due to the need of the cluster to keep both mirrors uptodate and consistent on all nodes, a mirror failure on one node is treated as if the failure had been observed cluster-wide, evicting the mirror on all nodes. Again, multipath IO is recommended to reduce this risk.

In situations where the primary focus is on redundancy and not on scale-out, building a storage target node (using md raid1 in a fail-over configuration or using drbd) and reexporting via iSCSI, NFS, or CIFS could be a viable option.

10.2. Quotas on OCFS2 Filesystem

To use quotas on ocfs2 filesystem, the filesystem has to be created with appropriate quota features: 'usrquota' filesystem feature is needed for accounting quotas for individual users, 'grpquota' filesystem feature is needed for accounting of quotas for groups. These features can be also enabled later on an unmounted filesystem using tunefs.ocfs2.

For quota-tools to operate on the filesystem, you have to mount the filesystem with 'usrquota' (and/or 'grpquota') mount option.

When a filesystem has appropriate quota feature enabled, it maintains in its metadata how much space and files each user (group) uses. Since ocfs2 treats quota information as a filesystem internal metadata, there is no need to ever run quotacheck(8) program. Instead, all the needed functionality is built into fsck.ocfs2 and the filesystem driver itself.

To enable enforcement of limits imposed on each user / group, run quotaon(8) program similarly as for any other filesystem.

Commands quota(1), setquota(8), edquota(8) work as usual with ocfs2 filesystem. Commands repquota(8) and warnquota(8) do not work with ocfs2 because of a limitation in the current kernel interface.

For performance reasons each cluster node performs quota accounting locally and synchronizes this information with a common central storage once per 10 seconds (this interval is tunable by tunefs.ocfs2 using options 'usrquota-sync-interval' and 'grpquota-sync-interval'). Thus quota information need not be exact at all times and as a consequence user / group can slightly exceed their quota limit when operating on several cluster nodes in parallel.

Chapter 11. Support Statement for SUSE Linux Enterprise High Availability Extension 11 SP4

Support requires an appropriate subscription from SUSE; for more information, see http://www.suse.com/products/server/services_support.html.

A Geo Clustering for SUSE Linux Enterprise High Availability Extension subscription is needed to receive support and maintenance to run geographical clustering scenarios, including manual and automated setups.

Support for the DRBD storage replication is independent of the cluster scenario and included as part of the SUSE Linux Enterprise High Availability Extension product and does not require the addition of a Geo Clustering for SUSE Linux Enterprise High Availability Extension subscription.

General Support Statement

The following definitions apply:

  • L1: Installation and problem determination - technical support designed to provide compatibility information, installation and configuration assistance, usage support, on-going maintenance and basic troubleshooting. Level 1 Support is not intended to correct product defect errors.

  • L2: Reproduction of problem isolation - technical support designed to duplicate customer problems, isolate problem areas and potential issues, and provide resolution for problems not resolved by Level 1 Support.

  • L3: Code Debugging and problem resolution - technical support designed to resolve complex problems by engaging engineering in patch provision, resolution of product defects which have been identified by Level 2 Support.

SUSE will only support the usage of original (unchanged or not recompiled) packages.

Chapter 12. Technical Information

Chapter 13. Miscellaneous

13.1. crmsh: Enable Anonymous Shadow CIBs

When scripting cluster configuration changes, the scripts are more robust if changes are applied with a single commit. Creating a shadow CIB to collect the changes makes this easier, but has previously required naming the shadow CIB.

crmsh now allows the creation of shadow CIBs without explicitly specifying a name. The name will be determined automatically and will not clash with any other previously created shadow CIBs.

Chapter 14. More Information and Feedback

  • Read the READMEs on the CDs.

  • Get detailed changelog information about a particular package from the RPM:

    rpm --changelog -qp <FILENAME>.rpm

    <FILENAME>. is the name of the RPM.

  • Check the ChangeLog file in the top level of CD1 for a chronological log of all changes made to the updated packages.

  • Find more information in the docu directory of CD1 of the SUSE Linux Enterprise High Availability Extension CDs. This directory includes a PDF version of the High Availability Guide.

  • http://www.suse.com/documentation/sle_ha/ contains additional or updated documentation for SUSE Linux Enterprise High Availability Extension 11.

  • Visit http://www.suse.com/products/ for the latest product news from SUSE and http://www.suse.com/download-linux/source-code.html for additional information on the source code of SUSE Linux Enterprise products.

Copyright (c) 2016 SUSE LLC.

Thanks for using SUSE Linux Enterprise High Availability Extension in your business.

The SUSE Linux Enterprise High Availability Extension Team.