Simulating a Cluster Network Failure

This document (7017617) is provided subject to the disclaimer at the end of this document.

Environment

SUSE Linux Enterprise High Availability Extension 11
SUSE Linux Enterprise High Availability Extension 12
SUSE Linux Enterprise High Availability Extension 15

Situation

To simulate a Network Failure to test the cluster behavior in case of a split brain.

This is normally done on a physical level by removing the network cable or a switch to simulate the real world scenario that the OS has no control or indication of an issue apart from the cluster not being able to communicate anymore.

In many cases this preferred solution is not applicable, maybe there is only virtual machines as cluster nodes with no physical connection that can be removed. Or the removal of the physical connection would be to difficult or might affect other areas and is un-desireable at the moment.

Please keep in mind that bringing down the interface with, for example

   ifdown eth1

is NOT recommended. This will most likely only cause other and erratic issues. Disabling an interface in this or any comparable other way is not recommended and not a valid test for the cluster communication.

As further argument against an

   ifdown eth1

this removes locally the IP, so any local Application or Service that relies on this part of the network will get an error. Meaning that this test will actually not trigger a cluster communication issue but most likely also a local resource failure.
  

Resolution

To simulate a cluster communication failure use iptables and drop the packages on the IP address that is configured for the communication.

Assuming the setup would be to use

Node A uses local IP 192.168.20.193 for cluster communication

Node B uses local IP 192.168.20.228 for cluster communication

The idea is to block the communication of the nodes. This can be done by implementing a Firewall Rule on one node, to

   not send to the other ip

and

  not receive from the other ip

Coming from the above Example with Node A and Node B, one can implement this by setting on Node B

   iptables -A INPUT -s 192.168.20.193 -j DROP; iptables -A OUTPUT -d 192.168.20.193 -j DROP

which means that all Traffic coming from source 192.168.20.193 , which is Node A, and all Traffic going to  192.168.20.193, which is Node A, will be dropped by the Kernel on Node B.

This breaks the Cluster Communication apart without removing or influencing any relevant local Network settings and without System Notification to any Service, Socket or Application.

For the cluster stack this appears to be a split brain.

You can at any time with

   iptables -F

flush the iptables rules to remove this.

Which might be especially useful as a split brain might lead to the node with the iptables rules being the survivor. But this means that the other node reboots into a split brain and might reboot the formerly surviving node because of startup fencing.

Keep in mind that -F removes all these Rules, so using the iptables / Firewall for something else might have an affect on other Areas.

Please also keep in mind that if the IP's used for cluster communication are also used for Applications then there might not only be a Cluster Split Brain, but also a Resource Failure.

Additional Information

To use either

   physical separation

or

  iptables

is the recommended way to test cluster communication with corosync clusters.

See also: Corosync-and-ifdown-on-active-network-interface  

Disclaimer

This Support Knowledgebase provides a valuable tool for SUSE customers and parties interested in our products and solutions to acquire information, ideas and learn from one another. Materials are provided for informational, personal or non-commercial use within your organization and are presented "AS IS" WITHOUT WARRANTY OF ANY KIND.

  • Document ID:7017617
  • Creation Date: 19-May-2016
  • Modified Date:05-Nov-2021
    • SUSE Linux Enterprise Server

< Back to Support Search

For questions or concerns with the SUSE Knowledgebase please contact: tidfeedback[at]suse.com

SUSE Support Forums

Get your questions answered by experienced Sys Ops or interact with other SUSE community experts.

Join Our Community

Support Resources

Learn how to get the most from the technical support you receive with your SUSE Subscription, Premium Support, Academic Program, or Partner Program.


SUSE Customer Support Quick Reference Guide SUSE Technical Support Handbook Update Advisories
Support FAQ

Open an Incident

Open an incident with SUSE Technical Support, manage your subscriptions, download patches, or manage user access.

Go to Customer Center