Azure Load-Balancer Detection Hardening

This document (7024128) is provided subject to the disclaimer at the end of this document.

Environment

SUSE Linux Enterprise High Availability Extension 12

Situation

The Setup Guides for SLES High Availability on Azure describe setting up the floating IP of the Cluster with an additional Resource of type anything, for example :
crm configure primitive nc_NW1_nfs anything \
  params binfile="/usr/bin/nc" cmdline_options="-l -k 61000" \
  op monitor timeout=20s interval=10 depth=0

which configures the anything 'Resource Type' to use the 
  /usr/bin/nc
 
to listen to the Azure Loadbalancer. Without this functionality the Floating Cluster IP is not reachable.
 
The used binary is the old and trusty netcat, which in itself is fine.
During testing it was revealed that in some scenarios, due to the backlog in nc being hardcoded to 1 and the general limitations of the nc, this can lead to the resource being blocked

This shows in the system by checking the netstat, and the second row, which is the Rec-Q, it will show "2"
oldhanae2:~ # netstat -nlp | grep "\/nc "
tcp        2      0 0.0.0.0:61000           0.0.0.0:*               LISTEN      12813/nc           

in this state the nc will not be listening to the load-balancer requests anymore and as such the floating IP will not be available.

Messages similar to the below example (netcat on 61000) will be written to /var/log/messages as a result of this condition.

 
2020-01-01T00:00:01.1000000+00:00 oldhanae2 kernel: [12345.678910] TCP: request_sock_TCP: Possible SYN flooding on port 61000. Sending cookies.  Check SNMP counters.

Resolution

The suggested solution from SUSE in this case is to use not nc but the more powerful and reliable
socat

To implement this, the rpm socat would be installed on the cluster node
zypper in socat
      
then the resource stopped, using the above example that would be
crm resource stop nc_NW1_nfs
which will make the node unavailable, downtime would be required

changing the configuration
crm configure edit nc_NW1_nfs

changing
primitive rsc_nc_HN1_HDB03 anything \
   params binfile="/usr/bin/nc" cmdline_options="-l -k 61000" \
   op monitor timeout=20s interval=10 
 
to
primitive rsc_nc_HN1_HDB03 anything \
   params binfile="/usr/bin/socat" cmdline_options="-U TCP-LISTEN:61000,backlog=10,fork,reuseaddr /dev/null" \
   op monitor timeout=20s interval=10 
  
and then starting the resource again.
crm resource start nc_NW1_nfs

As anything just invokes an installed binary, the change is easy.

To sum it up again changes are
  • for the binary
replace "/usr/bin/nc"    with    "/usr/bin/socat"
  • for the parameters
replace "-l -k <MY-PORT>" with  "-U TCP-LISTEN:<MY-PORT>,backlog=10,fork,reuseaddr /dev/null"

Additional Information

The most up-to-date solution for existing Pacemaker clusters still using nc is to move the configuration to utilize the azure-lb resource agent, which is part of package resource-agents, with the following package version requirements:
  • For SLES 12 SP4/SP5, the version must be at least resource-agents-4.3.018.a7fb5035-3.30.1.
  • For SLES 15/15 SP1, the version must be at least resource-agents-4.3.0184.6ee15eb2-4.13.1
To implement this, stop the resource:
crm resource stop nc_NW1_nfs
   
which will make the node unavailable, downtime would be required

change the configuration
crm configure edit nc_NW1_nfs

changing
primitive rsc_nc_HN1_HDB03 anything \
   params binfile="/usr/bin/nc" cmdline_options="-l -k 61000" \
   op monitor timeout=20s interval=10 
to
primitive rsc_nc_HN1_HDB03 azure-lb port=61000 \
   op monitor timeout=20s interval=10
   
and then start the resource again.
crm resource start nc_NW1_nfs

Disclaimer

This Support Knowledgebase provides a valuable tool for SUSE customers and parties interested in our products and solutions to acquire information, ideas and learn from one another. Materials are provided for informational, personal or non-commercial use within your organization and are presented "AS IS" WITHOUT WARRANTY OF ANY KIND.

  • Document ID:7024128
  • Creation Date: 19-Sep-2019
  • Modified Date:25-Jun-2020
    • SUSE Linux Enterprise High Availability Extension

< Back to Support Search

For questions or concerns with the SUSE Knowledgebase please contact: tidfeedback@suse.com

SUSE Support Forums

Get your questions answered by experienced Sys Ops or interact with other SUSE community experts.

Join Our Community

Support Resources

Learn how to get the most from the technical support you receive with your SUSE Subscription, Premium Support, Academic Program, or Partner Program.


SUSE Customer Support Quick Reference Guide SUSE Technical Support Handbook Update Advisories
Support FAQ

Open an Incident

Open an incident with SUSE Technical Support, manage your subscriptions, download patches, or manage user access.

Go to Customer Center