corosync[35436]: [TOTEM ] sendmsg(mcast) failed (non-critical): Invalid argument (22).

This document (7022316) is provided subject to the disclaimer at the end of this document.

Environment

SUSE Linux Enterprise High Availability Extension 11 Service Pack 4
SUSE Linux Enterprise High Availability Extension 12

Situation

In /var/log/messages on a Pacemaker cluster node is message:

Nov  2 16:05:41 sapnode1 corosync[35436]:  [MAIN  ] Totem is unable to form a cluster because of an operating system or network fault. The most common cause of this message is that the local firewall is configured improperly.


After enabling debug for corosync the following will show  :

Nov  2 16:17:16 sapnode1 corosync[35436]:  [TOTEM ] sendmsg(mcast) failed (non-critical): Invalid argument (22)
Nov  2 16:17:16 sapnode1 corosync[35436]:  [TOTEM ] sendmsg(mcast) failed (non-critical): Invalid argument (22)
Nov  2 16:17:16 sapnode1 corosync[35436]:  [MAIN  ] Totem is unable to form a cluster because of an operating system or network fault. The most common cause of this message is that the local firewall is configured improperly.
Nov  2 16:17:16 sapnode1 corosync[35436]:  [TOTEM ] sendmsg(mcast) failed (non-critical): Invalid argument (22)
Nov  2 16:17:16 sapnode1 corosync[35436]:  [TOTEM ] sendmsg(mcast) failed (non-critical): Invalid argument (22)
Nov  2 16:17:16 sapnode1 corosync[35436]:  [TOTEM ] sendmsg(mcast) failed (non-critical): Invalid argument (22)


and cluster will refuse to start.


Resolution

When sure there's no firewall on the network blocking the traffic, go ahead and check

   /etc/corosync/corosync.conf

for bindnetaddr variable to see if the address matches netmask of the interface corosync should listen on.

An example from a invalid configuration:

corosync.conf :

    interface {
        bindnetaddr:    6.101.0.0

while the system network is configured with netmask 255.0.0.0(/8):

/bin/ip a

[...]

14: bond0: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP

    link/ether 38:63:bb:2b:7f:94 brd ff:ff:ff:ff:ff:ff

    inet 6.101.35.3/8 brd 6.255.255.255 scope global bond0


Therefore the valid bindnetaddr in this case should be 6.0.0.0
After making the change, corosync needs to be restarted :

   on SLES11: rcopenais restart
   on SLES12: systemctl pacemaker restart


Cause

The bindnetaddr is configured incorrectly (meaning it doesn't match system network settings).

The problem usually only occurs after migration from SLES 11 SP3 to SLES 11 SP4, where new logic for picking up the right interface was introduced.

Additional Information


Disclaimer

This Support Knowledgebase provides a valuable tool for SUSE customers and parties interested in our products and solutions to acquire information, ideas and learn from one another. Materials are provided for informational, personal or non-commercial use within your organization and are presented "AS IS" WITHOUT WARRANTY OF ANY KIND.

  • Document ID:7022316
  • Creation Date: 14-Nov-2017
  • Modified Date:03-Mar-2020
    • SUSE Linux Enterprise High Availability Extension

< Back to Support Search

For questions or concerns with the SUSE Knowledgebase please contact: tidfeedback@suse.com

SUSE Support Forums

Get your questions answered by experienced Sys Ops or interact with other SUSE community experts.

Join Our Community

Support Resources

Learn how to get the most from the technical support you receive with your SUSE Subscription, Premium Support, Academic Program, or Partner Program.


SUSE Customer Support Quick Reference Guide SUSE Technical Support Handbook Update Advisories
Support FAQ

Open an Incident

Open an incident with SUSE Technical Support, manage your subscriptions, download patches, or manage user access.

Go to Customer Center