Cluster node will not join after one node was upgraded.

This document (7022565) is provided subject to the disclaimer at the end of this document.

Environment

SUSE Linux Enterprise High Availability Extension 11 Service Pack 4
SUSE Linux Enterprise High Availability Extension 11 Service Pack 3

Situation

Cluster node will not join cluster properly and does not allow resources to be loaded on it.

Nodes show as UNCLEAN (offline)

Current DC: NONE

cib: Bad global update

Errors in /var/log/messages:

cib[8530]: error: cib_perform_op: Discarding update with feature set '3.0.10' greater than our own '3.0.8'

cib[8530]: error: cib_process_request: Completed cib_replace operation for section 'all': Protocol not supported (rc=-93, origin=<servername>/crmd/202, version=0.155.0)

cib[8530]: warning: cib_process_diff: Bad global update <cib_update_diff>

cib[8530]: error: cib_process_diff: Diff -1.-1.-1 -> -1.-1.-1 from <servername> not applied to 0.155.0: + and - versions in the diff did not change in global update

Resolution

Best option:

1. Upgrade the node running older code to same version/code running on DC node of the cluster.

Alternative option:

1. Bring up older node first in the cluster and make sure it's the DC before bringing in other newer node.

Note: If changes have been made to cluster configuration on upgraded node, these changes will be lost.

Note: You may also need to make a few changes in older cluster configuration to make sure cluster configuration (cib.xml) has been updated with a newer time stamp.

Cause

One node in the cluster had been upgraded to a newer version of pacemaker which provides a feature set greater than what's supported on older version.

In this case, one node had been upgraded to SLES11sp4 (newer pacemaker code) and cluster was restarted before other node in the cluster had been upgraded. Also the SLES11sp4 node was brought up first and the current DC (Designated Coordinator) of the cluster with new admin epoch.

Additional Information

You may also want to verify in the configuration file: /etc/corosync/corosync.conf that the "bindnetaddr:" address is correct on all nodes in the cluster. More strict checking is done with SLES11sp4 than was done with SLES11sp3. To verify the correct "bindnetaddr:" is selected, you may launch "yast cluster" and verify the "Bind Network Address" is correct in the drop down as it will calculate the correct one for your bound networks.

Disclaimer

This Support Knowledgebase provides a valuable tool for SUSE customers and parties interested in our products and solutions to acquire information, ideas and learn from one another. Materials are provided for informational, personal or non-commercial use within your organization and are presented "AS IS" WITHOUT WARRANTY OF ANY KIND.