Fencing, crashes and hangs on system with Mulitpath and OCFS2

This document (7000097) is provided subject to the disclaimer at the end of this document.

Environment

SUSE Linux Enterprise Server 10 All Support Packs
SUSE Linux Enterprise Server 9 All Support Packs

Situation

On systems with OCFS2 and Multipath
  • OCFS2 fences a node
  • The cluster is unstable
  • Hangs or crashes are observed

Resolution

This condition has been observed in situations where the MPIO polling policy is set higher than the OCFS2 heartbeat threshold. As a result, the OCFS2 may fence on a node when the a disk path fails-over.

The polling policy for MPIO is the time in seconds where a path is checked. The default setting is 5, however, in many situations this number is adjusted higher. The default OCFS2 heartbeat is set to 7. Under circumstances where MPIO settings have not been adjusted, modifications should not be need.

To fix this situation either the polling_interval in /etc/multipath.conf needs to be lower than the O2CB_HEARTBEAT_THRESHOLD in /etc/sysconfig/o2cb.

non-working configruation
The following is an example of a non-working configuration.

/etc/multipath.conf
defaults {
udev_dir /dev
polling_interval 10
default_selector "round-robin 0"
default_path_grouping_policy multibus
rr_weight priorities
failback immediate
no_path_retry queue
}
devnode_blacklist {
devnode "^(ram|raw|loop|fd|md|dm-|sr|scd|st)[0-9]*"
devnode "^hd[a-z][[0-9]*]"
devnode "^cciss!c[0-9]d[0-9]*[p[0-9]*]"
device {
vendor DEC.*
product MSA[15]00
}
}
devices {
device {
vendor "COMPAQ"
product "MSA1000 VOLUME"
path_grouping_policy multibus
}
}

/etc/sysconfig/o2cb
O2CB_ENABLED=true
O2CB_BOOTCLUSTER=HappyHippo
O2CB_HEARTBEAT_THRESHOLD=

working configruation
The following are examples of a working configuration.

/etc/multipath.conf
defaults {
udev_dir /dev
polling_interval 10
default_selector "round-robin 0"
default_path_grouping_policy multibus
rr_weight priorities
failback immediate
no_path_retry queue
}
devnode_blacklist {
devnode "^(ram|raw|loop|fd|md|dm-|sr|scd|st)[0-9]*"
devnode "^hd[a-z][[0-9]*]"
devnode "^cciss!c[0-9]d[0-9]*[p[0-9]*]"
device {
vendor DEC.*
product MSA[15]00
}
}
devices {
device {
vendor "COMPAQ"
product "MSA1000 VOLUME"
path_grouping_policy multibus
}
}

/etc/sysconfig/o2cb
O2CB_ENABLED=true
O2CB_BOOTCLUSTER=HappyHippo
O2CB_HEARTBEAT_THRESHOLD=14

adjusting the threshold:
The threshold for the O2CB_HEARTBEAT_THRESHOLD may need to be adjusted higher. Start by making the number higher than the polling_interval and then adjusting till the system seems stable. It is not uncommon for the threshold to be as high as 45 or 60 on heavily loaded systems.

 

Disclaimer

This Support Knowledgebase provides a valuable tool for SUSE customers and parties interested in our products and solutions to acquire information, ideas and learn from one another. Materials are provided for informational, personal or non-commercial use within your organization and are presented "AS IS" WITHOUT WARRANTY OF ANY KIND.

  • Document ID:7000097
  • Creation Date: 15-Apr-2008
  • Modified Date:16-Mar-2021
    • SUSE Linux Enterprise Server

< Back to Support Search

For questions or concerns with the SUSE Knowledgebase please contact: tidfeedback[at]suse.com

SUSE Support Forums

Get your questions answered by experienced Sys Ops or interact with other SUSE community experts.

Join Our Community

Support Resources

Learn how to get the most from the technical support you receive with your SUSE Subscription, Premium Support, Academic Program, or Partner Program.


SUSE Customer Support Quick Reference Guide SUSE Technical Support Handbook Update Advisories
Support FAQ

Open an Incident

Open an incident with SUSE Technical Support, manage your subscriptions, download patches, or manage user access.

Go to Customer Center