How to enable SBD to crashdump before rebooting.

This document (000019873) is provided subject to the disclaimer at the end of this document.

Environment

SUSE Linux Enterprise High Availability Extension 15 SP1
SUSE Linux Enterprise High Availability Extension 15 SP2
SUSE Linux Enterprise High Availability Extension 12 SP4
SUSE Linux Enterprise High Availability Extension 12 SP5

Situation

When there is a need to get a kernel crash dump / vmcore in a cluster which is also using SBD stonith device.  
This should only be used for troubleshooting or debugging a particular problem and then disabled as this could delay resource fail-over times thus impacting production highly available resources.  

Resolution

Kdump needs to be configured on all nodes of the cluster before making the following cluster changes. 
This does require each host be rebooted to load the new crashkernel options.   
Reference Documentation:
Manual Kdump Configuration
TID 000016171 - Configure crashkernel memory for kernel core dump analysis

There are two settings in the cluster configuration which should cover most fencing scenarios. 
1. Add the parameter crashdump=1 to the stonith:external/sbd primitive.
This can be done through a command line shell like "crm configure edit" or through HAWK web interface.
primitive stonith-sbd stonith:external/sbd \
	params pcmk_delay_max=30s crashdump=1
Note:  This covers the normal situation where fencing is proactively issued and the fencing target is still able to eat poison pills.  Basically the parameter tells fence agent to message a "crashdump"  poison pill rather than "reset/off" through sbd device. If the fencing target is still able to eat poison pills, it will crashdump.  This setting can be modified while cluster is actively running. 

2.  Modify the following line in /etc/sysconfig/sbd
SBD_TIMEOUT_ACTION=flush,crashdump
Note:  This covers more of the abnormal situations where the sbd inquisitor hasn't received enough healthy updates from sbd watchers within the watchdog timeout or the sbd inquisitor has unexpectedly died.  This setting is read when sbd.service is started. 
 

Additional Information

There is also another option in /etc/sysconfig/sbd for SBD_OPTS= that can be modified if a specific watchdog timer value is needed but by default the watchdog timer is disabled by default.   
-C N           Watchdog timeout to set before crashdumping.
Note: If this option is used or needed,  this feature was not working in certain versions of sbd RPM.  This has been fixed and checked into current updates. 
--changelog entry for sbd
- Update to version 1.4.0+20191028.d937f9d:
- sbd-inquisitor: use crashdump timeout
Reference: 
   man sbd (8)

 

Disclaimer

This Support Knowledgebase provides a valuable tool for SUSE customers and parties interested in our products and solutions to acquire information, ideas and learn from one another. Materials are provided for informational, personal or non-commercial use within your organization and are presented "AS IS" WITHOUT WARRANTY OF ANY KIND.

  • Document ID:000019873
  • Creation Date: 10-Feb-2021
  • Modified Date:11-Feb-2021
    • SUSE Linux Enterprise High Availability Extension

< Back to Support Search

For questions or concerns with the SUSE Knowledgebase please contact: tidfeedback@suse.com

SUSE Support Forums

Get your questions answered by experienced Sys Ops or interact with other SUSE community experts.

Join Our Community

Support Resources

Learn how to get the most from the technical support you receive with your SUSE Subscription, Premium Support, Academic Program, or Partner Program.


SUSE Customer Support Quick Reference Guide SUSE Technical Support Handbook Update Advisories
Support FAQ

Open an Incident

Open an incident with SUSE Technical Support, manage your subscriptions, download patches, or manage user access.

Go to Customer Center