OSD gets restarted automatically after its ceph-osd process terminates with an assert failure

This document (000020266) is provided subject to the disclaimer at the end of this document.

Environment

SUSE Enterprise Storage 6

Situation

An OSD gets restarted automatically after its ceph-osd process terminates with an assert failure or receiving a segmentation fault signal (SIGSEGV) for other reasons.

Resolution

This behavior is intentional (see below for explanation.) To disable restarts altogether, create a drop-in directory for the ceph-osd service with " mkdir /etc/systemd/system/ceph-osd@.service.d/" and add a file "10-no-restart.conf" with the content

  [Service]
  Restart=no

to it. Next, reload the systemd configuration with "systemctl daemon-reload" to notify systemd about the changed unit settings.

Cause

Systemd is configured to restart services automatically for a certain amount of crashes within a configurable time frame.

Additional Information

The advantage of automatically restarting the service is that incidents like the daemon being terminated by the kernel out of memory killer will not cause the OSD to be down for an extensive time until it gets restarted manually.

However, in some situations like a failing disk causing the ceph-osd to terminate via an assert() failure, it may be beneficial to not restart the process so the downed OSD shows up in the cluster status.

The criteria which behavior makes more sense  depends on system administration preferences and the specific monitoring setup. The recommendation is to monitor disk health via its SMART status, but for some disk models, the SMART status does not indicate a degrading disk reliably. In that case, a slowly failing disk may go unnoticed for a long time before it finally breaks completely.

For more information on systemd services and their configuration parameters, please refer to https://www.freedesktop.org/software/systemd/man/systemd.service.html

Disclaimer

This Support Knowledgebase provides a valuable tool for SUSE customers and parties interested in our products and solutions to acquire information, ideas and learn from one another. Materials are provided for informational, personal or non-commercial use within your organization and are presented "AS IS" WITHOUT WARRANTY OF ANY KIND.

  • Document ID:000020266
  • Creation Date: 01-Jun-2021
  • Modified Date:01-Jun-2021
    • SUSE Enterprise Storage

< Back to Support Search

For questions or concerns with the SUSE Knowledgebase please contact: tidfeedback@suse.com

SUSE Support Forums

Get your questions answered by experienced Sys Ops or interact with other SUSE community experts.

Join Our Community

Support Resources

Learn how to get the most from the technical support you receive with your SUSE Subscription, Premium Support, Academic Program, or Partner Program.


SUSE Customer Support Quick Reference Guide SUSE Technical Support Handbook Update Advisories
Support FAQ

Open an Incident

Open an incident with SUSE Technical Support, manage your subscriptions, download patches, or manage user access.

Go to Customer Center