Object Storage Daemons (OSDs) can fail due to an internal data inconsistency

This document (7024257) is provided subject to the disclaimer at the end of this document.

Environment

SUSE Enterprise Storage 6
Ceph version 14.2.4

Situation

With Ceph 14.2.4 OSDs can fail due to an internal data inconsistency. This poses no immediate threat to data availability as Ceph will automatically re-replicate the data from the remaining OSDs to other OSDs.

Resolution

If OSDs are seen to be crashing and not coming up again (marked as out and down) please check the respective OSD log for the following messages:

... 3 rocksdb: [db/db_impl_compaction_flush.cc:2660] Compaction error: Corruption: block checksum mismatch:...
... block checksum mismatch: expected XXXXXXXXXX, got XXXXXXXXXX in db/XXXXXX.sst offset ...
... In function 'void AllocatorLevel02<T>::_mark_allocated(uint64_t, uint64_t) [with L1 = AllocatorLevel01Loose; uint64_t = long unsigned int]'
FAILED ceph_assert(available >= allocated) ...

If these messages appear in the OSD logs of the out OSD please re-deploy the OSD as soon as possible. Refer to the relevant SES 6 online documentation on how to accomplish this.

NOTE: Instead of the above procedure to re-create the OSD, it can be first attempted to simply run the below command for the affected OSD. If the OSD still crashes, it will be needed to follow the replace procedure:

ceph-osd -i $ID --mkfs

Replace $ID with the OSD number of the relevant OSD, for example if the affected OSD is OSD.13 the command will be:

ceph-osd -i 13 --mkfs

Cause

This is caused by a bug in the Ceph Nautilus code, the root cause is however still under investigation.

Additional Information

Also see: https://tracker.ceph.com/issues/42223

Disclaimer

This Support Knowledgebase provides a valuable tool for SUSE customers and parties interested in our products and solutions to acquire information, ideas and learn from one another. Materials are provided for informational, personal or non-commercial use within your organization and are presented "AS IS" WITHOUT WARRANTY OF ANY KIND.

  • Document ID:7024257
  • Creation Date: 13-Nov-2019
  • Modified Date:03-Mar-2020
    • SUSE Enterprise Storage

< Back to Support Search

For questions or concerns with the SUSE Knowledgebase please contact: tidfeedback@suse.com

SUSE Support Forums

Get your questions answered by experienced Sys Ops or interact with other SUSE community experts.

Join Our Community

Support Resources

Learn how to get the most from the technical support you receive with your SUSE Subscription, Premium Support, Academic Program, or Partner Program.


SUSE Customer Support Quick Reference Guide SUSE Technical Support Handbook Update Advisories
Support FAQ

Open an Incident

Open an incident with SUSE Technical Support, manage your subscriptions, download patches, or manage user access.

Go to Customer Center