mds daemon fail to start with "FAILED assert(g_conf->mds_wipe_sessions)" message

This document (000020284) is provided subject to the disclaimer at the end of this document.

Environment

SUSE Enterprise Storage 5.5

Situation

mds daemon fail to start with "FAILED assert(g_conf->mds_wipe_sessions)" message.
 
Below is the crash:
0> 2021-06-05 21:21:09.112447 7f3a4baf6700 -1 /home/abuild/rpmbuild/BUILD/ceph-12.2.13-706-gff66d09906/src/mds/journal.cc: In function 'void EMetaBlob::replay(MDSRank*, LogSegment*, MDSlaveUpdate*)' thread 7f3a4baf6700 time 2021-06-05 21:21:09.109003
/home/abuild/rpmbuild/BUILD/ceph-12.2.13-706-gff66d09906/src/mds/journal.cc: 1602: FAILED assert(g_conf->mds_wipe_sessions)

 ceph version 12.2.13-706-gff66d09906 (ff66d09906a7c2d8f4dbf1d17cbdfce9c10483ca) luminous (stable)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x10e) [0x5654ce7c3e7e]
 2: (EMetaBlob::replay(MDSRank*, LogSegment*, MDSlaveUpdate*)+0x4506) [0x5654ce76cf46]
 3: (EUpdate::replay(MDSRank*)+0x26) [0x5654ce76e906]
 4: (MDLog::_replay_thread()+0x602) [0x5654ce725412]
 5: (MDLog::ReplayThread::entry()+0xd) [0x5654ce4af7bd]
 6: (()+0x96b4) [0x7f3a58a5d6b4]
 7: (clone()+0x6d) [0x7f3a57a8a2dd]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

Resolution

- Ensure that no clients are trying to connect to cephfs.
​​​​​​- Add "mds_wipe_sessions = true" to the ceph.conf in the [global] or [mds] section.
- Start the mds daemon. If it starts successfully, remove "mds_wipe_sessions = true" setting and try restarting the mds again, to ensure the mds daemon will not crush without this option any more. If it still crashes, set it back again.


Cause

The errors happens when mds tries to replay an open session. The code that triggers the crash was added in v12.2.12. The `FAILED assert(g_conf->mds_wipe_sessions)` actually means that the default mds behavior is to terminate in this case unless mds_wipe_sessions variable is set.
> 2021-06-05 00:00:03.309852 7f51730f0700 -1 log_channel(cluster) log [ERR] : error replaying open sessions(0) sessionmap v 223454462 table 0

It tells that currently it has 0 sessions (sessionmap is empty) and the error is because when replaying a session journal event, the current sessionmap version is unexpectedly 0, which is expected to be not less than the replaying event version, which is 223454462.

And after setting the parameter 'mds_wipe_sessions to true', instead of crashing it will wipe the sessions from the sessionmap (which is already 0 anyway) and will bump its version to the event version.
 

Status

Top Issue

Disclaimer

This Support Knowledgebase provides a valuable tool for SUSE customers and parties interested in our products and solutions to acquire information, ideas and learn from one another. Materials are provided for informational, personal or non-commercial use within your organization and are presented "AS IS" WITHOUT WARRANTY OF ANY KIND.

  • Document ID:000020284
  • Creation Date: 11-Jun-2021
  • Modified Date:11-Jun-2021
    • SUSE Enterprise Storage

< Back to Support Search

For questions or concerns with the SUSE Knowledgebase please contact: tidfeedback[at]suse.com

SUSE Support Forums

Get your questions answered by experienced Sys Ops or interact with other SUSE community experts.

Join Our Community

Support Resources

Learn how to get the most from the technical support you receive with your SUSE Subscription, Premium Support, Academic Program, or Partner Program.


SUSE Customer Support Quick Reference Guide SUSE Technical Support Handbook Update Advisories
Support FAQ

Open an Incident

Open an incident with SUSE Technical Support, manage your subscriptions, download patches, or manage user access.

Go to Customer Center