Cephadm host check intermittently fails
This document (000021074) is provided subject to the disclaimer at the end of this document.
Environment
SUSE Enterprise Storage 7
Situation
After the timeout values for sshd have been changed to
ClientAliveInterval 600
ClientAliveCountMax 0
on a node, "
ceph -s" shows
health: HEALTH_WARN
1 hosts fail cephadm check
roughly every twenty minutes.
Resolution
Set ClientAliveInterval in /etc/ssh/sshd_config to a value different than 600 and restart the sshd service.
Cause
The OpenSSH version on SUSE Linux Enterprise 15 SP2, on which SUSE Enterprise Storage 7 is based, and earlier will terminate connections immediately after ClientAliveInterval seconds if ClientAliveCountMax is set to zero. The host check interval by the Ceph orchestrator is ten minutes. If ClientAliveInterval is set to 600 seconds (ten minutes), sshd will likely terminate the existing connection just as Ceph is trying to perform a host check, which can result in a spurious host check failure. Setting ClientAliveInterval to any other value avoids this situation.
Additional Information
This issue does not happen with SUSE Enterprise Storage 7.1, which is based on SUSE Linux Enterprise 15 SP3, due to a newer OpenSSH version. ClientAliveCountMax set to zero disables connection termination there.
Disclaimer
This Support Knowledgebase provides a valuable tool for SUSE customers and parties interested in our products and solutions to acquire information, ideas and learn from one another. Materials are provided for informational, personal or non-commercial use within your organization and are presented "AS IS" WITHOUT WARRANTY OF ANY KIND.
- Document ID:000021074
- Creation Date:
17-May-2023
- Modified Date:17-May-2023
-
< Back to Support Search
For questions or concerns with the SUSE Knowledgebase please contact: tidfeedback[at]suse.com