Unstable clock source issues
This document (000021799) is provided subject to the disclaimer at the end of this document.
Environment
SUSE Linux Enterprise Server 15
SUSE Linux Enterprise Server 12
Situation
There are many instances where the kernel may mark the clock source as unstable.
Resolution
The following released code updates help to reduce the occurrence of a clock source being marked as unstable:
commit c86ff8c55b8a ("clocksource: Avoid accidental unstable marking of clocksources") introduced some protection by relaxing the watchdog process when the system is loaded, so it is less likely that it will trigger as a false positive and switch the clocksource. This is backported to most supported SLE branches.
commit c37e85c135ce ("clocksource: Loosen clocksource watchdog constraints") increased the watchdog timeout for similar reasons. This is present in SLE15-SP6 and later kernels.
Cause
This problem can be seen as a side-effect of kernel stalls. It has been seen in a particular part of the block layer due to lost wake ups etc.
End manufacturer BIOS, hardware firmware and other code has also been seen as a cause of some clock stability issues. It is also possible for external factors to contribute to an unstable clock source.
Additional Information
If a server reports an unstable clocksource on multiple occasions (and the server still exhibits the same issues after being fully patched, including all hardware BIOS, UEFI and firmware updates), then it is important not to ignore the problem and to attempt to identify the root cause.
It is useful during the troubleshooting, to try and prove if the clock source is considered stable during the boot process and only becomes unreliable at some later point in time.
If the root cause can not be identified, the issue should be raised with SUSE.
Note that the 'tsc=nowatchdog' / 'tsc=reliable' setting should only be used as a temporary workaround and can not be set whilst troubleshooting the root cause as it will hide the real problem.
Other related code changes:
commit b50db7095fe0 ("x86/tsc: Disable clocksource watchdog for TSC on qualified platorms")
commit b4bac279319d ("x86/tsc: Use topology_max_packages() to get package number")
Other related documentation:
TSC Clocksource Switching to HPET During High I/O Load
troubleshooting: broken TSC/clocksource
Note that the 'troubleshooting: broken TSC/clocksource' document mostly deals with nohz_full, which is a rare use case, but section 2.6 is relevant.
Disclaimer
This Support Knowledgebase provides a valuable tool for SUSE customers and parties interested in our products and solutions to acquire information, ideas and learn from one another. Materials are provided for informational, personal or non-commercial use within your organization and are presented "AS IS" WITHOUT WARRANTY OF ANY KIND.
- Document ID:000021799
- Creation Date: 22-Apr-2025
- Modified Date:15-May-2025
-
- SUSE Linux Enterprise Server
For questions or concerns with the SUSE Knowledgebase please contact: tidfeedback[at]suse.com