Unrecoverable NTP time drift in XEN HVM domU if clocksource=tsc after live migration

This document (000020410) is provided subject to the disclaimer at the end of this document.

Environment

SUSE Linux Enterprise Server 15 SP3
SUSE Linux Enterprise Server 15 SP2
SUSE Linux Enterprise Server 15 SP1
SUSE Linux Enterprise Server 15
SUSE Linux Enterprise Server 12 SP5
SUSE Linux Enterprise Server 12 SP4
SUSE Linux Enterprise Server 12 SP3
SUSE Linux Enterprise Server 12 SP2

Situation

The document is only relevant if  clocksource=tsc is used.

Only older hardware is affected, that does not yet support TSC Scaling.

Resolution

The solution is to use chrony because it is more resilient when it comes to CPU clockspeed frequency differences and recovers better from time drifts. 

Cause

During the boot of the host, XEN will estimate the TSC frequency and report it in xl dmesg:  

# xl dmesg | grep Hz
  (XEN) Platform timer is 14.318MHz HPET
  (XEN) Detected 2399.999 MHz processor.


The MHz value varies across reboots and is not 100% accurate. The value is used to decide if TSC access needs to be emulated. If source and destination hosts calculated a different value, TSC emulation will be used.

SUSE introduced the parameter suse_vtsc_tolerance=KHz to allow for some variance between CPU clockspeed frequencies when migrating between hardware. It relaxes this check to provide native, unemulated access to the RTC.

If the value is too large, live migration will take place but might also cause a NTP drift within a HVM domU that is irrecoverable. ntpd can only handle differences up to 500ppm.
1ppm is one microsecond difference per second.
500ppm equates roughly to ~1MHz at a CPU base frequency of 2GHz.
If 500ppm is exceeded ntpd silently stops working.

During the boot of the dom0 the kernel will also estimate the TSC frequency and report it in dmesg:

# dmesg | grep -Ei '(clock|tsc|hz)'
...
[    0.012021] tsc: Detected 2399.998 MHz processor
...


A NTP client can run in dom0 to compare the speed of the system clock with the speed of an external reference clock. The calculated drift will be used to adjust the advancing of time. Both ntpd and chronyd write the drift to a file at regular intervals. Extra logging can be enabled to see how the drift varies over time (ntpd writes statistics to "loopstats", chronyd writes them to "tracking.log").

The real TSC frequency can be estimated from the reported MHz value and the drift value.

Similar to dom0, a HVM domU will also estimate the TSC frequency during boot time. The assumed frequency will be used for the entire runtime of the domU. In case the clocksource is forced to be 'tsc' instead of 'xen' (via cmdline option "clocksource=tsc nohz=off highres=off"), the actual hardware frequency of the TSC is expected to remain the same.

Based on the reported MHz values and the reported drift on both dom0s, the drift within domU after the live migration can be estimated. The domU kernel expects the TSC to advance in a certain speed. If the frequency differs on the other host, time will run faster if the frequency is lower, and it will run slower if the frequency is higher. A NTP client can correct such differences up to a certain point. chronyd can handle a significant higher drift than ntpd.

The drift is in ppm.

src_dom0_real_Hz   ?dst_dom0_real_Hz?      (us+drift)*src_dom0_real_Hz
---------------- = ------------------- === ---------------------------
      us               us+drift                       us


Maximal target frequency for migration from a 2500 MHz host:
(((1000*1000)+500)* (2.5*1000*1000*1000)) / (1000*1000) = 2501.250 MHz

To enable tracking in ntpd, add this to ntp.conf:
  statsdir /var/lib/ntp/drift/
  filegen loopstats  file loopstats  type day enable


In case apparmor is enabled, ntpd lacks permissions to write stats

To enable tracking in chrondy, add this to chrony.conf:
  logdir /var/log/chrony
  log tracking


Restart the daemons.

The statistic files will be updated in varying intervals.

Disclaimer

This Support Knowledgebase provides a valuable tool for SUSE customers and parties interested in our products and solutions to acquire information, ideas and learn from one another. Materials are provided for informational, personal or non-commercial use within your organization and are presented "AS IS" WITHOUT WARRANTY OF ANY KIND.

  • Document ID:000020410
  • Creation Date: 08-Oct-2021
  • Modified Date:08-Oct-2021
    • SUSE Linux Enterprise Server

< Back to Support Search

For questions or concerns with the SUSE Knowledgebase please contact: tidfeedback@suse.com

SUSE Support Forums

Get your questions answered by experienced Sys Ops or interact with other SUSE community experts.

Join Our Community

Support Resources

Learn how to get the most from the technical support you receive with your SUSE Subscription, Premium Support, Academic Program, or Partner Program.


SUSE Customer Support Quick Reference Guide SUSE Technical Support Handbook Update Advisories
Support FAQ

Open an Incident

Open an incident with SUSE Technical Support, manage your subscriptions, download patches, or manage user access.

Go to Customer Center