Case files of a TSE: Would you have the time?

October 13, 2021 | By: Anthony Stalker

This is the first part of a series that attempts to showcase the kind of work that SUSE Support does and how we help customers resolve issues when running SUSE Products. The cases that are selected will be based on real cases. However, all details will be fully anonymized and stripped of identifying marks.

This is a case where the time from when I took it to when it was resolved happened to be about half an hour. Being half an hour late might not mean a lot to some people, but computer systems are much more sensitive to time and need it to be accurate and synchronized. That’s why it’s crucial to have a solid NTP (Network Time Protocol) infrastructure. This case shows how important attention to detail can be when troubleshooting a system.

Customer’s description:

We configured our server in yast to use only local ntp server as time sources. It looks that yast wrote the config to /etc/chrony.conf:

pool ntp01.customer.com iburst
pool ntp02.customer.com iburst

But I see on firewall plenty of drops while the server tries to reach ntp sources outside in internet.

Why? How can I stop polling server outside of two defined in conf?

There was also an output of chronyc> sources from the chrony shell that showed polls of some NTP sources. It contained reachable local pools and some unreachable internet sources. For more information on interpreting the output of: `ntp -q` and `chronyc sources -v` please see a TID that I wrote on the subject.

Notes on timekeeping

What the customer wants to achieve here is excellent practice in timekeeping. It’s correct that they want to use local time sources. This has the potential to improve security, and it’s the only way to give time to servers which are not connected to the internet. Perhaps the greatest advantage is that we would expect the local time sources to be consistent to each other and while having accurate time is important, having a consistent relative time across the network is most important.

Not so good practice is having 1 (no redundancy, but also there’s no dispute of the time) or 3 time-servers configured. Two is probably the worst number of time-servers, since if they tell different time, there will be no possibility of consensus. It’s one of the reasons we always want at least 3 nodes in a High Availability Cluster and why 2 is the worst number of hardware watchdog devices in a cluster.

What the customer has configured here, though, are 2 pools of time-servers. A pool is basically just a list of servers, so if one of the servers shows a time that’s really off or if it’s unreachable, NTP will just get the time from the next server in the “list.” So, what the customer wants here is very reasonable. They’ve configured some local time-servers, but have all these unreachable internet time sources polluting their configuration.

However, they can’t figure out where they are all coming from. Quite reasonably, they wrote in a specific question: Why is the system trying to reach these internet time sources? How do I fix it?

The problem and the fix

What the customer didn’t notice is that they had an “include” directive in their /etc/chrony.conf configuration:

# Also include any directives found in configuration files in /etc/chrony.d
include /etc/chrony.d/*.conf

And there is a drop in file called ‘/etc/chrony.d/pool.conf’ present with the contents:

pool 2.opensuse.pool.ntp.org iburst

What was happening is that chrony parsed the /etc/chrony.conf, hit the “include” directive, followed it and included the internet pool.

The likely origin is that the customer left a box ticked with synchronize to internet time-servers, when setting up the NTP service with YaST.

Now that we know what the problem was, how do we fix it? Since this is Linux, it’s straightforward and effortless. What I suggested was either commenting out either the ‘include’ line, which would tell chrony to forget anything outside /etc/chrony.conf that is fine if there’s no other include files providing important functionality. The other thing that the customer could do is delete the ‘/etc/chrony.d/pool.conf’ file completely or comment out its only line.

Practice point:

You will often hear from SUSE Support when we’re troubleshooting, to “comment out” one or several lines from a file. What does this mean, and why do we do it?

Commenting out a line in a configuration means putting a comment character at the beginning of a line. This is most often, but not always, “#”. In samba configurations, “;” is preferred for comments, for example.

cat /etc/fstab

# This text can be considered inert.
# Nothing on this line will be parsed as part of the configuration. 
# This can be useful to document changes and deviations from standard configurations.

UUID=f8626266-d78c-4c21-bc08-962bde178cdb  /                       btrfs  defaults                      0  0
UUID=7facfae7-41e5-4699-9d78-555b032be5c8  swap                    swap   defaults                      0  0
UUID=f8626266-d78c-4c21-bc08-962bde178cdb  /var                    btrfs  subvol=/@/var                 0  0
UUID=f8626266-d78c-4c21-bc08-962bde178cdb  /usr/local              btrfs  subvol=/@/usr/local           0  0
UUID=f8626266-d78c-4c21-bc08-962bde178cdb  /tmp                    btrfs  subvol=/@/tmp                 0  0
UUID=f8626266-d78c-4c21-bc08-962bde178cdb  /srv                    btrfs  subvol=/@/srv                 0 0
UUID=f8626266-d78c-4c21-bc08-962bde178cdb  /root                   btrfs  subvol=/@/root                0  0
UUID=f8626266-d78c-4c21-bc08-962bde178cdb  /opt                    btrfs  subvol=/@/opt                 0  0
UUID=f8626266-d78c-4c21-bc08-962bde178cdb  /home                   btrfs  subvol=/@/home                0  0
UUID=f8626266-d78c-4c21-bc08-962bde178cdb  /boot/grub2/x86_64-efi  btrfs  subvol=/@/boot/grub2/x86_64-efi  0  0
UUID=f8626266-d78c-4c21-bc08-962bde178cdb  /boot/grub2/i386-pc     btrfs  subvol=/@/boot/grub2/i386-pc  0  0
#UUID=6CC5-4290                             /boot/efi               vfat   defaults                      0  2

In the case the /boot/efi filesystem line is “commented out” meaning that if e.g., I run a mount -a command, the system will not attempt to mount it. We use this to isolate the issue and quickly adjust configuration items without overwriting them.

Conclusion

The customer wrote back shortly that to confirm he was satisfied with the solution and how we handled the case.

Sometimes the solution can be easy, but spotting the problem can require attention to detail, close reading and a dash of knowing where to look.

Have you ever seen an issue where all it took to fix it was changing one character or removing a line? Did you ever read a config file over and over again, only for a colleague to point out the issue at first glance? Were you that colleague who spotted it? I know I’ve been both. Please don’t hesitate to share your stories in the comments below!

Oct 07th, 2022

Case files of a TSE: Would you have the time?

Customer’s description:

Notes on timekeeping

The problem and the fix

Practice point:

Conclusion

Related Articles

Meet the latest SUSE documentation “accrual”

SUSE Linux Enterprise Micro 5.3 Public Beta (Beta 2) is out!

Unlock the Easiest Path to HA SQL Server in Kubernetes

Test and Configure Active Security for Containers and Kubernetes on Rancher Desktop via Calico Cloud

Leave a Reply Cancel reply

Case files of a TSE: Would you have the time?

Customer’s description:

Notes on timekeeping

The problem and the fix

Practice point:

Conclusion

Related Articles

Meet the latest SUSE documentation “accrual”

SUSE Linux Enterprise Micro 5.3 Public Beta (Beta 2) is out!

Unlock the Easiest Path to HA SQL Server in Kubernetes

Test and Configure Active Security for Containers and Kubernetes on Rancher Desktop via Calico Cloud

Leave a Reply Cancel reply

Business-Critical Linux

Enterprise Container Management

Edge

Solutions

Industries

Support

Services

Resources

Partners

Communities

About