Case files of a TSE: Would you have the time?
October 13, 2021 | By: Anthony Stalker
This is the first part of a series that attempts to showcase the kind of work that SUSE Support does and how we help customers resolve issues when running SUSE Products. The cases that are selected will be based on real cases. However, all details will be fully anonymized and stripped of identifying marks.
This is a case where the time from when I took it to when it was resolved happened to be about half an hour. Being half an hour late might not mean a lot to some people, but computer systems are much more sensitive to time and need it to be accurate and synchronized. That’s why it’s crucial to have a solid NTP (Network Time Protocol) infrastructure. This case shows how important attention to detail can be when troubleshooting a system.
We configured our server in yast to use only local ntp server as time sources. It looks that yast wrote the config to /etc/chrony.conf:
pool ntp01.customer.com iburst pool ntp02.customer.com iburst
But I see on firewall plenty of drops while the server tries to reach ntp sources outside in internet.
Why? How can I stop polling server outside of two defined in conf?
There was also an output of chronyc> sources from the chrony shell that showed polls of some NTP sources. It contained reachable local pools and some unreachable internet sources. For more information on interpreting the output of: `ntp -q` and `chronyc sources -v` please see a TID that I wrote on the subject.
Notes on timekeeping
What the customer wants to achieve here is excellent practice in timekeeping. It’s correct that they want to use local time sources. This has the potential to improve security, and it’s the only way to give time to servers which are not connected to the internet. Perhaps the greatest advantage is that we would expect the local time sources to be consistent to each other and while having accurate time is important, having a consistent relative time across the network is most important.
Not so good practice is having 1 (no redundancy, but also there’s no dispute of the time) or 3 time-servers configured. Two is probably the worst number of time-servers, since if they tell different time, there will be no possibility of consensus. It’s one of the reasons we always want at least 3 nodes in a High Availability Cluster and why 2 is the worst number of hardware watchdog devices in a cluster.
What the customer has configured here, though, are 2 pools of time-servers. A pool is basically just a list of servers, so if one of the servers shows a time that’s really off or if it’s unreachable, NTP will just get the time from the next server in the “list.” So, what the customer wants here is very reasonable. They’ve configured some local time-servers, but have all these unreachable internet time sources polluting their configuration.
However, they can’t figure out where they are all coming from. Quite reasonably, they wrote in a specific question: Why is the system trying to reach these internet time sources? How do I fix it?
The problem and the fix
What the customer didn’t notice is that they had an “include” directive in their /etc/chrony.conf configuration:
# Also include any directives found in configuration files in /etc/chrony.d include /etc/chrony.d/*.conf
And there is a drop in file called ‘/etc/chrony.d/pool.conf’ present with the contents:
pool 2.opensuse.pool.ntp.org iburst
What was happening is that chrony parsed the /etc/chrony.conf, hit the “include” directive, followed it and included the internet pool.
The likely origin is that the customer left a box ticked with synchronize to internet time-servers, when setting up the NTP service with YaST.
Now that we know what the problem was, how do we fix it? Since this is Linux, it’s straightforward and effortless. What I suggested was either commenting out either the ‘include’ line, which would tell chrony to forget anything outside /etc/chrony.conf that is fine if there’s no other include files providing important functionality. The other thing that the customer could do is delete the ‘/etc/chrony.d/pool.conf’ file completely or comment out its only line.
You will often hear from SUSE Support when we’re troubleshooting, to “comment out” one or several lines from a file. What does this mean, and why do we do it?
Commenting out a line in a configuration means putting a comment character at the beginning of a line. This is most often, but not always, “#”. In samba configurations, “;” is preferred for comments, for example.
cat /etc/fstab # This text can be considered inert. # Nothing on this line will be parsed as part of the configuration. # This can be useful to document changes and deviations from standard configurations. UUID=f8626266-d78c-4c21-bc08-962bde178cdb / btrfs defaults 0 0 UUID=7facfae7-41e5-4699-9d78-555b032be5c8 swap swap defaults 0 0 UUID=f8626266-d78c-4c21-bc08-962bde178cdb /var btrfs subvol=/@/var 0 0 UUID=f8626266-d78c-4c21-bc08-962bde178cdb /usr/local btrfs subvol=/@/usr/local 0 0 UUID=f8626266-d78c-4c21-bc08-962bde178cdb /tmp btrfs subvol=/@/tmp 0 0 UUID=f8626266-d78c-4c21-bc08-962bde178cdb /srv btrfs subvol=/@/srv 0 0 UUID=f8626266-d78c-4c21-bc08-962bde178cdb /root btrfs subvol=/@/root 0 0 UUID=f8626266-d78c-4c21-bc08-962bde178cdb /opt btrfs subvol=/@/opt 0 0 UUID=f8626266-d78c-4c21-bc08-962bde178cdb /home btrfs subvol=/@/home 0 0 UUID=f8626266-d78c-4c21-bc08-962bde178cdb /boot/grub2/x86_64-efi btrfs subvol=/@/boot/grub2/x86_64-efi 0 0 UUID=f8626266-d78c-4c21-bc08-962bde178cdb /boot/grub2/i386-pc btrfs subvol=/@/boot/grub2/i386-pc 0 0 #UUID=6CC5-4290 /boot/efi vfat defaults 0 2
In the case the /boot/efi filesystem line is “commented out” meaning that if e.g., I run a mount -a command, the system will not attempt to mount it. We use this to isolate the issue and quickly adjust configuration items without overwriting them.
The customer wrote back shortly that to confirm he was satisfied with the solution and how we handled the case.
Sometimes the solution can be easy, but spotting the problem can require attention to detail, close reading and a dash of knowing where to look.
Have you ever seen an issue where all it took to fix it was changing one character or removing a line? Did you ever read a config file over and over again, only for a colleague to point out the issue at first glance? Were you that colleague who spotted it? I know I’ve been both. Please don’t hesitate to share your stories in the comments below!
(Visited 1 times, 1 visits today)
Anthony StalkerI'm a Frontline Technical Support Engineer working in the EMEA region. If you open a Support Case with SUSE for a technical problem, maybe I will be the person who helps you get to a solution.
No comments yet