Troubleshooting failures caused by network issues outside of the OS

This document (000019863) is provided subject to the disclaimer at the end of this document.

Environment

SUSE Linux Enterprise Server 15
SUSE Linux Enterprise Server 12
SUSE Linux Enterprise in the Public Cloud

Situation

An unexpected service outage happens. No relevant logs can be found within the OS indicating a connection or system problem except for failures of services that rely on the network. System recovers after a reboot. This may particularly impact workloads in Public Cloud environments.

Resolution

Collect network state information directly from the kernel on an ongoing basis.  This can be used to prove that service issues started after an external network failure. 

From the command line, set up a bash script to detect a previous rtmon.log file, rotate it out, and then execute rtmon:
echo '#!/bin/bash 
# Per SUSE TID 000019863

timestamp=$(date +%Y%m%d-%H%M)

if [ -f /var/log/rtmon.log ]; then
  mv /var/log/rtmon.log /var/log/rtmon.log-$timestamp && xz -z /var/log/rtmon.log-$timestamp
fi

/usr/sbin/rtmon file /var/log/rtmon.log' > /usr/local/sbin/rtmon.sh

Mark the new rtmon.sh script as executable:
chmod +x /usr/local/sbin/rtmon.sh

Create a systemd service unit configuration:
echo '[Unit] 
Description="RTNetlink Monitor Daemon" 

[Service] 
ExecStart=/usr/local/sbin/rtmon.sh

[Install] 
WantedBy=network.target' > /etc/systemd/system/rtmon.service

Then enable and start the rtmon.service:
systemctl enable rtmon --now

The logfile is a binary so to view the output of the logfile, run:
ip monitor file /var/log/rtmon.log

Cause

External network outage.

Additional Information

The events can be monitored using the `ip monitor` utility which shows them in similar way as in `ip link show` (and `ip addr show`):
# ip link show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000
    link/ether 00:0d:3a:c6:47:fd brd ff:ff:ff:ff:ff:ff

that is, they're noted as:
<ifindex>: <ifname[@link]>: <ifflags> <attributes>
where:
- <ifindex> interface number
- <ifname> currently assigned interface name (can't be changed when UP)
- [@link] is the binding to the underlying interface,
   e.g. vlan interface `bond0.42` on top of `bond0`
- <ifflags>:
    - UP: administratively enabled/started/`ip link set up`
    - LOWER_UP: carrier detected (on an UP interface)
    - NO-CARRIER: inverted LOWER_UP, when LOWER_UP is not set.
- <attributes>
     - Device attributes such as mtu, qlen, operational link state, etc.

When the carrier is detected / inherited from underlying interfaces,
the kernel sends a NEWLINK message with IFF_UP | IFF_LOWER_UP bit set. 
When carrier is lost, the kernel sends a NEWLINK message without IFF_LOWER_UP,
and on `ip link set down` without the IFF_UP flag.

More info about interface states (and further operstates) can be found here:

https://www.kernel.org/doc/Documentation/networking/operstates.txt

Disclaimer

This Support Knowledgebase provides a valuable tool for SUSE customers and parties interested in our products and solutions to acquire information, ideas and learn from one another. Materials are provided for informational, personal or non-commercial use within your organization and are presented "AS IS" WITHOUT WARRANTY OF ANY KIND.

  • Document ID:000019863
  • Creation Date: 04-Mar-2021
  • Modified Date:05-Mar-2021
    • SUSE Linux Enterprise Server
    • SUSE Linux Enterprise Server for SAP Applications

< Back to Support Search

For questions or concerns with the SUSE Knowledgebase please contact: tidfeedback@suse.com

SUSE Support Forums

Get your questions answered by experienced Sys Ops or interact with other SUSE community experts.

Join Our Community

Support Resources

Learn how to get the most from the technical support you receive with your SUSE Subscription, Premium Support, Academic Program, or Partner Program.


SUSE Customer Support Quick Reference Guide SUSE Technical Support Handbook Update Advisories
Support FAQ

Open an Incident

Open an incident with SUSE Technical Support, manage your subscriptions, download patches, or manage user access.

Go to Customer Center