NFS file system hangs. New mount attempts hang also.

This document (000019722) is provided subject to the disclaimer at the end of this document.

Environment

SUSE Linux Enterprise Server

Situation

An NFS mount appears to be hung or stalled indefinitely. New NFS mount attempts, pointing to the same server, may also hang.

Network packet captures show that the NFS client is sending out packets destined to the NFS Server's port 2049, but no responses are seen. Often (but not always) this will take the form of a TCP SYN packet being sent but getting no reply. In those specific cases (no reply to SYN), evidence of the connection attempt may be seen on the NFS client machine with the commands:

netstat -nt | grep :2049
ss -nt | grep :2049

If one or more connections show the status "SYN_SENT" for an extended period of time, then something is blocking these attempts. Normally, a connection will only be in this status for a fraction of a second, because the other side of the connection will reply and this status will change.

There might also be occurrences of the following message in the NFS client machine's /var/log/messages:

nfs: server <hostname or address> not responding

However, packets on other connections (both new and old) between the same client machine and server machine may be having no problems. This is what sets this scenario apart from most other cases of "nfs: server not responding": There is not a global problem effecting all communication between the two devices.

NOTE: There can be many different causes of a hanging NFS mount, or of the "not responding" logs. This document discusses only one unique type of scenario.

Resolution

If the problem has already occurred, reboot of the NFS client machine will likely be necessary, though some other options exist. Read the rest of this section plus the "Cause" section for various options and details.

The unique scenario described above happens because many modern firewalls and smart routers will detect and block TCP connection reuse, even though connection reuse is a practice which NFS has traditionally relied upon it.

The evolution of the NFS specification plus recent code changes have allowed NFS 4.1 and 4.2 clients to stop relying on connection reuse. So if either of those versions of NFS are used, and if new enough updates are present (late 2021), this issue can be avoided. The change is present as follows:

SLES 15 SP4 and SP5: Present in the originally shipping kernel-default.
SLES 15 SP3: Present beginning in kernel-default-5.3.18-59.24.1.
SLES 15 SP2: Present beginning in kernel-default-5.3.18-24.83.2.
SLES 12 SP5: Present beginning in kernel-default-4.12.14-122.77.1

This change / update is needed on NFS Client systems, not on NFS Servers.

For SUSE Linux products other than regular SLES, or for different flavors of the kernel, you can also check for these changes within the output of the following command (replace "<kernel-package-name>" with the installed package name):

rpm -q --changelog <kernel-package-name> | grep -A2 -B2 1186264

- SUNRPC: prevent port reuse on transports which don't request it
  (bnc#1186264 bnc#1189021).
- commit a89b568
- kabi fix for NFSv4.1: Don't rebind to the same source port when
  reconnecting to the server
  (bnc#1186264 bnc#1189021)
- commit 844eb4c
- NFSv4.1: Don't rebind to the same source port when
  (bnc#1186264 bnc#1189021)
- commit 4b89a40

For NFS v4.0, v3, v2, or for older Linux distributions where the change is not available, the options for dealing with connection-reuse blockage are:

1. Reset (power cycle) the smart router (or other device) which has decided to block the specific reused connection between the nfs client and nfs server. Better yet, completely turn off the "smart connection reuse" feature of that device, so this won't happen again in the future.

2. If the NFS file system is mounted again with a different NFS version, a new connection definition will be used and the existing blockage can be avoided. This trick will only work once per version (until the NFS client is rebooted). The newly chosen version may eventually become blocked as well, for the same reason. NFS version is controlled with the mount option nfsvers= or vers= and can be set to 3, 4.0, 4.1, or 4.2. (On older Linux distributions such as SLES 11, mounting with minor versions needs to be specified in the format "vers=4,minorversion=1").

Mounting with another version may allow work to resume without reboot, but applications may still experience disruption upon this transition, because any resources, files, locks, etc. which were held will be lost and have to be obtained again. Chances are, however, that those resources were already lost after the initial blockage. Additionally, different versions support different features. In rare scenarios, functionality may change upon switching versions.

3. Reboot the NFS client. In theory, merely doing successful umount of an individual NFS client machine's mounts (all its mounts which point to the same NFS Server) may be enough to get around this problem, but this is not always as easy as it sounds. To explain in more detail:

The underlying need is to get the NFS/RPC layers at the NFS client machine to forget the connection definition used previously. To do that, it must forget the details of the NFS mount. This is extremely difficult, because:

a. Any existing mount that is already suffering from this problem will typically not umount, because the file system is considered "busy".

b. If the mounts is considered "busy", some administrators might attempt to use umount -l (lazy umount) or -f (forced umount). But these often do not accomplish a clean umount, even if they appear to. These methods might only superficially remove the mounts from a mount list. Data about the mounts (including knowledge of the source port used previously) may still be held in memory, so attempts to mount again afterwards may still fail.

c. Often, different NFS mounts pointing to the same NFS Server will be sharing a connection. More than one mount may need to be cleared before the connection definition will be forgotten.

So in many cases, reboot of the client machine will be necessary.

Cause

Something is blocking the NFS client's communication attempts with the NFS Server. This is being blocked as an individual connection. Other connections between the two devices are typically unaffected. This blockage often comes from a smart router, frontend device, firewall, or some other kind of network security device or policy.

The connection being blocked is usually a repeat of a previous connection which the NFS client was using before it ran into some temporary trouble. As such, this repeat is commonly called "connection reuse". NFS needs to do connection reuse fairly often. An NFS mount will initially use a somewhat randomly chosen connection definition. For any number of reasons, that connection may eventually be interrupted. When the NFS client code attempts to re-establish the connection, the same connection definition will be used.

As background information: A unique TCP connection is defined by 4 factors: client IP address, client port, server IP address, server port. Some protocols use a differently-defined connection each time they make a connection, usually by varying the client's source port number. In contrast, NFS traditionally needs precisely the same connection definition it was using before a problem came up, because of the way NFS recovery works in NFS versions 2, 3, and 4.0.

Many modern smart routers or other security-conscious devices have a "smart connection reuse" feature. Those devices can detect connection reuse and may treat it with suspicion if it happens within a short time frame. They may dynamically start manipulating or blocking such connections. While it is true that some instances of connection reuse may be a clue that malicious activity is present, in many cases the communication is it safe and necessary. Therefore, blocking connection reuse can lead to problems for NFS file systems.

Status

Top Issue

Additional Information

Some forms of TCP kernel tuning may make NFS Client's connection reuse happen more frequently, and therefore increase the potential for blockage to arise.

To check for current values of certain TCP tuning, give the command:

sysctl -a | less

Then search for the following settings. The values shown below are default. If they have been altered, they could be contributing to the problem. Returning to the default is recommended. Permanent configuration for these is usually controlled in /etc/sysctl.conf, though other utilities (sapconf, saptune, tuned, and others) may be altering them.

net.ipv4.tcp_tw_recycle = 0
# In newer kernels, this parameter no longer exists, which yields equivalent behavior to "0".

net.ipv4.tcp_tw_reuse = 0
# In newer kernels, the default has changed to 2. It is generally safe to use either 0 or 2, but 1 can bring about various problems.

net.ipv4.tcp_retries1 = 3

net.ipv4.tcp_retries2 = 15
# NOTE: In other problem scenarios, it may have been decided to use a value lower than the default of 15. Lowering this setting can be risky. Changes which help one problem can cause other problems. If you feel you must set this value lower than the default of 15, it is still recommended to use at least 8. RFC 1122 recommends the length of certain timeouts, and those recommendations cannot be met with a tcp_retries2 value lower than 8.

For anyone who is using a non-SUSE distribution and may be wondering how to get the NFS 4.1 and 4.2 client changes that eliminate connection reuse, those are fully present in upstream kernel 5.15. Of course, any other distributor might back-port these changes in to older kernels, just as SUSE has.

Disclaimer

This Support Knowledgebase provides a valuable tool for SUSE customers and parties interested in our products and solutions to acquire information, ideas and learn from one another. Materials are provided for informational, personal or non-commercial use within your organization and are presented "AS IS" WITHOUT WARRANTY OF ANY KIND.