NFS file system is hung. New mount attempts hang also.
This document (000019722) is provided subject to the disclaimer at the end of this document.
Network packet captures show that the NFS client is sending out packets destined to the NFS Server's port 2049, but no responses are seen. Often (not always) this will take the form of a TCP SYN packet being sent, but getting no reply. In those (SYN) cases, evidence of the connection attempt may also be seen on the NFS client machine, for an extended period, showing a status of "SYN_SENT". That can be checked with the command:
netstat -nt | grep :2049
There may also be occurrences of the follow message in the NFS client machine's /var/log/messages:
nfs: server <hostname or address> not responding
However, packets on OTHER connections (both new and old) between the same client machine and server machine may be having no problems. This is what sets this scenario apart from most other cases of "nfs: server not responding": There is no global problem effecting ALL communication between the two devices.
NOTE: There can be many different causes of a hanging NFS mount, or of the "not responding" logs. This document discusses only one type of scenario.
For more explanation and other options, see the discussion below:
Something is blocking the individual connection which NFS is attempting to use. It is not necessarily blocking other connections between the two devices. This blockage often comes from a smart router, frontend device, or some other kind of network security device or policy.
The connection being blocked is usually a repeat of a previous connection which the NFS mount was using before it ran into some kind of trouble. As such, this attempt is commonly called "connection reuse". NFS needs to do "connection reuse" fairly often. An NFS mount will initially get established successfully on a somewhat randomly chosen connection definition. For any number of reasons that connection may eventually be interrupted. When the NFS file system is used again, the connection will need to be re-established.
As background information: A unique TCP connection is defined by 4 factors: Client IP address, client port, server IP address, server port. Some protocols use a differently-defined connection each time the make a connection (usually by varying the client's source port number), but NFS sometimes needs to use the same connection it was using before a problem came up, because of the way NFS recovery works.
Many modern smart routers or other security conscious devices will have a "smart connection reuse" feature of some kind. Many such devices treat connection reuse (if it happens within a short time frame) with suspicion and may dynamically start manipulating or blocking connections which are being reused. While it is true that some instances of connection reuse may indicate that malicious activity is going on, connection reuse is not illegal and in fact is very necessary in some cases, such as NFS. Thus, blocking connection reuse can lead to problems for NFS file systems.
The options available to getting past a connection-reuse blockage are:
1. Reset the smart router (or other device) which is blocking the connection between the nfs client and nfs server. Better yet, completely turn off the "smart connection reuse" feature of that device, so this won't happen again in the future.
2. Reboot the NFS client. To explain this need in more detail:
The underlying need is to get the NFS/RPC layers at the NFS client machine to forget the connection definition it was using previously. To do that, it must "forget" the details of this NFS mount. This is extremely difficult, because:
a. Any existing mount that is already suffering from this problem will typically not umount, because the file system is considered "busy".
b. Attempts to use umount -l (lazy umount) or -f (forced umount) may not likely help. Even if the umount -l -f appears to succeed, it may only superficially remove the mounts from a mount list. Data about the mounts will still be held in other locations.
c. Often, multiple different NFS mounts pointing to the same NFS Server will be sharing a connection. There may more than one mount to clear away, before the connection definition will be forgotten.
So in the majority of cases, if option #1 above is not possible, reboot of the client machine will be needed.
To check for current values of certain TCP tuning, give the command:
sysctl -a | less
and then search for the following settings. The values shown below are default. If they have been altered, this could be contributing the problem. They should be reset to these defaults. Permanent configuration for these is usually controlled in /etc/sysctl.conf:
net.ipv4.tcp_tw_recycle = 0
# On newer kernels, this parameter no longer exists, which yields equivalent behavior to "0".
net.ipv4.tcp_tw_reuse = 0
net.ipv4.tcp_retries1 = 3
net.ipv4.tcp_retries2 = 15
This Support Knowledgebase provides a valuable tool for SUSE customers and parties interested in our products and solutions to acquire information, ideas and learn from one another. Materials are provided for informational, personal or non-commercial use within your organization and are presented "AS IS" WITHOUT WARRANTY OF ANY KIND.
- Document ID:000019722
- Creation Date: 21-Sep-2020
- Modified Date:23-Sep-2020
- SUSE Linux Enterprise Server
For questions or concerns with the SUSE Knowledgebase please contact: firstname.lastname@example.org