NFS file system is hung. New mount attempts hang also.

This document (000019722) is provided subject to the disclaimer at the end of this document.

Environment

SUSE Linux Enterprise Server

Situation

An NFS mount appears to be hung or stalled indefinitely.  New NFS mount attempts, pointing to the same server, may also hang.

Network packet captures show that the NFS client is sending out packets destined to the NFS Server's port 2049, but no responses are seen.  Often (not always) this will take the form of a TCP SYN packet being sent, but getting no reply.  In those (SYN) cases, evidence of the connection attempt may also be seen on the NFS client machine, for an extended period, showing a status of "SYN_SENT".  That can be checked with the command:

netstat -nt | grep :2049

There may also be occurrences of the follow message in the NFS client machine's /var/log/messages:

nfs: server <hostname or address> not responding

However, packets on OTHER connections (both new and old) between the same client machine and server machine may be having no problems.  This is what sets this scenario apart from most other cases of "nfs: server not responding":  There is no global problem effecting ALL communication between the two devices.

NOTE:  There can be many different causes of a hanging NFS mount, or of the "not responding" logs.  This document discusses only one type of scenario.

Resolution

Short term solution:  Reboot of the NFS client machine will likely be necessary.

For more explanation and other options, see the discussion below:

Something is blocking the individual connection which NFS is attempting to use.  It is not necessarily blocking other connections between the two devices.  This blockage often comes from a smart router, frontend device, or some other kind of network security device or policy.

The connection being blocked is usually a repeat of a previous connection which the NFS mount was using before it ran into some kind of trouble.  As such, this attempt is commonly called "connection reuse".   NFS needs to do "connection reuse" fairly often.  An NFS mount will initially get established successfully on a somewhat randomly chosen connection definition.  For any number of reasons that connection may eventually be interrupted.  When the NFS file system is used again, the connection will need to be re-established.

As background information:  A unique TCP connection is defined by 4 factors:  Client IP address, client port, server IP address, server port.  Some protocols use a differently-defined connection each time the make a connection (usually by varying the client's source port number), but NFS sometimes needs to use the same connection it was using before a problem came up, because of the way NFS recovery works.

Many modern smart routers or other security conscious devices will have a "smart connection reuse" feature of some kind.  Many such devices treat connection reuse (if it happens within a short time frame) with suspicion and may dynamically start manipulating or blocking connections which are being reused.  While it is true that some instances of connection reuse may indicate that malicious activity is going on, connection reuse is not illegal and in fact is very necessary in some cases, such as NFS.  Thus, blocking connection reuse can lead to problems for NFS file systems.

The options available to getting past a connection-reuse blockage are:

1.  Reset the smart router (or other device) which is blocking the connection between the nfs client and nfs server.  Better yet, completely turn off the "smart connection reuse" feature of that device, so this won't happen again in the future.

-OR-

2.  Reboot the NFS client.  To explain this need in more detail:

The underlying need is to get the NFS/RPC layers at the NFS client machine to forget the connection definition it was using previously.  To do that, it must "forget" the details of this NFS mount.  This is extremely difficult, because:

a.  Any existing mount that is already suffering from this problem will typically not umount, because the file system is considered "busy".

b.  Attempts to use umount -l (lazy umount) or -f (forced umount) may not likely help.  Even if the umount -l -f appears to succeed, it may only superficially remove the mounts from a mount list.  Data about the mounts will still be held in other locations.

c.  Often, multiple different NFS mounts pointing to the same NFS Server will be sharing a connection.  There may more than one mount to clear away, before the connection definition will be forgotten.

So in the majority of cases, if option #1 above is not possible, reboot of the client machine will be needed.

Cause

"Smart Connection Reuse" features of various "Smart Routers" or other security-conscious devices.

Status

Top Issue

Additional Information

Some forms of TCP kernel tuning may make connection reuse happen more often, and therefore increase the potential for blockage to arise.

To check for current values of certain TCP tuning, give the command:

sysctl -a | less

and then search for the following settings.  The values shown below are default.  If they have been altered, this could be contributing the problem.  They should be reset to these defaults.  Permanent configuration for these is usually controlled in /etc/sysctl.conf:

net.ipv4.tcp_tw_recycle = 0
# On newer kernels, this parameter no longer exists, which yields equivalent behavior to "0".

net.ipv4.tcp_tw_reuse = 0

net.ipv4.tcp_retries1 = 3

net.ipv4.tcp_retries2 = 15

Disclaimer

This Support Knowledgebase provides a valuable tool for SUSE customers and parties interested in our products and solutions to acquire information, ideas and learn from one another. Materials are provided for informational, personal or non-commercial use within your organization and are presented "AS IS" WITHOUT WARRANTY OF ANY KIND.

  • Document ID:000019722
  • Creation Date: 21-Sep-2020
  • Modified Date:23-Sep-2020
    • SUSE Linux Enterprise Server

< Back to Support Search

For questions or concerns with the SUSE Knowledgebase please contact: tidfeedback@suse.com

SUSE Support Forums

Get your questions answered by experienced Sys Ops or interact with other SUSE community experts.

Join Our Community

Support Resources

Learn how to get the most from the technical support you receive with your SUSE Subscription, Premium Support, Academic Program, or Partner Program.


SUSE Customer Support Quick Reference Guide SUSE Technical Support Handbook Update Advisories
Support FAQ

Open an Incident

Open an incident with SUSE Technical Support, manage your subscriptions, download patches, or manage user access.

Go to Customer Center