SUSE Linux Enterprise Server 11 Service Pack 3 (SLES 11 SP3)
SUSE Linux Enterprise Server 11 Service Pack 2 (SLES 11 SP2)
A SLES 11 SP2 or SP3 system (typically using kernel 3.0.80-x or newer) is acting as an NFS client. In other words, it has mounted one or more file systems from remote NFS server(s). (Mount type nfs).
The nfs-client-mounted file system works for a while, but after some time, any process on the client machine which is trying to access the nfs mount or get data or statistics from it might stall. The process is waiting for a response from the nfs client layer which is not coming. The NFS Server is still functioning fully, as other nfs clients are not necessarily effected.
Typically, when this occurs, the nfs client will log messages to /var/log/messages which state:
kernel: nfs: server <server-name> not responding, still trying ...
This message would normally imply that the nfs client is sending request but the nfs server is not answering. Historically, when this error occurs, the first thing to do would be to examine the TCP communication between this NFS client and the NFS server, and see whether that is breaking down during periods when this error is occuring. However, due to a recent bug in some 3.0.x kernels, this error can instead be given when the problem is actually internal to the NFS/RPC code on the NFS client system; not due to TCP or network communication problems.With this bug, the nfs client layer *believes* it is sending requests to the nfs server, but those requests are not really making it to the TCP layer or out onto the network.
There have been more than one issue identified and corrected for this symptom. To resolve the known cases, the recommendations are:
1. For a host running SLES 11 SP3, and acting as an NFS client, update the kernel to at least 3.0.101-0.21.
2. For a host running SLES 11 SP2 (which is now out of maintenance), and acting as an NFS client:
A. If a Long Term SupportPack Service (LTSS) contract is present, update the kernel to at least 3.0.101-0.7.19.1 from the SLES 11 SP2 LTSS maintenance channel.
B. If a LTSS is not present, update to kernel 3.0.101-0.7.17 from the regular maintenance channel. This contains all but one of the potential fixes for this symptom. If this kernel does not correct the problem being seen, the options are to upgrade to SP3 or obtain an LTSS contract.
If the kernel is already as new (or newer) than the fixed kernels listed in this TID, then do not assume that the issue being encountered is the one described in this document. Rather, investigate whether TCP communication is failing between the NFS client and NFS server. Communication failures can happen temporarily, and even on just one TCP connection at a time. So tests of "ping" or of various applications which use TCP connections may not give conclusive comparisons. Failure of all communication would explain NFS failure as well, but success of other communication will not prove that NFS's TCP communication is successful. Often, investigation of the specific NFS connection activity is required, via tcpdump.
On SLES 11 SP2, it might also be possible to avoid this symptom by back-reving the kernel to 3.0.74. This would likely resolve some cases of this symptom, but not others.
To the author's knowledge, this issue has only been reported by users of NFS v3. However, this may be a misleading coincidence, as the percentage of NFS v4 users is small compared to NFS v3. The code fix was made in sunrpc code, which is used by both NFS v3 and v4.
This Support Knowledgebase provides a valuable tool for NetIQ/Novell/SUSE customers and parties interested in our products and solutions to acquire information, ideas and learn from one another. Materials are provided for informational, personal or non-commercial use within your organization and are presented "AS IS" WITHOUT WARRANTY OF ANY KIND.