SUSE Linux Enterprise Server 12 Service Pack 3 (SLES 12 SP3)
A SLES 12 SP3 machine is acting as an NFS (Network File System) client to a NetApp NFS Server. The SLES NFS client has 5 NFS mounts which all point to the NetApp device, and a fair amount of work is being done.
From time to time, work being done on the NFS mounts bogs down and throughput is very low.
This particular problem was not seen with SLES 11, only after upgrading to SLES 12.
See the "cause" section for the reasons behind this solution:
To limit the number of outstanding RPC requests that can be present on a single NFS connection, there are two settings which can be useful. Either one might be enough to relieve the symptoms, but understanding them both may be helpful to implementing a solid strategy.
1. In /etc/sysctl.conf
may need to be set, when NetApp NFS Servers are involved. This value defaults to 65536, but some NetApp devices do not like seeing more than 128. This setting controls how large the slot table can grow. It will not start at the size specified here. In kernels present in SLES 12, the slot table is auto-tuned and the slot table grows as needed, up to this max.
2. When multiple mounts are done at one nfs client and point to the same NFS server, they will typically share one TCP connection (and will share other resources, as well). This means that one slot table may be servicing multiple NFS mounts. It is possible to force mounts to use separate connections (and therefore separate slot tables) with the mount parameter:
Where "n" is a non-zero number. Any mount which has a unique "n" value will have it's own connection. Any mounts which use the same "n" value will share a connection. Note, therefore, that "n" does not represent "how many mounts can share a connection". Rather, it is an arbitrary tag, and any mounts with the same tag will share a connection.
On any mounts where this setting is not used, it is assumed that each of those mounts (which point to the same NFS Server device) can share a connection.
Considering the above 2 methods together:
If the slot table is being limited to 128 in order to "protect" the NetApp device, then systems using multiple NFS mounts (which point to the same NetApp server) may need to use multiple TCP connections, so each mount (or different groups of mounts) can have their own slot tables. With low to medium load, or some mixture of high and low loads, 5 mounts might be able to perform well with 128 slots. But 5 very busy mounts might begin to suffer from this limitation, especially in cases where other bottlenecks are also effecting performance.
The details about NetApp (discussed below) come from a 3rd party, and are being presented "as is" without verification:
NetApp servers (at least some) may want to limit the number of outstanding RPC requests that can accumulate on one TCP connection. If it goes beyond 128, the NetApp device may consider a potential denial-of-service attack to be occurring, and may try to throttle that TCP connection by setting its TCP receive window to 0. This prevents further data from being accepted until the window is opened up again.
On Linux, the number of simultaneous outstanding RPC requests that can be issued on one TCP connection is controlled by the tcp slot table size, which is also known as "sunrpc.tcp_slot_table_entries". In older Linux kernels, such as those in SLES 11, this table size defaulted to 16, and if a change was desired, it had to be changed manually. However, in newer kernels in SLES 12, the size of the table is auto-tuned and will become larger as needed. If the need is serious enough to grow this table beyond 128, NetApp servers might decide to throttle the connection.
The NetApp practice of throttling the connection in this case is not necessarily viewed to be favorable by Linux NFS developers. If the tuning suggestions given in the "Resolution" section of this document are not considered helpful enough, administrators are encouraged to pursue a solution through NetApp configuration and/or NetApp support services.
This Support Knowledgebase provides a valuable tool for NetIQ/Novell/SUSE customers and parties interested in our products and solutions to acquire information, ideas and learn from one another. Materials are provided for informational, personal or non-commercial use within your organization and are presented "AS IS" WITHOUT WARRANTY OF ANY KIND.