Long client timeouts when failing over the NFS Ganesha IP resource

This document (7023651) is provided subject to the disclaimer at the end of this document.

Environment

SUSE Enterprise Storage 5
SUSE Linux Enterprise High Availability Extension 12

Situation

When configuring NFS Ganesha with the CephFS storage back-end and the high availability Active-Passive configuration as described in the SES 5 Deployment Guide and then failing over the IP resource, this results in a more than two minute outage on the clients.

Resolution

Some delay in client functionality is necessary after a NFS Server cluster failover, because the NFS Server has to give time for clients to reclaim old locks and leases before any client can request new access. This "grace time" (and other factors) can cause NFS clients to stall for a few minutes after a NFS Server failover.

A thorough technical analysis may be needed to determine the exact reasons for various delays in any given cluster failover sequence. There is usually more than one factor involved.

Because the grace and lease times mentioned above can be altered via configuration, many cluster administrators are tempted to lower these timers in order to speed up failover. This should be done with caution, however, because normal functionality (even during times when no failover is in progress) can be broken if these values are set too low.

If it is desired to lower these values for this purpose, please adjust the "/etc/ganesha/ganesha.conf" file and specify smaller values for the "Lease_Lifetime" and "Grace_Period":

However, to protect against broken functionality, it is strongly cautioned to keep these values at least at 30 seconds. However, higher values are recommended for more safety. The defaults are 60 and 90 seconds respectively. The example below lowers then to 40 and 50 respectively. Note that this block needs to be outside of the EXPORT block:

NFSv4 {
Lease_Lifetime = 40;
Grace_Period = 50;
}

- Restart the NFS Ganesha service for the changes to be in effect by migrating the NFS IP resource e.g.:

# crm resource migrate ganesha-ip <insert_node_name>

NOTE: See the Additional Information section on how to prevent the configuration file from being overwritten via DeepSea and / or openATTIC.

Cause

Some of the delay is due to the default NFS lease lifetime and Grace period settings. Other portions of the delay may be due to other parts of the cluster configuration.

Additional Information

To verify the change is in effect, look for the following entries in the "/var/log/ganesha/ganesha.log" file after a migration of the resource, which should show the configured grace value. In this case using the above example, 30 seconds should be logged:

...

17/01/2019 10:26:29 : epoch 5c404a45 : node : ganesha.nfsd-xxxxxx[main] nfs4_start_grace :STATE :EVENT :NFS Server Now IN GRACE, duration 50

...

To keep the default behaviour and still allow using openATTIC to configure Ganesha exports but also still keep any custom settings, adjust the "/etc/sysconfig/nfs-ganesha" file on the Ganesha nodes as follows:

OPTIONS="-L /var/log/ganesha/ganesha.log -f /etc/ganesha/main.conf -N NIV_EVENT"
EPOCH_EXEC="/bin/true"

Now create the "/etc/ganesha/main.conf" file with the following content:

%include ganesha.conf

NFSv4 {
Lease_Lifetime = 40;
Grace_Period = 50;
}

The above will then include the changes made to "/etc/ganesha/ganesha.conf" via openATTIC and any custom settings from "/etc/ganesha/main.conf".

NOTE: With the above method, it will be needed to verify the modified "/etc/sysconfig/nfs-ganesha" file after applying updates / patching the servers.

Alternatively to prevent openATTIC from overwriting changes to the modified "/etc/ganesha/ganesha.conf" file, thus complete manual configuration of the configuration file(s) are required and openATTIC can not be used to create exports, see the "Customizing the default Configuration" section in the online Deployment Guide documentation.

Disclaimer

This Support Knowledgebase provides a valuable tool for SUSE customers and parties interested in our products and solutions to acquire information, ideas and learn from one another. Materials are provided for informational, personal or non-commercial use within your organization and are presented "AS IS" WITHOUT WARRANTY OF ANY KIND.