Supportconfig gets stuck when executing the SES plugin

This document (000020532) is provided subject to the disclaimer at the end of this document.

Environment

SUSE Enterprise Storage 6

Situation

When running supportconfig to generate a supportconfig file for a SUSE Enterprise Storage node (SES), this takes a very long time / gets stuck when executing the SES plugin.

Resolution

There can be quite a few different reasons for this happening, one workaround can be to exclude the SES plugin and generate the supportconfig without the SES log information then additionally provide specific individual relevant SES log files.

To exclude a supportconfig plugin, in this case the SES plugin, use the -o option as this will toggle the specified keyword on or off. Since the pses plugin is enabled (on) by default, this will then disable it:

supportconfig -o pses

Additionally, to determine where exactly the plugin is having problems, note that supportconfig uses the following temp directory to store information needed to generate the supportconfig tar file:

/var/log/scc_`hostname`_`date`_`time`/

Using something like the following command can help to show the last Ceph related file being created / worked on by supportconfig which can provide clues as to what is causing it to be stuck:

ls -ltR /var/log/scc_`hostname`_`date`_`time`/ceph/ | more

Additionally, while it is stuck also checking in "/var/log/scc_`hostname`_`date`_`time`/plugin-ses.txt" may provide a hint on what exactly it is getting stuck on.

Cause

Two known causes can be:

1. Very large SES log files due to the cluster not being healthy for an extended period of time and log rotation not working / not configured / not configured optimally.
2. Large number of crash entries causing cluster health status commands to take a very long time to complete.

Additional Information

Specifically regarding the causes listed in the Cause section, for cause 1. verify the logrotate configuration, see the logrotate SLES 15 SP1 online documentation for details.

For cause 2. see information on the usage of "ceph crash" in the SES 6 online documentation. Alternatively consider removing all older crash entries by running "ceph crash prune 1", in this example, specifying 1 will keep all crashes not older than 1 day. If there are a large number of daemon crashes, this would indicate an underlying problem with the Ceph cluster, be sure that this has been addressed.

Disclaimer

This Support Knowledgebase provides a valuable tool for SUSE customers and parties interested in our products and solutions to acquire information, ideas and learn from one another. Materials are provided for informational, personal or non-commercial use within your organization and are presented "AS IS" WITHOUT WARRANTY OF ANY KIND.