Usage of hb_report for SLES HAE

This document (7007262) is provided subject to the disclaimer at the end of this document.

Environment

SUSE Linux Enterprise Server 15
SUSE Linux Enterprise Server 12
SUSE Linux Enterprise Server 11
SUSE Linux Enterprise Server 10

Situation

There was an incident within the cluster that needs to be investigated. SUSE Technical Support requested hb_report file, that is cluster report, for the analysis.

Resolution

The hb_report utility (on newer version of SLES this is a wrapper for crm report command) is an essential tool for finding challenges and issues in a SLES HAE Cluster. In a cluster context it is important to capture all log files and all configs of all cluster nodes at the time of an incident to be investigated. For this reason, other tools and ways are possible to use, but are not as efficient in a SLES HAE cluster context. 

For hb_report to work as intended, it should gather all information from all nodes. See 'Additional information' section for details.  

Before uploading your data to Support, always double-check if the final hb_report file contains subdirectories with cluster node names and their respective data

 

Collecting cluster report 


 
If the SSH connection between your nodes is already configured and working (see 'Additional information' if the connection needs to be configured), you may run the hb_report command on the cluster node of your choice. Running it only on one node is sufficient, because hb_report will collect all the logs and cluster configurations from all the cluster members
 
hb_report is also able to extract cluster history data from eg. rotated logs, that is it will inspect for example rotated /var/log/pacemaker/pacemaker.log-<date>.xz files and if logged events match defined time range, and it will copy the logged events into final hb_report file. Of course, if the rotated logs were already removed, it cannot collect any logged events
 

Example 1:  Collecting cluster report as root user 

 This example shows collecting of the report as root user. Assuming there was an incident to investigate on 11. 09.2022 16:45, the interesting data would be from this time and from some time above and before, to ensure we capture all information that might have led to this incident. 
 
The timeframe in question could be from 14. 10. 2022 00:00 to 14. 10. 2022 23:59.  It is also often helpful to force the resulting output filename to contain both the date and time it was generated.

The following is an example of such parameters on an hb_report
 
hb_report -f "2022/10/14 00:00" -t "2022/10/14 23:59" /tmp/hb_report-$(date +"%Y%m%d-%H%M")

With the syntax above, the resulting file is created with name in this format:

drwxrwsr-x+ 4 sfsc-dlm suse      14 Oct 14 13:09 hb_report-20221014-1348
 

Example 2:  Collecting cluster report example as non-root user with sudo 

 
To collect the report as non-root user with sudo (see sudo configuration in ‘Additional information’) add ‘-u <non-root user>’ option. An example: 
 
sudo hb_report -f "2022/10/14 00:00" -t "2022/10/14 23:59"  -u sadmin1 

 
An example to double-check the content of hb_report file: 

 
ls hb_report* 
# hb_report-Fri-14-Oct-2022.tar.bz2 
 

Checking whether pacemaker.log was collected for the time in question: 

 
tar --wildcards -xOjf ./hb_report-Fri-14-Oct-2022.tar.bz2 hb_report-Fri-14-Oct-2022/*/pacemaker.log | sed '1b;$b;d' 
Oct 14 00:05:55 oldhanaa1 pacemaker-controld  [31997] (crm_timer_popped)        info: Cluster Recheck Timer (I_PE_CALC) just popped (900000ms) 
Oct 14 23:50:56 oldhanaa1 pacemaker-controld  [31997] (do_state_transition)     notice: State transition S_TRANSITION_ENGINE -> S_IDLE | input=I_TE_SUCCESS cause=C_FSA_INTERNAL origin=notify_crmd 
 

Checking whether ha-log.txt (ha-log.txt is same as messages file) was collected for the time in question: 

 
tar --wildcards -xOjf ./hb_report-Fri-14-Oct-2022.tar.bz2 hb_report-Fri-14-Oct-2022/*/ha-log.txt | sed '1b;$b;d' 
2022-10-14T00:00:01.007277+02:00 oldhanaa2 CRON[10297]: (root) CMD ([ -x /usr/lib64/sa/sa1 ] && exec /usr/lib64/sa/sa1 1 1) 
2022-10-14T23:55:01.691802+02:00 oldhanaa2 CRON[21550]: (root) CMD ([ -x /usr/lib64/sa/sa2 ] && exec /usr/lib64/sa/sa2 -A) 


In the above output, we can see that we have ha-log.txt logged events only from one node, that is we are missing some data needed from other node for the analysis; thus a sysadmin has to restore messages file from the missing node for the time in question manually from a backup and provide it separately. 

Additional Information

If SSH connections between the cluster nodes which hb_report uses has not yet been configured, there are two options to do this. Depending on your environment requirements, hb_report tries to either gather data from other nodes either via SSH root access to other cluster nodes, or via defined user via sudo (see 'Running cluster reports without root access ' for details). 
  

Configuration to collect cluster report as root with root SSH access between cluster nodes 


 
Root SSH access between cluster nodes is configured by default if ha-cluster-bootstrap package was used for initial cluster deployment, or if YaST cluster module was used. 
 
If the cluster was setup manually or if SSH root access was removed or not working, it is best to setup SSH keys without password to enable the script to traverse the cluster without a sysadmin giving the root password three to four times, that is, for each and every node of the cluster. 

To setup SSH keys (we use RSA in this example) for this, the command to run as user root is: 

 
ssh-keygen -t rsa  

 
which will create the following two keys in the /root/.ssh/ directory: 

 
id_rsa 
id_rsa.pub
    

The public key has to be copied over to all remaining cluster nodes: 
 
ssh-copy-id OTHER_NODE 


and add this SSH key public part into local authorized keys file so hb_report from other nodes would also work: 
 
cat /tmp/id_rsa.pub >> /root/.ssh/authorized_keys  


After this it is possible for root to ssh without password from one server to another. This should be done for each and every member of the cluster. 
 
If root SSH access is too benevolent for your needs, either try running cluster reports gathering without root (as described below) or try to see sshd_config(5) man page for 'Match' block which could be used to restrict access for a particular user.  

   

Configuration to collect cluster report without root user 


 
General documentation for collecting cluster report without root user is available at 'Running cluster reports without root access'. 
 
This option uses SSH agent forwarding and sudo. SSH agent forwarding allows connections from an authentication agent (such as ssh-agent(1)), meaning the use of a sysadmin's local SSH keys to login to a final node via a jumphost (in this case the jumphost is the cluster node where cluster report is collected and final node would be any remaining cluster node). 
 
An example of sudoers(5) definition (in this case a user in ‘sysadmin’ user group who has access to all cluster nodes via SSH with his own SSH key, needs to collect cluster report as non-root user): 
 
Host_Alias CLUSTER = node1, node2 
Runas_Alias R = root 
Defaults!HA_ALLOWED env_keep+=SSH_AUTH_SOCK 
Cmnd_Alias HA_ALLOWED = /usr/sbin/hb_report *, /usr/sbin/crm report * 
%sysadmins CLUSTER = (R) NOPASSWD: HA_ALLOWED

 
This sudo(8) definition needs to be present on all cluster nodes; it allows the user to preserve SSH_AUTH_SOCK environment variable (which points to UNIX socket used by SSH to obtain the keys from the SSH agent) while running hb_report (or crm report) as root via sudo
 
The user wanting to collect cluster report without root account must ensure that SSH forwarding of connections from an authentication agent such as ssh-agent(1) is enabled, eg. with OpenSSH client 'ssh -A', with PuTTY ‘Allow agent forwarding’, is used while connecting to the node where cluster report collection will be run. 
 
  • https://documentation.suse.com/sle-ha/15-SP4/html/SLE-HA-all/app-crmreport-nonroot.html 
  • https://documentation.suse.com/sle-ha/15-SP4/html/SLE-HA-all/app-ha-troubleshooting.html#sec-ha-troubleshooting-misc 
  • https://www.suse.com/support/kb/doc/?id=000020662 

Disclaimer

This Support Knowledgebase provides a valuable tool for SUSE customers and parties interested in our products and solutions to acquire information, ideas and learn from one another. Materials are provided for informational, personal or non-commercial use within your organization and are presented "AS IS" WITHOUT WARRANTY OF ANY KIND.

  • Document ID:7007262
  • Creation Date: 25-Nov-2010
  • Modified Date:09-Dec-2022
    • SUSE Linux Enterprise High Availability Extension
    • SUSE Linux Enterprise Server

< Back to Support Search

For questions or concerns with the SUSE Knowledgebase please contact: tidfeedback[at]suse.com

SUSE Support Forums

Get your questions answered by experienced Sys Ops or interact with other SUSE community experts.

Join Our Community

Support Resources

Learn how to get the most from the technical support you receive with your SUSE Subscription, Premium Support, Academic Program, or Partner Program.


SUSE Customer Support Quick Reference Guide SUSE Technical Support Handbook Update Advisories
Support FAQ

Open an Incident

Open an incident with SUSE Technical Support, manage your subscriptions, download patches, or manage user access.

Go to Customer Center