SUSE Support

Here When You Need Us

rsc_azure-events resource primitve fails with error: Unable to get instance info

This document (000021356) is provided subject to the disclaimer at the end of this document.

Environment

SUSE Linux Enterprise Server 15 SP4
SUSE Linux Enterprise Server 15 SP5
SUSE Linux Enterprise Server for SAP Applications 15 SP4
SUSE Linux Enterprise Server for SAP Applications 15 SP5

Situation

Microsoft, the hosting provider, applied a maintenance update to the host agent responsible for monitoring the VM located on the physical host node in Azure. This momentarily affected connectivity, leading to an error in /var/log/messages on the cluster nodes.
python3[2504]: 2024-01-29T02:21:11.121086Z ERROR ExtHandler ExtHandler Error fetching the goal state: [ProtocolError] Error fetching goal state: [ResourceGoneError] [HTTP Failed] [410: Gone] b'<?xml version="1.0" encoding="utf-8"?>\n<Error xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">\n    <Code>ResourceNotAvailable</Code>\n    <Message>The resource requested is no longer available. Please refresh your cache.</Message>\n    <Details></Details>\n</Error>'
python3[2504]: Traceback (most recent call last):
python3[2504]:   File "bin/WALinuxAgent-2.10.0.6-py3.9.egg/azurelinuxagent/common/protocol/wire.py", line 788, in update_goal_state
python3[2504]:   File "bin/WALinuxAgent-2.10.0.6-py3.9.egg/azurelinuxagent/common/utils/restutil.py", line 478, in http_request
...
...
python3[2504]:     raise ResourceGoneError(response_error)
python3[2504]: azurelinuxagent.common.exception.ResourceGoneError: [ResourceGoneError] [HTTP Failed] [410: Gone] b'<?xml version="1.0" encoding="utf-8"?>\n<Error xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">\n    <Code>ResourceNotAvailable</Code>\n    <Message>The resource requested is no longer available. Please refresh your cache.</Message>\n    <Details></Details>\n</Error>'
This error failed the cluster resource primitive "rsc_azure-events" with an associated error.
azure-events: WARNING: Failed to reach the server: Gone
azure-events: ERROR: getInstanceInfo: Unable to get instance info
pacemaker-execd[8993]:  notice: rsc_azure-events_monitor_10000[12431] error output [ ocf-exit-reason:getInstanceInfo: Unable to get instance info ]
pacemaker-controld[8996]:  notice: Result of monitor operation for rsc_azure-events on azlsapa6per01: error (getInstanceInfo: Unable to get instance info) 
After the resource failed to restart three times, it reached a complete failure, requiring manual intervention by the administrator to resolve the issue and perform cleanup.
pacemaker-schedulerd[8995]:  warning: Unexpected result (error: getInstanceInfo: Unable to get instance info) was recorded for monitor of rsc_azure-events:0 on azlsapa6per01 at Jan 29 10:21:18 2024 
pacemaker-schedulerd[8995]:  warning: cln_azure-events cannot run on azlsapa6per01 due to reaching migration threshold (clean up resource to allow again)

Resolution

To facilitate the completion of a host maintenance operation without triggering a complete resource failure, it is recommended to modify the resource primitive configuration. Specifically, adjusting the configuration to incorporate a 60-second delay before initiating the restart is suggested. This adjustment grants sufficient time for Azure agents to finalize their maintenance tasks. The modified resource primitive configuration should be set as follows:

primitive rsc_azure-events azure-events \
        op monitor interval=10s timeout=240s \
        op start timeout=10s interval=0s start-delay=60s\
        op stop timeout=10s interval=0s \
        meta failure-timeout=60s

Cause

A maintenance update was implemented on the host agent responsible for monitoring the virtual machine (VM) situated on the physical host node within Azure. The duration of this update exceeded the time required for the cluster resource to initiate a restart attempt.

Disclaimer

This Support Knowledgebase provides a valuable tool for SUSE customers and parties interested in our products and solutions to acquire information, ideas and learn from one another. Materials are provided for informational, personal or non-commercial use within your organization and are presented "AS IS" WITHOUT WARRANTY OF ANY KIND.

  • Document ID:000021356
  • Creation Date: 13-Feb-2024
  • Modified Date:19-Apr-2024
    • SUSE Linux Enterprise High Availability Extension
    • SUSE Linux Enterprise Server
    • SUSE Linux Enterprise Server for SAP Applications

< Back to Support Search

For questions or concerns with the SUSE Knowledgebase please contact: tidfeedback[at]suse.com

tick icon

SUSE Support Forums

Get your questions answered by experienced Sys Ops or interact with other SUSE community experts.

tick icon

Support Resources

Learn how to get the most from the technical support you receive with your SUSE Subscription, Premium Support, Academic Program, or Partner Program.

tick icon

Open an Incident

Open an incident with SUSE Technical Support, manage your subscriptions, download patches, or manage user access.