iTCO_wdt does not accept Watchdog Timeout bigger 63 seconds

This document (7011426) is provided subject to the disclaimer at the end of this document.

Environment


SUSE Linux Enterprise High Availability Extension 11 Service Pack 1
SUSE Linux Enterprise High Availability Extension 11 Service Pack 2
SUSE Linux Enterprise Server 10 Service Pack 4
SUSE Linux Enterprise High Availability Extension 11

Situation

Setting up a sbd STONITH and using the iTCO_wdt Hardware watchdog seems to work ok. The sbd has the watchdog timeout set to 65 seconds as in this example

jupiter:~ # sbd -d /dev/disk/by-id/scsi-mywatchdogdevice dump
==Dumping header on disk /dev/disk/by-id/scsi-mywatchdogdevice
Header version     : 2
Number of slots    : 255
Sector size        : 512
Timeout (watchdog) : 65
Timeout (allocate) : 2
Timeout (loop)     : 1
Timeout (msgwait)  : 130

But the logfiles show entries like this on cluster start

Nov 22 12:33:32 jupiter sbd: [3845]: ERROR: WDIOC_SETTIMEOUT: Failed to set watchdog timer to 65 seconds.: Invalid argument
Nov 22 12:33:32 jupiter sbd: [3845]: CRIT: Please validate your watchdog configuration!
Nov 22 12:33:32 jupiter sbd: [3845]: CRIT: Choose a different watchdog driver or specify -T to silence this check if you are sure.

which indicates that the value of the Watchdog Timeout could not be passed to the Hardware Watchdog. This also means that the cluster is not protected via the Watchdog and this is, as the log file states, a critical issue. It should be resolved first and as soon as possible.

This applies to all Watchog Timeout Values bigger than 63 Seconds and the iTCO_wdt Hardware Watchdog

Resolution

As one solution is is possible to use a lower value for the Watchdog Timeout, any value smaller or equal to 63 Seconds should work and be fine. For example with

sbd -d /dev/disk/by-id/scsi-mywatchdogdevice -1 60 -4 120 create

and a restart of all cluster nodes would resolve the issue.

If it is really necessary to use a Watchdog Timeout of >63 Seconds then it should be checked whether there is any other Hardware Watchdog available.

As a last resort softdog could be used.

Cause

This is a hardware limitation from Version 1 of iTCO and can be found in the code in drivers/watchdog/iTCO_wdt.c

        if (((iTCO_wdt_private.iTCO_version == 2) && (tmrval > 0x3ff)) ||
            ((iTCO_wdt_private.iTCO_version == 1) && (tmrval > 0x03f)))


Disclaimer

This Support Knowledgebase provides a valuable tool for SUSE customers and parties interested in our products and solutions to acquire information, ideas and learn from one another. Materials are provided for informational, personal or non-commercial use within your organization and are presented "AS IS" WITHOUT WARRANTY OF ANY KIND.

  • Document ID:7011426
  • Creation Date: 27-Nov-2012
  • Modified Date:03-Mar-2020
    • SUSE Linux Enterprise Server

< Back to Support Search

For questions or concerns with the SUSE Knowledgebase please contact: tidfeedback@suse.com

SUSE Support Forums

Get your questions answered by experienced Sys Ops or interact with other SUSE community experts.

Join Our Community

Support Resources

Learn how to get the most from the technical support you receive with your SUSE Subscription, Premium Support, Academic Program, or Partner Program.


SUSE Customer Support Quick Reference Guide SUSE Technical Support Handbook Update Advisories
Support FAQ

Open an Incident

Open an incident with SUSE Technical Support, manage your subscriptions, download patches, or manage user access.

Go to Customer Center