My Favorites

Close

Please to see your favorites.


iTCO_wdt does not accept Watchdog Timeout bigger 63 seconds

This document (7011426) is provided subject to the disclaimer at the end of this document.

Environment


SUSE Linux Enterprise High Availability Extension 11 Service Pack 1
SUSE Linux Enterprise High Availability Extension 11 Service Pack 2
SUSE Linux Enterprise Server 10 Service Pack 4
SUSE Linux Enterprise High Availability Extension 11

Situation

Setting up a sbd STONITH and using the iTCO_wdt Hardware watchdog seems to work ok. The sbd has the watchdog timeout set to 65 seconds as in this example

jupiter:~ # sbd -d /dev/disk/by-id/scsi-mywatchdogdevice dump
==Dumping header on disk /dev/disk/by-id/scsi-mywatchdogdevice
Header version     : 2
Number of slots    : 255
Sector size        : 512
Timeout (watchdog) : 65
Timeout (allocate) : 2
Timeout (loop)     : 1
Timeout (msgwait)  : 130

But the logfiles show entries like this on cluster start

Nov 22 12:33:32 jupiter sbd: [3845]: ERROR: WDIOC_SETTIMEOUT: Failed to set watchdog timer to 65 seconds.: Invalid argument
Nov 22 12:33:32 jupiter sbd: [3845]: CRIT: Please validate your watchdog configuration!
Nov 22 12:33:32 jupiter sbd: [3845]: CRIT: Choose a different watchdog driver or specify -T to silence this check if you are sure.

which indicates that the value of the Watchdog Timeout could not be passed to the Hardware Watchdog. This also means that the cluster is not protected via the Watchdog and this is, as the log file states, a critical issue. It should be resolved first and as soon as possible.

This applies to all Watchog Timeout Values bigger than 63 Seconds and the iTCO_wdt Hardware Watchdog

Resolution

As one solution is is possible to use a lower value for the Watchdog Timeout, any value smaller or equal to 63 Seconds should work and be fine. For example with

sbd -d /dev/disk/by-id/scsi-mywatchdogdevice -1 60 -4 120 create

and a restart of all cluster nodes would resolve the issue.

If it is really necessary to use a Watchdog Timeout of >63 Seconds then it should be checked whether there is any other Hardware Watchdog available.

As a last resort softdog could be used.

Cause

This is a hardware limitation from Version 1 of iTCO and can be found in the code in drivers/watchdog/iTCO_wdt.c

        if (((iTCO_wdt_private.iTCO_version == 2) && (tmrval > 0x3ff)) ||
            ((iTCO_wdt_private.iTCO_version == 1) && (tmrval > 0x03f)))


Disclaimer

This Support Knowledgebase provides a valuable tool for NetIQ/Novell/SUSE customers and parties interested in our products and solutions to acquire information, ideas and learn from one another. Materials are provided for informational, personal or non-commercial use within your organization and are presented "AS IS" WITHOUT WARRANTY OF ANY KIND.

  • Document ID:7011426
  • Creation Date:27-NOV-12
  • Modified Date:29-NOV-12
    • SUSESUSE Linux Enterprise Server

Did this document solve your problem? Provide Feedback