Cluster settings that affect the number of actions or jobs that can be executed in parallel

This document (7024060) is provided subject to the disclaimer at the end of this document.

Environment

SUSE Linux Enterprise High Availability Extension 12
SUSE Linux Enterprise High Availability Extension 11 Service Pack 4

Situation

What are cluster settings that might affect the number of actions or jobs that can be executed in parallel in a cluster.
Jobs or actions might include "start", "stop", "monitor" type operations for each resource in the cluster.

Resolution

1. LRMD_MAX_CHILDREN=
/etc/sysconfig/pacmaker
  LRMD_MAX_CHILDREN=4
Details:  This option has been deprecated in favor of node-action-limit but if set will still affect the number of in-flight actions that will run on a cluster node. This is for backward compatibility.
Action:  You can comment this out and use "node-action-limit" instead.

2. node-action-limit=   -->  Cluster property -->  cib-bootstrap-option -->node-action-limit=
Details: This is a per node limit. This is the number of in-flight actions that run on a local cluster node.
** It defaults to 2x CPU cores.

3. batch-limit=     --> Cluster property --> cib-bootstrap-option -->  batch-limit=
Details: This is a cluster wide limit for number of actions.
** The number of jobs that the Transition Engine (TE) is allowed to execute in parallel. The TE is the logic in pacemaker’s CRMd that executes the actions determined by the Policy Engine (PE). The "correct" value will depend on the speed and load of your network and cluster nodes.

Cluster Resource Manager (CRM) logic:
1) Check the number of in-flight actions have reached the cluster-wide limit (batch-limit).
   * If so, hold it.
   * If not, go to step 2)
2) Check the number of in-flight actions on that node has reached the per-node limit (node-action-limit).
  * If so , hold it.
  * If not, issue it.

CRM also takes "CPU Load" into consideration when scheduling actions.
If crmd detects a high load (default 80% of (2 x (number of CPU's) then it will log a message similar to this and delay scheduling actions even if batch-limit and node-action-limit haven't been reached.
crmd[2034]:   notice: High CPU load detected: 18.410000

Cause

The purpose of throttling or limiting the number of parallel actions is to keep Pacemaker from overloading the nodes such that actions might start timing out, causing unnecessary failures and need for recovery operations.

Disclaimer

This Support Knowledgebase provides a valuable tool for SUSE customers and parties interested in our products and solutions to acquire information, ideas and learn from one another. Materials are provided for informational, personal or non-commercial use within your organization and are presented "AS IS" WITHOUT WARRANTY OF ANY KIND.

  • Document ID:7024060
  • Creation Date: 13-Aug-2019
  • Modified Date:03-Mar-2020
    • SUSE Linux Enterprise High Availability Extension

< Back to Support Search

For questions or concerns with the SUSE Knowledgebase please contact: tidfeedback@suse.com

SUSE Support Forums

Get your questions answered by experienced Sys Ops or interact with other SUSE community experts.

Join Our Community

Support Resources

Learn how to get the most from the technical support you receive with your SUSE Subscription, Premium Support, Academic Program, or Partner Program.


SUSE Customer Support Quick Reference Guide SUSE Technical Support Handbook Update Advisories
Support FAQ

Open an Incident

Open an incident with SUSE Technical Support, manage your subscriptions, download patches, or manage user access.

Go to Customer Center