SUSE Support

Here When You Need Us

rancher-logging-root-fluentd-0 pod keeps restarting continuously with exit code 137 even after increasing memory

This document (000021839) is provided subject to the disclaimer at the end of this document.

Environment

  • SUSE Rancher 2.9.x
  • Rancher-logging 104.1.x+

Situation

  • A cluster with rancher-logging installed causing rancher-logging-root-fluentd-0 pod to restart continuously with error code '137'. But the same issue persists even after increasing the memory significantly. 

  • The rancher-logging-root-fluentd-0 pod only shows below error:

    Last status: Exited with 137: Error, Started: Fri Feb 28, 2025 4:50:53 PM, Exited: Fri Feb 28, 2025 4:57:07 PM
  • Upon further investigation, rancher-logging-root-fluentbit pod shows below errors: 

    [2025/03/04 11:01:38] [error] [net] TCP connection failed: rancher-logging-root-fluentd.cattle-logging-system.svc.cluster.local:24240 (Connection refused)
    [2025/03/04 11:01:38] [error] [output:forward:forward.0] no upstream connections available
    [2025/03/04 11:01:38] [ warn] [engine] failed to flush chunk '1-1741004097.135890300.flb', retry in 320 seconds: task_id=147, input=tail.0 > output=forward.0 (out_id=0)

Resolution

  • Configure the output buffer to use the type 'file' instead of 'memory'. 
  • Below is an example output snippet for elasticsearch:
apiVersion: logging.banzaicloud.io/v1beta1
kind: Output
metadata:
  name: efk
  namespace: cattle-logging-system
spec:
  elasticsearch:
    buffer:
      flush_interval: 30s
      flush_mode: interval
      flush_thread_count: 4
      queued_chunks_limit_size: 300
      type: file                          <<========================
  • Furthermore, login to Rancher >> explore the desired cluster >> Apps >> Installed Apps >> Rancher-Logging >> Click on "Edit/Upgrade" and review if the 'Buffer_Chunk_Size' and 'Buffer_Max_Size' mentioned below can be tuned further with a value that best suits the cluster needs as per https://github.com/rancher/rancher-docs/issues/90
inputTail:
    Buffer_Chunk_Size: ''
    Buffer_Max_Size: ''
  • Observe that the pod rancher-logging-root-fluentd-0 does not restart anymore and logs are sent successfully.

Cause

  • By default when Fluent Bit processes data, it uses Memory as a primary and temporary place to store the records. There are scenarios where it would be ideal to have a persistent buffering mechanism based in the filesystem to provide aggregation and data safety capabilities.
  • Fluentbit can lead to these issues when destination is slow or the cluster is producing large volumes of data. 
  • It is important to understand the correct configuration in case of slow destinations or large backpressure. 
  • More information can be found here: https://docs.fluentbit.io/manual/administration/buffering-and-storage

Disclaimer

This Support Knowledgebase provides a valuable tool for SUSE customers and parties interested in our products and solutions to acquire information, ideas and learn from one another. Materials are provided for informational, personal or non-commercial use within your organization and are presented "AS IS" WITHOUT WARRANTY OF ANY KIND.

  • Document ID:000021839
  • Creation Date: 15-May-2025
  • Modified Date:22-May-2025
    • SUSE Rancher

< Back to Support Search

For questions or concerns with the SUSE Knowledgebase please contact: tidfeedback[at]suse.com

tick icon

SUSE Support Forums

Get your questions answered by experienced Sys Ops or interact with other SUSE community experts.

tick icon

Support Resources

Learn how to get the most from the technical support you receive with your SUSE Subscription, Premium Support, Academic Program, or Partner Program.

tick icon

Open an Incident

Open an incident with SUSE Technical Support, manage your subscriptions, download patches, or manage user access.