Rancher log forwarding to an Elasticsearch endpoint stops functioning as a result of connection reload behaviour in Rancher v2.3, prior to v2.3.8, and v2.4, prior to v2.4.4

This document (000020044) is provided subject to the disclaimer at the end of this document.

Situation

Issue

In Rancher v2.0 - v2.2, v2.3 prior to v2.3.8, and v2.4 prior to v2.4.4, a previously functioning log forwarding configuration to an Elasticsearch instance could stop successfully forwarding logs, without any configuration change and whilst the Elasticsearch endpoint was still available. The logs of the rancher-logging-fluentd Pod(s) in the cattle-logging Namespace of the affected cluster, reveal log messages of the following format:

failed to flush the buffer, retry_time=0, next_retry_seconds=2019-07-24 07:07:31 +0000, chunk=58e67cdcd7d1406de13fe55a26fe6cad, error_class=Fluent::Plugin::ElasticSearchOutput::RecoverableRequestFailure error="could not push logs to ElasticSearch cluster ({:host=>elasticsearch.example.com, :port=>443, :scheme=>\"https\"}): connect_write timeout reached"

or

failed to flush the buffer. retry_time=10 next_retry_seconds=2019-07-24 07:07:31 +0000 chunk="58e67cdcd7d1406de13fe55a26fe6cad" error_class=Elasticsearch::Transport::Transport::Error error="Cannot get new connection from pool."

Pre-requisites

  • A Rancher v2.x instance, running Rancher v2.0 - v2.2, v2.3 prior to v2.3.8, or v2.4 prior to v2.4.4
  • A Rancher managed cluster with Rancher log forwarding configured to an Elasticsearch endpoint

Root cause

By default the fluent-plugin-elasticsearch fluentd plugin will attempt to reload the host list from elasticsearch after 10000 requests. This behaviour is a result of default functionality in the elasticsearch-ruby gem, as documented in the plugin's FAQ. This reload behaviour is not compatible with all elasticsearch environments, and failure of the reload results in the plugin failing to forward further log events.

Resolution

In Rancher v2.3, from v2.3.8, and Rancher v2.4, from v2.4.4, the Rancher log forwarding configuration for Elasticsearch endpoints was updated to include the option reload_connections false. This disables the default connection reload behaviour, preventing occurrences of this issue.

Disclaimer

This Support Knowledgebase provides a valuable tool for SUSE customers and parties interested in our products and solutions to acquire information, ideas and learn from one another. Materials are provided for informational, personal or non-commercial use within your organization and are presented "AS IS" WITHOUT WARRANTY OF ANY KIND.

  • Document ID:000020044
  • Creation Date: 06-May-2021
  • Modified Date:06-May-2021
    • SUSE Rancher

< Back to Support Search

For questions or concerns with the SUSE Knowledgebase please contact: tidfeedback[at]suse.com

SUSE Support Forums

Get your questions answered by experienced Sys Ops or interact with other SUSE community experts.

Join Our Community

Support Resources

Learn how to get the most from the technical support you receive with your SUSE Subscription, Premium Support, Academic Program, or Partner Program.


SUSE Customer Support Quick Reference Guide SUSE Technical Support Handbook Update Advisories
Support FAQ

Open an Incident

Open an incident with SUSE Technical Support, manage your subscriptions, download patches, or manage user access.

Go to Customer Center