Rancher log forwarding to an Elasticsearch endpoint stops functioning as a result of connection reload behaviour in Rancher v2.3, prior to v2.3.8, and v2.4, prior to v2.4.4
This document (000020044) is provided subject to the disclaimer at the end of this document.
Situation
Issue
In Rancher v2.0 - v2.2, v2.3 prior to v2.3.8, and v2.4 prior to v2.4.4, a previously functioning log forwarding configuration to an Elasticsearch instance could stop successfully forwarding logs, without any configuration change and whilst the Elasticsearch endpoint was still available. The logs of the rancher-logging-fluentd
Pod(s) in the cattle-logging
Namespace of the affected cluster, reveal log messages of the following format:
failed to flush the buffer, retry_time=0, next_retry_seconds=2019-07-24 07:07:31 +0000, chunk=58e67cdcd7d1406de13fe55a26fe6cad, error_class=Fluent::Plugin::ElasticSearchOutput::RecoverableRequestFailure error="could not push logs to ElasticSearch cluster ({:host=>elasticsearch.example.com, :port=>443, :scheme=>\"https\"}): connect_write timeout reached"
or
failed to flush the buffer. retry_time=10 next_retry_seconds=2019-07-24 07:07:31 +0000 chunk="58e67cdcd7d1406de13fe55a26fe6cad" error_class=Elasticsearch::Transport::Transport::Error error="Cannot get new connection from pool."
Pre-requisites
- A Rancher v2.x instance, running Rancher v2.0 - v2.2, v2.3 prior to v2.3.8, or v2.4 prior to v2.4.4
- A Rancher managed cluster with Rancher log forwarding configured to an Elasticsearch endpoint
Root cause
By default the fluent-plugin-elasticsearch
fluentd plugin will attempt to reload the host list from elasticsearch after 10000 requests. This behaviour is a result of default functionality in the elasticsearch-ruby gem, as documented in the plugin's FAQ. This reload behaviour is not compatible with all elasticsearch environments, and failure of the reload results in the plugin failing to forward further log events.
Resolution
In Rancher v2.3, from v2.3.8, and Rancher v2.4, from v2.4.4, the Rancher log forwarding configuration for Elasticsearch endpoints was updated to include the option reload_connections false
. This disables the default connection reload behaviour, preventing occurrences of this issue.
Disclaimer
This Support Knowledgebase provides a valuable tool for SUSE customers and parties interested in our products and solutions to acquire information, ideas and learn from one another. Materials are provided for informational, personal or non-commercial use within your organization and are presented "AS IS" WITHOUT WARRANTY OF ANY KIND.
- Document ID:000020044
- Creation Date: 06-May-2021
- Modified Date:06-May-2021
-
- SUSE Rancher
For questions or concerns with the SUSE Knowledgebase please contact: tidfeedback[at]suse.com