How to prevent Prometheus from scraping noisy kube-apiserver metrics

This document (000021191) is provided subject to the disclaimer at the end of this document.

Environment

Rancher 2.6.x, 2.7.x with cluster monitoring V2 enabled.

Situation

Prometheus memory utilization can increase depending on the number of metrics being ingested from the kube-apiserver instances and the size of the cluster workloads. Running this command from the Prometheus expression browser will display the top 10 scraped metrics.

topk(10, count by (__name__)({__name__=~".+"}))

Note:
The example below assumes that apiserver_request_duration_seconds_bucket is the metric that needs to be dropped.

Resolution

Go to the Cluster Explorer section of Rancher and click the required cluster.
Find Rancher monitoring from the Installed Apps section and choose the Edit & Upgrade option[1].
Leave the version unchanged and use the Edit YAML section to update the values for the chart. The values of the monitoring chart can be found in the upstream Rancher chart repo[2].
In the kubeApiserver configuration option in the values of the helm chart and add the drop config into the metricRelabelings section. The example config below will drop the apiserver_request_duration_seconds_bucket metric from being scraped. More information on metric relabelings can be found in the Prometheus documentation[3].

kubeApiserver:
  serviceMonitor
    metricRelabelings:
    - action: drop
      regex: apiserver_request_duration_seconds_bucket
      sourceLabels:
      - __name__

Once the config is added to the chart, click Update. After the update completes, restart the Prometheus statefulset for the changes to take effect.

kubectl rollout restart statefulset prometheus-rancher-monitoring-prometheus -n cattle-monitoring-system

Check the status of the statefulset and verify the pod is recreated.

kubectl rollout status statefulset prometheus-rancher-monitoring-prometheus -n cattle-monitoring-system

Use the Prometheus expression browser to verify the apiserver_request_duration_seconds_bucket metric is no longer being scraped.

Additional Information

References:

[1] https://ranchermanager.docs.rancher.com/pages-for-subheaders/monitoring-and-alerting
[2] https://github.com/rancher/charts
[3] https://prometheus.io/docs/prometheus/latest/configuration/configuration/#relabel_config

Disclaimer

This Support Knowledgebase provides a valuable tool for SUSE customers and parties interested in our products and solutions to acquire information, ideas and learn from one another. Materials are provided for informational, personal or non-commercial use within your organization and are presented "AS IS" WITHOUT WARRANTY OF ANY KIND.