Adding missing cAdvisor labels to k8s 1.24+

This document (000020951) is provided subject to the disclaimer at the end of this document.

Environment

RKE1 on k8s 1.24+, Rancher 2.6.8+ or 2.7.0+

Situation

 Monitoring V2 dashboards are not displaying CPU/Memory usage.

Resolution

Rancher Engineering team has put in place a workaround that consists of the following:
  • Creating a cAdvisor standalone instance
  • Creating ServiceMonitor with some relabeling
  • Disabling kubelet.serviceMonitor.cAdvisor in the rancher-monitoring chart
  1. Example standalone cAdvisor instance and ServiceMonitor YAML:
apiVersion: v1
kind: ServiceAccount
metadata:
  labels:
    app: cadvisor
  name: cadvisor
  namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  labels:
    app: cadvisor
  name: cadvisor
rules:
- apiGroups:
  - policy
  resourceNames:
  - cadvisor
  resources:
  - podsecuritypolicies
  verbs:
  - use
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  labels:
    app: cadvisor
  name: cadvisor
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: cadvisor
subjects:
- kind: ServiceAccount
  name: cadvisor
  namespace: kube-system
---
apiVersion: apps/v1
kind: DaemonSet
metadata:
  annotations:
    seccomp.security.alpha.kubernetes.io/pod: docker/default
  labels:
    app: cadvisor
  name: cadvisor
  namespace: kube-system
spec:
  selector:
    matchLabels:
      app: cadvisor
      name: cadvisor
  template:
    metadata:
      annotations:
        scheduler.alpha.kubernetes.io/critical-pod: ""
      labels:
        app: cadvisor
        name: cadvisor
    spec:
      automountServiceAccountToken: false
      containers:
      - args:
        - --housekeeping_interval=10s
        - --max_housekeeping_interval=15s
        - --event_storage_event_limit=default=0
        - --event_storage_age_limit=default=0
        - --enable_metrics=app,cpu,disk,diskIO,memory,network,process
        - --docker_only
        - --store_container_labels=false
        - --whitelisted_container_labels=io.kubernetes.container.name,io.kubernetes.pod.name,io.kubernetes.pod.namespace
        image: gcr.io/cadvisor/cadvisor:v0.45.0
        name: cadvisor
        ports:
        - containerPort: 8080
          name: http
          protocol: TCP
        resources:
          limits:
            cpu: 800m
            memory: 2000Mi
          requests:
            cpu: 400m
            memory: 400Mi
        volumeMounts:
        - mountPath: /rootfs
          name: rootfs
          readOnly: true
        - mountPath: /var/run
          name: var-run
          readOnly: true
        - mountPath: /sys
          name: sys
          readOnly: true
        - mountPath: /var/lib/docker
          name: docker
          readOnly: true
        - mountPath: /dev/disk
          name: disk
          readOnly: true
      priorityClassName: system-node-critical
      serviceAccountName: cadvisor
      terminationGracePeriodSeconds: 30
      tolerations:
      - key: node-role.kubernetes.io/controlplane
        value: "true"
        effect: NoSchedule
      - key: node-role.kubernetes.io/etcd
        value: "true"
        effect: NoExecute
      volumes:
      - hostPath:
          path: /
        name: rootfs
      - hostPath:
          path: /var/run
        name: var-run
      - hostPath:
          path: /sys
        name: sys
      - hostPath:
          path: /var/lib/docker
        name: docker
      - hostPath:
          path: /dev/disk
        name: disk
---
apiVersion: policy/v1beta1
kind: PodSecurityPolicy
metadata:
  labels:
    app: cadvisor
  name: cadvisor
  namespace: kube-system
spec:
  allowedHostPaths:
  - pathPrefix: /
  - pathPrefix: /var/run
  - pathPrefix: /sys
  - pathPrefix: /var/lib/docker
  - pathPrefix: /dev/disk
  fsGroup:
    rule: RunAsAny
  runAsUser:
    rule: RunAsAny
  seLinux:
    rule: RunAsAny
  supplementalGroups:
    rule: RunAsAny
  volumes:
  - '*'
---
apiVersion: v1
kind: Service
metadata:
  name: cadvisor
  labels:
    app: cadvisor
  namespace: kube-system
spec:
  selector:
    app: cadvisor
  ports:
  - name: cadvisor
    port: 8080
    protocol: TCP
    targetPort: 8080
---
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  labels:
    app: cadvisor
  name: cadvisor
  namespace: kube-system
spec:
  endpoints:
  - metricRelabelings:
    - sourceLabels:
      - container_label_io_kubernetes_pod_name
      targetLabel: pod
    - sourceLabels:
      - container_label_io_kubernetes_container_name
      targetLabel: container
    - sourceLabels:
      - container_label_io_kubernetes_pod_namespace
      targetLabel: namespace
    - action: labeldrop
      regex: container_label_io_kubernetes_pod_name
    - action: labeldrop
      regex: container_label_io_kubernetes_container_name
    - action: labeldrop
      regex: container_label_io_kubernetes_pod_namespace
    port: cadvisor
    relabelings:
    - sourceLabels:
      - __meta_kubernetes_pod_node_name
      targetLabel: node
    - sourceLabels:
      - __metrics_path__
      targetLabel: metrics_path
      replacement: /metrics/cadvisor
    - sourceLabels:
      - job
      targetLabel: job
      replacement: kubelet
  namespaceSelector:
    matchNames:
    - kube-system
  selector:
    matchLabels:
      app: cadvisor
  1. On the Rancher Monitoring chart, change the kubelet.serviceMonitor.cAdvisor to false
kubelet:
  serviceMonitor:
    cAdvisor: false

Cause

This is caused by the changes on k8s 1.24, by the removal of the Dockershim

Status

Reported to Engineering

Disclaimer

This Support Knowledgebase provides a valuable tool for SUSE customers and parties interested in our products and solutions to acquire information, ideas and learn from one another. Materials are provided for informational, personal or non-commercial use within your organization and are presented "AS IS" WITHOUT WARRANTY OF ANY KIND.

  • Document ID:000020951
  • Creation Date: 24-Jan-2023
  • Modified Date:25-Jan-2023
    • SUSE Rancher

< Back to Support Search

For questions or concerns with the SUSE Knowledgebase please contact: tidfeedback[at]suse.com

SUSE Support Forums

Get your questions answered by experienced Sys Ops or interact with other SUSE community experts.

Join Our Community

Support Resources

Learn how to get the most from the technical support you receive with your SUSE Subscription, Premium Support, Academic Program, or Partner Program.


SUSE Customer Support Quick Reference Guide SUSE Technical Support Handbook Update Advisories
Support FAQ

Open an Incident

Open an incident with SUSE Technical Support, manage your subscriptions, download patches, or manage user access.

Go to Customer Center