Pods return to the Initializing status even though containers in the pod are running

This document (000020002) is provided subject to the disclaimer at the end of this document.

Situation

Issue

After successfully starting pods return to the 'Initializing status'. When reviewed an init container is failing to start, even though other containers are running. If one of the running containers fails it will not be restarted due to the failed init container. This can be seen in the following example: Image showing one pod Initializing

Init container is failing to start, even though other containers are running: Image showing init container crashing

Workaround

Isssue background

This happens when the init container is removed from the host matchine. Because init containers have run to complettion, and are terminated, this can happen when a docker system prune or docker container prune is run on the node. When the kubelet sees that the init container in no longer exists it will try and rerun it. Depending on the init container operation this may fail on a pod that is already running (e.g linkerd).

Avoiding the issue

Where possible, manually pruning images on Kubernetes nodes should be avoided. The kubelet has a built in image cleanup mechanism to remove unused containers and images. Where it's not possible to avoid manual clean up, init containers that are stopped should not be removed. A list of init container IDs can be generated with the following command:

kubectl get pods --all-namespaces -o jsonpath='{range .items[*].status.initContainerStatuses[*]}{.containerID}{"\n"}{end}' | cut -d/ -f3

The below script can be used to generate a list of containers to clean on a remote node, e.g.

NODE_TO_CLEAN=<node_ip>
USER=<user>

INIT_CONTAINERS=$(kubectl get pods --all-namespaces -o jsonpath='{range .items[*].status.initContainerStatuses[*]}{.containerID}{"\n"}{end}' | cut -d/ -f3)
TERMED_PODS=$(ssh -o LogLevel=QUIET -t ${USER}@${NODE_TO_CLEAN} sudo docker ps -qa --filter status=exited --no-trunc | sed -e 's/\r//g')

CONTAINERS_TO_REMOVE=$(comm -23 <(echo $TERMED_PODS | sort) <(echo $INIT_CONTAINERS | sort) )
PASS_CONTAINERS=$(typeset -p CONTAINERS_TO_REMOVE)

ssh -o LogLevel=QUIET -t ${USER}@${NODE_TO_CLEAN} bash <<EOF
    $PASS_CONTAINERS
    sudo docker rm $(echo "\${CONTAINERS_TO_REMOVE}") && sudo docker image prune -af
EOF

Resolution

There is currently no long term resolution.

Disclaimer

This Support Knowledgebase provides a valuable tool for SUSE customers and parties interested in our products and solutions to acquire information, ideas and learn from one another. Materials are provided for informational, personal or non-commercial use within your organization and are presented "AS IS" WITHOUT WARRANTY OF ANY KIND.

  • Document ID:000020002
  • Creation Date: 06-May-2021
  • Modified Date:06-May-2021
    • SUSE Rancher

< Back to Support Search

For questions or concerns with the SUSE Knowledgebase please contact: tidfeedback@suse.com

SUSE Support Forums

Get your questions answered by experienced Sys Ops or interact with other SUSE community experts.

Join Our Community

Support Resources

Learn how to get the most from the technical support you receive with your SUSE Subscription, Premium Support, Academic Program, or Partner Program.


SUSE Customer Support Quick Reference Guide SUSE Technical Support Handbook Update Advisories
Support FAQ

Open an Incident

Open an incident with SUSE Technical Support, manage your subscriptions, download patches, or manage user access.

Go to Customer Center