How to setup Nodelocal DNS cache on Rancher 2.x
This document (000020174) is provided subject to the disclaimer at the end of this document.
Situation
Why use Nodelocal DNS cache?
Like many applications in a containerised architecture, CoreDNS or kube-dns runs in a distributed fashion. In certain circumstances, DNS reliability and latency can be impacted with this approach. The causes of this relate notably to conntrack race conditions or exhaustion, cloud provider limits, and the unreliable nature of the UDP protocol.
A number of workarounds exist, however long term mitigation of these and other issues has resulted in a redesign of the Kubernetes DNS architecture, and the result being the Nodelocal DNS cache project.
Requirements
- A Kubernetes cluster of v1.15 or greater created by Rancher v2.x or RKE
- A Linux cluster, Windows is currently not supported
- Access to the cluster
Resolution
Installing
There are two installation approaches, both approaches should be non-invasive, pods that are currently running will not be modified. The DNS configuration will take effect for pods started after the install is complete.
RKE1: Using a Rancher version after v2.4.x, or RKE version after v1.1.0
Update the cluster using 'Edit as YAML' in the Rancher UI. With RKE, edit the cluster.yaml file instead.
Note: Updating the cluster using the below will create the node-local-dns
Daemonset, and restart the kubelet
container on each node.
As in the documentation, update or add the dns.nodelocal.ip_address
field using the following as an example:
dns:
[..]
nodelocal:
ip_address: "169.254.20.10"
New pods created after the change will configure the node-local-dns link-local address as the nameserver in /etc/resolv.conf
.
Note: No further action is needed to use node-local-dns (as in the option A/B below), the changes to /etc/resolv.conf
will take effect for pods started from this point onwards.
RKE1: Using a Rancher version before v2.4.x, or RKE version before v1.1.0
Installing the YAML manifest by navigating to the cluster, and clicking the Launch kubectl
button in the Rancher UI. This command can also be run from a terminal where a kubeconfig for the cluster is currently configured.
Environment variables are replaced before applying the manifest, one assumption is that the cluster service discovery domain name is cluster.local
(default), adjust the command if needed.
curl -sL https://raw.githubusercontent.com/kubernetes/kubernetes/master/cluster/addons/dns/nodelocaldns/nodelocaldns.yaml \
| sed -e 's/__PILLAR__DNS__DOMAIN__/cluster.local/g' \
| sed -e "s/__PILLAR__DNS__SERVER__/$(kubectl get service --namespace kube-system kube-dns -o jsonpath='{.spec.clusterIP}')/g" \
| sed -e 's/__PILLAR__LOCAL__DNS__/169.254.20.10/g' \
| kubectl apply -f -
Ensure the node-local-dns
pods start successfully, a pod should start on each control plane and worker node.
kubectl get -n kube-system pod -l k8s-app=node-local-dns
When deploying the YAML manifest there are two options to configure the cluster to use the new node-local-dns configuration, please choose from option A or B below.
Option A - Configure the Kubelet
By default, the Kubelet will configure the /etc/resolv.conf
of pods with the kube-dns
Service ClusterIP as the nameserver. Configuring all new pods to query node-local-dns will require updating the Kubelet arguments.
Note: Updating the arguments using the below will restart the kubelet
container on each node.
- If the cluster was provisioned by Rancher, edit the cluster in the UI and click on
Edit as YAML
.
- If the cluster was provisioned by RKE, edit the cluster.yml file directly.
Update the kubelet
service with the cluster-dns
argument and IP Address. Click save, or run an rke up
to put this change into effect.
services:
kubelet:
extra_args:
cluster-dns: "169.254.20.10"
New pods created after the change will configure the node-local-dns link-local address as the nameserver in /etc/resolv.conf
.
Option B - Configure Workloads
Alternatively, node-local-dns can be configured on a per-workload basis by updating the workload with a dnsConfig
and dnsConfig
.
-
If using the Rancher UI, edit the workload, navigate to Show advanced options > Networking > DNS Nameservers and add
169.254.20.10
. Additionally, adjust the DNS Policy toNone
.
- If configuring by YAML, patch in the following to the pod spec to adjust the
dnsPolicy
anddnsConfig
:
spec:
dnsPolicy: "None"dnsConfig: nameservers: - 169.254.20.10
RKE2: Using any RKE2 Kubernetes version
Update the default HelmChart for CoreDNS, the nodelocal.enabled: true value will install node-local-dns in the cluster. Please see the documentation here for more details.
Testing
Once installed, start a new pod to test DNS queries.
kubectl run --restart=Never --rm -it --image=tutum/dnsutils dns-test -- dig google.com
Unless Option B was used to install node-local-dns, you should expect to see 169.254.20.10
as the server, and a successful answer to the query.
To verify a pod or container is using node-local-dns by checking the /etc/resolv.conf
file, for example:
kubectl exec -it <pod name> -- grep nameserver /etc/resolv.conf
nameserver 169.254.20.10
Removing Nodelocal DNS cache
To remove from a cluster, the reverse steps are needed.
Note: Pods created with the node-local-dns nameserver in /etc/resolv.conf
will need to be started again to use the kube-dns service as a nameserver again.
Using a Rancher version after v2.4.x, or RKE version after v1.1.0
Remove the dns.nodelocal
configuration from the cluster YAML
Using a Rancher version before v2.4.x, or RKE version before v1.1.0
-
Remove the Kubelet configuration (Option A), or remove the dnsConfig from workloads (Option B).
-
If Option A was taken, delete any pods in workloads that were started since the Kubelet configuration change so that they are started with the kube-dns ClusterIP again.
-
Remove the node-local-dns objects with the following command:
curl -sL https://raw.githubusercontent.com/kubernetes/kubernetes/master/cluster/addons/dns/nodelocaldns/nodelocaldns.yaml | kubectl delete -f -
Note: it is important to perform these steps in order, and only complete step 3 once the pods using node-local-dns have been started with the kube-dns ClusterIP configured in /etc/resolv.conf
again.
Additional Information
Troubleshooting
Node-local-dns will perform external lookups on behalf of pods, this lookup occurs from the node-local-dns DaemonSet pod running on the same node as the pod.For internal lookups, CoreDNS will be used, node-local-dns will cache successful queries (30s), and negative queries (5s) by default. For an architecture overview please see the diagram here.
In no specific order, the following can help understand a DNS issue further.
Check all kube-dns and node-local-dns objects
Ensure there are no obvious issues with scheduling CoreDNS and node-local-dns pods in the cluster.
kubectl get all -n kube-system -l k8s-app=node-local-dns
kubectl get all -n kube-system -l k8s-app=kube-dns
All node-local-dns and kube-dns pods should be ready and running, the kube-dns Service should exist. Check the events if needed to locate any warning or failed event messages.
kubectl describe ds -n kube-system -l k8s-app=node-local-dns
kubectl describe rs -n kube-system -l k8s-app=kube-dns
Check the logs and ConfigMap of kube-dns and node-local-dns pods
kubectl logs -n kube-system -l k8s-app=kube-dns
kubectl logs -n kube-system -l k8s-app=node-local-dns
kubectl get configmap -n kube-system coredns -o yaml
kubectl get configmap -n kube-system node-local-dns -o yaml
Enable logging and perform a DNS test
Note, query logging can increase the log output from CoreDNS, enabling this temporarily while investigating is suggested.
- Enable query logging to understand the pattern from workloads
- Run a DaemonSet to perform queries from a pod running on each node in the cluster
Ask questions to further eliminate the issue
- Is it only DNS that is affected, or is all connectivity affected?
- Are internal, external or all DNS queries failing?
- Are all nodes and workloads experiencing the issue, or a specific node or workload? * Nodes use the upstream DNS configured in
/etc/resolv.conf
, queries failing from a node could indicate the issue is with upstream DNS - What is the error reported by applications? * If logs are aggregated, queries can be performed on the logs to identify timelines and impact
- Is the issue intermittent or constantly occuring? * If the issue is intermittent, configure monitoring or a loop to identify when the issue occurs, when it does - are internal, external or all queries affected?
Disclaimer
This Support Knowledgebase provides a valuable tool for SUSE customers and parties interested in our products and solutions to acquire information, ideas and learn from one another. Materials are provided for informational, personal or non-commercial use within your organization and are presented "AS IS" WITHOUT WARRANTY OF ANY KIND.
- Document ID:000020174
- Creation Date: 31-Oct-2021
- Modified Date:27-Jul-2022
-
- SUSE Rancher
For questions or concerns with the SUSE Knowledgebase please contact: tidfeedback[at]suse.com