How to troubleshoot Overlay Network Connectivity issues

This document (000020831) is provided subject to the disclaimer at the end of this document.

Situation

pod-to-pod communication not happening

Resolution

Pod-to-Pod communication should depend on multiple factors. Mainly network communication should be allowed in between the nodes. The following checkpoints help us trace the problem's root cause.

 
  • Check ports for their overlay are open between nodes (if they have multiple subnets/VLANs/DCs); testing from just one node to nodes in the other network should be good enough, for e.g., `nc -uvz <node IP> 8472` (if they’re using canal, change the port as needed).[https://rancher.com/docs/rancher/v2.6/en/installation/requirements/ports/#commonly-used-ports]
 
  • Check the DNS from a test pod with suitable tools (not busybox, it has nslookup issues), `rancherlabs/swiss-army-knife` is good for this. `dig <hostname> @<coredns pod IP>`, do this for all coredns pod IPs.
 
            -Use the same test pod to test their upstream nameservers (all 3, over a few retries), `dig <hostname>  @<upstream    ns IP>`

[https://docs.ranchermanager.rancher.io/v2.5/troubleshooting/other-troubleshooting-tips/dns]
 
                Note:  In an air-gap environment, Swiss-army-knife is not available. You can try a specific busy box image with network tools like busybox image v1.28.
 
  •   Run the overlay test mentioned in the Rancher documentation to test pod-to-pod communication.                                 Overlay network test steps test the pod to pod connectivity between the nodes  :https://docs.ranchermanager.rancher.io/v2.5/troubleshooting/other-troubleshooting-tips/networking#check-if-overlay-network-is-functioning-correctly.
 
              [Note: This overlay test performs the pod-to-pod communication using ICMP protocol, which means you will still see networking issues because TCP communication might be blocked even though the test passes. So you have to test with good network tools like NC and iperf.]
 
  •  Check the Infra VMS  knowns issues and overlay network ports are allowed at the switch level. 
                e.g., In case of Vmware vSphere version 6.7u2.
 
  1. Change the VXLAN port to 8472 (when NSX is not used) or 4789 (when NSX is used)
  2. Disable the VXLAN hardware offload feature on the VMXNET3 NIC (which recent Linux driver version enable by default.  [https://docs.vmware.com/en/VMware-vSphere/6.7/rn/esxi670-202111001.html -Refer PR 2766401,https://github.com/projectcalico/calico/issues/4727 ]


     

    Additional Information

    Reference Artiles& Links:
    https://docs.vmware.com/en/VMware-vSphere/6.7/rn/esxi670-202111001.html -Refer PR 2766401
    https://github.com/projectcalico/calico/issues/4727
     

    Disclaimer

    This Support Knowledgebase provides a valuable tool for SUSE customers and parties interested in our products and solutions to acquire information, ideas and learn from one another. Materials are provided for informational, personal or non-commercial use within your organization and are presented "AS IS" WITHOUT WARRANTY OF ANY KIND.

    • Document ID:000020831
    • Creation Date: 27-Oct-2022
    • Modified Date:28-Oct-2022
      • SUSE Rancher

    < Back to Support Search

    For questions or concerns with the SUSE Knowledgebase please contact: tidfeedback[at]suse.com

    SUSE Support Forums

    Get your questions answered by experienced Sys Ops or interact with other SUSE community experts.

    Join Our Community

    Support Resources

    Learn how to get the most from the technical support you receive with your SUSE Subscription, Premium Support, Academic Program, or Partner Program.


    SUSE Customer Support Quick Reference Guide SUSE Technical Support Handbook Update Advisories
    Support FAQ

    Open an Incident

    Open an incident with SUSE Technical Support, manage your subscriptions, download patches, or manage user access.

    Go to Customer Center