SLES 10 SP2 networking under Xen, troubleshooting and recommendations
This guide seeks to address common networking issues with SLES 10 SP2. Between SLES 10 SP1 and SLES 10 SP2, the networking configuration was redone to enable request enterprise features.
This guide is targeted at new and intermediate Xen users. Advanced users may find the information in this guide to be somewhat-lacking or redundant. Then again, most advanced users will not be reading this document.
With SLES 10 SP2 the decision was made to follow the upstream model of networking. There were also some significant networking backend changes.
Netloops are gone
Under SLES 10 SP1 there was the concept of a netloop, which is basically a software pair of virtual Ethernet NICs connected via a software cross over cable. It had a few limitation and was considered to be slow by some. With SLES 10 SP2, netloops are gone and with it the need to configure more than 8 devices.
ethX is now a Bridge
Under SLES 10 SP2, ethX is now a bridge and an NIC; pethX is still present. However, since ethX is a bridge and an interface, you will notice that the xenbrX bridges are gone.
Recommended: Simplified Network Bridging
On SLES 10 SP2, the network load order has changed. The new load order is first bonded interfaces, VLAN, tunnel networks and finally bridged networks. While YaST does not allow you to set up bridged networks, this new load order allows you to simplify network configurations. If you so choose, you can now configure networking to be the same in both the Xen and the non-Xen kernel (i.e. bridges are present in both kernels).
YaST can create the bridges for you, however, creating them by hand may be faster and easier. The by-hand configuration files are easier in the respect that fine tuning of the bridges require you to use the configuration scripts. Personally, I prefer the use of the manual configuration simply because YaST does not support editing bridges when using the Xen kernel and YaST may not acknowledge the bridges if you do any fine tuning of the configuration file.
There are considerable advantages to configuring networking this way. The most obvious advantage is that the networking is setup at a lower level; the regular Xen networking is done almost at the end of the boot process. Furthermore, the Xen scripts completely rip out the existing network and then build it again using. This method accomplishes the same thing, without having to RIP out networking components. Finally, since the networking configuration is done when networking is brought up, programs and init scripts loaded after the network will be guaranteed to have the devices that they want. Services like DHCP and Heartbeat can benefit immensely from a configuration done this way.
- Boot into the non-Xen kernel
- Go to /etc/sysconfig/network
- Backup the files if you want to. Otherwise, any mistakes can be fixed by using YaST
- For each physical device, there is an “ifcfg-eth-…” file. Open each file for editing
- Remove the lines reading “BOOTPROTO”, “IPADDR”, “NETMASK”, and “NETWORK”. Add “BRIDGE=SWITCH” ot the file. The following is an example:
BRIDGE=SWITCH BROADCAST='' ETHTOOL_OPTIONS='' NAME='Intel PRO/1000 GT Desktop Adapter' NETMASK='' NETWORK='' REMOTE_IPADDR='' STARTMODE='auto' UNIQUE='JNkJ.49XPvc+GN1h3' USERCONTROL='no'
- Save and close the file
- Create a new file for each bridge. The naming convention is “ifcfg-” followed by the name you want. For example, “ifcfg-br0” will create a device named “br0”. To create “xenbr0” you would name the file “ifcfg-xenbr0”. Populate the file with the values below. Replace the values for BRIDGE_PORTS with the correct Ethernet device name. VLANS and tunnel devices are eligible. If you do not put an IP address in for the IPADDR value then you can create a bridge that the Dom0 doesn’t have access to (this is useful for DomU’s that are designed for a DMZ like mail or web-servers.)
DEVICE=switch ONBOOT=YES TYPE=Bridge IPADDR=10.0.10.15 NETMASK=255.255.255.0 NETWORK= BROADCAST= STARTMODE=auto USERCONTROL=no BRIDGE='yes' BRIDGE_PORTS='eth0' #leave this option blank for a private network between host and DomU's BRIDGE_AGEINGTIME='300' BRIDGE_FORWARDDELAY='0' BRIDGE_HELLOTIME='2' BRIDGE_MAXAGE='20' BRIDGE_PATHCOSTS='19' BRIDGE_PORTPRIORITIES= BRIDGE_PRIORITY= BRIDGE_STP='on'
- Save and close the file
- Test the configurations
- If the network configuration was successful, then the output of “ifconfig” should look like this
br0 Link encap:Ethernet HWaddr 00:1B:21:0F:84:00 inet addr:10.0.10.15 Bcast:10.0.10.255 Mask:255.255.255.0 inet6 addr: fe80::21b:21ff:fe0f:8400/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:46271419 errors:0 dropped:0 overruns:0 frame:0 TX packets:21236676 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:43393110460 (41382.8 Mb) TX bytes:12426640835 (11850.9 Mb) br1 Link encap:Ethernet HWaddr 00:A0:C9:84:70:9C inet6 addr: fe80::2a0:c9ff:fe84:709c/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:9297077 errors:0 dropped:0 overruns:0 frame:0 TX packets:6 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:722740368 (689.2 Mb) TX bytes:468 (468.0 b) eth0 Link encap:Ethernet HWaddr 00:1B:21:0F:84:00 inet6 addr: fe80::21b:21ff:fe0f:8400/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:46627087 errors:0 dropped:0 overruns:0 frame:0 TX packets:22811639 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:44293485060 (42241.5 Mb) TX bytes:12623872584 (12039.0 Mb) Base address:0xec00 Memory:dffe0000-e0000000 eth1 Link encap:Ethernet HWaddr 00:A0:C9:84:70:9C inet6 addr: fe80::2a0:c9ff:fe84:709c/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:9642149 errors:0 dropped:0 overruns:0 frame:0 TX packets:17 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:881866945 (841.0 Mb) TX bytes:1236 (1.2 Kb) lo Link encap:Local Loopback inet addr:127.0.0.1 Mask:255.0.0.0 inet6 addr: ::1/128 Scope:Host UP LOOPBACK RUNNING MTU:16436 Metric:1 RX packets:957153 errors:0 dropped:0 overruns:0 frame:0 TX packets:957153 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:4506435510 (4297.6 Mb) TX bytes:4506435510 (4297.6 Mb)
- Comment out the (network-script network-bridge) in /etc/xen/xend-config.sxp file.
- Modify any other services that listen based on simple device names like Heartbeat to use the new, permanent bridges
- Reboot and test to make sure that it will work
Troubleshooting: Routing Difficulties
A far too common problem that is encountered is routing issues. The biggest problem seen when scripts boot up when the configuration scripts are brought up the network routes some how get mangled. Usually this is seen on systems with more than one NIC.
The output of “route -n” may show something similar:
Kernel IP routing table Destination Gateway Genmask Flags Metric Ref Use Iface 10.10.0.0 0.0.0.0 255.255.255.0 U 0 0 0 prvbr0 188.8.131.52 0.0.0.0 255.255.252.0 U 0 0 0 br0 127.0.0.0 0.0.0.0 255.0.0.0 U 0 0 0 lo
A correct output should look like this:
Kernel IP routing table Destination Gateway Genmask Flags Metric Ref Use Iface 10.10.0.0 0.0.0.0 255.255.255.0 U 0 0 0 prvbr0 184.108.40.206 0.0.0.0 255.255.252.0 U 0 0 0 br0 127.0.0.0 0.0.0.0 255.0.0.0 U 0 0 0 lo 0.0.0.0 220.127.116.11 0.0.0.0 UG 0 0 0 br0
Alternatively you can use the “ip route show” command, which omits the nice table, but is a little more readable.
On an incorrectly configured system, “ip route show” presents the user with the following:
10.10.0.0/24 dev prvbr0 proto kernel scope link src 10.10.0.1 18.104.22.168/22 dev br0 proto kernel scope link src 22.214.171.124 127.0.0.0/8 dev lo scope link
On a correctly configured system, “ip route show” presents the user with the following:
10.10.0.0/24 dev prvbr0 proto kernel scope link src 10.10.0.1 126.96.36.199/22 dev br0 proto kernel scope link src 188.8.131.52 127.0.0.0/8 dev lo scope link default via 184.108.40.206 dev br0
In the first example, we are worried about the missing 0.0.0.0 and in the second case we are worried about the “default via” lines. If you do not see those lines then the default route has not been set. In situations where this is seen you might be able to ping or access resources on a local subnet, but you would be unable to reach resources outside the immediate subnet.
The fix to this common problem is pretty easy. When the networking scripts bring up the networking configuration, a file in /etc/sysconfig/network called routes is parsed. In almost every case where there have been complaints of being unable to ping Dom0 or Dom0 communications problems, it is caused by the routes file not being correctly populated.
The syntax is as follows:
host_or_net gateway netmask device
For the default route, it is a little easier. Replace the IP address and the device with the correct device name. If you have created static bridges that have an IP address bound to it, then you will want to use that device name.
default 10.0.0.1 - eth0
The device in this example, must have an IP address. I have seen people populate the file with pethX and xenbrX devices, even though the devices do not have an IP address bound to it. You must use a device that has an IP address, even the pseudo Ethernet devices. Since Ethernet bridges are a layer 2 device that, by definition, only understands the ARP protocol and nothing higher, setting the route to use the pethX or any other device with out an IP address will result in the packets not knowing where to go and how to get there, much less having the correct headers. Understanding that routes/routing is done at layer 3 of the OSI model and that routes/routing deals with TCP, it becomes clear that the device that needs to be used is a device with an IP address.
Troubleshooting: A bridge that acts like a bridge
A really common misconception is that the network bridge will forward all the information on to devices. Commonly this is seen where people are trying to do traffic sniffing on a DomU. This, however, is not the case. The default Xen networking scripts will create a bridge that will only pass on traffic addressed to MAC addresses on the other side of the bridge. This is called a transparent bridge, and they are quite dumb — they do absolutely no layer 3, or TCP/IP functions, but act as a link layer or layer 2 bridge (by definition they know nothing about any protocol higher than ARP). Since layer 2 bridges operate by comparing the originating and receiving MAC addresses and passing frames from one side to the other side of the bridge, packet sniffing does not work unless you forward all the frames to the other side of the bridge.
Network bridges are slower than switches or routers. A bridge works by processing all the Ethernet frames that it gets on both sides of the bridge; the bridge looks at the Ethernet frames and determines what the destination MAC address is. If the destination MAC address is known to be on the other side of the bridge, then the bridge forwards the frame. Otherwise, the frame is not forwarded.
Bridges can forward the following types of traffic:
- Multicast traffic
- Broadcast traffic within a subnet
- Unicast traffic to the Dom
There are ways to forward all the packets across the bridge. The method to so, however is advanced and requires knowledge of firewall rules. Suffice it to say, it is not covered in this guide since the applicability of forwarding packets across the bridge is practical.
Troubleshooting: Bridge Performance Issues
With out a doubt, software bridges are slower than physical bridges and switches. Obviously, this is because hardware bridges and switches are designed to do a very few task and then do them well. The bridges that are created on Linux have been shown to be as efficient as most hardware bridges when utilization of CPU resources is under 56% percent. Once the system reaches 86% percent, then performance degradation is enough to worry about. The bridge design in SLES 10 SP2 is more efficient than in SLES 10 SP1, and user should notice a performance increase.
Where you may notice some performance issues is with the initial connection. For example, if the forwarding delay time is set a little too high, then the bridge may not appear to be up. Also, if the MAC address that packets are being sent to or from has been too long, then the connection may take a little longer to see. In most cases this is NOT a problem.
The other consideration to take into account is that the max number of MAC addresses that a bridge can remember is 4096 for the entire bridge. This number may seem high, but on poorly designed networks, networks with a large number of computers, routers, printers, etc., and networks where the subnet is large enough to allow for thousands of hosts, this may be an issue. In most cases, this will never be an issue. Yet, many schools and governments routinely build networks with a /20 or even worse subnet masks (with a /20 subnet mask or 255.255.240.0, there are 4096 potential hosts). Aside from the nightmarish management of such a network (think about the Broadcast traffic, managing a DHCP zone for it, etc), Xen host bridges could be overwhelmed bridges that are flooded with traffic.
For these reasons, it is recommended that production Xen hosts be placed on subnet or VLAN that is separate from workstations and printer devices. Further, it is recommended that subnets be limited to a reasonable size such, usually no bigger than /23 (512 addresses), or /22 (1024 addresses). Most default subnets of /24, or 256 addresses are sufficient and perform reasonably well.
If you locate your Xen hosts on a separate subnet or VLAN, then you may have problems with dynamic browsing technologies. If you have implemented, or are planning on implementing dynamic browsing technologies within a DomU on a separate subnet, it is recommended that you understand how those changes will affect your situation.
Problems with Windows Browsing: If you locate a Windows-based DomU on a separate subnet than your workstations, then you will may have browsing problems unless you have a WINS server. For example, if you have a PDC in a DomU, then you may not be able to see the Windows domain, or you will be unable to browse for the server. If this is a problem, it is recommended to have at least one Windows-based server or Samba configured as a WINS server or to use static IP address assignments in the lmhosts files.
Problems with SLP: If you locate a SLP directory agent in a DomU that is on a separate subnet than the workstations you may have problems with the workstations seeing the directory-agent or the SLP service agents may not register. In such cases, the workstations should be configured to look at a specific IP address for the SLP servers, serve out the SLP directory agent via DHCP and configure the workstations to use DHCP, or locate a directory agent on each subnet and configure the directory agents to see each other statically. eDirectory installations inside of a DomU will be affected by this unless SLP is configured properly or static host files has been setup. This issue is most notably seen on Open Enterprise Server, Netware 6.5 and eDirectory installations.
Recommendation: Bridging and Bonding: redundancy for the DomU’s
Time and time again the question keeps coming up of how to build redundancy in for the DomU’s. I find this question fascinating since it seems that people expect higher up times for the DomU than the Dom0 and they somehow expect the virtual interfaces presented in DomU to go down more often than the physical interfaces.
The following design is what I see done most often. I would recommend it for those wishing to increase the redundancy:
- Bond two or more NIC interfaces in Dom0, using YaST
- Create a bridge on the bonded interface
- Use the bridge for the Dom0
Sounds simple, right? The simple designs have the highest uptime and are the easiest to fix in the event of a problem. Other designs are more ellagent, but they tend to be extremely complex. In the event that you need support or to solve the problem, you will want a simple design to get up faster.
Some people, however decide to bond in the DomU. This, in my opinion introduces unnecessary complexity. Why? Dom0 can handle the bonding, and the bonding for all DomU’s needs be done but once. Bonding the interfaces in DomU may provide the same benefit, but with the cost of a whole lot work. Finally, if the DomU’s have the NICs bonded then any attempt to migrate the DomU to other hardware must have the same number of NICs, and the physical NICs but be bond to the same network. If not, then you run the risk of the DomU not coming up properly if you move it to another machine.
Unless you have a really fancy, expensive switch, I tend to recommend mode 0 or balance-rr. This mode provides redundancy and load balancing. If one of the NICs go down then the connection is still up.
Dynamic Link Aggregation, Mode 4 is a mode where two or more links are treated by the switch as a single link via the 802.3ad specification. Dynamic link aggrigation is not what most people think, as they assume that all the NICs will magically become a mega NIC, (i.e. 4 1Gb cards become a 4Gb card).
- Two or more links are treated as single link by the switch
- Bandwidth is aggregated together
- Individual connections cannot use greater than the bandwidth provided by a single link
- Outbound connections are sent out one of the slave interfaces, not both
- The switch MUST support active link aggregation, if it does not only ONE NIC will be used
If you choose to use link aggregation, please remember that performance will not be X times faster, rather bandwidth will be X more. The difference is subtle, but if you use link aggregation you should expect to serve out more connections at the same speed of a single NIC (i.e. serve out 4 ~1Gb connections, instead of serving out 1 4Gb connection or 2 2Gb connections). Also, the connection will be limited to the how fast the application can handle the connection. Since CPU resources are consumed by the connection and the application needs CPU resources, you can quickly reach the point of diminishing returns where the speed at which the data is coming in is faster than the CPU can handle and therefore the application can process, resulting in lower performance than if you had used other technology.