Load balancing howto: LVS + ldirectord + heartbeat 2
Contents:
- Environment
- Problem
- Solution
- Goals
- Configuration: linux-director (load balancer)
- Configuration: real servers
- Testing the setup
- Caveats
- Alternative solutions
- Conclusion
- External links
Environment
SuSE Linux Enterprise Server 10 Service Pack 2
heartbeat-2.1.3-0.9
IPVS v1.2.1
Problem
A high capacity load balancing solution is needed to address current and future needs to provide highly available and scalable services.
Solution
Linux Virtual Server (LVS) provides the means of building scalable and high performing virtual cluster server. Heartbeat 2 can be used to further increase the availability of the virtual services.
Limitations
- Iptables redirection to avoid ARP problems with direct routing load balancing is not covered.
- Heartbeat 2 SSH STONITH is used without quorumd or pingd. Very limited “tiebreaker” capability.
Concepts
LVS hides real servers behind a virtual IP and load balances the incoming request across all cluster nodes based on a scheduling algorithm. It implements transport-layer load balancing inside the Linux kernel, also called Layer-4 switching.
There are 3 types of LVS load balancing:
- Network Address Translation (NAT)
Incoming requests arrive at the virtual IP and are forwarded to the real servers by changing the destination IP address. The real servers send the response to the load balancer which in turn changes the destination IP address and forwards the response back to the client.As all traffic goes through the load balancer, it usually becomes a bottleneck for the cluster. - IP Tunneling
LVS sends requests to real servers through an IP tunnel (redirecting to a different IP address) and the real servers reply directly to the client using their own routing tables. Cluster members can be in different subnets. - Direct routing
Packets from end users are forwarded directly to the real server. The IP packet is not modified as the real servers are configured to accept traffic for the shared cluster virtual IP address by using a virtual non-ARP alias interface. The response from the real server is send directly to the client. The real servers and load-balancer (LVS) have to be in the same physical network segment. (layer 2)
As the load-balancer is the only entry point for all incoming requests, it would present a single point of failure for the cluster. A backup load-balancer is needed as well as a monitoring program that can fail over the service along with the connection statuses.
I this example, Linux Director Daemon (ldirectord) is used to monitor and administer real servers in the LVS cluster and heartbeat 2 is used as the fail-over monitor for the load balancers (ldirectord).
Note:
linux-director or ldirerector in this document are used to refer to the load-balancing server.
Ldirectord monitors the health of the real servers by periodically requesting a known URL and checking that the response contains an expected string. If a service fails on a server, then the server is taken out of the pool of real-servers and will be reinserted once it comes back on line.
Goals
Image 1: Load balancer graph.
Load balancing behaviour
Linux-directors and real servers will have 1 real interface with their IP address and 1 virtual alias interface that will be configured with the shared Virtual IP (VIP) 192.168.0.200.
- A client will send a request for a web page from 192.168.0.200.
- Ldirectord will check the IP and port number and If they are matched for a virtual service, a real server is chosen from the cluster by a scheduling algorithm, and the connection is added into the hash table which records connections.
- The load balancer forwards the packet (VIP is unchanged) to the chosen real server.
- When the real server receives the forwarded packet, it finds that the packet is for the address on its loopback alias interface, it processes the request and returns the result directly to the client
High availability behaviour
- Node level monitoring
If one of the nodes (ldirector1/ldirector2) running cluster resources stops sending out heartbeat signals, declare it dead, reboot the node and fail over all resources to a different node. - Service level monitoring:
If the VIP or ldirectord service fails, try to restart the service, if it fails, reboot the node and fail over all resources to a different node. - Service “stickiness”
If a dead or stand-by node becomes active again, keep the resources where they run now and don’t fail-back.
Configuration: linux-director (load balancer)
It is recommended to disable SuSEfirewall2 for the configuration to avoid networking issues.
rcSuSEfirewall2 stop chkconfig SuSEfirewall2_init off chkconfig SuSEfirewall2_setup off
Required software
Install heartbeat and ldirectord by running:
zypper install heartbeat heartbeat-ldirectord perl-MailTools
IP forwarding
The Linux-Directors must be able to route traffic to the real-servers. This is achieved by enabling the kernel IPV4 packet forwarding. Edit /etc/sysct.conf
and add net.ipv4.ip_forward = 1.
# /etc/sysctl.conf net.ipv4.ip_forward = 1
For the changes to take effect, run:
sysctl -p
Ldirectord
Create the file /etc/ha.d/ldirectord.cf
and add:
# /etc/ha.d/ldirectord.cf checktimeout=3 checkinterval=5 autoreload=yes logfile="/var/log/ldirectord.log" quiescent=yes virtual=192.168.0.200:80 fallback=127.0.0.1:80 real=192.168.0.110:80 gate real=192.168.0.120:80 gate service=http request="test.html" receive="Still alive" scheduler=wlc protocol=tcp checktype=negotiate
with a [TAB], not white space.
Explanation
virtual=92.168.0.200:80
Defines a virtual service by IP-address and port.
real=192.168.0.110:80 gate
Defines a real service by IP-address and port. The second argument defines the forwarding method, which in this case (gate) translates to *Direct routing*.
request="test.html"
Defines what file to request.
receive="Still alive"
Defines the expected response.
See “man ldirectord
” for configuration directives not covered here.
What ldirector does:
Ldirectord will connect to each real server once every 5 seconds (checkinterval
) and request 192.168.0.110:80/test.html (real/request
). If it does not receive the expected string “Still alive” (receive
) within 3 seconds of the last check (checktimeout
), it will remove the server from the available pool. It will be added again once the check succeeds.
Because of the quiescent=yes
setting, the real servers won’t be removed from the LVS table. Rather ,their weight is set to “0” so that no new connections will be accepted. Already established connections will be persistent until they timeout.
Test
Start ldirectord and check the real server table:
/etc/init.d/ldirectord start Starting ldirectord... success ipvsadm -L -n IP Virtual Server version 1.2.1 (size=4096) Prot LocalAddress:Port Scheduler Flags -> RemoteAddress:Port Forward Weight ActiveConn InActConn TCP 192.168.0.200:http wlc persistent 600 -> 192.168.0.110:http Route 0 0 0 -> 192.168.0.120:http Route 0 0 0
Note:
The weight in this case is 0 because the real-servers are not configured and ldirectord could not fetch the test.html page.
Disable ldirectord service
Make sure ldirectord is *not running* and won’t start on boot. Only heartbeat 2 will be allowed to start and stop the service.
/etc/init.d/ldirectord stop /sbin/chkconfig ldirectord off
Heartbeat 2
This heartbeat 2 setup is not a production setup as no redundant heartbeat media is used. In a production setup at least two media are needed over which the heartbeat signals can be propagated. The SSH STONITH device should never be used in production ! Always use hardware STONITH devices, such as power switches, DRAC, iLO etc.
Heartbeat 2 runs on the two Linux-Directors (load balancers) and handles bringing up the interface for the virtual address. This is the address to which end-users should connect. It will also monitor the ldirectord daemon.
Main configuration file
Create a file /etc/ha.d/ha.cf
and add:
# /etc/ha.d/ha.cf crm on udpport 694 bcast eth0 node ldirector1 ldirector2
Note: The order of directives is significant.
Explanation
crm on
Use heartbeat version 2
udpport 694
Which port Heartbeat will use for its UDP intra-cluster communication
node ldirector1 ldirector2
Node names. Output of uname -n
bcast eth0
Use device eth0 to broadcast the heartbeat
Node authentication
The authkeys configuration file contains information for Heartbeat to use when authenticating cluster members.
Create /etc/ha.d/authkeys
and add:
# /etc/ha.d/authkeys auth 1 1 sha1 YourSecretKey
1
– key number associated with this line
sha1
– key signature method.
YourSecretKey
– shared secret key
This file cannot be readable or writable by anyone other than root:
chmod 600 /etc/ha.d/authkeys
Name resolution
Add node names to /etc/hosts
on both linux-directors:
# /etc/hosts 192.168.0.10 ldirector1 192.168.0.20 ldirector2
The name used here should be the output of uname -r
.
Time synchronization
Even though not required, it is very useful in every cluster environment where you want to compare log files from different nodes. The time server should be outside the cluster. See novell documentation on how to configure an NTP client through YaST2.
Propagate configuration to all nodes
To configure Heartbeat 2 on the other nodes in the cluster run:
/usr/lib/heartbeat/ha_propagate
on the heartbeat node you just configured or just copy over the /etc/ha.d/ha.cf
and /etc/ha.d/authkeys
.
To start the cluster, run:
/etc/init.d/heartbeat start
on both nodes.
It can take more than 1 min to elect the Designated Coordinator (DC) and synchronize the cluster when it start for the first time.
Check the cluster status with:
crm_mon -i 5 ============ Last updated: Tue May 20 04:58:32 2008 Current DC: ldirector1 (5792135e-ed53-438b-8a71-85f0285464c2) 2 Nodes configured. 0 Resources configured. ============ Node: ldirector2 (f8f2ad4a-a05d-416a-92a9-66b759768fb9): online Node: ldirector1 (5792135e-ed53-438b-8a71-85f0285464c2): online
Connect to hb_gui
On the linux-directors set the password for user: hacluster
/usr/bin/passwd hacluster
This is the user that can connect to the heartbeat cluster management console hb_gui
.
Connect to hb_gui
by running:
/usr/bin/hb_gui &
on any of the Ldirectors and specify the IP, username hacluster
and the newly set password.
Create resource group “load_balancer”
A resource group places constraints on the resources to make their management easier. It enforces that resources within the group run on the same node and have to start in a specific order; from the top to the bottom and stop in the reverse order.
Resource: virtual IP
- Create a group named “load_balancer“, leave “colocation” and “ordered” as “true”
- Add the resource “IPAddr2“
- Add the value for the parameter “ip” 192.168.0.200.
This is the virtual IP address 192.168.0.200 that clients will connect to. - Add the parameter “lvs_support” with a value of “true”
Image 2: Resource group configuration – IP address.
- Add an operation named “monitor“, interval “20”, timeout “10” start delay “0” On fail “restart”
Image 3: Resource group configuration – resource monitor.
These values are by default in milliseconds and can be read this way:
Check the VIP service every 20 milliseconds. If the monitor does not get a response within 10 milliseconds, try to restart the service on this node. If the restart fails, fence-off/reboot the node and fail over the resource to an active node.
Note:
If you wonder what’s the difference between IPAddr and IPAddr2, the first one uses “ifconfig” and the second “ip” command to set up the interface. Moreover, IPAddr2 can be used as a cloned Cluster IP.
Resource: ldirector
- Add a native resource named “ldirectord” with the class “ocf/heartbeat” that belongs to the group “load_balancer”.
- Add the parameter “configfile” with the value of “/etc/ha.d/ldirectord.cf“
- Add an operation name “monitor“, interval “20”, timeout “10” start delay “0” On fail “restart”
Image 4: Resource group configuration – ldirector.
- Start the resource groupHighlight the “load_balancer” group and click on the “Play” button on the top bar. The resource group should come up with a green light.
Image 5: Starting the resource group.
STONITH
“Shoot the other node in the head” (STONITH) is a fencing technique. If a node loses communication to the cluster, it will be fenced of the cluster. As heartbeat can’t know for sure if the node is really dead or not, it uses STONITH to make the uncertain assumptions become a solid fact by powering down or rebooting the errant node.
- Generate SSH keysOn both nodes, execute as root:
ssh-keygen -t rsa Generating public/private rsa key pair. Enter file in which to save the key (/root/.ssh/id_rsa): Created directory '/root/.ssh'. Enter passphrase (empty for no passphrase): # leave empty Enter same passphrase again: # leave empty Your identification has been saved in /root/.ssh/id_rsa. Your public key has been saved in /root/.ssh/id_rsa.pub.
Generate the SSH keys without a passphrase (just hit enter when prompted for passphrase)
- Distribute SSH keys
Distribute the SSH keys to both nodes using the following commands:#on ldirector1 ssh-copy-id -i /root/.SSH/id_rsa.pub ldirector2 #on ldirector2 ssh-copy-id -i /root/.SSH/id_rsa.pub ldirector1
After propagating the SSH keys, test if you can execute commands without being prompted for a password.
Here is an example from ldirector2:ldirector2:~ # ssh -q -x -n -l root "ldirector1" "ls -l /" total 22 drwxr-xr-x 2 root root 2920 May 19 22:47 bin drwxr-xr-x 3 root root 624 May 19 23:03 boot drwxr-xr-x 9 root root 6760 May 20 01:03 dev drwxr-xr-x 79 root root 6616 May 20 05:03 etc -- snip --
- Activate the ATD daemonThe ATD daemon is used to execute the SSH STONITH reboot command. Activate it by running:
/etc/init.d/atd start chkconfig atd on
- STONITH clone resource
Clones are resources that can run simultaneously on multiple nodes.- Add a native resource with resource_id “ssh_stonith” with resource type “external/ssh“
- Add parameter “hostlist” with the value “ldirector1,ldirector2“.
- Check the “Clone” button and set the Attributes “clone_max” to “2” and “clone_node_max” to “1“.
This will make sure that only 1 STONITH clone can run on a single node and a total of 2 STONITH resources can run in the whole cluster.Image 6: Clone resource configuration – SSH STONITH.
- Finalize configuration
- Start the STONITH clone by pressing the “play” button.
- Highlight the “linux-ha“, do under the “Configurations” tab, check the “Stonith Enabled” box and set “Default Resource Stickiness” to “INFINITY”
Image 7: Completed configuration.
WARNING:
If your STONITH device does not work properly, the resources might never fail over in the case of a failure as the healthy node will try to STONITH the faulty node and will not take over resources until it get’s an confirmation about a successful STONITH. If you have issues with this, try to disable STONITH through the hb_gui.Configuration check
As all resources should be running now, check on the linux-director that is currently running the load_balancer group, if ldirectord and the virtual IP address are running. In this example, ldirector2 is running the resources:
ldirector2:~ # ip add sh eth0 2: eth0: <BROADCAST,MULTICAST,UP> mtu 1500 qdisc pfifo_fast qlen 1000 link/ether 00:50:56:00:00:f0 brd ff:ff:ff:ff:ff:ff inet 192.168.0.20/24 brd 192.168.0.255 scope global eth0 inet 192.168.0.200/24 brd 192.168.0.255 scope global secondary eth0 ldirector2:~ # ps x |grep ldirector 9918 ? S 0:00 /usr/bin/perl -w /usr/sbin/ldirectord /etc/ha.d/ldirectord.cf start
Configuration: real servers
Virtual interface
Edit
/etc/sysconfig/network/ifcfg-lo
and add:# /etc/sysconfig/network/ifcfg-lo IPADDR_0=192.168.0.200 # VIP NETMASK_0=255.255.255.255 NETWORK_0=192.168.0.0 BROADCAST_0=192.168.0.255 LABEL_0='0'
Restart the network:
/etc/init.d/network restart
The new
lo:0
virtual interface is now active:ip add sh lo 1: lo: <LOOPBACK,UP> mtu 16436 qdisc noqueue link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 brd 127.255.255.255 scope host lo inet 192.168.0.200/32 brd 192.168.0.255 scope global lo:0
Restrict ARP advertisements
Clients will send all HTTP requests to the VIP 192.168.0.200. Before they can connect to the IP, an ARP request is made to match a MAC address to the requested IP address. Since the linux-directors and real servers both have an interface configured with the same virtual IP address, each one of them can randomly reply to an ARP request for 192.168.0.200. This would break the load balancing for the cluster. To solve this problem, ARP replies for the virtual interfaces have to be disabled.
Edit
/etc/sysctl.conf
and add:# /etc/sysctl.conf net.ipv4.conf.all.ARP_ignore = 1 net.ipv4.conf.eth0.ARP_ignore = 1 net.ipv4.conf.all.ARP_announce = 2 net.ipv4.conf.eth0.ARP_announce = 2
Load the changes with:
sysctl -p
Explanation
net.ipv4.conf.all.ARP_ignore = 1
Enable configuration of ARP_ignore optionnet.ipv4.conf.eth0.ARP_ignore = 1
Do not respond to ARP requests if the requested IP address is configured on the “lo” (loopback) device or any virtual eth0:X device.net.ipv4.conf.all.ARP_announce = 2
Enable configuration of ARP_announce optionnet.ipv4.conf.eth0.ARP_announce = 2
As the source IP address of ARP requests is entered into the ARP cache on the destination, it has the effect of announcing this address. This is undesirable for the lo or any other virtual interfaces from the real servers.Using this setting, whenever the real server makes an ARP request, it tries to use the real IP as the source IP of the ARP request.
Default gateway
The real-servers need to be set up so that their default route is set to the gateway router’s address on the server network and not an address on one of the linux-directors. In this example, 192.168.0.254 is the default gateway.
echo "default 192.168.0.254" > /etc/sysconfig/network/routes; rcnetwork restart; # and check the routing table route -n Kernel IP routing table Destination Gateway Genmask Flags Metric Ref Use Iface 192.168.0.0 0.0.0.0 255.255.255.0 U 0 0 0 eth0 169.254.0.0 0.0.0.0 255.255.0.0 U 0 0 0 eth0 127.0.0.0 0.0.0.0 255.0.0.0 U 0 0 0 lo 0.0.0.0 192.168.0.254 0.0.0.0 UG 0 0 0 eth0
Web server
- Install Apache2 by running:
zypper install apache2
- Create a test.html page that ldirectord will periodically check to determine if the service is available:
echo "Still alive" > /srv/www/htdocs/test.html echo "Real server 1" > /srv/www/htdocs/index.html
Note:
The default SLES10 SP2 Apache2 DocumentRoot is used in this example.Repeat the same on real-server2 but change the index.html to “Real server2” so it is visible which web server is serving the request.
Start HTTP service:
/etc/init.d/apache2 start
Note:
We only use a virtual HTTP service. It is possible to configure ldirectord to check any other services, such as , oracle listener, MySQL, SMTP, POP/IMAP, FTP, LDAP, NNTP and others.Ldirectord test
After setting up and starting the apache web server on both real-servers, check on the linux-director that is currently running the load_balancer resource group if both servers are available in the IPVS server pool:
ldirector2:~ # ipvsadm -Ln IP Virtual Server version 1.2.1 (size=4096) Prot LocalAddress:Port Scheduler Flags -> RemoteAddress:Port Forward Weight ActiveConn InActConn TCP 192.168.0.200:80 wlc -> 192.168.0.110:80 Route 1 0 0 -> 192.168.0.120:80 Route 1 0 0
Now we see both servers with weight of 1. Connect with a browser to: 192.168.0.200. A page served by real-server1 or real-server2 should come up.
Image 8: Load balancing test.
Testing the cluster
heartbeat
- put one of the ldirectors on stand-by using the hb_gui
result: resource fail-over - connect to the active linux-director and kill the ldirectord process (
killall ldirectord
)
result: resource restart - kill the heartbeat process (
killall heartbeat
)
result: node reboot and resource fail-over - kill network connection between ldirector1 and ldirector2
result: split-brain situation. Both nodes get quorum.
The first one to send a successful STONITH takes over resources. Usually, the DC is the faster one to STONITH the other node.
In the case of SSH STONITH, this does not work that well as a network connection is needed for the ssh command.
ldirectord
- connect to 192.168.0.200 a couple of times
result: index.html from real-server 1 or 2 is shown - kill connection to real-server1 (wait 10 sec) and check connectivity again
result: index.html from real-server2 is shown
Caveats
- Lidrecord does not start
When trying to run ldirector, the following errors occurs:/etc/init.d/ldirectord start Starting ldirectord... Can't locate Mail/Send.pm in @INC (@INC contains: /usr/lib/perl5/5.8.8/i586-linux-thread-multi /usr/lib/perl5/5.8.8 -- snip --
Install perl-MailTools:
zypper install perl-MailTools
or the CPAN Mail::Send module:
env FTP_PASSIVE=1 cpan -i Send::Mail
Failed start with error:
/etc/init.d/ldirectord start Error [19251] reading file /etc/ha.d/ldirectord.cf at line X: Unknown command fallback=127.0.0.1:80
Make sure the directives in
/etc/ha.d/ldirectord.cf
undervirtual=192.168.0.200:80
begin with a [TAB]. - Ldrectord shows real-servers with a weight of 0
Check if you can connect to the real servers directly with a browser and that the web page tree matches the ldirector request=”test.html” and receive=”Still alive” directives. - SSH STONITH fails to reboot a node
Make sure that the ATD and ssh daemons are running. Test if you can ssh to both nodes without a passphrase.Alternative solutions
Keepalived
Keepalived provides a strong and robust health checking for LVS clusters. It implements a framework of health checking on multiple layers for server failover, and VRRPv2 stack to handle director failover.Piranha
Piranha provides the ability to load-balance incoming IP network requests across a farm of servers. IP Load Balancing is based on open source Linux Virtual Server (LVS) technology.Conclusion
The combination of Heartbeat 2, ldirectord and LVS provides a robust framework of open source tools to build highly available clusters that can load balance work between two or more servers in order to assure optimal resource utilization, scalability and availability of services. The example shown here depicts a basic working setup, that can be fine tuned to meet more specific needs.
External links
Ultramonkey
Linux-ha project
Linux virtual server
Heartbeat 2 DTD
Split-brain, quorum, fencingnote: example configuration files are attached.
Comments
I was just asked by a customer to help them develop a simple load balancing solution for BorderManager, and this, combined with the just released JeOS might just do the trick! Thanks for the tip!
These are the gems that keep me browsing to over http://www.novell.com/communities/
Novell / Novell Community Members: Please keep this level of detail and technical accuracy up!
This has finally brought all the snippets of info together into one place.
I’m using it on a 2 node cluster, using ldirectord and heartbeat on the same nodes as the web service. All working great.
Perhaps you could show the XML outputted from cibadmin so people don’t have to install a GUI on a server? 🙂
The moment I set the loopback alias on any of the real servers, the server running ldirectord cannot see the server any more. Let’s say I have a pool of webservers. On web01, I have:
lo:0 Link encap:Local Loopback
inet addr:10.0.0.100 Mask:255.255.255.255
UP LOOPBACK RUNNING MTU:16436 Metric:1
…where 10.0.0.100 just also happens to be the IP of an alias of my primary interface on the Load Balancer. From my Load Balancer, web01’s real IP address has just become unresponsive.
I’m convinced my /etc/sysctl.conf is missing *something*, I just can’t put my finger on what. Please help.
… Turning off a firewall isn’t a good idea in an enterprise environment! What contortions are necessary to get SuSEfirewall2 to work with this at least on the load balancers? I assume FW_ROUTE and FW_MASQUERADE. What else?
Hi i am curious about the loadbalancer. can we do this with https as well. if we can the it would have been very helpful to me.
Tenebris,
Could your issue have something to do with the last octet of your subnet mask? Your post shows it as 255 — perhaps changing it to 0 will help.
Mark G. Harvey
Pres., Denver Area Novell Users Group
What about the connections state synchronization?, without that the failover is disruptive…
This has finally brought all the snippets of info together into one place. I’m using it on a 2 node cluster, using ldirectord and heartbeat on the same nodes as the web service. All working great.