SUSE Conversations


Load balancing howto: LVS + ldirectord + heartbeat 2



By: jslezacek

May 23, 2008 11:56 am

Reads:10892

Comments:9

Rating:0

Contents:

Environment

SuSE Linux Enterprise Server 10 Service Pack 2
heartbeat-2.1.3-0.9
IPVS v1.2.1

Problem

A high capacity load balancing solution is needed to address current and future needs to provide highly available and scalable services.

Solution

Linux Virtual Server (LVS) provides the means of building scalable and high performing virtual cluster server. Heartbeat 2 can be used to further increase the availability of the virtual services.

Limitations

  • Iptables redirection to avoid ARP problems with direct routing load balancing is not covered.
  • Heartbeat 2 SSH STONITH is used without quorumd or pingd. Very limited “tiebreaker” capability.

Concepts
LVS hides real servers behind a virtual IP and load balances the incoming request across all cluster nodes based on a scheduling algorithm. It implements transport-layer load balancing inside the Linux kernel, also called Layer-4 switching.

There are 3 types of LVS load balancing:

  • Network Address Translation (NAT)
    Incoming requests arrive at the virtual IP and are forwarded to the real servers by changing the destination IP address. The real servers send the response to the load balancer which in turn changes the destination IP address and forwards the response back to the client.As all traffic goes through the load balancer, it usually becomes a bottleneck for the cluster.
  • IP Tunneling
    LVS sends requests to real servers through an IP tunnel (redirecting to a different IP address) and the real servers reply directly to the client using their own routing tables. Cluster members can be in different subnets.
  • Direct routing
    Packets from end users are forwarded directly to the real server. The IP packet is not modified as the real servers are configured to accept traffic for the shared cluster virtual IP address by using a virtual non-ARP alias interface. The response from the real server is send directly to the client. The real servers and load-balancer (LVS) have to be in the same physical network segment. (layer 2)

As the load-balancer is the only entry point for all incoming requests, it would present a single point of failure for the cluster. A backup load-balancer is needed as well as a monitoring program that can fail over the service along with the connection statuses.

I this example, Linux Director Daemon (ldirectord) is used to monitor and administer real servers in the LVS cluster and heartbeat 2 is used as the fail-over monitor for the load balancers (ldirectord).

Note:
linux-director or ldirerector in this document are used to refer to the load-balancing server.

Ldirectord monitors the health of the real servers by periodically requesting a known URL and checking that the response contains an expected string. If a service fails on a server, then the server is taken out of the pool of real-servers and will be reinserted once it comes back on line.

Goals

Image 1: Load balancer graph.

Load balancing behaviour
Linux-directors and real servers will have 1 real interface with their IP address and 1 virtual alias interface that will be configured with the shared Virtual IP (VIP) 192.168.0.200.

  1. A client will send a request for a web page from 192.168.0.200.
  2. Ldirectord will check the IP and port number and If they are matched for a virtual service, a real server is chosen from the cluster by a scheduling algorithm, and the connection is added into the hash table which records connections.
  3. The load balancer forwards the packet (VIP is unchanged) to the chosen real server.
  4. When the real server receives the forwarded packet, it finds that the packet is for the address on its loopback alias interface, it processes the request and returns the result directly to the client

High availability behaviour

  1. Node level monitoring
    If one of the nodes (ldirector1/ldirector2) running cluster resources stops sending out heartbeat signals, declare it dead, reboot the node and fail over all resources to a different node.
  2. Service level monitoring:
    If the VIP or ldirectord service fails, try to restart the service, if it fails, reboot the node and fail over all resources to a different node.
  3. Service “stickiness”
    If a dead or stand-by node becomes active again, keep the resources where they run now and don’t fail-back.

Configuration: linux-director (load balancer)

It is recommended to disable SuSEfirewall2 for the configuration to avoid networking issues.

rcSuSEfirewall2 stop
chkconfig SuSEfirewall2_init off
chkconfig SuSEfirewall2_setup off

Required software

Install heartbeat and ldirectord by running:

zypper install heartbeat heartbeat-ldirectord perl-MailTools

IP forwarding

The Linux-Directors must be able to route traffic to the real-servers. This is achieved by enabling the kernel IPV4 packet forwarding. Edit /etc/sysct.conf and add net.ipv4.ip_forward = 1.

# /etc/sysctl.conf
net.ipv4.ip_forward = 1

For the changes to take effect, run:

sysctl -p

Ldirectord

Create the file /etc/ha.d/ldirectord.cf and add:

# /etc/ha.d/ldirectord.cf
checktimeout=3
checkinterval=5
autoreload=yes
logfile="/var/log/ldirectord.log"
quiescent=yes
virtual=192.168.0.200:80
	fallback=127.0.0.1:80
	real=192.168.0.110:80 gate
 	real=192.168.0.120:80 gate
	service=http
	request="test.html"
	receive="Still alive"
	scheduler=wlc
	protocol=tcp
	checktype=negotiate
IMPORTANT: The directives under “virtual=” have to start
with a [TAB], not white space.

Explanation

virtual=92.168.0.200:80
Defines a virtual service by IP-address and port.

real=192.168.0.110:80 gate
Defines a real service by IP-address and port. The second argument defines the forwarding method, which in this case (gate) translates to *Direct routing*.

request="test.html"
Defines what file to request.

receive="Still alive"
Defines the expected response.

See “man ldirectord” for configuration directives not covered here.

What ldirector does:

Ldirectord will connect to each real server once every 5 seconds (checkinterval) and request 192.168.0.110:80/test.html (real/request). If it does not receive the expected string “Still alive” (receive) within 3 seconds of the last check (checktimeout), it will remove the server from the available pool. It will be added again once the check succeeds.

Because of the quiescent=yes setting, the real servers won’t be removed from the LVS table. Rather ,their weight is set to “0″ so that no new connections will be accepted. Already established connections will be persistent until they timeout.

Test
Start ldirectord and check the real server table:

/etc/init.d/ldirectord start
Starting ldirectord... success

ipvsadm -L -n
IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
  -> RemoteAddress:Port           Forward Weight ActiveConn InActConn
TCP  192.168.0.200:http wlc persistent 600
  -> 192.168.0.110:http           Route   0      0          0         
  -> 192.168.0.120:http           Route   0      0          0

Note:
The weight in this case is 0 because the real-servers are not configured and ldirectord could not fetch the test.html page.

Disable ldirectord service
Make sure ldirectord is *not running* and won’t start on boot. Only heartbeat 2 will be allowed to start and stop the service.

/etc/init.d/ldirectord stop
/sbin/chkconfig ldirectord off

Heartbeat 2

This heartbeat 2 setup is not a production setup as no redundant heartbeat media is used. In a production setup at least two media are needed over which the heartbeat signals can be propagated. The SSH STONITH device should never be used in production ! Always use hardware STONITH devices, such as power switches, DRAC, iLO etc.

Heartbeat 2 runs on the two Linux-Directors (load balancers) and handles bringing up the interface for the virtual address. This is the address to which end-users should connect. It will also monitor the ldirectord daemon.

Main configuration file

Create a file /etc/ha.d/ha.cf and add:

# /etc/ha.d/ha.cf
crm on
udpport 694
bcast eth0
node ldirector1 ldirector2

Note: The order of directives is significant.

Explanation

crm on
Use heartbeat version 2

udpport 694
Which port Heartbeat will use for its UDP intra-cluster communication

node ldirector1 ldirector2
Node names. Output of uname -n

bcast eth0
Use device eth0 to broadcast the heartbeat

Node authentication

The authkeys configuration file contains information for Heartbeat to use when authenticating cluster members.

Create /etc/ha.d/authkeys and add:

# /etc/ha.d/authkeys 
auth 1 
1 sha1 YourSecretKey

1 – key number associated with this line
sha1 – key signature method.
YourSecretKey – shared secret key

This file cannot be readable or writable by anyone other than root:

chmod 600 /etc/ha.d/authkeys

Name resolution

Add node names to /etc/hosts on both linux-directors:

# /etc/hosts
192.168.0.10    ldirector1
192.168.0.20    ldirector2

The name used here should be the output of uname -r.

Time synchronization
Even though not required, it is very useful in every cluster environment where you want to compare log files from different nodes. The time server should be outside the cluster. See novell documentation on how to configure an NTP client through YaST2.

Propagate configuration to all nodes

To configure Heartbeat 2 on the other nodes in the cluster run:

/usr/lib/heartbeat/ha_propagate

on the heartbeat node you just configured or just copy over the /etc/ha.d/ha.cf and /etc/ha.d/authkeys.

To start the cluster, run:

/etc/init.d/heartbeat start

on both nodes.

It can take more than 1 min to elect the Designated Coordinator (DC) and synchronize the cluster when it start for the first time.
Check the cluster status with:

crm_mon -i 5
============
Last updated: Tue May 20 04:58:32 2008
Current DC: ldirector1 (5792135e-ed53-438b-8a71-85f0285464c2)
2 Nodes configured.
0 Resources configured.
============

Node: ldirector2 (f8f2ad4a-a05d-416a-92a9-66b759768fb9): online
Node: ldirector1 (5792135e-ed53-438b-8a71-85f0285464c2): online

Connect to hb_gui

On the linux-directors set the password for user: hacluster

/usr/bin/passwd hacluster

This is the user that can connect to the heartbeat cluster management console hb_gui.

Connect to hb_gui by running:

/usr/bin/hb_gui &

on any of the Ldirectors and specify the IP, username hacluster and the newly set password.

Create resource group “load_balancer”

A resource group places constraints on the resources to make their management easier. It enforces that resources within the group run on the same node and have to start in a specific order; from the top to the bottom and stop in the reverse order.

Resource: virtual IP

  1. Create a group named “load_balancer“, leave “colocation” and “ordered” as “true”
  2. Add the resource “IPAddr2
  3. Add the value for the parameter “ip” 192.168.0.200.
    This is the virtual IP address 192.168.0.200 that clients will connect to.
  4. Add the parameter “lvs_support” with a value of “true

    Image 2: Resource group configuration – IP address.

  5. Add an operation named “monitor“, interval “20″, timeout “10″ start delay “0″ On fail “restart”

    Image 3: Resource group configuration – resource monitor.

These values are by default in milliseconds and can be read this way:
Check the VIP service every 20 milliseconds. If the monitor does not get a response within 10 milliseconds, try to restart the service on this node. If the restart fails, fence-off/reboot the node and fail over the resource to an active node.

Note:
If you wonder what’s the difference between IPAddr and IPAddr2, the first one uses “ifconfig” and the second “ip” command to set up the interface. Moreover, IPAddr2 can be used as a cloned Cluster IP.

Resource: ldirector

  1. Add a native resource named “ldirectord” with the class “ocf/heartbeat” that belongs to the group “load_balancer”.
  2. Add the parameter “configfile” with the value of “/etc/ha.d/ldirectord.cf
  3. Add an operation name “monitor“, interval “20″, timeout “10″ start delay “0″ On fail “restart”

    Image 4: Resource group configuration – ldirector.

  4. Start the resource groupHighlight the “load_balancer” group and click on the “Play” button on the top bar. The resource group should come up with a green light.

    Image 5: Starting the resource group.

STONITH
“Shoot the other node in the head” (STONITH) is a fencing technique. If a node loses communication to the cluster, it will be fenced of the cluster. As heartbeat can’t know for sure if the node is really dead or not, it uses STONITH to make the uncertain assumptions become a solid fact by powering down or rebooting the errant node.

  • Generate SSH keysOn both nodes, execute as root:
    ssh-keygen -t rsa
    Generating public/private rsa key pair.
    Enter file in which to save the key (/root/.ssh/id_rsa): 
    Created directory '/root/.ssh'.
    Enter passphrase (empty for no passphrase): # leave empty
    Enter same passphrase again: 		    # leave empty
    Your identification has been saved in /root/.ssh/id_rsa.
    Your public key has been saved in /root/.ssh/id_rsa.pub.

    Generate the SSH keys without a passphrase (just hit enter when prompted for passphrase)

  • Distribute SSH keys
    Distribute the SSH keys to both nodes using the following commands:

    #on ldirector1
    ssh-copy-id -i /root/.SSH/id_rsa.pub ldirector2
    
    #on ldirector2
    ssh-copy-id -i /root/.SSH/id_rsa.pub ldirector1

    After propagating the SSH keys, test if you can execute commands without being prompted for a password.
    Here is an example from ldirector2:

    ldirector2:~ # ssh -q -x -n -l root "ldirector1" "ls -l /"
    total 22
    drwxr-xr-x  2 root root 2920 May 19 22:47 bin
    drwxr-xr-x  3 root root  624 May 19 23:03 boot
    drwxr-xr-x  9 root root 6760 May 20 01:03 dev
    drwxr-xr-x 79 root root 6616 May 20 05:03 etc
    -- snip --
  • Activate the ATD daemonThe ATD daemon is used to execute the SSH STONITH reboot command. Activate it by running:
    /etc/init.d/atd start
    chkconfig atd on
  • STONITH clone resource
    Clones are resources that can run simultaneously on multiple nodes.

    1. Add a native resource with resource_id “ssh_stonith” with resource type “external/ssh
    2. Add parameter “hostlist” with the value “ldirector1,ldirector2“.
    3. Check the “Clone” button and set the Attributes “clone_max” to “2” and “clone_node_max” to “1“.
      This will make sure that only 1 STONITH clone can run on a single node and a total of 2 STONITH resources can run in the whole cluster.

      Image 6: Clone resource configuration – SSH STONITH.

  • Finalize configuration
    1. Start the STONITH clone by pressing the “play” button.
    2. Highlight the “linux-ha“, do under the “Configurations” tab, check the “Stonith Enabled” box and set “Default Resource Stickiness” to “INFINITY

      Image 7: Completed configuration.

    WARNING:
    If your STONITH device does not work properly, the resources might never fail over in the case of a failure as the healthy node will try to STONITH the faulty node and will not take over resources until it get’s an confirmation about a successful STONITH. If you have issues with this, try to disable STONITH through the hb_gui.

    Configuration check

    As all resources should be running now, check on the linux-director that is currently running the load_balancer group, if ldirectord and the virtual IP address are running. In this example, ldirector2 is running the resources:

    ldirector2:~ # ip add sh eth0
    2: eth0: <BROADCAST,MULTICAST,UP> mtu 1500 qdisc pfifo_fast qlen 1000
        link/ether 00:50:56:00:00:f0 brd ff:ff:ff:ff:ff:ff
        inet 192.168.0.20/24 brd 192.168.0.255 scope global eth0
        inet 192.168.0.200/24 brd 192.168.0.255 scope global secondary eth0
    
    ldirector2:~ # ps x |grep ldirector
     9918 ?        S      0:00 /usr/bin/perl -w /usr/sbin/ldirectord /etc/ha.d/ldirectord.cf start

    Configuration: real servers

    Virtual interface

    Edit /etc/sysconfig/network/ifcfg-lo and add:

    # /etc/sysconfig/network/ifcfg-lo
    IPADDR_0=192.168.0.200    # VIP
    NETMASK_0=255.255.255.255
    NETWORK_0=192.168.0.0
    BROADCAST_0=192.168.0.255
    LABEL_0='0'

    Restart the network:

    /etc/init.d/network restart

    The new lo:0 virtual interface is now active:

    ip add sh lo
    1: lo: <LOOPBACK,UP> mtu 16436 qdisc noqueue 
        link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
        inet 127.0.0.1/8 brd 127.255.255.255 scope host lo
        inet 192.168.0.200/32 brd 192.168.0.255 scope global lo:0

    Restrict ARP advertisements

    Clients will send all HTTP requests to the VIP 192.168.0.200. Before they can connect to the IP, an ARP request is made to match a MAC address to the requested IP address. Since the linux-directors and real servers both have an interface configured with the same virtual IP address, each one of them can randomly reply to an ARP request for 192.168.0.200. This would break the load balancing for the cluster. To solve this problem, ARP replies for the virtual interfaces have to be disabled.

    Edit /etc/sysctl.conf and add:

    # /etc/sysctl.conf
    net.ipv4.conf.all.ARP_ignore = 1
    net.ipv4.conf.eth0.ARP_ignore = 1
    net.ipv4.conf.all.ARP_announce = 2
    net.ipv4.conf.eth0.ARP_announce = 2

    Load the changes with:

    sysctl -p

    Explanation

    net.ipv4.conf.all.ARP_ignore = 1
    Enable configuration of ARP_ignore option

    net.ipv4.conf.eth0.ARP_ignore = 1
    Do not respond to ARP requests if the requested IP address is configured on the “lo” (loopback) device or any virtual eth0:X device.

    net.ipv4.conf.all.ARP_announce = 2
    Enable configuration of ARP_announce option

    net.ipv4.conf.eth0.ARP_announce = 2
    As the source IP address of ARP requests is entered into the ARP cache on the destination, it has the effect of announcing this address. This is undesirable for the lo or any other virtual interfaces from the real servers.

    Using this setting, whenever the real server makes an ARP request, it tries to use the real IP as the source IP of the ARP request.

    Default gateway

    The real-servers need to be set up so that their default route is set to the gateway router’s address on the server network and not an address on one of the linux-directors. In this example, 192.168.0.254 is the default gateway.

    echo "default 192.168.0.254" > /etc/sysconfig/network/routes;
    rcnetwork restart;
    # and check the routing table
    route -n
    Kernel IP routing table
    Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
    192.168.0.0     0.0.0.0         255.255.255.0   U     0      0        0 eth0
    169.254.0.0     0.0.0.0         255.255.0.0     U     0      0        0 eth0
    127.0.0.0       0.0.0.0         255.0.0.0       U     0      0        0 lo
    0.0.0.0         192.168.0.254   0.0.0.0         UG    0      0        0 eth0

    Web server

    1. Install Apache2 by running:
        zypper install apache2
    2. Create a test.html page that ldirectord will periodically check to determine if the service is available:
      echo "Still alive" > /srv/www/htdocs/test.html
      echo "Real server 1" > /srv/www/htdocs/index.html

    Note:
    The default SLES10 SP2 Apache2 DocumentRoot is used in this example.

    Repeat the same on real-server2 but change the index.html to “Real server2″ so it is visible which web server is serving the request.

    Start HTTP service:

    /etc/init.d/apache2 start

    Note:
    We only use a virtual HTTP service. It is possible to configure ldirectord to check any other services, such as , oracle listener, MySQL, SMTP, POP/IMAP, FTP, LDAP, NNTP and others.

    Ldirectord test

    After setting up and starting the apache web server on both real-servers, check on the linux-director that is currently running the load_balancer resource group if both servers are available in the IPVS server pool:

    ldirector2:~ # ipvsadm -Ln
    IP Virtual Server version 1.2.1 (size=4096)
    Prot LocalAddress:Port Scheduler Flags
      -> RemoteAddress:Port           Forward Weight ActiveConn InActConn
    TCP  192.168.0.200:80 wlc
      -> 192.168.0.110:80             Route   1      0          0         
      -> 192.168.0.120:80             Route   1      0          0

    Now we see both servers with weight of 1. Connect with a browser to: 192.168.0.200. A page served by real-server1 or real-server2 should come up.

    Image 8: Load balancing test.

    Testing the cluster

    heartbeat

    • put one of the ldirectors on stand-by using the hb_gui
      result: resource fail-over
    • connect to the active linux-director and kill the ldirectord process (killall ldirectord)
      result: resource restart
    • kill the heartbeat process (killall heartbeat)
      result: node reboot and resource fail-over
    • kill network connection between ldirector1 and ldirector2
      result: split-brain situation. Both nodes get quorum.
      The first one to send a successful STONITH takes over resources. Usually, the DC is the faster one to STONITH the other node.
      In the case of SSH STONITH, this does not work that well as a network connection is needed for the ssh command.

    ldirectord

    • connect to 192.168.0.200 a couple of times
      result: index.html from real-server 1 or 2 is shown
    • kill connection to real-server1 (wait 10 sec) and check connectivity again
      result: index.html from real-server2 is shown

    Caveats

    • Lidrecord does not start
      When trying to run ldirector, the following errors occurs:

      /etc/init.d/ldirectord start
      Starting ldirectord... Can't locate Mail/Send.pm in @INC (@INC contains: /usr/lib/perl5/5.8.8/i586-linux-thread-multi /usr/lib/perl5/5.8.8
      -- snip --

      Install perl-MailTools:

      zypper install perl-MailTools

      or the CPAN Mail::Send module:

      env FTP_PASSIVE=1 cpan -i Send::Mail

      Failed start with error:

      /etc/init.d/ldirectord start
      Error [19251] reading file /etc/ha.d/ldirectord.cf at line X: Unknown command  fallback=127.0.0.1:80

      Make sure the directives in /etc/ha.d/ldirectord.cf under virtual=192.168.0.200:80 begin with a [TAB].

    • Ldrectord shows real-servers with a weight of 0
      Check if you can connect to the real servers directly with a browser and that the web page tree matches the ldirector request=”test.html” and receive=”Still alive” directives.
    • SSH STONITH fails to reboot a node
      Make sure that the ATD and ssh daemons are running. Test if you can ssh to both nodes without a passphrase.

      Alternative solutions

      Keepalived
      Keepalived provides a strong and robust health checking for LVS clusters. It implements a framework of health checking on multiple layers for server failover, and VRRPv2 stack to handle director failover.

      Piranha
      Piranha provides the ability to load-balance incoming IP network requests across a farm of servers. IP Load Balancing is based on open source Linux Virtual Server (LVS) technology.

      Conclusion

      The combination of Heartbeat 2, ldirectord and LVS provides a robust framework of open source tools to build highly available clusters that can load balance work between two or more servers in order to assure optimal resource utilization, scalability and availability of services. The example shown here depicts a basic working setup, that can be fine tuned to meet more specific needs.

      External links

      Ultramonkey
      Linux-ha project
      Linux virtual server
      Heartbeat 2 DTD
      Split-brain, quorum, fencing

      note: example configuration files are attached.

VN:F [1.9.22_1171]
Rating: 0.0/5 (0 votes cast)
VN:F [1.9.22_1171]
Rating: 0 (from 0 votes)

Tags: , ,
Categories: SUSE Linux Enterprise Server, Technical Solutions

Disclaimer: As with everything else at SUSE Conversations, this content is definitely not supported by SUSE (so don't even think of calling Support if you try something and it blows up).  It was contributed by a community member and is published "as is." It seems to have worked for at least one person, and might work for you. But please be sure to test, test, test before you do anything drastic with it.

9 Comments

  1. By:gbianchi77

    I was just asked by a customer to help them develop a simple load balancing solution for BorderManager, and this, combined with the just released JeOS might just do the trick! Thanks for the tip!

  2. By:ericgearhart

    These are the gems that keep me browsing to over http://www.novell.com/communities/

    Novell / Novell Community Members: Please keep this level of detail and technical accuracy up!

  3. By:Anonymous

    This has finally brought all the snippets of info together into one place.

    I’m using it on a 2 node cluster, using ldirectord and heartbeat on the same nodes as the web service. All working great.

    Perhaps you could show the XML outputted from cibadmin so people don’t have to install a GUI on a server? :-)

  4. By:Tenebris

    The moment I set the loopback alias on any of the real servers, the server running ldirectord cannot see the server any more. Let’s say I have a pool of webservers. On web01, I have:

    lo:0 Link encap:Local Loopback
    inet addr:10.0.0.100 Mask:255.255.255.255
    UP LOOPBACK RUNNING MTU:16436 Metric:1

    …where 10.0.0.100 just also happens to be the IP of an alias of my primary interface on the Load Balancer. From my Load Balancer, web01′s real IP address has just become unresponsive.

    I’m convinced my /etc/sysctl.conf is missing *something*, I just can’t put my finger on what. Please help.

  5. By:caruthers

    … Turning off a firewall isn’t a good idea in an enterprise environment! What contortions are necessary to get SuSEfirewall2 to work with this at least on the load balancers? I assume FW_ROUTE and FW_MASQUERADE. What else?

  6. By:mirza_shafeeq

    Hi i am curious about the loadbalancer. can we do this with https as well. if we can the it would have been very helpful to me.

  7. By:markgharvey

    Tenebris,

    Could your issue have something to do with the last octet of your subnet mask? Your post shows it as 255 — perhaps changing it to 0 will help.

    Mark G. Harvey
    Pres., Denver Area Novell Users Group

  8. By:ciriarte

    What about the connections state synchronization?, without that the failover is disruptive…

  9. By:frissko

    This has finally brought all the snippets of info together into one place. I’m using it on a 2 node cluster, using ldirectord and heartbeat on the same nodes as the web service. All working great.

Comment

RSS