SUSE Blog

Speeding up SSL – All You Need to Know About HAProxy

chabowski

By: chabowski

May 8, 2017 5:10 am

Reads:2,829

Comments:1

Score:Unrated

Print/PDF

The following article has been contributed by Marcus “Darix” Rückert, Senior Software Engineer in the Operations & Services Team at SUSE. It first appeared on his personal homepage.

 

For quite a few years now I have been a HAProxy user, even using snapshots in production for a very long time. Especially after support was added to terminate SSL connections directly in HAProxy. Getting rid of stunnel was so nice…

For a very long time I was doing really well with this setup. But over time more and more services were put behind HAProxy and the connections and the total amount of connections went up. We started to see some performance issues. Which at first sounds weird … if you look at the benchmarks on the HAProxy website they can do thousands if not hundreds of thousands of connections per seconds.

Photo by Dmitri Popov

 

So what happened?

HAProxy is a single process event driven program at its core. This means while we handle one connection and doing the computation for that request, no other connection will actually be handled. And this means every bit of code needs to be as fast as possible so you can quickly switch to the next connection. In general this model works really well and is widely used.

If SSL comes into the picture things get tricky. SSL handshakes are quite expensive computations, so at the end your requests are limited by the number of handshakes you can do per second.

Need for speed

Now you have two options to get more speed: faster computations (meaning faster CPUs or crypto accelerators) or spread the work on more CPUs. The first one is not always an option. Per-core-performance is still growing, but not too much. Which leaves us with option 2.

HAProxy has the nbproc directive but the documentation discourages its use. So step 1 is to ask upstream about it. The answer in short: your problem is the only real use case for it. I tried to argue that the warning could be relaxed perhaps, but without the warning people would probably use the directive even in situations where SSL was not the problem. I learned that during my time with lighttpd, but at least I had some steps to proceed.

Our base configuration looks as follows:

global
  log /dev/log daemon
  maxconn 32768
  chroot /var/lib/haproxy
  user haproxy
  group haproxy
  stats socket /var/lib/haproxy/stats user haproxy group haproxy mode 0640"\" 
  level operator

  tune.bufsize 32768
  tune.ssl.default-dh-param 2048

  ssl-default-bind-ciphers ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-"\"
  SHA384:ECDHE-ECDSA-CHACHA20-POLY1305:ECDHE-RSA-CHACHA20-POLY1305:ECDHE-"\"
  ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES256-"\"
  SHA384:ECDHE-RSA-AES256-SHA384:ECDHE-ECDSA-AES128-SHA256:ECDHE-RSA-AES128-"\"
  SHA256
  ssl-default-bind-options no-sslv3 no-tlsv10 no-tlsv11
  ssl-default-server-ciphers ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-"\"
  GCM-SHA384:ECDHE-ECDSA-CHACHA20-POLY1305:ECDHE-RSA-CHACHA20-POLY1305:ECDHE-"\"
  ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES256-"\"
  SHA384:ECDHE-RSA-AES256-SHA384:ECDHE-ECDSA-AES128-SHA256:ECDHE-RSA-AES128-"\"
  SHA256
  ssl-default-server-options no-sslv3 no-tlsv10 no-tlsv11

defaults
  log     global
  mode    http
  option  log-health-checks
  option  log-separate-errors
  option  dontlognull
  option  httplog
  option  splice-auto
  option  socket-stats
  retries 3
  option  redispatch
  maxconn 10000
  timeout connect     5s
  timeout client     60s
  timeout server    450s

frontend http
  bind 0.0.0.0:80    tfo
  bind :::80 v6only  tfo

  bind 0.0.0.0:443   tfo ssl crt /etc/ssl/services/
  bind :::443 v6only tfo ssl crt /etc/ssl/services/

  acl is_ssl ssl_fc

  default_backend nginx

backend nginx
  option forwardfor
  server nginx 127.0.0.1:81 check inter 2s

(Editor’s note: We had to insert line breaks to make the entire code available, and we marked them here with “\”. To get the correct code, please remove the “\”.)

A few comments about the config:

  • We define the ciphers globally, so we don’t have to specify it on each bind statement later.
  • IPv6 sockets get bound with v6 only so we don’t handle IPv4 connections on the IPv6 sockets.

So let’s find out what our base line speed is. Both benchmarks ran in parallel.

$ ab2 -c 20 -n 10000 https://myhost.mydomain.tld/
[snip]
Requests per second:    247.93 [#/sec] (mean)
Time per request:       80.668 [ms] (mean)
Time per request:       4.033 [ms] (mean, across all concurrent requests)
Transfer rate:          55.45 [Kbytes/sec] received
[snip]

$ ab2 -c 20 -n 10000 http://myhost.mydomain.tld/
[snip]
Requests per second:    630.70 [#/sec] (mean)
Time per request:       31.711 [ms] (mean)
Time per request:       1.586 [ms] (mean, across all concurrent requests)
Transfer rate:          141.04 [Kbytes/sec] received
[snip]

The properties of the SSL connection are :

TLSv1.2,ECDHE-RSA-AES256-GCM-SHA384,4096,256

The Solution

The changes to the global options and defaults are minor. First we configure that we want 7 processes launched in total. One for our request routing and plain HTTP access. Six processes for handling SSL. With the setting in the default block we ensure that any block without a bind-process statement will be owned by the first process.

global
[snip]
  nbproc 7

defaults
[snip]
  bind-process 1

Next is our new SSL-only frontend. All we do here is doing the SSL termination and then forward it back to our routing backend. The SSL backend binds the processes 2-7.

listen ssl
  bind-process 2-7

The intuitive idea for the bind statements would be:

  bind 0.0.0.0:443   tfo ssl crt /etc/ssl/services/ process 2-7
  bind :::443 v6only tfo ssl crt /etc/ssl/services/ process 2-7

Each bind statement gets assigned six processes. You could leave out the process statement at the end of the bind lines, and then all processes assigned to this block will be used.

This creates 2 sockets shared among the six processes. When the kernel sends a connection to those sockets, each process tries to grab the connection, but in the end only one process gets it. This might lead to unbalanced work.

Another option is shown below where we get multiple sockets with SO_REUSEPORT. In this case the kernel will distribute the load more fairly over all the processes involved. Our configuration block would look like this:

  bind 0.0.0.0:443   tfo ssl crt /etc/ssl/services/ process 2
  bind 0.0.0.0:443   tfo ssl crt /etc/ssl/services/ process 3
  bind 0.0.0.0:443   tfo ssl crt /etc/ssl/services/ process 4
  bind 0.0.0.0:443   tfo ssl crt /etc/ssl/services/ process 5
  bind 0.0.0.0:443   tfo ssl crt /etc/ssl/services/ process 6
  bind 0.0.0.0:443   tfo ssl crt /etc/ssl/services/ process 7

  bind :::443 v6only tfo ssl crt /etc/ssl/services/ process 2
  bind :::443 v6only tfo ssl crt /etc/ssl/services/ process 3
  bind :::443 v6only tfo ssl crt /etc/ssl/services/ process 4
  bind :::443 v6only tfo ssl crt /etc/ssl/services/ process 5
  bind :::443 v6only tfo ssl crt /etc/ssl/services/ process 6
  bind :::443 v6only tfo ssl crt /etc/ssl/services/ process 7

The SSL session cache is shared between all processes, and the TLS tickets are encrypted using a private key that is generated before all the child processes are forked.

A small optimization for the internal traffic is the tcp-smart-connect option. HAProxy directly sends the data (ie: the proxy protocol header and request data) in the first packet. This ensures that the HTTP back-end has the request available immediately and saves it from having to poll for the data.

  option tcp-smart-connect

As the last step in the listen block, we configure forwarding to next back-end. We use send-proxy-v2 here so the HTTP back-end knows the real remote IP.
In theory we could use Unix domain sockets here. But there is no splice support for Unix domain sockets yet ☹️.

  server http 127.0.0.1:84 send-proxy-v2

Last but not least we define the additional listening socket for our internal
routing. We cannot use the normal port 80 sockets as we want to allow the
proxy protocol on the socket

frontend http
[snip]
  #
  # the socket for routing the requests
  #
  bind 127.0.0.1:84  tfo accept-proxy

  acl is_ssl fc_rcvd_proxy
[snip]

One important last point is the is_ssl ACL. Normally you would use ssl_fc (SSL front-end connection) to see if your connection was received via a SSL socket. As we moved the SSL out of this scope we can not use it anymore. But the connections from the SSL front-end also forward the connection data via the proxy protocol. As we only have one socket using this feature, we can safely assume that those connections came from our SSL front-end and thus are secure.

The complete configuration with inline comments for all the changes looks as follows:

global
  log /dev/log daemon
  maxconn 32768
  chroot /var/lib/haproxy
  user haproxy
  group haproxy
  stats socket /var/lib/haproxy/stats user haproxy group haproxy mode 0640"\" 
  level operator

  tune.bufsize 32768
  tune.ssl.default-dh-param 2048

  ssl-default-bind-ciphers ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-"\"
  SHA384:ECDHE-ECDSA-CHACHA20-POLY1305:ECDHE-RSA-CHACHA20-POLY1305:ECDHE-"\"
  ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES256-"\"
  SHA384:ECDHE-RSA-AES256-SHA384:ECDHE-ECDSA-AES128-SHA256:ECDHE-RSA-AES128-"\"
  SHA256
  ssl-default-bind-options no-sslv3 no-tlsv10 no-tlsv11
  ssl-default-server-ciphers ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-"\"
  GCM-SHA384:ECDHE-ECDSA-CHACHA20-POLY1305:ECDHE-RSA-CHACHA20-POLY1305:ECDHE-"\"
  ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES256-"\"
  SHA384:ECDHE-  RSA-AES256-SHA384:ECDHE-ECDSA-AES128-SHA256:ECDHE-RSA-AES128-"\"
  SHA256
  ssl-default-server-options no-sslv3 no-tlsv10 no-tlsv11

  # launch 7 process
  # process 1 for plain routing
  # process 2-7 for SSL
  nbproc 7

defaults
  log     global
  mode    http
  option  log-health-checks
  option  log-separate-errors
  option  dontlognull
  option  httplog
  option  splice-auto
  option  socket-stats
  retries 3
  option  redispatch
  maxconn 10000
  timeout connect     5s
  timeout client     60s
  timeout server    450s
  # by default use process 1
  bind-process 1

# our new SSL only frontend. All we do here is doing the SSL termination and
# then forward it back to our routing backend
listen ssl
  # bind processes 2-7
  bind-process 2-7
  #
  # Intuitive idea:
  # bind 0.0.0.0:443   tfo ssl no-sslv3 crt /etc/ssl/services/ process 2-7
  # bind :::443 v6only tfo ssl no-sslv3 crt /etc/ssl/services/ process 2-7
  #
  # Each bind statement gets assigned 6 processes. You could leave out the
  # process statement at the end of the bind lines and then all processes
  # assigned to this block will be used.
  #
  # This creates 2 sockets shared among the 6 processes. When the kernel sends
  # a connection to those sockets each process tries to grab the connection
  # but in the end only 1 process gets it. This might lead to unbalanced work.
  #
  # Another option is shown below where we get multiple sockets with 
  # SO_REUSEPORT. In this case the kernel will distribute the load more fairly
  # over all the processes involved.
  #
  # The ssl session cache is shared between all processes and the TLS tickets
  # are encrypted using a private key that is generated before all the 
  # child processes are forked.  
  #
  bind 0.0.0.0:443   tfo ssl no-sslv3 crt /etc/ssl/services/ process 2
  bind 0.0.0.0:443   tfo ssl no-sslv3 crt /etc/ssl/services/ process 3
  bind 0.0.0.0:443   tfo ssl no-sslv3 crt /etc/ssl/services/ process 4
  bind 0.0.0.0:443   tfo ssl no-sslv3 crt /etc/ssl/services/ process 5
  bind 0.0.0.0:443   tfo ssl no-sslv3 crt /etc/ssl/services/ process 6
  bind 0.0.0.0:443   tfo ssl no-sslv3 crt /etc/ssl/services/ process 7

  bind :::443 v6only tfo ssl no-sslv3 crt /etc/ssl/services/ process 2
  bind :::443 v6only tfo ssl no-sslv3 crt /etc/ssl/services/ process 3
  bind :::443 v6only tfo ssl no-sslv3 crt /etc/ssl/services/ process 4
  bind :::443 v6only tfo ssl no-sslv3 crt /etc/ssl/services/ process 5
  bind :::443 v6only tfo ssl no-sslv3 crt /etc/ssl/services/ process 6
  bind :::443 v6only tfo ssl no-sslv3 crt /etc/ssl/services/ process 7

  #
  # Optimization:
  #
  # Directly send the data (ie: the proxy protocol header and request data) in
  # the first packet. This ensures that the http backend has the request
  # available immediately and saves it from having to poll for the data.
  #
  option tcp-smart-connect

  # forward to next backend.
  # we use send-proxy-v2 here so the http backend knows the real remote IP
  #
  # In theory we could use unix domain sockets here. But there is no splice 
  # support for unix domain sockets yet *sad face* 
  server http 127.0.0.1:84 send-proxy-v2 


frontend http 
  bind 0.0.0.0:80 tfo 
  bind :::80 v6only tfo 

  # bind 0.0.0.0:443 tfo ssl crt /etc/ssl/services/ 
  # bind :::443 v6only tfo ssl crt /etc/ssl/services/ 

  # 
  # the socket for routing the requests 
  # 
  bind 127.0.0.1:84 tfo accept-proxy 

  acl is_ssl fc_rcvd_proxy 

  default_backend nginx 

backend nginx 
  option forwardfor 
  server nginx 127.0.0.1:81 check inter 2s

(Editor’s note: We had to insert line breaks to make the entire code available, and we marked them here with “\”. To get the correct code, please remove the “\”.)

And we can validate the different listens:

$ ss -tplen | grep haproxy
LISTEN     0      128   127.0.0.1:84   *:*    users:(("haproxy",pid=8194,fd=21)) 
ino:281097 sk:f <->
LISTEN     0      128          *:443   *:*    users:(("haproxy",pid=8200,fd=11)) 
ino:281087 sk:10 <->
LISTEN     0      128          *:443   *:*    users:(("haproxy",pid=8199,fd=10)) 
ino:281086 sk:11 <->
LISTEN     0      128          *:443   *:*    users:(("haproxy",pid=8198,fd=9)) 
ino:281085 sk:12 <->
LISTEN     0      128          *:443   *:*    users:(("haproxy",pid=8197,fd=8)) 
ino:281084 sk:13 <->
LISTEN     0      128          *:443   *:*    users:(("haproxy",pid=8196,fd=7)) 
ino:281083 sk:14 <->
LISTEN     0      128          *:443   *:*    users:(("haproxy",pid=8195,fd=6)) 
ino:281082 sk:15 <->
LISTEN     0      128          *:80    *:*    users:(("haproxy",pid=8194,fd=19)) 
ino:281095 sk:16 <->
LISTEN     0      128         :::443  :::*    users:(("haproxy",pid=8200,fd=17)) 
ino:281093 sk:17 v6only:1 <->
LISTEN     0      128         :::443  :::*    users:(("haproxy",pid=8199,fd=16)) 
ino:281092 sk:18 v6only:1 <->
LISTEN     0      128         :::443  :::*    users:(("haproxy",pid=8198,fd=15)) 
ino:281091 sk:19 v6only:1 <->
LISTEN     0      128         :::443  :::*    users:(("haproxy",pid=8197,fd=14)) 
ino:281090 sk:1a v6only:1 <->
LISTEN     0      128         :::443  :::*    users:(("haproxy",pid=8196,fd=13)) 
ino:281089 sk:1b v6only:1 <->
LISTEN     0      128         :::443  :::*    users:(("haproxy",pid=8195,fd=12)) 
ino:281088 sk:1c v6only:1 <->
LISTEN     0      128         :::80   :::*    users:(("haproxy",pid=8194,fd=20)) 
ino:281096 sk:1d v6only:1 <->

In the example above we see:

  • pid 8194 handles *:80, [::]:80 and 127.0.0.1:84.
  • pids 8195-8200 handle *:443 and [::]:443

 

And this really helped?

$ ab2 -c 20 -n 10000 https://myhost.mydomain.tld/
[snip]
Requests per second:    678.84 [#/sec] (mean)
Time per request:       29.462 [ms] (mean)
Time per request:       1.473 [ms] (mean, across all concurrent requests)
Transfer rate:          151.81 [Kbytes/sec] received
[snip]

One could say … YES 👍. With the burden of SSL removed from the main process, we get a sharp increase of performance here as well:

$ ab2 -c 20 -n 10000 http://myhost.mydomain.tld/
[snip]
Requests per second:    19338.32 [#/sec] (mean)
Time per request:       1.034 [ms] (mean)
Time per request:       0.052 [ms] (mean, across all concurrent requests)
Transfer rate:          4324.68 [Kbytes/sec] received
[snip]

Why don’t you test it yourself?  😁

0 votes, average: 0.00 out of 50 votes, average: 0.00 out of 50 votes, average: 0.00 out of 50 votes, average: 0.00 out of 50 votes, average: 0.00 out of 5 (0 votes, average: 0.00 out of 5)
You need to be a registered member to rate this post.
Loading...

Tags: , , ,
Categories: Technical Solutions

Disclaimer: As with everything else in the SUSE Blog, this content is definitely not supported by SUSE (so don't even think of calling Support if you try something and it blows up).  It was contributed by a community member and is published "as is." It seems to have worked for at least one person, and might work for you. But please be sure to test, test, test before you do anything drastic with it.

1 Comment

  1. By:rich_c

    Great read, out of interest are you matching the number of processes to the number of cores?

Comment

RSS