Terminating a high volume of SSL connections cost effectively

5.7k views Asked by At

I have recently set up a Node.js based web socket server that has been tested to handle around 2,000 new connection requests per second on a small EC2 instance (m1.small). Considering the cost of a m1.small instance, and the ability to put multiple instances behind a WebSocket capable proxy server such as HAProxy, we are very happy with the results.

However, we realised we had not done any testing using SSL yet, so looked into a number of SSL options. It became apparent that terminating SSL connections at the proxy server is ideal because then the proxy server can inspect the traffic and insert headers such as X-Forward-For so that the server knows which IP the request came from.

So I looked into a number of solutions such as Pound, stunnel and stud, all of which allowed incoming connections on 443 to be terminated, and then passed onto HAProxy on port 80, which in turn passes the connection onto the web servers. Unfortunately however, I found that sending traffic to the SSL termination proxy server on a c1.medium (High CPU) instance very quickly consumed all CPU resources, and only at a rate of 50 or so requests per second. I tried using all three of the solution listed above, and all of them performed roughly the same as I assume under the hood they all rely on OpenSSL anyway. I tried using a 64 bit very large High CPU instance (c1.xlarge) and found that performance only scale linearly with cost. So based on EC2 pricing, I'd need to pay roughly $600p/m for 200 SSL requests per second, as opposed to $60p/m for 2,000 non SSL requests per second. The former price becomes economically unviable very quickly when we start planning to accept 1,000s or 10,000s of requests per second.

I also tried terminating the SSL using Node.js' https server, and the performance was very similar to Pound, stunnel and stud, so no clear advantage to that approach.

So what I am hoping someone can help with is advising how I can get around this ridiculous cost we have to absorb to provide SSL connections. I have heard that SSL hardware accelerators provide much better performance as the hardware is designed for SSL encryption and decryption, but as we are currently using Amazon EC2 for all of our servers, using SSL hardware accelerators is not an option unless we have a separate data centre with physical servers. I am just struggling to see how the likes of Amazon, Google, Facebook can provide all their traffic over SSL when the cost of this is so high. There must be a better solution out there.

Any advice or ideas would be greatly appreciated.

Thanks Matt

4

There are 4 answers

0
Robert On

I just realized Amazon's Elastic Load Balancer is super slow for SSL Termination... I did a simple test on www.blitz.io (no relation, just a customer) with 1 to 250 concurrent connections over 1 minute. It failed horribly... But if I do TCP 443 on front end of ELB and TCP 443 on backend with no certificate, it wipes out a small instance's CPU when running IIS and an SSL cert on that instance. I need just handshakes, it's a simple web service serving clients from all over the place. New connection setup and teardown every time.

How can I design a high traffic SSL web service, preferably with SSL all the way to the backend for strict security compliance?

0
Gnarfoz On

I do not know much about the CPU power available on different EC2 instances, but I assume your problem lies not with your choice of TLS-terminating proxy software, but with their configuration. Without any configuration, I'm assuming all of them would offer all cipher suites they support, including (very) slow ones. And they'll probably let the client pick the one it likes best, too.

Not all TLS cipher suites are born equal, some have higher CPU costs than others, be it from the key exchange or the cipher itself. Depending on the software used, there should be a way to specify a string of ciphers the server accepts (and also a way to make the server insist on that). For OpenSSL these work this way: http://www.openssl.org/docs/apps/ciphers.html#CIPHER_STRINGS

If you're going for speed, at least make sure you're not using ciphers that employ Diffie-Hellman (the non-elliptic-curve kind) key-exchanges. To disable cipher suites using DH key exchange, make sure the string includes !DH at some point. You can test what string results in which ciphers being available with, for example, openssl ciphers -v 'HIGH:!aNULL:!DH:!ECDH'.

This string disables both normal Diffie-Hellman as well as Elliptic Curve Diffie-Hellman key exchanges. This probably only leaves RSA key exchange, depending on your OpenSSL version.

Regarding ciphers, you should probably test on your intended EC2 hardware. Without hardware acceleration, you should probably prefer RC4 over AES128 over AES256 over anything else, at least according to this benchmark.

I also suggest reading this wonderful post, especially the enlightening first diagram showing the impact of DH on TLS handshake performance.

Lastly, make sure you're using TLS session caching. That saves some CPU, too.

1
Aishugopi On

The performance of Node.js' https server is very similar to Pound, stunnel and stud,and there is no clear advantage to that approach.

0
kzahel On

I'm also wondering how to do this effectively. AWS ssl termination is dreadfully slow, but perhaps there is some way to improve its performance. Stud seemed promising but like you mentioned, also has a large cpu cost.