How to increase the request per second on amazon EC2 T2.micro instance?

13k views Asked by At

I recently lunched a Amazon EC2 instance, the T2.micro. After installed Wildfly 8.2.0Final, I try to do a load test of the web server. I tested the server to serve a static page of less than 500 byte size, and a dynamic page that write and read mysql. To my suprise, I got the similar result, both test get the result of around 1000 RPS. I monitored the system using top -d 1, the CPU hasn't reach the max, and there are free memory. I think either EC2 has some limitation on concurrent connections, or my setup needs improvement.

My setup is CentOS 7, WileFly/Jboss 8.2.0 Final, MariaDb 5.5. The test tool is jmeter in distributed mode or command line mode. Tests were performed on remote, on the same subnet, and on the localhost. All get the same result.

Can you please help identify where the bottleneck is. Are there any limitations on Amazon EC2 instance that could affect this? Thanks.

2

There are 2 answers

3
dgil On BEST ANSWER

Yes, there are some limitations depending of the EC2 instance type and one of them is network performance.

Amazon doesn't publish the exact limitations of each type of instance, but in the Instance Types Matrix you can see that t2.micro has a low to moderate network performance. If you need better network performance, you can check on the AWS instance types page where it shows which instances have enhanced networking:

Enhanced Networking

Enhanced Networking enables you to get significantly higher packet per second (PPS) performance, lower network jitter and lower latencies. This feature uses a new network virtualization stack that provides higher I/O performance and lower CPU utilization compared to traditional implementations. In order to take advantage of Enhanced Networking, you should launch an HVM AMI in VPC, and install the appropriate driver. Enhanced Networking is currently supported in C4, C3, R3, I2, M4, and D2 instances. For instructions on how to enable Enhanced Networking on EC2 instances, see the Enhanced Networking on Linux and Enhanced Networking on Windows tutorials. To learn more about this feature, check out the Enhanced Networking FAQ section.

You have more information in these SO and SF questions:

0
BobMcGee On

You're right that 1000 RPS feels awfully low for Wildfly, given that the Undertow server powering it is one of the fastest in Java land and among the 10 fastest, period.

Starting points to optimize: Make sure that you do not have request logging on (that could cause an I/O bottleneck), use the latest stable JVM, and it's probably worth using the most recent Wildfly version that your app works with.

With that done, you're almost certainly being bottlenecked by connection creation, not your AWS instance. This could be within JMeter, or within the Wildfly subsystem.

To eliminate JMeter as a culprit, try ApacheBenchmark ("ab") at the same concurrency level, and then try it with the -k option on (to allow connection reuse).

  • If the first ApacheBenchmark number is much higher than JMeter, the issue is the thread-based networking model that JMeter uses (Another load-testing tool, such as gatling or locust.io may be needed).
  • If the second number is much higher than the first, the bottleneck is proven to be connection creation. The may be solved by tuning the Undertow server settings.

As far as WildFly goes, I'd have to see the config.xml, but you may be able to improve performance by tweaking the Undertow subsystem settings. The defaults are usually solid, but you want a very low number of I/O threads (either 1, or the number of CPUs, no more).

I have seen a trivial Wildfly 10 application far exceed the performance you're seeing on a t2.micro instance.


Benchmark results, with Wildfly 10 + docker + Java 8:

Server setup (EC2 t2.micro running latest amazon linux, in US-east-1, different AZs)

sudo yum install docker
sudo service docker start
sudo docker run --rm -it -p 8080:8080 svanoort/jboss-demo-app:0.7-lomem

Client (another t2.micro, minimal load, different AZ):

ab -c 16 -k -n 1000 http://$SERVER_PRIVATE_IP:8080/rest/cached/500

16 concurrent connections with keep-alive, serving 500 bytes of cached randomly pre-generated data

Results over multiple runs: 430 requests per second (RPS), 1171 RPS, 1527 RPS, 1686 RPS, 1977 RPS, 2471 RPS, 3339 RPS, eventually peaking at ~6500 RPS after hundreds of thousands of requests.

Notice how that goes up over time? It's important to prewarm the server before benchmarking, to allow for enough handler threads to be created, and to allow for JIT compilation. 10,000 requests is a good starting point.

If I turn off connection keepalive? Peaks at about ~1450 RPS with concurrency 16. BUT WAIT! With a single thread (concurrency 1), it only gives ~340-350 RPS. Increasing concurrency beyond 16 does not give higher performance, it remains fairly stable (even up to 512 concurrent connections).

If I increase the request data size to 2000 bytes, by using http://$SERVER_PRIVATE_IP:8080/rest/cached/2000 then it still hits 1367 RPS, showing that almost all of the time is spent on connection handling.

With very large (300k) requests and connection keep-alive, I hit about 50 MB/s between hosts, but I've seen up to 90 MB/s in optimal situations.

Very impressive performance for JBoss/Wildfly there, I'd say. Note that higher concurrency may be needed if there is more latency between hosts, to allow for the impact of round-trip time on connection creation.