We have a web application which uses REST services developed in Java 8 JAX-RS and WAR deployed on Apache tomcat 9. Recently we faced an issue in production where Tomcat would respond really slow under high traffic and would occassionally drop connections as well. So we decided to run few load tests to determine the throughput of our system.
We are using following specs on production -
- ubuntu 18.04 with 64 bit JVM Open JDK
- 16 cores CPU and 64 GB RAM
- Tomcat 9
Our Tomcat server XML configuration is -
<Connector port="8443" protocol="org.apache.coyote.http11.Http11NioProtocol"
maxThreads="300" minSpareThreads="50" acceptCount="250" acceptorThreadCount="2" enableLookups="false" SSLEnabled="true" scheme="https" secure="true"
keystoreFile="/certficate/file/path.jks"
keystorePass="password"
clientAuth="false" sslProtocol="TLS" />
We have a REST API which is a simple 'ping' method which returns a simple JSON response with 200. e.g.
@GET
@Path("/ping")
@Produces(MediaType.APPLICATION_JSON)
public EdoServiceResponse ping() {
return new EdoServiceResponse();
}
{ "status" 200, "responseText" : "OK" }
We used this to perform load testing using Apache JMeter 5.4.1. Our findings:
No of threads | Avg response time (ms) | Total requests in 1 min |
---|---|---|
50 | 12 | 243,002 |
100 | 22 | 277,016 |
250 | 40 | 384,729 |
500 | 76 | 400,048 |
1,000 | 124 | 469,712 |
2,000 | 229 | 480,784 |
5,000 | 507 | 336,921 |
10,000 | 1,843 | 74,677 |
So as we can see, average response time starts increasing even with smaller load like 100 or 250 threads. This is a simple REST API without any DB connection or logic. We observed that our CPU usage never goes above 40% and memory usage stays below 10% anytime. Tomcat process never occupies more than 4 GB. We even checked the max open file limit but its set to 65000 which is well above our requirement. So we are not able to figure out where the bottleneck is which causes the response time to reduce proportionately with no of threads.
We tried changing server xml paramteres like maxThreads, acceptCount, maxConnections etc but there was no significant change. We also tried to set max heap settings which did not help. Only change where we observed slight performance improvement of around 15-25% is when we added garbage collection parameters in catalina.sh as - -XX:+UseG1GC -XX:MaxGCPauseMillis=200 -XX:ParallelGCThreads=20 -XX:ConcGCThreads=5 -XX:InitiatingHeapOccupancyPercent=70
We need to figure out what our system capability of handling load is as currently with CPU and memory underutilised, we don't know why its taking more time to handle requests even with low concurrency of 100-200 threads. We don't know what steps to take next to improve the throughput.
Any help would be appreciated. Thanks.