I would like to tune Tomcat to fail-fast in case of any issues when all threads are occupied ( for example, waiting for a database connection if suddenly database starts to perform badly ).
I checked few articles, specifically: https://netflixtechblog.com/tuning-tomcat-for-a-high-throughput-fail-fast-system-e4d7b2fc163f https://tomcat.apache.org/tomcat-8.5-doc/config/http.html
I have few questions and would be grateful if you help me:
- Why Netflix maintains the concurrent requests in memory and then in-their-own-code reply 503 if it reaches the threshold? What is the rationale behind it doing themselves? As far as I understand from tomcat documentation ( see link above ), if maxThreads and acceptCount is reached, tomcat will reply with Connection Refused by himself. So specifying correctly small acceptCount should the trick.
From Tomcat docs:
Each incoming request requires a thread for the duration of that request. If more simultaneous requests are received than can be handled by the currently available request processing threads, additional threads will be created up to the configured maximum (the value of the maxThreads attribute). If still more simultaneous requests are received, they are stacked up inside the server socket created by the Connector, up to the configured maximum (the value of the acceptCount attribute). Any further simultaneous requests will receive "connection refused" errors, until resources are available to process them.
From article:
Track the number of active concurrent requests in memory and use it for fast failing. If the number of concurrent requests is near the estimated active threads (8 in our example) then return an http status code of 503. This will prevent too many worker threads becoming busy because once the peak throughput is hit, any extra threads which become active will be doing a very light weight job of returning 503 and then be available for further processing.
I assume that even in Tomcat NIO the worker threads are not re-used "simultaneously" like it can be in NodeJS, for example. For one http request there is an allocated 1 worker thread which won't handle any other http requests unless finishes with this one. Even if this http request is waiting for a blocking i/o like database read.
Given all things mentioned above, I believe that this could be the proper configuration:
# not using the executor, but a 'default' internal thread pool of the connector
maxConnections = -1 # For NIO/NIO2 only, setting the value to -1, will disable the maxConnections feature and connections will not be counted.
# or we can leave default, i.e. 10k. The thing is that only 1 thread ( hence little resources used ) will keep these connections alive, so we can use keepAlive
# and avoid establishing connections again and again which can be costly and resource consuming.
server.tomcat.max-threads = 96 # we assume that 8 cores can handle up to this amount of threads, given that each thread spends significant time in database and/or other network calls
# in the article above it was stated that they increased maxThreads 3 times than the amount of cores
server.tomcat.accept-count = ?? # from one point of view, it can be something really-really small like 16 or 32
# from another point of view, maybe something like 512
# imagine that due to a temporary network glitch an upstream service won't be reachable and we need few seconds to reach the timeouts and open circuit breaker. would be nice to queue other requests and successfully ( the ones what are independent ) process them seconds after rather than reject them right away
P.S I am using Tomcat 8.5 and NIO connector.
Any help / advice would be highly appreciated.