I have faced a critical issue.
My application architecture is decribed as following:
nginx -> web app (express/nodejs) -> api (jetty/java) -> mysql
The API application is well optimized so its performance does not need to be mentioned here. (~200ms/req with 100 reqs/s)
My web application:
When doing profile log, I noticed that HTML rendering time by Swig template engine blocks I/O too long, so it increases waiting time of other pending requests dramatically.
For rendering 1MB text/html response, Swig template takes ~250ms.
Here is output of my stress-test:
$ node stress.js 20
Receive response [0] - 200 - 431.682654ms
Receive response [1] - 200 - 419.248099ms
Receive response [2] - 200 - 670.558033ms
Receive response [4] - 200 - 920.763105ms
Receive response [3] - 200 - 986.20115ms
Receive response [7] - 200 - 1521.330763ms
Receive response [5] - 200 - 1622.569327ms
Receive response [9] - 200 - 1424.500137ms
Receive response [13] - 200 - 1643.676996ms
Receive response [14] - 200 - 1595.958319ms
Receive response [10] - 200 - 1798.043086ms
Receive response [15] - 200 - 1551.028243ms
Receive response [8] - 200 - 1944.247382ms
Receive response [6] - 200 - 2044.866157ms
Receive response [11] - 200 - 2162.960215ms
Receive response [17] - 200 - 1941.155794ms
Receive response [16] - 200 - 1992.213563ms
Receive response [12] - 200 - 2315.330372ms
Receive response [18] - 200 - 2571.841722ms
Receive response [19] - 200 - 2523.899486ms
AVG: 1604.10ms
As you can see, the later request, the longer waiting time.
When I return response code instead of render HTML, by modify some code:
function render(req, res, next, model) {
return res.status(200).end(); // add this line
res.render('list', model);
}
The stress-test output changes to:
$ node stress.js 20
Receive response [0] - 200 - 147.738725ms
Receive response [1] - 200 - 204.656645ms
Receive response [2] - 200 - 176.583635ms
Receive response [3] - 200 - 218.785931ms
Receive response [4] - 200 - 194.479036ms
Receive response [6] - 200 - 191.531871ms
Receive response [5] - 200 - 265.371646ms
Receive response [7] - 200 - 294.373466ms
Receive response [8] - 200 - 262.097708ms
Receive response [10] - 200 - 282.183757ms
Receive response [11] - 200 - 249.842496ms
Receive response [9] - 200 - 371.228602ms
Receive response [14] - 200 - 236.945983ms
Receive response [13] - 200 - 304.847457ms
Receive response [12] - 200 - 377.766879ms
Receive response [15] - 200 - 332.011981ms
Receive response [16] - 200 - 306.347012ms
Receive response [17] - 200 - 284.942474ms
Receive response [19] - 200 - 249.047099ms
Receive response [18] - 200 - 315.11977ms
AVG: 263.30ms
There are some solutions I tried to implement before, but none of them can reduce the response time:
Use node-cluster (2 workers in my server)
if (conf.cluster) {
// cluster setup
var cluster = require('cluster');
var numCPUs = require('os').cpus().length;
if (cluster.isMaster) {
for (var i = 0; i < numCPUs; i++) {
cluster.fork();
}
cluster.on('exit', function(worker, code, signal) {
console.log('Worker ' + worker.process.pid + ' died');
// create new worker
cluster.fork();
});
} else {
rek('server').listen(conf.port, function() {
console.log('Application started at port ' + conf.port + ' [PID: ' + process.pid + ']');
});
}
} else {
rek('server').listen(conf.port, function() {
console.log('Application started at port ' + conf.port + ' [PID: ' + process.pid + ']');
});
}
Use JXCore with 16 threads (max threads no)
jx mt-keep:16 app.js
Use NGINX load balancing
Start 4 node processes
$ PORT=3000 forever start app.js
$ PORT=3001 forever start app.js
$ PORT=3002 forever start app.js
$ PORT=3003 forever start app.js
nginx.conf
upstream webapp {
server 127.0.0.1:3000;
server 127.0.0.1:3001;
server 127.0.0.1:3002;
server 127.0.0.1:3003;
}
server {
listen 80;
location / {
proxy_pass http://webapp;
}
[...]
}
I thought all above solutions would provide multiple processes/threads that do not block each other when executing heavy-task like HTML rendering, but the results are not the same as my expectation: the waiting time is not reduced. Although the logs show requests are actually served by multiple processes/threads.
Do I miss any points here?
Or could you please show me another solution to reduce the waiting time?
Creating a cluster isn't going to reduce the response times, but it will allow you to run your responses in parallel without blocking IO. Of course in order to use cluster properly you'll need to set up your own logic for the master to efficiently control the workers. Adding a cluster without proper logic is never going to give you any real benefits. In order to make this work correctly, your master needs to handle all of the incoming requests and distribute them to the workers to process. Then the workers send the results back to the master which handles the rest.