System information (AWS EC2 Instance (m4.large) behind the ElasticBeanstalk):
Region: us-west-1
Memory: 8GB
CPU: 2 core / 2.4GHz
PHP Version: 7.0.22 (ZTS) with FPM
Nginx Version: 1.10.2
There is an API used by web/mobile/other. Each endpoint is making database requests and using cache (APCu or Redis)
Apache
Apache serves ~40 requests per second. Latency was ~500-1200ms (depends on the API endpoint).
Nginx
Then we decided to move to Nginx. But faced the strange behavior - throughput decreased to ~ 20 requests per second. And the latency is constantly increasing (e.g.: test starts with 300ms and ends with >31000ms)
/etc/nginx/nginx.conf:
user webapp;
pid /var/run/nginx.pid;
worker_processes auto;
worker_rlimit_nofile 10000;
error_log /var/log/nginx/error.log;
include /usr/share/nginx/modules/*.conf;
events {
worker_connections 4096;
use epoll;
multi_accept on;
}
http {
log_format main '$remote_addr - $remote_user [$time_local] "$request" '
'$status $body_bytes_sent "$http_referer" '
'"$http_user_agent" "$http_x_forwarded_for"';
access_log /var/log/nginx/access.log main;
sendfile on;
tcp_nopush on;
tcp_nodelay on;
keepalive_timeout 65;
types_hash_max_size 2048;
fastcgi_buffers 8 16k;
fastcgi_buffer_size 32k;
fastcgi_connect_timeout 60;
fastcgi_send_timeout 300;
fastcgi_read_timeout 300;
charset utf-8;
client_max_body_size 50m;
gzip on;
gzip_vary on;
gzip_min_length 10240;
gzip_proxied expired no-cache no-store private auth;
gzip_types text/plain text/css text/xml text/javascript application/x-javascript application/xml application/json;
gzip_disable "MSIE [1-6]\.";
include /etc/nginx/mime.types;
default_type application/octet-stream;
upstream php {
server 127.0.0.1:9000;
}
include /etc/nginx/conf.d/*.conf;
index index.html index.htm;
}
/fpm/pools/www.conf:
[www]
user = webapp
group = webapp
listen = 127.0.0.1:9000
pm = dynamic
pm.max_children = 75
pm.start_servers = 30
pm.min_spare_servers = 30
pm.max_spare_servers = 35
pm.max_requests = 500
... the rest is default
Performance is measured by Apache Jmeter, using custom scenarios.
Tests are run from the same region (another EC2 instance).
cURL stats:
lookup: 0.125
connect: 0.125
appconnect: 0.221
pretransfer: 0.221
redirect: 0.137
starttransfer: 0.252
total: 0.389
tcptraceroute is also perfect (1ms)
Please advise! I cannot find the cause of the problem by myself.. Thanks!