Nginx + Php Fpm scaling issues on AWS EC2 instance - php

We are performing a load test using locust(1000 Users) on a webpage of our application.
Instance type: t3a.medium
The instance is running behind a load balancer. And we are using RDS Aurora Database which peaks at around 70% CPU utilization. EC2 instance metrics are healthy. EDIT: Instance memory consumption is within 800 MB out of available 4 GB
There are multiple 502 Server error: Bad Gateway and sometimes 500 and 520 errors as well.
Error 1:
2020/10/08 16:58:21 [error] 4344#4344: *41841 connect() to unix:/var/run/php/php7.2-fpm.sock failed (11: Resource temporarily unavailable) while connecting to upstream, client: <PublicIP>, server: <Domain name>, request: "GET <webpage> HTTP/1.1", upstream: "fastcgi://unix:/var/run/php/php7.2-fpm.sock:", host: "<Domain name>"
Error 2(Alert):
2020/10/08 19:15:11 [alert] 9109#9109: *105735 socket() failed (24: Too many open files) while connecting to upstream, client: <PublicIP>, server: <Domain name>, request: "GET <webpage> HTTP/1.1", upstream: "fastcgi://unix:/var/run/php/php7.2-fpm.sock:", host: "<Domain name>"
Listing down configuration files:
Nginx Configuration
server {
listen 80;
listen [::]:80;
root /var/www/####;
index index.php;
access_log /var/log/nginx/###access.log;
error_log /var/log/nginx/####error.log ;
server_name #####;
client_max_body_size 100M;
autoindex off;
location / {
try_files $uri $uri/ /index.php?$query_string;
}
location ~ \.php$ {
include fastcgi_params;
fastcgi_intercept_errors on;
fastcgi_index index.php;
fastcgi_pass unix:/var/run/php/php7.2-fpm.sock;
fastcgi_param SCRIPT_FILENAME $document_root/$fastcgi_script_name;
}
}
/etc/nginx/nginx.conf
user nginx;
worker_processes auto;
error_log /var/log/nginx/error.log warn;
pid /var/run/nginx.pid;
events {
worker_connections 8096;
multi_accept on;
use epoll;
epoll_events 512;
}
http {
include /etc/nginx/mime.types;
default_type application/octet-stream;
log_format main '$remote_addr - $remote_user [$time_local] "$request" '
'$status $body_bytes_sent "$http_referer" '
'"$http_user_agent" "$http_x_forwarded_for"';
access_log /var/log/nginx/access.log main;
sendfile on;
tcp_nopush on;
tcp_nodelay on;
keepalive_timeout 65;
gzip on;
gzip_comp_level 2;
gzip_min_length 1000;
gzip_types text/xml text/css;
gzip_http_version 1.1;
gzip_vary on;
gzip_disable "MSIE [4-6] \.";
include /etc/nginx/conf.d/*.conf;
}
/etc/php/7.2/fpm/php-fpm.conf
emergency_restart_threshold 10
emergency_restart_interval 1m
process_control_timeout 10s
Php-fpm Important Parameters:
user = www-data
group = www-data
listen = /run/php/php7.2-fpm.sock
listen.owner = www-data
listen.group = www-data
;listen.mode = 0660
pm = static
pm.max_children = 300
/etc/security/limits.conf
nginx soft nofile 30000
nginx hard nofile 50000
/etc/sysctl.conf
net.nf_conntrack_max = 131072
net.core.somaxconn = 131072
net.core.netdev_max_backlog = 65535
kernel.msgmnb = 131072
kernel.msgmax = 131072
fs.file-max = 131072
What are we missing? Can anyone please point to the right direction?

So we were able to resolve this issue. The problem was php-fpm did not have access to access system resources. You may need to change values according to hardware specifications.
So, our final configuration looks like this:
In /etc/security/limits.conf, add following lines:
nginx soft nofile 10000
nginx hard nofile 30000
root soft nofile 10000
root hard nofile 30000
www-data soft nofile 10000
www-data hard nofile 30000
In /etc/sysctl.conf, add following values
net.nf_conntrack_max = 231072
net.core.somaxconn = 231072
net.core.netdev_max_backlog = 65535
kernel.msgmnb = 231072
kernel.msgmax = 231072
fs.file-max = 70000
In /etc/nginx/nginx.conf, change or add so finally it should have these values(kindly change them according to your use case and server capacity):
worker_processes auto;
worker_rlimit_nofile 30000;
events {
worker_connections 8096;
multi_accept on;
use epoll;
epoll_events 512;
}
sendfile on;
tcp_nopush on;
tcp_nodelay on;
keepalive_timeout 65;
gzip on;
gzip_comp_level 2;
gzip_min_length 1000;
gzip_types text/xml text/css;
gzip_http_version 1.1;
gzip_vary on;
gzip_disable "MSIE [4-6] .";
In /etc/php/7.2/fpm/php-fpm.conf , change values to look like this:
emergency_restart_threshold = 10
emergency_restart_interval = 1m
process_control_timeout = 10s
rlimit_files = 10000
In /etc/php/7.2/fpm/pool.d/www.conf , change values to look like this:
user = www-data
group = www-data
listen.backlog = 4096
listen.owner = www-data
listen.group = www-data
;listen.mode = 0660
pm = static
pm.max_children = 1000

Related

nginx + php-fpm70 getting 502 bad gateway

I'm Getting 502 bad gateway error in one of my magento2 site.
Here are my configuration files.
nginx.conf configuration file
user nginx;
worker_processes 4;
worker_rlimit_nofile 100000;
pid /var/run/nginx.pid;
events {
use epoll;
# worker_connections 1024;
worker_connections 10240;
multi_accept on;
}
http {
include /etc/nginx/mime.types;
default_type application/octet-stream;
access_log off;
error_log /var/log/nginx/error.log warn;
rewrite_log on;
access_log /var/log/nginx/access.log main buffer=32k flush=300;
sendfile on;
tcp_nopush on;
tcp_nodelay on;
autoindex off;
server_tokens off;
port_in_redirect off;
open_file_cache max=10000 inactive=5m;
open_file_cache_valid 2m;
open_file_cache_min_uses 1;
open_file_cache_errors on;
types_hash_max_size 4096;
client_header_buffer_size 16k;
large_client_header_buffers 4 32k;
fastcgi_send_timeout 3600;
fastcgi_read_timeout 3600;
fastcgi_buffers 8 256k;
fastcgi_buffer_size 256k;
fastcgi_connect_timeout 3600;
############
client_max_body_size 1024M;
client_body_buffer_size 128k;
server_names_hash_max_size 1024;
client_body_timeout 300;
client_header_timeout 300;
keepalive_timeout 600;
keepalive_requests 100000;
send_timeout 60;
server_names_hash_bucket_size 128;
gzip on;
gzip_comp_level 6;
gzip_http_version 1.0;
gzip_proxied any;
gzip_min_length 1100;
gzip_buffers 16 8k;
gzip_types any;
gzip_types text/plain text/css application/octet-stream application/json application/x-javascript application/javascript text/xml application/xml application/xml+rss text/javascript text/x-javascript font/ttf application/font-woff font/opentype application/vnd.ms-fontobject image/svg+xml;
gzip_disable “msie6”;
gzip_vary on;
include /etc/nginx/conf.d/*.conf;
}
php-fpm configure file
[domain.com.com]
listen = /var/run/php/domain.com-fpm.sock
listen.allowed_clients = 127.0.0.1
listen.owner = nginx
listen.group = nginx
user = domain_live
group = domain_live
; Choose how the process manager will control the number of child processes.
pm = dynamic
pm.max_children = 150
pm.start_servers = 60
pm.min_spare_servers = 50
pm.max_spare_servers = 90
pm.max_requests = 500
pm.status_path = /status
slowlog = /var/www/www.domain.com.com/logs/php-fpm-www-slow.log
rlimit_core = unlimited
php_admin_value[error_log] = /var/www/www.domain.com.com/logs/php-fpm-www-error.log
php_admin_flag[log_errors] = on
; Set session path to a directory owned by process user
php_value[session.save_handler] = files
php_value[session.save_path] = /var/www/www.domain.com.com/application/var/session
php_value[soap.wsdl_cache_dir] = /var/lib/php/wsdlcache
Getting below error logs
php-fpm error logs
WARNING: [pool domain.com] child 4314 exited on signal 11 (SIGSEGV - core dumped) after 4556.304032 seconds from start
NOTICE: [pool domain.com] child 6422 started
nginx error logs
2018/05/18 13:28:02 [error] 4247#4247: *1460 recv() failed (104: Connection reset by peer) while reading response header from upstream, client: 1.2.3.4, server: www.domain.com, request: "GET /checkout/cart/ HTTP/2.0", upstream: "fastcgi://unix:/var/run/php/domain.com-fpm.sock:", host: "www.domain.com.com", referrer: "https://www.domain.com.com/customer/account/login/"
Please help to resolve this issue. As i'm already tried multiple ways to resolve issue but didn't get the success.

Optimize Nginx + PHP-FPM for 5 million daily pageviews

We run a few high volume websites which together generate around 5 million pageviews per day. We have the most overkill servers as we anticipate growth but we are having reports of a few active users saying the site is sometimes slow on the first pageview. I've seen this myself every once in a while where the first pageview will take 3-5 seconds then it's instant after that for the rest of the day. This has happened to me maybe twice in the last 24 hours so not enough to figure out what's happening. Every page on our site uses PHP but one of the times it happened to me it was on a PHP page that doesn't have any database calls which makes me think the issue is limited to NGINX, PHP-FPM or network settings.
We have 3 NGINX servers running behind a load balancer. Our database is separate on a cluster. I included our configuration files for nginx and php-fpm as well as our current RAM usage and PHP-FPM status. This is based on middle of the day (average traffic for us). Please take a look and let me know if you see any red flags in my setup or have any suggestions to optimize further.
Specs for each NGINX Server:
OS: CentOS 7
RAM: 128GB
CPU: 32 cores (2.4Ghz each)
Drives: 2xSSD on RAID 1
RAM Usage (free -g)
total used free shared buff/cache available
Mem: 125 15 10 3 100 103
Swap: 15 0 15
PHP-FPM status (IE: http://server1_ip/status)
pool: www
process manager: dynamic
start time: 03/Mar/2016:03:42:49 -0800
start since: 1171262
accepted conn: 69827961
listen queue: 0
max listen queue: 0
listen queue len: 0
idle processes: 1670
active processes: 1
total processes: 1671
max active processes: 440
max children reached: 0
slow requests: 0
php-fpm config file:
[www]
user = nginx
group = nginx
listen = /var/opt/remi/php70/run/php-fpm/php-fpm.sock
listen.owner = nginx
listen.group = nginx
listen.mode = 0660
listen.allowed_clients = 127.0.0.1
pm = dynamic
pm.max_children = 6000
pm.start_servers = 1600
pm.min_spare_servers = 1500
pm.max_spare_servers = 2000
pm.max_requests = 1000
pm.status_path = /status
slowlog = /var/opt/remi/php70/log/php-fpm/www-slow.log
php_admin_value[error_log] = /var/opt/remi/php70/log/php-fpm/www-error.log
php_admin_flag[log_errors] = on
php_value[session.save_handler] = files
php_value[session.save_path] = /var/opt/remi/php70/lib/php/session
php_value[soap.wsdl_cache_dir] = /var/opt/remi/php70/lib/php/wsdlcache
nginx config file:
user nginx;
worker_processes 32;
error_log /var/log/nginx/error.log;
pid /run/nginx.pid;
events {
worker_connections 1000;
multi_accept on;
use epoll;
}
http {
log_format main '$remote_addr - $remote_user [$time_iso8601] "$request" '
'$status $body_bytes_sent "$http_referer" '
'"$http_user_agent" "$http_x_forwarded_for"';
access_log /var/log/nginx/access.log main;
sendfile on;
tcp_nopush on;
tcp_nodelay on;
keepalive_timeout 10 10;
send_timeout 60;
types_hash_max_size 2048;
client_max_body_size 50M;
client_body_buffer_size 5m;
client_body_timeout 60;
client_header_timeout 60;
fastcgi_buffers 256 16k;
fastcgi_buffer_size 128k;
fastcgi_connect_timeout 60s;
fastcgi_send_timeout 60s;
fastcgi_read_timeout 60s;
fastcgi_busy_buffers_size 256k;
fastcgi_temp_file_write_size 256k;
reset_timedout_connection on;
server_names_hash_bucket_size 100;
#compression
gzip on;
gzip_vary on;
gzip_min_length 10240;
gzip_proxied expired no-cache no-store private auth;
gzip_types text/plain text/css text/xml text/javascript application/x-javascript application/javascript application/xml;
gzip_disable "MSIE [1-6]\.";
include /etc/nginx/mime.types;
default_type application/octet-stream;
# Load modular configuration files from the /etc/nginx/conf.d directory.
# See http://nginx.org/en/docs/ngx_core_module.html#include
# for more information.
include /etc/nginx/conf.d/*.conf;
server {
listen 80 default_server;
listen [::]:80 default_server;
server_name domain1.com;
root /folderpath;
location / {
index index.php;
}
location = /favicon.ico { access_log off; log_not_found off; }
location = /robots.txt { access_log off; log_not_found off; }
#server status
location /server-status {
stub_status on;
access_log off;
auth_basic "Restricted";
auth_basic_user_file /etc/nginx/.htpasswd;
}
location = /status {
access_log off;
allow 127.0.0.1;
auth_basic "Restricted";
auth_basic_user_file /etc/nginx/.htpasswd;
fastcgi_pass unix:/var/opt/remi/php70/run/php-fpm/php-fpm.sock;
fastcgi_index index.php;
fastcgi_param SCRIPT_FILENAME $document_root$fastcgi_script_name;
include fastcgi_params;
}
location ~ \.php$ {
try_files $uri =404;
fastcgi_pass unix:/var/opt/remi/php70/run/php-fpm/php-fpm.sock;
fastcgi_index index.php;
fastcgi_param SCRIPT_FILENAME $document_root$fastcgi_script_name;
include fastcgi_params;
}
UPDATE:
I installed opcache as per the suggestion below. Not sure if it fixes the issue. Here are my settings
opcache.enable=1
opcache.memory_consumption=1024
opcache.interned_strings_buffer=64
opcache.max_accelerated_files=32531
opcache.max_wasted_percentage=10
2 minor tips:
if you use opcache, monitor it to check if its configuration (especially memory size) is ok, and avoid OOM reset, you can use https://github.com/rlerdorf/opcache-status (a single php page)
increase pm.max_requests to keep using same processes

502 Bad Gateway: nginx, php5-fpm, 175/209 connect() failed (111: Connection refused) while connecting to upstream

Running shopware 5 on a Debian Jessie machine with nginx and php5-fpm, we get very often a 502 Bad Gateway. This happens mostly in backend when longer operations are working like thumbnail creation, even if this is done within small chunks of single ajax requests.
The used server with 64 GB RAM and 16 Cores is sleeping at all, because there is no real traffic on it. We use it like a staging system currently unless we have fixed all errors like this one.
Error log:
In the nginx-error log the following lines can be found then:
[error] 20524#0: *175 connect() failed (111: Connection refused) while connecting to upstream, client: xx.xx.xx.xx, server: domain.com, request: "POST /backend/MediaManager/createThumbnails HTTP/1.1", upstream: "fastcgi://127.0.0.1:9000", host: "www.domain.com", referrer: "http://www.domain.com/backend/"
[error] 20524#0: *175 no live upstreams while connecting to upstream, client: xx.xx.xx.xx, server: domain.com, request: "POST /backend/Log/createLog HTTP/1.1", upstream: "fastcgi://php-fpm", host: "www.domain.com", referrer: "http://www.domain.com/backend/"
[error] 20524#0: *175 connect() failed (111: Connection refused) while connecting to upstream, client: xx.xx.xx.xx, server: domain.com, request: "GET /backend/login/getLoginStatus?_dc=1457014588680 HTTP/1.1", upstream: "fastcgi://127.0.0.1:9000", host: "www.domain.com", referrer: "http://www.domain.com/backend/"
[error] 20522#0: *209 connect() failed (111: Connection refused) while connecting to upstream, client: xx.xx.xx.xx, server: domain.com, request: "GET /backend/login/getLoginStatus?_dc=1457014618682 HTTP/1.1", upstream: "fastcgi://127.0.0.1:9000", host: "www.domain.com", referrer: "http://www.domain.com/backend/"
Maybe it is notable, that at first lot of "*175 connect" errors occure and then finally a "*209 connect".
Config files:
I'll try to post only significant lines related to this topic and will leave out all those lines which are commented out.
php-fpm:
/etc/php5-fpm/pool.d/www.conf:
[www]
user = www-data
group = www-data
listen = /var/run/php5-fpm.sock
listen.owner = www-data
listen.group = www-data
pm.max_children = 5
pm.start_servers = 2
pm.min_spare_servers = 1
pm.max_spare_servers = 3
nginx:
/etc/nginx/nginx.conf:
user www-data;
worker_processes auto;
pid /run/nginx.pid;
events {
worker_connections 768;
multi_accept on;
}
http {
## MIME types.
include /etc/nginx/mime.types;
default_type application/octet-stream;
## Default log and error files.
access_log /var/log/nginx/access.log;
error_log /var/log/nginx/error.log;
## Use sendfile() syscall to speed up I/O operations and speed up
## static file serving.
sendfile on;
## Handling of IPs in proxied and load balancing situations.
# set_real_ip_from 192.168.1.0/24; # set to your proxies ip or range
# real_ip_header X-Forwarded-For;
## Timeouts.
client_body_timeout 60;
client_header_timeout 60;
keepalive_timeout 10 10;
send_timeout 60;
## Reset lingering timed out connections. Deflect DDoS.
reset_timedout_connection on;
## Body size.
client_max_body_size 10m;
## TCP options.
tcp_nodelay on;
## Optimization of socket handling when using sendfile.
tcp_nopush on;
## Compression.
gzip on;
gzip_buffers 16 8k;
gzip_comp_level 1;
gzip_http_version 1.1;
gzip_min_length 10;
gzip_types text/plain text/css application/json application/javascript application/x-javascript text/xml application/xml application/xml+rss text/javascript image/x-icon application/vnd.ms-fontobject font/opentype application/x-font-ttf;
gzip_vary on;
gzip_proxied any; # Compression for all requests.
gzip_disable "msie6";
## Hide the Nginx version number.
server_tokens off;
## Upstream to abstract backend connection(s) for PHP.
upstream php-fpm {
server unix:/var/run/php5-fpm.sock;
# server 127.0.0.1:9000;
## Create a backend connection cache.
keepalive 32;
}
## Include additional configs
include /etc/nginx/conf.d/*.conf;
## Include all vhosts.
include /etc/nginx/sites-enabled/*;
}
/etc/nginx/sites-available/site.conf:
server {
listen 80;
listen 443 ssl;
server_name xxxxxxxx.com;
root /var/www/shopware;
## Access and error logs.
access_log /var/log/nginx/xxxxxxxx.com.access.log;
error_log /var/log/nginx/xxxxxxxx.com.error.log;
## leaving out lots of shopware/mediafiles-related settings
## ....
## continue:
location ~ \.php$ {
try_files $uri $uri/ =404;
## NOTE: You should have "cgi.fix_pathinfo = 0;" in php.ini
fastcgi_split_path_info ^(.+\.php)(/.+)$;
## required for upstream keepalive
# disabled due to failed connections
#fastcgi_keep_conn on;
include fastcgi_params;
fastcgi_param SCRIPT_FILENAME $document_root$fastcgi_script_name;
fastcgi_param SHOPWARE_ENV $shopware_env if_not_empty;
fastcgi_param ENV $shopware_env if_not_empty; # BC for older SW versions
fastcgi_buffers 8 16k;
fastcgi_buffer_size 32k;
client_max_body_size 24M;
client_body_buffer_size 128k;
## upstream "php-fpm" must be configured in http context
fastcgi_pass php-fpm;
}
}
What to do now? Please let me now if i should provide further information to this question.
Update
After applying nginx- and fpm-settings from #peixotorms, the errors in nginx-logs changed to:
30 upstream timed out (110: Connection timed out) while reading response header from upstream
But the issue itself isn't solved. It has just another face...
It might sound strange to you, but your problem is most probably due to the fact that you're running PHP on a socket instead of a tcp port. You will start seeing 502 errors (and others) when you have around 300 concurrent requests (sometimes less) to php on a socket configuration.
Also your pm.max_children is way too low, unless you want to limit your server to around 5 simultaneous php requests maximum: http://php.net/manual/en/install.fpm.configuration.php
Configure it this way, and those errors should go away:
For your nginx.conf change the following values:
worker_processes 4;
worker_rlimit_nofile 750000;
# handles connection stuff
events {
worker_connections 50000;
multi_accept on;
use epoll;
}
upstream php-fpm {
keepalive 30;
server 127.0.0.1:9001;
}
Your /etc/php5-fpm/pool.d/www.conf
(Use these settings because you have plenty or RAM and CPU)
[www]
user = www-data
group = www-data
listen.owner = www-data
listen.group = www-data
listen.mode = 0660
listen = 127.0.0.1:9001
listen.allowed_clients = 127.0.0.1
listen.backlog = 65000
pm = dynamic
pm.max_children = 1024
pm.start_servers = 8
pm.min_spare_servers = 4
pm.max_spare_servers = 16
pm.max_requests = 10000
Also add this on your location ~ \.php$ { block:
location ~ \.php$ {
try_files $uri $uri/ =404;
## NOTE: You should have "cgi.fix_pathinfo = 0;" in php.ini
fastcgi_split_path_info ^(.+\.php)(/.+)$;
## required for upstream keepalive
# disabled due to failed connections
#fastcgi_keep_conn on;
include fastcgi_params;
fastcgi_param SCRIPT_FILENAME $document_root$fastcgi_script_name;
fastcgi_param SHOPWARE_ENV $shopware_env if_not_empty;
fastcgi_param ENV $shopware_env if_not_empty; # BC for older SW versions
fastcgi_keep_conn on;
fastcgi_connect_timeout 20s;
fastcgi_send_timeout 60s;
fastcgi_read_timeout 60s;
fastcgi_pass php-fpm;
}
EDIT:
Change the values below on your /etc/php5/fpm/php.ini file to this and restart:
safe_mode = Off
output_buffering = Off
zlib.output_compression = Off
max_execution_time = 900
max_input_time = 900
memory_limit = 2048M
post_max_size = 120M
file_uploads = On
upload_max_filesize = 120M
Try binding to 0.0.0.0:9000:
listen = 0.0.0.0:9000

nginx: connection timeout to php5-fpm

I've moved my server from apache2+fcgi to nginx+fpm because I wanted a lighter environment, and apache's memory footprint was high. The server is a dual core (I know, not very much) with 8G of ram. It runs also a rather busy FreeRadius server and related MySQL. CPU load average is ~1, with some obvious peaks.
One of those peaks happens every 30 minutes when I get web pings from some controlled devices. With Apache the server load was spiking up a lot, slowing down everything. Now with nginx the process is much faster (I also did some optimization in the code), tough now I miss some of these connections. I configured both nginx and fpm to what I believe should be enough, but I must be missing something because in these moments php isn't (apparently) able to reply to nginx. This is a recap of the config:
nginx/1.8.1
user www-data;
worker_processes auto;
pid /var/run/nginx.pid;
events {
worker_connections 1024;
# multi_accept on;
}
client_body_buffer_size 10K;
client_header_buffer_size 1k;
client_max_body_size 20m;
large_client_header_buffers 2 1k;
location ~ \.php$ {
fastcgi_split_path_info ^(.+\.php)(.*)$;
set $fsn /$yii_bootstrap;
if (-f $document_root$fastcgi_script_name){
set $fsn $fastcgi_script_name;
}
fastcgi_pass 127.0.0.1:9011;
include fastcgi_params;
fastcgi_param SCRIPT_FILENAME $document_root$fsn;
fastcgi_param PATH_INFO $fastcgi_path_info;
fastcgi_param PATH_TRANSLATED $document_root$fsn;
fastcgi_read_timeout 150s;
}
php5-fpm 5.4.45-1~dotdeb+6.1
[pool01]
listen = 127.0.0.1:9011
listen.allowed_clients = 127.0.0.1
pm = dynamic
pm.max_children = 150
pm.start_servers = 2
pm.min_spare_servers = 2
pm.max_spare_servers = 8
pm.max_requests = 2000
pm.process_idle_timeout = 10s
When the peak arrives I start seeing this in fpm logs:
[18-Feb-2016 11:30:04] WARNING: [pool pool01] seems busy (you may need to increase pm.start_servers, or pm.min/max_spare_servers), spawning 8 c
hildren, there are 0 idle, and 13 total children
[18-Feb-2016 11:30:05] WARNING: [pool pool01] seems busy (you may need to increase pm.start_servers, or pm.min/max_spare_servers), spawning 16
children, there are 0 idle, and 15 total children
[18-Feb-2016 11:30:06] WARNING: [pool pool01] seems busy (you may need to increase pm.start_servers, or pm.min/max_spare_servers), spawning 32
children, there are 0 idle, and 17 total children
[18-Feb-2016 11:30:07] WARNING: [pool pool01] seems busy (you may need to increase pm.start_servers, or pm.min/max_spare_servers), spawning 32
children, there are 0 idle, and 19 total children
and worse in nginx's error.log
2016/02/18 11:30:22 [error] 23400#23400: *209920 connect() failed (110: Connection timed out) while connecting to upstream, client: 79.1.1.9,
server: host.domain.com, request: "GET /ping/?whoami=abc02 HTTP/1.1", upstream: "fastcgi://127.0.0.1:9011", host: "host.domain.com"
2016/02/18 11:30:22 [error] 23400#23400: *209923 connect() failed (110: Connection timed out) while connecting to upstream, client: 1.1.9.71,
server: host.domain.com, request: "GET /utilz/pingme.php?whoami=abc01 HTTP/1.1", upstream: "fastcgi://127.0.0.1:9011", host: "host.domain.com"
2016/02/18 11:30:22 [error] 23400#23400: *209925 connect() failed (110: Connection timed out) while connecting to upstream, client: 3.7.0.4,
server: host.domain.com, request: "GET /ping/?whoami=abc03 HTTP/1.1", upstream: "fastcgi://127.0.0.1:9011", host: "host.domain.com"
2016/02/18 11:30:22 [error] 23400#23400: *209926 connect() failed (110: Connection timed out) while connecting to upstream, client: 1.7.2.1
, server: host.domain.com, request: "GET /ping/?whoami=abc04 HTTP/1.1", upstream: "fastcgi://127.0.0.1:9011", host: "host.domain.com"
Those connections are lost!
First question, why nginx returns timeout within 22s (the pings are made at 00 and 30 minutes of every hour) if fastcgi_read_timeout is set to 150s?
Second question: why do I get so many fpm warnings? The total children displayed never reaches pm.max_children. I know warnings are not errors, but I get warned... Is there a relation between those messages and nginx's timeouts?
Given that the server handles perfectly fine the regular traffic, and it has no problem with ram and swap neither during these peak times (it always has ~1.5G or more free), is there a better tuning to handle those ping connections (which doesn't involve changing the schedule)? Should I raise pm.start_servers and/or pm.min_spare_servers?
You need some changes + I would recommend upgrading your php to 5.6.
Nginx tunning: /etc/nginx/nginx.conf
user www-data;
pid /var/run/nginx.pid;
error_log /var/log/nginx/error.log crit;
# NOTE: Max simultaneous requests = worker_processes*worker_connections/(keepalive_timeout*2)
worker_processes 1;
worker_rlimit_nofile 750000;
# handles connection stuff
events {
worker_connections 50000;
multi_accept on;
use epoll;
}
# http request stuff
http {
access_log off;
log_format main '$remote_addr $host $remote_user [$time_local] "$request" $status $body_bytes_sent "$http_referer" "$http_user_agent" $ssl_cipher $request_time';
types_hash_max_size 2048;
server_tokens off;
fastcgi_read_timeout 180;
keepalive_timeout 20;
keepalive_requests 1000;
reset_timedout_connection on;
client_body_timeout 20;
client_header_timeout 10;
send_timeout 10;
tcp_nodelay on;
tcp_nopush on;
sendfile on;
directio 100m;
client_max_body_size 100m;
server_names_hash_bucket_size 100;
include /etc/nginx/mime.types;
default_type application/octet-stream;
# index default files
index index.html index.htm index.php;
# use hhvm with php-fpm as backup
upstream php {
keepalive 30;
server 127.0.0.1:9001; # php5-fpm (check your port)
}
# Virtual Host Configs
include /etc/nginx/sites-available/*;
}
For a default server, create and add to /etc/nginx/sites-available/default.conf
# default virtual host
server {
listen 80;
server_name localhost;
root /path/to/your/files;
access_log off;
log_not_found off;
# handle staic files first
location / {
index index.html index.htm index.php ;
}
# serve static content directly by nginx without logs
location ~* \.(jpg|jpeg|gif|png|bmp|css|js|ico|txt|pdf|swf|flv|mp4|mp3)$ {
access_log off;
log_not_found off;
expires 7d;
# Enable gzip for some static content only
gzip on;
gzip_comp_level 6;
gzip_vary on;
gzip_types text/plain text/css application/json application/x-javascript application/javascript text/javascript image/svg+xml application/vnd.ms-fontobject application/x-font-ttf font/opentype;
}
# no cache for xml files
location ~* \.(xml)$ {
access_log off;
log_not_found off;
expires 0s;
add_header Pragma no-cache;
add_header Cache-Control "no-cache, no-store, must-revalidate, post-check=0, pre-check=0";
gzip on;
gzip_comp_level 6;
gzip_vary on;
gzip_types text/plain text/xml application/xml application/rss+xml;
}
# run php only when needed
location ~ .php$ {
# basic php params
fastcgi_pass php;
fastcgi_index index.php;
fastcgi_keep_conn on;
fastcgi_connect_timeout 20s;
fastcgi_send_timeout 30s;
fastcgi_read_timeout 30s;
# fast cgi params
include fastcgi_params;
fastcgi_param SCRIPT_FILENAME $document_root$fastcgi_script_name;
fastcgi_param SCRIPT_NAME $fastcgi_script_name;
fastcgi_param QUERY_STRING $query_string;
fastcgi_param REQUEST_METHOD $request_method;
fastcgi_param CONTENT_TYPE $content_type;
fastcgi_param CONTENT_LENGTH $content_length;
fastcgi_param REQUEST_URI $request_uri;
fastcgi_param DOCUMENT_URI $document_uri;
fastcgi_param DOCUMENT_ROOT $document_root;
fastcgi_param REMOTE_ADDR $remote_addr;
fastcgi_param REMOTE_PORT $remote_port;
}
}
Idealy, you want php5-fpm to autorestart if it starts failing, therefore you can this to /etc/php5/fpm/php-fpm.conf
emergency_restart_threshold = 60
emergency_restart_interval = 1m
process_control_timeout = 10s
Change /etc/php5/fpm/pool.d/www.conf
[www]
user = www-data
listen.owner = www-data
listen.group = www-data
listen.mode = 0660
listen = 127.0.0.1:9001
listen.allowed_clients = 127.0.0.1
listen.backlog = 65000
pm = dynamic
pm.start_servers = 8
pm.min_spare_servers = 4
pm.max_spare_servers = 16
; maxnumber of simultaneous requests that will be served (if each php page needs 32 Mb, then 128x32 = 4G RAM)
pm.max_children = 128
; We want to keep it hight (10k to 50k) to prevent server respawn, however if there are memory leak on PHP code we will have a problem.
pm.max_requests = 10000

Nginx + PHP-FPM Using CPU and Not RAM

I am doing load testing on an Nginx Server and I am having an issue where my CPU hits 100% but only 50% of my ram is being utilized. The server is this:
2 vCPU
2 GB of RAM
40GB SSD Drive
Rackspace High Performance Server
This is my Nginx Config
worker_processes 2;
error_log /var/log/nginx/error.log crit;
pid /var/run/nginx.pid;
events {
worker_connections 1524;
use epoll;
multi_accept on;
}
http {
include /etc/nginx/mime.types;
default_type application/octet-stream;
log_format main '$remote_addr - $remote_user [$time_local] "$request" '
'$status $body_bytes_sent "$http_referer" '
'"$http_user_agent" "$http_x_forwarded_for"';
#access_log /var/log/nginx/access.log main;
access_log off;
# Sendfile copies data between one FD and other from within the kernel.
# More efficient than read() + write(), since the requires transferring data to and from the user space.
sendfile on;
# Tcp_nopush causes nginx to attempt to send its HTTP response head in one packet,
# instead of using partial frames. This is useful for prepending headers before calling sendfile,
# or for throughput optimization.
tcp_nopush on;
# don't buffer data-sends (disable Nagle algorithm). Good for sending frequent small bursts of data in real time.
tcp_nodelay on;
# allow the server to close the connection after a client stops responding. Frees up socket-associated memory.
reset_timedout_connection on;
#keepalive_timeout 0;
keepalive_timeout 65;
# send the client a "request timed out" if the body is not loaded by this time. Default 60.
client_body_timeout 10;
# If the client stops reading data, free up the stale client connection after this much time. Default 60.
send_timeout 2;
open_file_cache max=200000 inactive=20s;
open_file_cache_valid 30s;
open_file_cache_min_uses 2;
open_file_cache_errors on;
gzip on;
server_tokens off;
client_max_body_size 20m;
client_body_buffer_size 128k;
client_max_body_size 20m;
client_body_buffer_size 128k;
gzip_min_length 10240;
gzip_proxied expired no-cache no-store private auth;
gzip_types text/plain text/css text/xml text/javascript application/x-javascript application/xml;
gzip_disable "MSIE [1-6]\.";
# Load config files from the /etc/nginx/conf.d directory
# The default server is in conf.d/default.conf
fastcgi_cache_path /var/cache/nginx levels=1:2 keys_zone=microcache:10m max_size=1000m inactive=60m;
include /etc/nginx/conf.d/*.conf;
include /etc/nginx/sites-enabled/*;
}
Kernal additions in /etc/sysctl.conf
# Increase system IP port limits to allow for more connections
net.ipv4.ip_local_port_range = 2000 65000
net.ipv4.tcp_window_scaling = 1
# number of packets to keep in backlog before the kernel starts dropping them
net.ipv4.tcp_max_syn_backlog = 3240000
# increase socket listen backlog
net.core.somaxconn = 3240000
net.ipv4.tcp_max_tw_buckets = 1440000
# Increase TCP buffer sizes
net.core.rmem_default = 8388608
net.core.rmem_max = 16777216
net.core.wmem_max = 16777216
net.ipv4.tcp_rmem = 4096 87380 16777216
net.ipv4.tcp_wmem = 4096 65536 16777216
net.ipv4.tcp_congestion_control = cubic
Example VHost Config with PHP-FPM
server {
listen 80;
server_name www.example.com;
location / {
root /data/sites/example.com/public_html;
index index.php index.html index.htm;
try_files $uri $uri/ /index.php?rt=$uri&$args;
}
location ~ \.php {
root /data/sites/example.com/public_html;
fastcgi_pass unix:/var/run/php-fpm/php-fpm.sock;
fastcgi_param SCRIPT_FILENAME $document_root$fastcgi_script_name;
fastcgi_param SCRIPT_NAME $fastcgi_script_name;
fastcgi_index index.php;
fastcgi_param PATH_INFO $fastcgi_script_name;
fastcgi_param ENV production;
include fastcgi_params;
}
}
The server can handle about 60 active SBU connections clicking around, or about 300 request per second. Is the fact that is not fully utilizing RAM and more CPU a bad thing? Can I optimize this further?

Categories