I am running wordpress 3.9.1 on a 2GB ram / 2x CPU digitalocean vps with nginx and php5-fpm on tcp/ip 127.0.0.1:9000 under debian 7 x64. I am experiencing massive 504 gateway timeouts.
nginx log output is:
upstream timed out (110: Connection timed out) while connecting to upstream or
upstream timed out (110: Connection timed out) while reading response header from upstream
www.conf:
listen = 127.0.0.1:9000
pm = dynamic
pm.max_children = 25
pm.start_servers = 4
pm.min_spare_servers = 4
pm.max_spare_servers = 8
pm.max_requests = 200
listen_backlog = 65536
request_terminate_timeout = 30
php.ini:
max_execution_time = 30
nginx.conf:
worker_processes 2;
worker_connections 1024;
client_max_body_size 10M;
fastcgi_read_timeout 30s;
client_header_timeout 30s;
client_body_timeout 30s;
fastcgi_send_timeout 30s;
fastcgi_buffers 8 16k;
fastcgi_buffer_size 32k;
keepalive_timeout 65;
gzip on;
The recommended settings of DO for the sysctl.conf are:
# Digital Ocean Recommended Settings:
net.core.wmem_max=12582912
net.core.rmem_max=12582912
net.ipv4.tcp_rmem= 10240 87380 12582912
net.ipv4.tcp_wmem= 10240 87380 12582912
the following was removed completely:
php5-fpm.conf (/etc/nginx/conf.d/):
upstream php5-fpm-sock {
server unix:/var/run/php5-fpm.sock;
}
I also tried to disable my plugins with no luck. The site becomes unresponsive while the wordpress dashboard shows "connection lost" message and eventually 504 gateway timeout.
Anyone with a similar experience?
Any suggestions would be appreciated.
Thanks.
Nginx worker_processes should correlate directly to the CPU threads; since you have a 2 core CPU then you should have 2 worker_processes threads.
Next, try increasing the php-fpm servers to x4 of CPU threads. Something like:
pm.start_servers = 4
pm.min_spare_servers = 4
pm.max_spare_servers = 8
Also, nginx is complaining of the upstream timeouts. Do you have any upstream confs/servers ?
Related
I am having issues with a long-running PHP script:
<?php
sleep(70); # extend 60s
phpinfo();
Which gets terminated every time after 60 seconds with a response 504 Gateway Time-out from Nginx.
When I inspect the Nginx errors I can see that the request times out:
... [error] 1312#1312: *2023 upstream timed out (110: Connection timed out) while reading response header from upstream, ... , upstream: "fastcgi://unix:/run/php/php7.0-fpm.sock", ...
I went through the related questions and tried increasing the timeouts creating a /etc/nginx/conf.d/timeout.conf file with the following content:
proxy_connect_timeout 600;
proxy_send_timeout 600;
proxy_read_timeout 600;
send_timeout 600;
fastcgi_read_timeout 600;
fastcgi_send_timeout 600;
fastcgi_connect_timeout 600;
I also read through the Nginx documentation for both fastcgi and core modules, searching for any configurations with defaults set to 60 seconds.
I ruled out the client_* timeouts because they return HTTP 408 instead of HTTP 504 responses.
This is my Nginx server config portion of FastCGI:
location ~ \.php$ {
fastcgi_pass unix:/run/php/php7.0-fpm.sock;
include fastcgi_params;
}
From what I read so far this doesn't seem to be an issue with PHP rather Nginx is to blame for the timeout. Nonetheless, I tried modifying the limits in PHP as well:
My values from the phpinfo():
default_socket_timeout=600
max_execution_time=300
max_input_time=-1
memory_limit=512M
The php-fpm pool config also has the following enabled:
catch_workers_output = yes
request_terminate_timeout = 600
There is nothing in the php-fpm logs.
I am also using Amazon's Load Balancer to route to the server, but the timeout configuration is also increased from the default 60 seconds.
I don't know where else to look, during all the changes I restarted both php-fpm and nginx.
Thank you
As it happens in these cases, I was actually editing a wrong configuration file that didn't get loaded by Nginx.
Adding the following to the right file did the trick:
fastcgi_read_timeout 600;
fastcgi_send_timeout 600;
fastcgi_connect_timeout 600;
I use Laravel Forge for spinning up my EC2 environments, which makes a LEMP stacks for me. I recently started getting 504 timeouts on requests.
I'm no sysadmin (hence subscription to Forge), but I looked through the logs and narrowed the issue down to these 2 repeated entries in my logs:
in: /var/log/nginx/default-error.log
2017/09/15 09:32:17 [error] 2308#2308: *1 upstream timed out (110: Connection timed out) while sending request to upstream, client: x.x.x.x, server: xxxx.com, request: "POST /upload HTTP/2.0", upstream: "fastcgi://unix:/var/run/php/php7.1-fpm.sock", host: "xxxx.com", referrer: "https://xxxx.com/rest/of/the/path"
in: /var/log/php7.1-fpm-log
[15-Sep-2017 09:35:09] WARNING: [pool www] seems busy (you may need to increase pm.start_servers, or pm.min/max_spare_servers), spawning 8 children, there are 0 idle, and 14 total children
It seems like fpm opens connections that never die, and from my RDS load logs I can see that the RAM is constantly maxed out.
I've tried:
Rolling back to a definite stable version of my app (2months ago)
Reinstalling my EC2 with 5.6, 7.0, and 7.1 (with their respective fpm)
Doing all the above on 14.04 and 16.04
Creating a bigger RDS
Right now the only thing that works is a beefy RDS (8gb RAM) + killing fpm pooled connections every 300 requests. But obviously throwing resources at this problem is not the solution.
Here is my config for /etc/php/7.1/fpm/pool.d/www.conf
user = forge
group = forge
listen = /run/php/php7.1-fpm.sock
listen.owner = www-data
listen.group = www-data
listen.mode = 0666
pm = dynamic
pm.max_children = 30
pm.start_servers = 7
pm.min_spare_servers = 6
pm.max_spare_servers = 10
pm.process_idle_timeout = 7s;
pm.max_requests = 300
And here is my config for nginx.conf
listen 80;
listen [::]:80;
listen 443 ssl http2;
listen [::]:443 ssl http2;
server_name xxxx.com;
root /home/forge/xxxx.com/public;
# FORGE SSL (DO NOT REMOVE!)
ssl_certificate /etc/nginx/ssl/xxxx.com/111111/server.crt;
ssl_certificate_key /etc/nginx/ssl/xxxx.com/111111/server.key;
ssl_protocols xxxx;
ssl_ciphers ...;
ssl_prefer_server_ciphers on;
ssl_dhparam /etc/nginx/dhparams.pem;
add_header X-Frame-Options "SAMEORIGIN";
add_header X-XSS-Protection "1; mode=block";
add_header X-Content-Type-Options "nosniff";
index index.html index.htm index.php;
charset utf-8;
# FORGE CONFIG (DOT NOT REMOVE!)
include forge-conf/xxxx.com/server/*;
location / {
try_files $uri $uri/ /index.php?$query_string;
}
location = /favicon.ico
location = /robots.txt
access_log /var/log/nginx/xxxx.com-access.log;
error_log /var/log/nginx/xxxx.com-error.log error;
error_page 404 /index.php;
location ~ \.php$ {
fastcgi_split_path_info ^(.+\.php)(/.+)$;
fastcgi_pass unix:/var/run/php/php7.1-fpm.sock;
fastcgi_index index.php;
fastcgi_read_timeout 60;
include fastcgi_params;
}
location ~ /\.(?!well-known).* {
deny all;
}
location ~* \.(?:ico|css|js|gif|jpe?g|png)$ {
expires 30d;
add_header Pragma public;
add_header Cache-Control "public";
}
OK, after a LOT of debugging and testing I've noticed these few causes.
Primary Cause for me: The AWS RDS instance that I was using for my MySQL had 500Mb of memory. Looking back, all these issues started once the DB size surpassed 400Mb.
Solution: Make sure you have 2x RAM of your DB size at all times. Otherwise the entire B+Tree doesn't fit in the memory, so it has to do constant swaps. This can take your query time upwards of 15 secs.
Primary Cause for problems like these: Not optimized SQL queries.
Solution: In your localhost maintain data similar to the size of your data on the server.
Running shopware 5 on a Debian Jessie machine with nginx and php5-fpm, we get very often a 502 Bad Gateway. This happens mostly in backend when longer operations are working like thumbnail creation, even if this is done within small chunks of single ajax requests.
The used server with 64 GB RAM and 16 Cores is sleeping at all, because there is no real traffic on it. We use it like a staging system currently unless we have fixed all errors like this one.
Error log:
In the nginx-error log the following lines can be found then:
[error] 20524#0: *175 connect() failed (111: Connection refused) while connecting to upstream, client: xx.xx.xx.xx, server: domain.com, request: "POST /backend/MediaManager/createThumbnails HTTP/1.1", upstream: "fastcgi://127.0.0.1:9000", host: "www.domain.com", referrer: "http://www.domain.com/backend/"
[error] 20524#0: *175 no live upstreams while connecting to upstream, client: xx.xx.xx.xx, server: domain.com, request: "POST /backend/Log/createLog HTTP/1.1", upstream: "fastcgi://php-fpm", host: "www.domain.com", referrer: "http://www.domain.com/backend/"
[error] 20524#0: *175 connect() failed (111: Connection refused) while connecting to upstream, client: xx.xx.xx.xx, server: domain.com, request: "GET /backend/login/getLoginStatus?_dc=1457014588680 HTTP/1.1", upstream: "fastcgi://127.0.0.1:9000", host: "www.domain.com", referrer: "http://www.domain.com/backend/"
[error] 20522#0: *209 connect() failed (111: Connection refused) while connecting to upstream, client: xx.xx.xx.xx, server: domain.com, request: "GET /backend/login/getLoginStatus?_dc=1457014618682 HTTP/1.1", upstream: "fastcgi://127.0.0.1:9000", host: "www.domain.com", referrer: "http://www.domain.com/backend/"
Maybe it is notable, that at first lot of "*175 connect" errors occure and then finally a "*209 connect".
Config files:
I'll try to post only significant lines related to this topic and will leave out all those lines which are commented out.
php-fpm:
/etc/php5-fpm/pool.d/www.conf:
[www]
user = www-data
group = www-data
listen = /var/run/php5-fpm.sock
listen.owner = www-data
listen.group = www-data
pm.max_children = 5
pm.start_servers = 2
pm.min_spare_servers = 1
pm.max_spare_servers = 3
nginx:
/etc/nginx/nginx.conf:
user www-data;
worker_processes auto;
pid /run/nginx.pid;
events {
worker_connections 768;
multi_accept on;
}
http {
## MIME types.
include /etc/nginx/mime.types;
default_type application/octet-stream;
## Default log and error files.
access_log /var/log/nginx/access.log;
error_log /var/log/nginx/error.log;
## Use sendfile() syscall to speed up I/O operations and speed up
## static file serving.
sendfile on;
## Handling of IPs in proxied and load balancing situations.
# set_real_ip_from 192.168.1.0/24; # set to your proxies ip or range
# real_ip_header X-Forwarded-For;
## Timeouts.
client_body_timeout 60;
client_header_timeout 60;
keepalive_timeout 10 10;
send_timeout 60;
## Reset lingering timed out connections. Deflect DDoS.
reset_timedout_connection on;
## Body size.
client_max_body_size 10m;
## TCP options.
tcp_nodelay on;
## Optimization of socket handling when using sendfile.
tcp_nopush on;
## Compression.
gzip on;
gzip_buffers 16 8k;
gzip_comp_level 1;
gzip_http_version 1.1;
gzip_min_length 10;
gzip_types text/plain text/css application/json application/javascript application/x-javascript text/xml application/xml application/xml+rss text/javascript image/x-icon application/vnd.ms-fontobject font/opentype application/x-font-ttf;
gzip_vary on;
gzip_proxied any; # Compression for all requests.
gzip_disable "msie6";
## Hide the Nginx version number.
server_tokens off;
## Upstream to abstract backend connection(s) for PHP.
upstream php-fpm {
server unix:/var/run/php5-fpm.sock;
# server 127.0.0.1:9000;
## Create a backend connection cache.
keepalive 32;
}
## Include additional configs
include /etc/nginx/conf.d/*.conf;
## Include all vhosts.
include /etc/nginx/sites-enabled/*;
}
/etc/nginx/sites-available/site.conf:
server {
listen 80;
listen 443 ssl;
server_name xxxxxxxx.com;
root /var/www/shopware;
## Access and error logs.
access_log /var/log/nginx/xxxxxxxx.com.access.log;
error_log /var/log/nginx/xxxxxxxx.com.error.log;
## leaving out lots of shopware/mediafiles-related settings
## ....
## continue:
location ~ \.php$ {
try_files $uri $uri/ =404;
## NOTE: You should have "cgi.fix_pathinfo = 0;" in php.ini
fastcgi_split_path_info ^(.+\.php)(/.+)$;
## required for upstream keepalive
# disabled due to failed connections
#fastcgi_keep_conn on;
include fastcgi_params;
fastcgi_param SCRIPT_FILENAME $document_root$fastcgi_script_name;
fastcgi_param SHOPWARE_ENV $shopware_env if_not_empty;
fastcgi_param ENV $shopware_env if_not_empty; # BC for older SW versions
fastcgi_buffers 8 16k;
fastcgi_buffer_size 32k;
client_max_body_size 24M;
client_body_buffer_size 128k;
## upstream "php-fpm" must be configured in http context
fastcgi_pass php-fpm;
}
}
What to do now? Please let me now if i should provide further information to this question.
Update
After applying nginx- and fpm-settings from #peixotorms, the errors in nginx-logs changed to:
30 upstream timed out (110: Connection timed out) while reading response header from upstream
But the issue itself isn't solved. It has just another face...
It might sound strange to you, but your problem is most probably due to the fact that you're running PHP on a socket instead of a tcp port. You will start seeing 502 errors (and others) when you have around 300 concurrent requests (sometimes less) to php on a socket configuration.
Also your pm.max_children is way too low, unless you want to limit your server to around 5 simultaneous php requests maximum: http://php.net/manual/en/install.fpm.configuration.php
Configure it this way, and those errors should go away:
For your nginx.conf change the following values:
worker_processes 4;
worker_rlimit_nofile 750000;
# handles connection stuff
events {
worker_connections 50000;
multi_accept on;
use epoll;
}
upstream php-fpm {
keepalive 30;
server 127.0.0.1:9001;
}
Your /etc/php5-fpm/pool.d/www.conf
(Use these settings because you have plenty or RAM and CPU)
[www]
user = www-data
group = www-data
listen.owner = www-data
listen.group = www-data
listen.mode = 0660
listen = 127.0.0.1:9001
listen.allowed_clients = 127.0.0.1
listen.backlog = 65000
pm = dynamic
pm.max_children = 1024
pm.start_servers = 8
pm.min_spare_servers = 4
pm.max_spare_servers = 16
pm.max_requests = 10000
Also add this on your location ~ \.php$ { block:
location ~ \.php$ {
try_files $uri $uri/ =404;
## NOTE: You should have "cgi.fix_pathinfo = 0;" in php.ini
fastcgi_split_path_info ^(.+\.php)(/.+)$;
## required for upstream keepalive
# disabled due to failed connections
#fastcgi_keep_conn on;
include fastcgi_params;
fastcgi_param SCRIPT_FILENAME $document_root$fastcgi_script_name;
fastcgi_param SHOPWARE_ENV $shopware_env if_not_empty;
fastcgi_param ENV $shopware_env if_not_empty; # BC for older SW versions
fastcgi_keep_conn on;
fastcgi_connect_timeout 20s;
fastcgi_send_timeout 60s;
fastcgi_read_timeout 60s;
fastcgi_pass php-fpm;
}
EDIT:
Change the values below on your /etc/php5/fpm/php.ini file to this and restart:
safe_mode = Off
output_buffering = Off
zlib.output_compression = Off
max_execution_time = 900
max_input_time = 900
memory_limit = 2048M
post_max_size = 120M
file_uploads = On
upload_max_filesize = 120M
Try binding to 0.0.0.0:9000:
listen = 0.0.0.0:9000
Suddenly, today around 10 a.m., The number of active client requests of Nginx are increase drastically.
http://gyazo.com/a34263e00065b2c52d03b0c295b5cfa3
As increase of active requests, cpu usage of the server increases and returns bad response.
http://gyazo.com/28ff3e4cfe73ebbc76eb74f225d91d3d
Please teach me what's happen in my environment.
My environment is blelow:
Amazon ELB <-> Nginx(ver.1.4.3) <-> php-fpm(ver.5.4.23) <-> WordPress(ver.3.9.2) <-> MySQL(ver.5.5.31)
/etc/nginx/conf.d/default.conf:
server {
listen 80 default;
server_name example.com;
root /var/www/vhosts/example;
index index.html index.htm;
charset utf-8;
access_log /var/log/nginx/access.log main;
error_log /var/log/nginx/error.log;
include /etc/nginx/drop;
rewrite /wp-admin$ $scheme://$host$uri/ permanent;
set $mobile '';
location ~* ^/wp-(content|admin|includes) {
index index.php index.html index.htm;
if ($request_filename ~ .*\.php) {
break;
proxy_pass http://backend;
}
include /etc/nginx/expires;
}
location / {
if ($request_filename ~ .*\.php) {
break;
proxy_pass http://backend;
}
include /etc/nginx/expires;
set $do_not_cache 0;
if ($http_cookie ~* "comment_author_|wordpress_(?!test_cookie)|wp-postpass_" ) {
set $do_not_cache 1;
}
if ($request_method = POST) {
set $do_not_cache 1;
}
proxy_no_cache $do_not_cache;
proxy_cache_bypass $do_not_cache;
proxy_redirect off;
proxy_cache czone;
proxy_cache_key "$scheme://$host$request_uri$mobile";
proxy_cache_valid 200 0m;
proxy_pass http://backend;
}
}
server {
listen unix:/var/run/nginx-backend.sock default;
server_name _;
root /var/www/vhosts/example;
index index.php index.html index.htm;
access_log /var/log/nginx/backend.access.log backend;
keepalive_timeout 25;
port_in_redirect off;
gzip off;
gzip_vary off;
include /etc/nginx/wp-multisite-subdir;
}
/etc/php-fpm.d/www.conf:
[www]
listen = /var/run/php-fpm.sock
listen.owner = nginx
listen.group = nginx
listen.mode = 0666
user = nginx
group = nginx
pm = dynamic
pm.max_children = 15
pm.start_servers = 1
pm.min_spare_servers = 1
pm.max_spare_servers = 4
pm.max_requests = 200
rlimit_files = 131072
rlimit_core = unlimited
request_terminate_timeout = 90
request_slowlog_timeout = 60
slowlog = /var/log/php-fpm/www-slow.log
php_admin_value[error_log] = /var/log/php-fpm/www-error.log
php_admin_flag[log_errors] = on
php_admin_value[upload_max_filesize] = 64M
php_admin_value[post_max_size] = 64M
php_admin_value[max_execution_time] = 60
Server spec:
$ uname -a
Linux ip-172-31-1-34 3.4.82-69.112.amzn1.x86_64 #1 SMP Mon Feb 24 16:31:21 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
$ cat /etc/system-release
Amazon Linux AMI release 2013.09
It looks like it's getting a lot more hits, this could be caused by something like a search engine crawling your site(s) or potentially an attack. It would be worth checking out the log files to see if there's been a drastic increase in access and which files are being accessed.
If it is an attack it could be worth looking into something like fail2ban to automatically block any dodgy connections You should also make sure that WordPress and plugins is kept up to date for security patches (you say you're using v3.9.2 when v4 is available)
I'm running an Nginx + PHP-FPM server and I have a script that is supposed to execute over 30 minutes. After 240 seconds it ceases to function and returns a 502 Gateway error from Nginx.
PHP-FPM Log for the execution:
[03-May-2013 19:52:02] WARNING: [pool www] child 2949, script
'/var/www/shell/import_db.php' (request: "GET /shell/import_db.php")
execution timed out (239.971196 sec), terminating
[03-May-2013 19:52:02] WARNING: [pool www] child 2949 exited on signal 15 (SIGTERM) after 540.054237 seconds from start
[03-May-2013 19:52:02] NOTICE:
[pool www] child 3049 started
Nginx log for the execution:
2013/05/03 19:52:02 [error] 2970#0: *1 recv() failed (104: Connection
reset by peer) while reading response header from upstream, client:
98.172.80.203, server: www.example.net, request: "GET /shell/import_db.php HTTP/1.1", upstream: "fastcgi://127.0.0.1:9000",
host: "www.example.net"
I've run this script on an apache+suphp server and it executes flawlessly. Nonethless, I am including:
set_time_limit(0); at the top of my script.
The max_execution_time according to phpinfo() is 1800 seconds (this was originally 300, I bumped it up to this in an attempt to get it working).
FastCGI Nginx Configuration:
## Fcgi Settings
include fastcgi_params;
fastcgi_connect_timeout 60;
fastcgi_send_timeout 180s;
fastcgi_read_timeout 600s;
fastcgi_buffer_size 4k;
fastcgi_buffers 512 4k;
fastcgi_busy_buffers_size 8k;
fastcgi_temp_file_write_size 256k;
fastcgi_intercept_errors off;
fastcgi_index index.php;
fastcgi_param SCRIPT_FILENAME $document_root$fastcgi_script_name;
fastcgi_param SCRIPT_NAME $fastcgi_script_name;
# nginx will buffer objects to disk that are too large for the buffers above
fastcgi_temp_path /tmpfs/nginx/tmp 1 2;
#fastcgi_keep_conn on; # NGINX 1.1.14
expires off; ## Do not cache dynamic content
I've restarted php-fpm and nginx to no avail. What setting or configuration am I missing from this or that I could check?
When you run nginx with php-fpm, nginx tries to so stay connected with php-fpm's children. You should set the following directive to change that:
fastcgi_keep_conn off;