Nginx php-fpm 504 gateway error - php

After hours of searching and debugging I give up!
There are thousands of questions and articles about long running PHP processes but non of them solved my issue.
I have a PHP script with the following codes:
$cur = 0;
// Second, loop for $timeout seconds checking if process is running
while( $cur < 31 ) {
sleep(1);
$cur += 1;
echo "\n ---- $cur ------ \n";
}
It is simply intended to run for 31 seconds.
I have a Nginx, PHP configured as fastcgi in debian server.
I set
max_execution_time = 600
In
/etc/php5/fpm/php.ini
I even set it in
/etc/php5/cli/php.ini
Also set
request_terminate_timeout = 600
in
/etc/php5/fpm/pool.d/www.conf
I also made these changes in nginx.conf http section
client_header_timeout 600;
client_body_timeout 600;
send_timeout 600;
fastcgi_read_timeout 600;
fastcgi_send_timeout 600;
client_max_body_size 600;
fastcgi_buffers 8 128k;
fastcgi_buffer_size 128k;
And put the directives inside server section. and these directives inside location section of nginx configuration
send_timeout 600;
fastcgi_read_timeout 600;
fastcgi_send_timeout 600;
client_max_body_size 600;
fastcgi_buffers 8 128k;
fastcgi_buffer_size 128k;
But I still encounter the Gateway Timeout error in the browser!
(And Yes! I restarted php-fpm and nginx thousands of times)
Do you guys have any idea?

Please don't take my answer as an insult but, did you make sure that your web server is on and did you try accessing another page of the site?

After seeing this answer, I tend to believe the situation is a s follows: nginx is trying to fill its FastCGI buffer (which is enabled by default) while your script is taking too long to return the first byte, resulting into the timeout. Provided I am correct, there are two things you need to do in order to resolve this:
Switch fastcgi_buffering to off
alter your script so that flush() and ob_flush() are called after each iteration:
while( $cur < 31 ) {
++$cur;
echo "\n ---- $cur ------ \n";
flush();
ob_flush();
sleep(1);
}
hth

I believe that you need to include the proxy_read_timeout directive in your Nginx configuration file. My own configuration file looks like this:
server {
proxy_read_timeout 300s;
...
}
You'll note that that is in my server block, however this directive is also valid inside of the http and location blocks as well.
*Edit to add that this is because Nginx proxies requests to the PHP-FPM server; the directives you attempted to use are only valid for content that is being served by Nginx itself, and not being proxied.

Related

Handling at least 200 Concurrent requests per second for a Laravel REST API with NGINX

I have developed a REST API with laravel 8 and configured the backend with PostgreSQL/PostGIS. My main aim is to handle atleast 200 concurrent requests per second for the API which should return the result within a second for all 200 concurrent requests. I have listed the configurations as follows but still I could not achieve 200 concurrent requests per second but could get the results in 3 seconds.
Technical Workflow
NGINX 1.18.0 -> PHP-FPM -> Laravel 8 -> PGBouncer -> PostgreSQL 12/PostGIS 2.4
AWS Server - M6G.8X Large
Operating System - Ubuntu 20.04 LTS
NGINX Configuration
user www-data;
worker_processes auto;
pid /run/nginx.pid;
include /etc/nginx/modules-enabled/*.conf;
# Load module section
load_module "modules/ngx_http_brotli_filter_module.so";
load_module "modules/ngx_http_brotli_static_module.so";
events {
worker_connections 1024;
}
http {
##
# Basic Settings
##
client_body_buffer_size 10K;
client_header_buffer_size 1k;
client_max_body_size 8m;
large_client_header_buffers 4 4k;
client_body_timeout 12;
client_header_timeout 12;
keepalive_requests 2000;
send_timeout 10;
sendfile on;
tcp_nopush on;
tcp_nodelay on;
keepalive_timeout 65;
types_hash_max_size 2048;
# server_tokens off;
...
...
}
PHP-FPM Configuration
Tried with static, dynamic and ondemand modes
pm = ondemand
pm.max_children = 5200
pm.process_idle_timeout = 5s
pm.max_requests = 2000
request_terminate_timeout = 600
PGBouncer Configuration
Tried with session, transaction and statement modes
pool_mode = transaction
max_client_conn = 50000
default_pool_size = 50
reserve_pool_size = 25
reserve_pool_timeout = 3
server_lifetime = 300
server_idle_timeout = 15
server_connect_timeout = 5
server_login_retry = 2
PostgreSQL Configuration
shared_buffers = 32GB
effective_cache_size = 96GB
work_mem = 33MB
maintenance_work_mem = 2GB
min_wal_size = 512MB
max_wal_size = 2GB
checkpoint_completion_target = 0.9
wal_buffers = 16MB
listen_addresses = '*'
max_connections = 1000
random_page_cost = 1.1
effective_io_concurrency = 200
max_worker_processes = 32
max_parallel_workers_per_gather = 16
max_parallel_workers = 32
parallel_setup_cost = 1000.0
parallel_tuple_cost = 0.1
min_parallel_table_scan_size = 100MB
min_parallel_index_scan_size = 5MB
parallel_leader_participation = on
The Laravel REST API is accessing DB for below needs.
SELECT statement of accessing a postgres function to get the output which is marked as PARALLEL SAFE
INSERT statement to add a new record in a functional table
INSERT statement to add a new record in an audit table
Even with th above configurations I could get only 65-70 requests/sec throughput while checking from jmeter for 200 parallel requests and the requests completing within 3 seconds. The API responds on an average within 300-400 ms while checking in Postman. Please help me resolve this issue.

nginx passing request to incorrect php-fpm pool

There is a machine with nginx and php-fpm on it. There are 2 servers, 2 php-fpm pools (each one with chroot) and 2 directories that has the same structure and similiar files/php classes.
One pool is listening on 127.0.0.1:22333 while another on 127.0.0.1:22335.
The problem is when I make a request to the second server it is somehow executed on the first pool. More strange that sometimes it takes some PHP classes from one directory (of the first pool), sometimes from another. There is not a specific pattern, it seems that it happens randomly.
e.g: Nginx logs show that request comes to the second server and php-fpm logs shows that is was handled in the first pool.
But it never happens other way around (requests to the first server are always executed with first php-fpm pool)
Pools are set up in the same way:
same user
same group
pm = dynamic
pm.start_servers = 20
pm.min_spare_servers = 10
pm.max_spare_servers = 30
pm.max_requests = 300
chroot = ...
chdir = /
php_flag[display_errors] = on
php_admin_value[error_log] = /logs/error.log
php_admin_flag[log_errors] = on
php_admin_value[memory_limit] = 64M
catch_workers_output = yes
php_admin_value[upload_tmp_dir] = ...
php_admin_value[curl.cainfo] = ...
Nginx servers directive for php looks like:
fastcgi_pass 127.0.0.1:2233X;
fastcgi_index index.php;
include /etc/nginx/fastcgi_params;
fastcgi_param DOCUMENT_ROOT /;
fastcgi_param SCRIPT_FILENAME $fastcgi_script_name;
fastcgi_param PATH_INFO $fastcgi_script_name;
fastcgi_intercept_errors off;
Had the same problem.
Best answer on this so far was on ServerFault which suggested opcache.enable=0, which pointed me to an quite interesting behavior of PHP.
the APC/OPcache cache is shared between all PHP-FPM pools
Digging further through opcache documentation I found this php.ini option:
opcache.validate_root=1
opcache.validate_root boolean
Prevents name collisions in chroot'ed environments. This should be enabled in all chroot'ed environments to prevent access to files outside the chroot.
Setting this option to 1 (default is 0) and restarting php-fpm fixed the problem for me.
EDIT:
Searching for the right words (validate_root) I found much more on this bug:
https://bugs.php.net/bug.php?id=69090
https://serverfault.com/a/877508/268837
Following the notes from the bug discussion, you should also consider setting opcache.validate_permission=1

Multiple PHP Pools for SAME User - Nginx Upstream on Debian

I'm trying to take advantage of nginx upstream using socket but receiving errors in my log:
connect() to unix:/var/run/user_fpm2.sock failed (2: No such file or directory) while connecting to upstream
I might be going about this wrong and looking for some advice/input.
Here's the nginx conf block:
upstream backend {
server unix:/var/run/user_fpm1.sock;
server unix:/var/run/user_fpm2.sock;
server unix:/var/run/user_fpm3.sock;
}
And:
location ~ \.php$ {
fastcgi_split_path_info ^(.+\.php)(.*)$;
fastcgi_pass backend;
fastcgi_index index.php;
include fastcgi_params;
}
Then, I have 3 PHP pools at /etc/php/7.0/fpm/pool.d/ that look pretty much the same as below. The only difference between the pools is _fpm1, _fpm2, and _fpm3 to match the upstream block.
[user]
listen = /var/run/user_fpm1.sock
listen.owner = user
listen.group = user
listen.mode = 0660
user = user
group = user
pm = ondemand
pm.max_children = 200
pm.process_idle_timeout = 30s
pm.max_requests = 500
request_terminate_timeout = 120s
chdir = /
php_admin_value[session.save_path] = "/home/user/_sessions"
php_admin_value[open_basedir] = "/home/user:/usr/share/pear:/usr/share/php:/tmp:/usr/local/lib/php"
I've noticed the /var/run always ONLY has the user_fpm3.sock file.
Am I going about this wrong? Is it possible to make this upstream config work? All advice and critique welcome.
I'm running PHP7 on Debian Jessie with nginx 1.10.3 - Server has 6 CPU's and 12GB RAM.
Thanks in advance.
UPDATE: I figured the answer myself, but leaving the question in case someone else is trying to do the same thing, or there's a way to optimize this further.
All I had to do was change my pool names to [user_one], [user_two], and [user_three]
Changing the the name of each PHP pool fixed the problem, like so:
[user_one]
[user_two]
[user_three]

Large File Uploads (> 4GB) with nginx and php-fpm

I am trying to upload large files with nginx and php-fpm and my own PHP script.
For files larger than 4 GB the PHP upload fails. I get a HTTP 200 response with my error message from PHP.
print_r($_FILES) shows me:
[file] => Array
(
[name] => foo.dat
[type] =>
[tmp_name] =>
[error] => 3
[size] => 0
)
According to http://php.net/manual/en/features.file-upload.errors.php error 3 means:
UPLOAD_ERR_PARTIAL
Value: 3;
The uploaded file was only partially uploaded.
All the data gets sent to nginx and yet php says "size=0".
I tried the upload in an HTML form and as XMLHttpRequest with Google Chrome and Firefox, as well as with a little upload program in Java. So an error on the client side or the connection is unlikely.
According to Chrome Network Inspector all data gets sent correctly, the content length in the request header is correct. After the data is sent I get the server response within 5 seconds, so a timeout is unlikely.
My nginx configuration:
client_max_body_size 8000m;
client_body_buffer_size 512k;
fastcgi_buffers 512 32k;
fastcgi_busy_buffers_size 128k;
fastcgi_buffer_size 128k;
fastcgi_temp_file_write_size 256k;
fastcgi_max_temp_file_size 0;
fastcgi_intercept_errors on;
fastcgi_connect_timeout 300;
fastcgi_send_timeout 18000;
fastcgi_read_timeout 1800;
And the php.ini for the php-fpm instance:
upload_max_filesize = 8000M
post_max_size = 8000M
max_input_time = 1800
max_execution_time = 300
memory_limit = 512M
;upload_tmp_dir = // not specified, which means system default according to the php docs. That's /tmp on my machine
The nginx and php-fpm error log say nothing about this. I have restarted nginx and php-fpm multiple times already.
I am running nginx/1.6.0 with php 5.5.11 on Debian 6 64bit.
I will test this on a 2nd server now since I'm thinking the issue might be with Debian and outside of nginx/php-fpm.
Any ideas how to debug this further?
UPDATE 1:
There is enough disk space. To test for partition and/or permission problems I just created an upload folder under
a) the same mount point as the web root
b) and with the same user as php-fpm is running
so simply: upload_tmp_dir = /home/myuser/upload
and I restarted php-fpm
The problem is still the same, I get an HTTP 200 with php error=3 for the file upload. Smaller files can be uploaded and appear temporarily in the new temp folder.
FIX
After updating to the latest php 5.6 version the error is gone. I am using the same php.ini file with that version, so it's probably a changed default setting or a fixed bug in php.

High Mysql Connexions in Sleeping state + High number of PHP instances running / Magento

UPDATE
Still in pain ... nothing found :(
I'm honestly willing to donate to anyone who could jelp me solve this, it's getting obsessional lol.
On a Proxmox distrib, I have a VM with a Debian installed.
ON this Debian : Nginx / PHP5-FPM / APC / Memcached and MySQL are running with a big MAGENTO multi-website setup.
Sometimes, (randomly or around 9am depends) The server load is increasing.
What I can see during this peek is :
High number of PHP-FPM instances in HTOP
Higyh number of MySQL connexions with most of them in sleeping state with a big "moment" value like 180 or sometimes more.
Server's memory is not full, free -h tells me memory is not the issue here.
TCP connexions from visitors is not high so, I don't think trafic is the issue neither
Looks like there is something (a php script I would say), that is triggered either by the cron or by a visitor (like a research or something else), and it's taking a lot of time to process, probably locking some MySQL tables and preventing other processes to run, leading to a massive freeze.
I'm trying hard to figure out what is causing this problem, or just find "ways" to debug it eficiently.
What I tried already :
Tracing some of the php processes with HTOP to find
some informations. That's how I found out that mysql's process had some message idnicating it cannot connect to a ressource because it was busy.
Searched in /var/log/messages and /var/log/syslog for information but got nothing relevant.
Searched in /var/log/mysql for some error logs but got nothing at all.
Searched in /var/log/php5-fpm.log and got many messages indicating that processes are exiting with code 3 after a "LONG" period of time (probably the process trying to get mysql ressource and never getting it ?) like :
WARNING: [pool www] child 23839 exited with code 3 after 1123.453563 seconds from start
or even :
WARNING: [pool www] child 29452 exited on signal 15 (SIGTERM) after 2471.593537 seconds from start
Searched in Nginx website's error file and found multiple messages indicating that visitors connexions timed out due to the 60 seconds timeout I set in Nginx config file.
Here are my settings :
Nginx website's config file :
location ~ \.php$ {
if (!-e $request_filename) {
rewrite / /index.php last;
}
try_files $uri =404;
expires off;
fastcgi_read_timeout 60s;
fastcgi_index index.php;
fastcgi_split_path_info ^(.*\.php)(/.*)?$;
fastcgi_pass unix:/var/run/php5-fpm.sock;
fastcgi_param SCRIPT_FILENAME $document_root$fastcgi_script_name;
Nginx main config file :
sendfile on;
tcp_nopush on;
tcp_nodelay on;
keepalive_timeout 65;
types_hash_max_size 2048;
fastcgi_read_timeout 60;
client_max_body_size 30M;
PHP-FPM is in onDemand mode
default_socket_timeout = 60
mysql.connect_timeout = 60
PHP-FPM pool's config file
pm=ondemand
pm.max_children = 500
pm.start_servers = 10
pm.min_spare_servers = 5
pm.max_spare_servers = 10
pm.process_idle_timeout = 10s;
pm.process_idle_timeout = 10s;
pm.max_requests = 5000 (was thinking about reducing this value to force processes to respawn, if someone has experience with it, I'm interested in hearing it)
Thank you for your time reading this, I will update the content here if needed.
Regards
Sorcy
Did you check the cronjobs in crontab and Magento to make sure this is not any job?
Does this weird server behaviour slowdown your site? Im not sure, but this can also be an Slowloris DDos attack, where a lot HTTP connections open and because of a bug doesnt get closed. Maybe I gave you a hint with that.

Categories