How to diagnose NGINX/PHP slowness - php

Developing NGINX/PHP web site. Significant discrepancy between NGINX and PHP processing times and I don't know how to diagnose.
Pulling a JPEG from the NGINX server is speedy.
ab -l -c 100 -n 10000 http://blah.com/a_40kb_file.jpg
7,700 Requests per second. NGINX log says served between 2ms and 8ms.
Pulling a PHP front page is another matter. It's a simple form with no graphics so each connection represents a page. Each download is approx 3kb. Each page is different as it contains a randomised token in the form.
ab -l -c 100 -n 10000 http://blah.com/
900 requests per second. NGINX log says served between 15ms and 250ms. No errors reported from NGINX. PHP-FPM complained about reaching max pm.max_children. Increased until no more error.
The PHP script records execution time by simple microtime(true) at the beginning and end. These are showing that:
Calling a single page, the NGINX log time and the PHP rune time broadly align (approx 1 to 2 ms).
Under simulated load, the NGINX time goes mad but the PHP execution time remains the same.
NGINX is waiting around somewhere for something to happen and I don't know how to diagnose. Are any tools/methods available?
NGINX.CONF
worker_processes auto;
worker_connections 768;
Dev platform:
1 Core VM. Vbox 6.0. Guest: Ubuntu 18.04. Intel i5-6260U.

Related

php-fpm processes stuck on state "getting request informations"

My webserver has been experiencing a problem of php-fpm active processes slowly increasing till the pm.max_children setting is reached, at which point it's stuck and I need to restart php-fpm.
(os: ubuntu 20.0.4, webserver: Caddy, php-fpm version: 7.1, pm = dynamic, running Laravel 5.5 framework)
I've enable the php-fpm status page and found that many processes are stuck in the "Getting request informations" state.
Example row from output: of /status?html&full (this has been stuck here for over an hour)
pid
state
start time
start since
requests
request duration
request method
request uri
content length
user
script
last request cpu
last request memory
1772235
Getting request informations
24/Jun/2021:15:03:07 +0000
5111
131
4625314443
POST
/api.php?t=removed&e=/role/checkOut/3461
5542139
-
/var/www/nameremoved/app/fe/production/api.php
0.00
0
Can anyone shed some light on what the "Getting request informations" state is? I can't seem to find anywhere it's documented.
In php.ini I have:
max_execution_time = 180
Yet this seems to be ignored..
The scripts being run are from Laravel 5.5 and definitely shouldn't take more than a few seconds to execute - they are just basic database operations, maybe with file uploads that could be up to 500MB
I guess my next step could be to set the php-fpm setting:
request_terminate_timeout
and see if that terminates the processes.
The strange thing is I have an identical server set up in a different location (requests are routed to either server based on location) which does not have this problem.
Any advice appreciated :)
UPDATE 25/6/2021
Still happening, it seems to be only for POST requests with file uploads
UPDATE 29/6/2021
I've set request_terminate_timeout=2h
this successfully kills the requests stuck in the "Getting request informations" state.. so this kinda solves the problem but I still have no idea what was causing it
UPDATE 16/6/2022
Now using Php 8.1, Laravel 8, Caddy v 2.4.6 same problem still occurring.
I've added global before and after middleware in Laravel to log each http request with php-fpm process id to try to find the culprit, but it seems the problem is occuring before the before middleware is even being hit..
I have the same behavior with Ubuntu 20.04.3 LTS, Laravel 8, php-fpm 7.4 and caddy 2.4.5.
Restarting either the caddy or php-fpm service immediately frees up the processes. So I first quickly "fixed" it by restarting caddy every 15 minutes via crontab.
Since this doesn't happen with nginx, I'm now running caddy -> nginx -> php-fpm, it works so far.

Docker - Nginx + PHP-FPM = reaching timeout

Simple question: I'm getting Maximum execution time of 0 seconds exceeded, but I don't know why: there are a lot of things written about it, but in script I've no set_timeout or anything like that, just in PHP-FPM config I've php_value[max_execution_time] = 0 and even with this I'm gettting timeout from PHP-FPM's side (nginx is OK, no 504 Gateway Timeout or what's throwing). The thing is long running http request being killed in aprox. 3 mins. Setup is standard nginx + php-fpm (7.2) running in docker.
Thanks for any points!

PHP Uploading with FastCGI on IIS 7.5 stalling/taking forever

Okay first off; this works 100% okay on one server set up, but is messed up on a very similar one - which is why I think it has to be an IIS issue somewhere - I just don't know where.
I have a very standard PHP upload script, but it keeps locking up/freezing then resuming itself on larger files (over 250mb.)
No errors return, and the upload does finish and work fine for files up to 4gb - but it takes forever. You can watch the size of the tmp files as they upload, and it will just stop receiving data - sometimes for several minutes at a time, then just pick back up right where it left off and continue the upload.
I have configured the following in IIS:
CGI:
Time-out: 00:30:00
Activity Timeout: 300000
Idle Timeout: 300000
Request Timeout: 300000
Request Filtering:
Max allowed content length: 4294967295
Max URL Length: 4096
Max query string: 2048
PHP:
post_max_size: 4G
upload_max_filesize: 4G
max_execution_time: 300000
max_file_uploads: 300000
max_input_time: -1
memory_limit: -1
I was previously getting errors from the script taking too long, however upping the Activity, Idle, and Request times have fixed that issue. The uploads do work fine, but take FOREVER.
I have the exact same IIS settings on another dev box running the same upload script and it works flawlessly - so I don't know what I'm missing.
PHP is 5.4.14. I get nothing in the PHP error log or Windows Event Viewer (since no errors are actually thrown as far as I can tell.)
Anyone have any idea of what settings I could be missing somewhere?
Welp, that was stupid. I just asked around and someone did infact turn on "intrusion prevention" at the router level for the one server that was having issues. Disabling that seems to resolve the problem.

NginX issues HTTP 499 error after 60 seconds despite config. (PHP and AWS)

At the end of last week I noticed a problem on one of my medium AWS instances where Nginx always returns a HTTP 499 response if a request takes more than 60 seconds. The page being requested is a PHP script
I've spent several days trying to find answers and have tried everything that I can find on the internet including several entries here on Stack Overflow, nothing works.
I've tried modifying the PHP settings, PHP-FPM settings and Nginx settings. You can see a question I raised on the NginX forums on Friday (http://forum.nginx.org/read.php?9,237692) though that has received no response so I am hoping that I might be able to find an answer here before I am forced to moved back to Apache which I know just works.
This is not the same problem as the HTTP 500 errors reported in other entries.
I've been able to replicate the problem with a fresh micro AWS instance of NginX using PHP 5.4.11.
To help anyone who wishes to see the problem in action I'm going to take you through the set-up I ran for the latest Micro test server.
You'll need to launch a new AWS Micro instance (so it's free) using the AMI ami-c1aaabb5
This PasteBin entry has the complete set-up to run to mirror my test environment. You'll just need to change example.com within the NginX config at the end
http://pastebin.com/WQX4AqEU
Once that's set-up you just need to create the sample PHP file which I am testing with which is
<?php
sleep(70);
die( 'Hello World' );
?>
Save that into the webroot and then test. If you run the script from the command line using php or php-cgi, it will work. If you access the script via a webpage and tail the access log /var/log/nginx/example.access.log, you will notice that you receive the HTTP 1.1 499 response after 60 seconds.
Now that you can see the timeout, I'll go through some of the config changes I've made to both PHP and NginX to try to get around this. For PHP I'll create several config files so that they can be easily disabled
Update the PHP FPM Config to include external config files
sudo echo '
include=/usr/local/php/php-fpm.d/*.conf
' >> /usr/local/php/etc/php-fpm.conf
Create a new PHP-FPM config to override the request timeout
sudo echo '[www]
request_terminate_timeout = 120s
request_slowlog_timeout = 60s
slowlog = /var/log/php-fpm-slow.log ' >
/usr/local/php/php-fpm.d/timeouts.conf
Change some of the global settings to ensure the emergency restart interval is 2 minutes
# Create a global tweaks
sudo echo '[global]
error_log = /var/log/php-fpm.log
emergency_restart_threshold = 10
emergency_restart_interval = 2m
process_control_timeout = 10s
' > /usr/local/php/php-fpm.d/global-tweaks.conf
Next, we will change some of the PHP.INI settings, again using separate files
# Log PHP Errors
sudo echo '[PHP]
log_errors = on
error_log = /var/log/php.log
' > /usr/local/php/conf.d/errors.ini
sudo echo '[PHP]
post_max_size=32M
upload_max_filesize=32M
max_execution_time = 360
default_socket_timeout = 360
mysql.connect_timeout = 360
max_input_time = 360
' > /usr/local/php/conf.d/filesize.ini
As you can see, this is increasing the socket timeout to 3 minutes and will help log errors.
Finally, I'll edit some of the NginX settings to increase the timeout's that side
First I edit the file /etc/nginx/nginx.conf and add this to the http directive
fastcgi_read_timeout 300;
Next, I edit the file /etc/nginx/sites-enabled/example which we created earlier (See the pastebin entry) and add the following settings into the server directive
client_max_body_size 200;
client_header_timeout 360;
client_body_timeout 360;
fastcgi_read_timeout 360;
keepalive_timeout 360;
proxy_ignore_client_abort on;
send_timeout 360;
lingering_timeout 360;
Finally I add the following into the location ~ .php$ section of the server dir
fastcgi_read_timeout 360;
fastcgi_send_timeout 360;
fastcgi_connect_timeout 1200;
Before retrying the script, start both nginx and php-fpm to ensure that the new settings have been picked up. I then try accessing the page and still receive the HTTP/1.1 499 entry within the NginX example.error.log.
So, where am I going wrong? This just works on apache when I set PHP's max execution time to 2 minutes.
I can see that the PHP settings have been picked up by running phpinfo() from a web-accessible page. I just don't get, I actually think that too much has been increased as it should just need PHP's max_execution_time, default_socket_timeout changed as well as NginX's fastcgi_read_timeout within just the server->location directive.
Update 1
Having performed some further test to show that the problem is not that the client is dying I have modified the test file to be
<?php
file_put_contents('/www/log.log', 'My first data');
sleep(70);
file_put_contents('/www/log.log','The sleep has passed');
die('Hello World after sleep');
?>
If I run the script from a web page then I can see the content of the file be set to the first string. 60 seconds later the error appears in the NginX log. 10 seconds later the contents of the file changes to the 2nd string, proving that PHP is completing the process.
Update 2
Setting fastcgi_ignore_client_abort on; does change the response from a HTTP 499 to a HTTP 200 though nothing is still returned to the end client.
Update 3
Having installed Apache and PHP (5.3.10) onto the box straight (using apt) and then increasing the execution time the problem does appear to also happen on Apache as well. The symptoms are the same as NginX now, a HTTP200 response but the actual client connection times out before hand.
I've also started to notice, in the NginX logs, that if I test using Firefox, it makes a double request (like this PHP script executes twice when longer than 60 seconds). Though that does appear to be the client requesting upon the script failing
The cause of the problem is the Elastic Load Balancers on AWS. They, by default, timeout after 60 seconds of inactivity which is what was causing the problem.
So it wasn't NginX, PHP-FPM or PHP but the load balancer.
To fix this, simply go into the ELB "Description" tab, scroll to the bottom, and click the "(Edit)" link beside the value that says "Idle Timeout: 60 seconds"
Actually I faced the same issue on one server and I figured out that after nginx configuration changes I didn't restart the nginx server, so with every hit of nginx url I was getting a 499 http response. After nginx restart it started working properly with http 200 responses.
I thought I would leave my two cents. First the problem is not related with php(still could be a php related, php always surprises me :P ). Thats for sure. its mainly caused of a server proxied to itself, more specifically hostname/aliases names issue, in your case it could be the load balancer is requesting nginx and nginx is calling back the load balancer and it keeps going that way.
I have experienced a similar issue with nginx as the load balancer and apache as the webserver/proxy
In my case - nginx was sending a request to an AWS ALB and getting a timeout with a 499 status code.
The solution was to add this line:
proxy_next_upstream off;
The default value for this in current versions of nginx is proxy_next_upstream error timeout; - which means that on a timeout it tries the next 'server' - which in the case of an ALB is the next IP in the list of resolved ips.
You need to find in which place problem live. I dont' know exact answer, but just let's try to find it.
We have here 3 elements: nginx, php-fpm, php. As you told, same php settings under apache is ok. Does it's same no same setup? Did you try apache instead of nginx on same OS/host/etc.?
If we will see, that php is not suspect, then we have two suspects: nginx & php-fpm.
To exclude nginx: try to setup same "system" on ruby. See https://github.com/garex/puppet-module-nginx to get idea to install simplest ruby setup. Or use google (may be it will be even better).
My main suspect here is php-fpm.
Try to play with these settings:
php-fpm`s request_terminate_timeout
nginx`s fastcgi_ignore_client_abort

Memcached concurrency w/ lighttpd php

I'm having an issue with memcached. Not sure if it's memcached, php, or tcp sockets but everytime I try a benchmark with 50 or more concurrency to a page with memcached, some of those request failed using apache ab. I get the (99) Cannot assign requested address error.
When I do a concurrency test of 5000 to a regular phpinfo() page. Everything is fine. No failed requests.
It seems like memcached cannot support high concurrency or am I missing something? I'm running memcached with the -c 5000 flag.
Server: (2) Quad Core Xeon 2.5Ghz, 64GB ram, 4TB Raid 10, 64bit OpenSUSE 11.1
Ok, I've figured it out. Maybe this will help others who have the same problem.
It seems like the issue can be a combination of things.
Set the sever.max-worker in the lighttpd.conf to a higher number
Original: 16 Now: 32
Turned off keep-alive in lighttpd.conf, it was keeping the connections opened for too long.
server.max-keep-alive-requests = 0
Change ulimit -n open files to a higher number.
ulimit -n 65535
If you're on linux use:
server.event-handler = "linux-sysepoll"
server.network-backend = "linux-sendfile"
Increase max-fds
server.max-fds = 2048
Lower the tcp TIME_WAIT before recycling, this keep close the connection faster.
In /etc/sysctl.conf add:
net.ipv4.tcp_tw_recycle = 1
net.ipv4.tcp_fin_timeout = 3
Make sure you force it to reload with: /sbin/sysctl -p
After I've made the changes, my server is now running 30,000 concurrent connections and 1,000,000 simultaneous requests without any issue, failed requests, or write errors with apache ab.
Command used to benchmark: ab -n 1000000 -c 30000 http://localhost/test.php
My Apache can't get even close to this benchmark. Lighttd make me laugh at Apache now. Apache crawl at around 200 concurrency.
I'm using just a 4 byte integer, using it as a page counter for testing purposes. Other php pages works fine even with 5,000 concurrent connections and 100,000 requests. This server have alot of horsepower and ram, so I know that's not the issue.
The page that seems to die have nothing but 5 lines to code to test the page counter using memcached. Making the connection gives me this error: (99) Cannot assign requested address.
This problem start to arise starting with 50 concurrent connections.
I'm running memcached with -c 5000 for 5000 concurrency.
Everything is on one machine (localhost)
The only process running is SSH, Lighttpd, PHP, and Memcached
There are no users connected to this box (test machine)
Linux -nofile is set to 32000
That's all I have for now, I'll post more information as I found more. It seems like there are alot of people with this problem.
I just tested something similar with a file;
$mc = memcache_connect('localhost', 11211);
$visitors = memcache_get($mc, 'visitors') + 1;
memcache_set($mc, 'visitors', $visitors, 0, 30);
echo $visitors;
running on a tiny virtual machine with nginx, php-fastcgi, and memcached.
I ran ab -c 250 -t 60 http://testserver/memcache.php from my laptop in the same network without seeing any errors.
Where are you seeing the error? In your php error log?
This is what I used for Nginx/php-fpm adding this lines in /etc/sysctl.conf # Rackspace dedicate servers with Memcached/Couchbase/Puppet:
# Memcached fix
net.ipv4.ip_nonlocal_bind = 1
net.ipv4.tcp_tw_recycle = 1
net.ipv4.tcp_fin_timeout = 3
I hope it helps.

Categories