I have fastCGI with PHP
However, server runs fine for several hours and then suddenly I have php error log full of:
mod_fcgid: can't apply process slot for /var/www/php-bin/website/php
At the same time, there is no spike in activity on the web, CPUs are not spiking, all seems normal based on the server usage.
If I restart apache, all is running ok again for several hours and then situation repeats.
I have tried to set higher values to fcgid settings:
FcgidMaxProcesses 18000
FcgidMaxProcessesPerClass 3800
but the problem stil persist.
What is interesting, I also have the second server with totally different setup and SW versions (FastCGI as PHP module), but the same problem sometimes (not so frequently) occurs there as well.
So I am wondering, can this problem be caused by some PHP script? On both servers, there are some PHP libraries that are the same.
In generall, how to track down what is causing this? On debug server, this problem is non-existing and on production, I cannot blindly change settings and reset server over and over again.
Related
Since about 2 months ago, I've experienced my website slows down (timeout problem), however I've made some checking on the server and my settings and behavior is this (LAMP, VPS 6GB Ram - 4 cpu cores, but I'm not expert on Linux or apache):
When it suddenly starts to hang, I've check the browser network behavior and I've found that images takes up to 24-30 seconds to load (small images from 4K to 180K), some of them fail to load. It also happens to .css files sometimes (10 seconds to load). During this period, only 1 core is used and RAM stays at 1.4GB tops. The server is hosting a website based on CMS (Joomla - SSL - gzip set).
Check browser network tab here
I have apache MPM as prefork with these settings:
KeepAlive On
KeepAliveTimeout 3
MaxKeepAliveRequests 500
StartServers 5
MinSpareServers 5
MaxSpareServers 10
ServerLimit 100
MaxClients 100
MaxRequestsPerChild 3000
I have mod_security enabled, but there isn't any suspicious behavior. I have also server-status enable, and I'm not sure but it doesn't look very loaded (most of process in K and W). The access log shows the usual behavior and no error logs found.
The Database is MariaDB, no hanged queries during this periods and nothing in slow query logs.
The thing is, even if I restart the apache service, the website still hangs. So I tried restarting the server (shutdown -r) and when the server and services are up again, it also hangs. Sometimes when I'm not monitoring the website, it comes back to normal after 20 minutes, but sometimes takes even 3 hours. The problem is that it's a production server and it's not always that happens. Sometimes it happens 2 days in a row, then after 3 or 4 days, sometimes happens twice in the same day.
Any idea what could be happening here? I'm out of clues right now. Thanks in advance
I suspect strongly its the server host. I once had the same exact problem and first thing I did was to get a copy of the website and ran it on wamp server on local machine. That way I was able to clear the confusion of whether its a server or CMS issue.
I posted the case on webmasters.stackexchange.com but no luck, they closed the question. As last resource, a couple of weeks ago I finally decided to move the entire website to a different server with the same characteristics, and boom!, problem solved. So bottom line, it seems the problem was a strange problem with the server itself.
we have nginx/1.6.2 running with php5-fpm (5.6) on a debian 8 system.
In the past days we got higher load than usual due to more users hitting our servers. With most visitors coming in the evening hours between 6pm and midnight.
Since a couple of days, two different servers runnning the above setup showed very slow response rates for several hours. In Munin, we saw, that there were suddenly hundreds of nginx connections in "writing" state were there were previously only about 20 at a time.
We do not get any errors other than timed out connections on remote hosts when trying to access those servers. All logs I saw were just normal.
The problem can be fixed with a restart of php5-fpm.
My question now is: why do suddenly hundreds of processes claim they are writing? Is there some known issue or maybe config setting we missed which could cause this?
Here is the complete list of symptoms we see:
Instead of < 20 very fast active connections /s we see up to 100 to 900 connections in writing state (all nginx connections hit php5-fpm, static content is not served by these servers) Avg. script runtime for the php scripts is 80ms.
Problem occurs only if total amount of nginx requests /s goes above 300 /s, It then drops from ~350 to ~250 req/s but these 250 show up to 900 "writing" connections
Many of these connections eventually time out and give no correct result
There are no errors in our logs
The eth / database traffic as well as CPU load correspond to the lower level of 250req/s to which the total drops, so there is no "writing" happening afaik.
For the setup:
as stated above. We use the build-in opcode cache of Zend, the APCu for some user variable cache, one of the servers runs a memcache instance (which works fine throughout the problem) and the other is running a Redis version, which also runs fine while the problem occurs.
Can anyone shed some light to what the problem might be?
Thanks!
We found the problem: APCu seems to be unstable with PHP 5.6.
Details:
debian 8
nginx/1.6.2
PHP 5.6.14-0+deb8u1
APCu 4.0.7 (Revision: 328290, 126M shm_size)
we used xhprof to profile requests when the server was slow (see question) and noticed, that APCu took > 100ms per read/write operation. Clearing the APCu variables did not help. All other parts of the code had normal speed.
We completely disabled our use of APCu and the system has been stable since.
So it seems, that this APCu version is unstable under load with PHP 5.6. At least for us.
We had the same problem, and the reason for that was that the data in Redis was more than the "maxmemory" so redis was unable to write any more data. I could login with redis-cli but couldn't set a value, if you are having this issue, you could login to redis using redis-cli and try to set something, if the redis memory is full you'll get an error.
I have a cloud server in Rackspace with cPanel installed. I have some 16 sites running on it. Out of them 14 sites run under a single account (this is a Drupal multisite installation). Everything has been running fine for last 5 months. Recently my server became unresponsive and had to be rebooted. Later it was found out that the server ran out of memory.
The issue now continue to occur intermittently. At that time I can find a lot of php processes popping out at once, memory usage increases at that time and the comes to nearly 200 MB out of the total 8GB.
/usr/bin/php /home/username/public_html/index.php
All sites become inaccessible. The load average also spikes. After 5-8 minutes the huge number of php processes disappears and then memory usage also comes to normal. This lasts not more than 5-6 minutes
The issue now continue to occur intermittently. And I checked all server logs and could not find any trace of the issue. I checked the server using maldet, rkhunter and could not find any traces of malicious codes or back-doors
The strange issue is that the issue does not occur during the most peak hours. It occurs during off-peak hours as well. There is no pattern in which this issue occurs.
I can find that there were 150 php instances running at once yesterday.
Can someone guide me in the correct direction? Is this a server side issue or has something to do with the internal site functions?
I'm running a fairly typical LAMP stack with PHP running through mod_fcgid. I'd consider the server to be under "high load" given the amount of traffic it receives.
There is an intermittent problem, where Apache is reporting all connections to be in the "Sending content" state ("W" on the monitor) when accessing sites that rely on PHP.
There are no PHP errors to speak of, its as though PHP isn't actually getting called during these "lockup" periods. However, in the apache site logs I'm seeing the following:
(103)Software caused connection abort: mod_fcgid: ap_pass_brigade failed in handle_request function
[warn] mod_fcgid: can't apply process slot for /var/www/cgi-bin/php.fcgi
During this time I can still access sites that do not depend on PHP, such as the apache status and HTML-only virtual hosts (that don't have the PHP handler include).
The php.fcgi script has PHP_FCGI_MAX_REQUESTS=500 set, because I have read there is a race condition problem with PHP running in CGI mode. The fcgid.conf also has MaxProcessCount=15 set.
Has anyone else experience this bug, and if so how can it be resolved?
I managed to fix this one myself.
To solve this problem add in stricter checks in the FastCGI configuration for process hangs, and reduce the lifetime of your PHP instances:
IPCConnectTimeout 20
ProcessLifeTime 120
IdleTimeout 60
IdleScanInterval 30
MaxRequestsPerProcess 499
MaxProcessCount 100
Depending on your requirements, this can satisfy a well-configured server that has in excess of 50k hits per hour.
You will find the number of recorded defunct / "zombie" PHP processes increases significantly. This is good, however, as previously the processes would have simply become unresponsive and the FastCGI manager would have continued to pipe requests to them!
I would also advise removing all override directives from your php.fcgi script, as this can cause problems with your system. Try to manage as much as possible from the primary FastCGI configuration in Apache.
We went with Nginx + http://php-fpm.org/
Try "strace -p".
I also saw lock-ups happen when some PHP software were trying to request file from the same server it's running on (get_file_contents('http://localhost...'))
I've written a PHP script that takes in two CSV files, processes them and returns an HTML table. I developed it on my MacBook running on Apache. When I uploaded the script to our production server, it began having problems. Production is an Ubuntu 10.04 LTS running nginx that forwards requests to Apache/PHP.
I added some debugging statements and tailed the logs so I can see exactly where it's stopping the execution of the script, but there are no errors thrown anywhere. The first file is 1.9 MB and it processes 366 kb before it fails. I've tested this several times and it always fails at the same place. I don't believe it's the file as it's the same file I used for testing the script and it never had a problem on my MacBook.
I've searched the internet and have increased several timeout parameters in nginx to no avail. I'm not sure where to look or what to look for at this point. Can someone point me in the right direction?
have you fully turned on error reporting on the server?