I am running a PHP application (Laravel and MySQL) on a Ubuntu VPS with nginx and php5-fpm installed (both with default settings). I soon experienced some totally random 502 errors, apparently due to php5-fpm which timed out and lost connection to nginx every now and then.
I was desperately looking for a solution on SO and any other resource I could find, but the error persisted: The webserver didn't respond about 40 times over 2 days, with a "downtime" of about 2 mins each. I changed the workers in php5-fpm, the maximum execution time... nothing. The server only showed very low CPU and RAM usage.
I eventually killed the VPS and set up a new one from scratch - with the same result. But instead of showing 502 errors, the request simply takes about 40 secs of constant loading without any content or error displayed. And about 2 mins later, once I hit reload the page loads instantly.
The only thing left I could think of was changing php5-fpm. What I did. I tried using hhvm. But again the same result of constant loading.
I seriously don't know what to do anymore... did anyone of you run into the same problem before?
Cheers
With the help of slow logs I found the issue, it was an external service (GeoJSON request) that randomly slowed down the page and therefore caused the error.
Related
Since about 2 months ago, I've experienced my website slows down (timeout problem), however I've made some checking on the server and my settings and behavior is this (LAMP, VPS 6GB Ram - 4 cpu cores, but I'm not expert on Linux or apache):
When it suddenly starts to hang, I've check the browser network behavior and I've found that images takes up to 24-30 seconds to load (small images from 4K to 180K), some of them fail to load. It also happens to .css files sometimes (10 seconds to load). During this period, only 1 core is used and RAM stays at 1.4GB tops. The server is hosting a website based on CMS (Joomla - SSL - gzip set).
Check browser network tab here
I have apache MPM as prefork with these settings:
KeepAlive On
KeepAliveTimeout 3
MaxKeepAliveRequests 500
StartServers 5
MinSpareServers 5
MaxSpareServers 10
ServerLimit 100
MaxClients 100
MaxRequestsPerChild 3000
I have mod_security enabled, but there isn't any suspicious behavior. I have also server-status enable, and I'm not sure but it doesn't look very loaded (most of process in K and W). The access log shows the usual behavior and no error logs found.
The Database is MariaDB, no hanged queries during this periods and nothing in slow query logs.
The thing is, even if I restart the apache service, the website still hangs. So I tried restarting the server (shutdown -r) and when the server and services are up again, it also hangs. Sometimes when I'm not monitoring the website, it comes back to normal after 20 minutes, but sometimes takes even 3 hours. The problem is that it's a production server and it's not always that happens. Sometimes it happens 2 days in a row, then after 3 or 4 days, sometimes happens twice in the same day.
Any idea what could be happening here? I'm out of clues right now. Thanks in advance
I suspect strongly its the server host. I once had the same exact problem and first thing I did was to get a copy of the website and ran it on wamp server on local machine. That way I was able to clear the confusion of whether its a server or CMS issue.
I posted the case on webmasters.stackexchange.com but no luck, they closed the question. As last resource, a couple of weeks ago I finally decided to move the entire website to a different server with the same characteristics, and boom!, problem solved. So bottom line, it seems the problem was a strange problem with the server itself.
is it possible that a request for a page, where the server or php might have an issue freezes and even disconnects other not related SSH services?
I am running a simple webpage (10 pictures and some text) on a dockerized environment with separate reverse proxy, a web server, a database (nginx, php-fpm and postgresql).
The whole system was up without a restart for a year or so, without problems. Now I have a newly occurring issue (about a month) with page/system freezes. When I visit my webpage it locks up from time to time (sometimes 1 instance is enough, other times, I need to open up to 20x) and needs about 30 seconds to start reacting again.The strange thing is that if I am connected in parallel with SSH to the server, it sometimes (not always) also disconnects my terminal. Which is why I believed it hast to do something with the system (but can't find anything there, so trying a different perspective here).
server (only remote access available):
Debian GNU/Linux 9.4 (stretch)
Kernel: 4.9.0-6-amd64 #1 SMP Debian 4.9.82-1+deb9u3 (2018-03-02) x86_64 GNU/Linux
68GB Ram, 8 Core, 2x4 TB HDDs and 1TB SDD
1 GBit-Uplink
I have monitoring installed and there does not seem to be any high workload on the IOs, network, CPU, or other during the lock up (I am not monitoring php stats though). I also have the same setup running on a local test server (different hardware and Kernel 4.9.0-6-amd64 #1 SMP Debian 4.9.88-1+deb9u1 (2018-05-07) x86_64 GNU/Linux) and that server has no freezing issues, so again an argument against the issue being with the dockerized environment or my page code.
I have done so far on the hardware side:
1.) SMART diagnostics - without any obvious issues (the "backup disk (not the one the servers are saved on)" has for some time: 191
G-Sense_Error_Rate 0x0032 001 001 000 , but the provider ran a
separate test some time ago and said that the disk has no issue, and
that the G-Sense_Error_Rate has little informational value anyhow)
2.) atop ( htop and iotop are live and SSH disconnects, thus I can't watch it as the problem occurs) over a 1s interval and 300 samples
(thus 5mins), where i was able to produce multiple freezes, but there
were no obvious load issues (granted this is the first time I am
looking at those things! - but there was also no high level line
coloring that atop does automatically)
3.) I have also a dockerized monitoring stack running (the freeze occurs with it running and with it being disabled, so it should not
come from here either) where I can view the dockers separately and
they also do not show anything alarming
4.) restarted the whole server - issue continues
5.) memtester-d 55 of 65 RAM without issues
6.) no problems in syslog
7.) ping the server, while producing the error and the ping is quick with 27ms, but when the server hangs, I lose 1 ping in about 10 (in those 30-40s, then ping is perfect again). But I cannot figure out, why that is
Where else could I look????
Any suggestions are highly appreciated!
Thanks!
Strange that this has only started to happen within the last few months and was fine previously.
Are you pulling down the latest image for nginx, postgres... etc? Maybe its a problem with the version of the images and could try using a specific release.
MySQL 5.1.73
Apache/2.2.15
PHP 5.6.13
CentOS release 6.5
Cakephp 3.1
After about 4 minutes (3 min, 57 seconds) the import process I'm running stops. There are no errors or warnings in any log that I can find. The import process consists of a lot of SQL calls and data processing, nothing too crazy, but it can take about 10 minutes to get through 5500 records if it's doing a full compare for updates.
Firefox: Secure Connection Failed - The connection to the server was reset while the page was loading.
Chrome: ERR_NO RESPONSE
The php set time limit is set to 900, which is working. I can set it to 5 seconds and get an error. The limit is not being reached.
I can sleep another controller for 10 minutes, and this error does not happen, indicating that something in the actual program is causing it to fail, and not the hosting service killing the request because it's taking too long (read about VPS doing this to prevent spam).
The php errors are turned all the way up in the php.ini, and just to be sure, in the controller itself.
The import process completes if I reduce the size of the file being imported. If it's just long enough, it will complete AND show the browser message. This indicates to me it's not failing at the same point of execution each time.
I have deleted all the cache and restarted the server.
I do not see any output in the apache logs other then that the request was made.
I do not see any errors in the mysql log, however, I don't know if it's because its not turned on.
The exact same code works on my local host without any issue. It's not a perfect match to the server, but it's close. Ubuntu Desktop vs Centos, php 5.5 vs php 5.6
I have kept an eye on the memory usage and don't see any issues there.
At this point I'm looking for any good suggestions on what else to look at or insights into what could be causing the failure. There are a lot of possible places to look, and without an error, it's really difficult to narrow down where the issue might be. Thanks in advance for any advice!
UPDATE
After taking a closer look at the memory usage during the request, I noticed it was getting much higher than it ideally should.
The httpd (apache) process gets killed and a new thread spawned. Once the new thread runs out of memory, the error shows up on the screen. When I had looked at it previous, it was only at 30%, probably because it had just killed the old process. Watching it the whole way through, I saw it get as high as 80%, which with the other processes was enough to get have it run out of memory, and a killed process can't log anything, hence the no errors or warnings. It is interesting to me that the process just starts right back up.
I found a command to show which processes had been killed due to memory which proved very useful:
dmesg | egrep -i 'killed process'
I did have similar problems with debugkit.
I had bug in my code during memory peak and the context was written to html in the error "log".
we have nginx/1.6.2 running with php5-fpm (5.6) on a debian 8 system.
In the past days we got higher load than usual due to more users hitting our servers. With most visitors coming in the evening hours between 6pm and midnight.
Since a couple of days, two different servers runnning the above setup showed very slow response rates for several hours. In Munin, we saw, that there were suddenly hundreds of nginx connections in "writing" state were there were previously only about 20 at a time.
We do not get any errors other than timed out connections on remote hosts when trying to access those servers. All logs I saw were just normal.
The problem can be fixed with a restart of php5-fpm.
My question now is: why do suddenly hundreds of processes claim they are writing? Is there some known issue or maybe config setting we missed which could cause this?
Here is the complete list of symptoms we see:
Instead of < 20 very fast active connections /s we see up to 100 to 900 connections in writing state (all nginx connections hit php5-fpm, static content is not served by these servers) Avg. script runtime for the php scripts is 80ms.
Problem occurs only if total amount of nginx requests /s goes above 300 /s, It then drops from ~350 to ~250 req/s but these 250 show up to 900 "writing" connections
Many of these connections eventually time out and give no correct result
There are no errors in our logs
The eth / database traffic as well as CPU load correspond to the lower level of 250req/s to which the total drops, so there is no "writing" happening afaik.
For the setup:
as stated above. We use the build-in opcode cache of Zend, the APCu for some user variable cache, one of the servers runs a memcache instance (which works fine throughout the problem) and the other is running a Redis version, which also runs fine while the problem occurs.
Can anyone shed some light to what the problem might be?
Thanks!
We found the problem: APCu seems to be unstable with PHP 5.6.
Details:
debian 8
nginx/1.6.2
PHP 5.6.14-0+deb8u1
APCu 4.0.7 (Revision: 328290, 126M shm_size)
we used xhprof to profile requests when the server was slow (see question) and noticed, that APCu took > 100ms per read/write operation. Clearing the APCu variables did not help. All other parts of the code had normal speed.
We completely disabled our use of APCu and the system has been stable since.
So it seems, that this APCu version is unstable under load with PHP 5.6. At least for us.
We had the same problem, and the reason for that was that the data in Redis was more than the "maxmemory" so redis was unable to write any more data. I could login with redis-cli but couldn't set a value, if you are having this issue, you could login to redis using redis-cli and try to set something, if the redis memory is full you'll get an error.
I have a cloud server in Rackspace with cPanel installed. I have some 16 sites running on it. Out of them 14 sites run under a single account (this is a Drupal multisite installation). Everything has been running fine for last 5 months. Recently my server became unresponsive and had to be rebooted. Later it was found out that the server ran out of memory.
The issue now continue to occur intermittently. At that time I can find a lot of php processes popping out at once, memory usage increases at that time and the comes to nearly 200 MB out of the total 8GB.
/usr/bin/php /home/username/public_html/index.php
All sites become inaccessible. The load average also spikes. After 5-8 minutes the huge number of php processes disappears and then memory usage also comes to normal. This lasts not more than 5-6 minutes
The issue now continue to occur intermittently. And I checked all server logs and could not find any trace of the issue. I checked the server using maldet, rkhunter and could not find any traces of malicious codes or back-doors
The strange issue is that the issue does not occur during the most peak hours. It occurs during off-peak hours as well. There is no pattern in which this issue occurs.
I can find that there were 150 php instances running at once yesterday.
Can someone guide me in the correct direction? Is this a server side issue or has something to do with the internal site functions?