Server getting slower over time - php

I am using a dedicated server for my PHP application. Server slowing down day by day after a reboot everything goes normal. I cache my json results as files and serve them to clients when everything normal response time is about 50ms but when server slows down response time goes up to 17 seconds or more.
This issue affects all server, I can't even login with ssh when this happens.
I don't have enough knowledge about servers.
How can I track this problem?
System Up for 6 Days now, and slowing down started again -
Here are my results
# lsof | wc -l
34255
# free
total used free shared buff/cache available
Mem: 32641048 1826832 6598216 232780 24216000 29805868
Swap: 16760828 0 16760828
My Server has 32gb Ram, 8 Core cpu, Centos 7.
I run a laravel application with 500 unique users daily.
I restarted MySQL service, httpd service, ngnix service, and cleared memory cache, nothing changed. Only the server reboot helps.
Static files response normally, but files served by PHP application or HTTP responses very slow and getting slower day by day.
Login with ssh getting slower too, I use plesk as control panel but it is getting slower too.
I mean this problem affects not only my application but all server.

Related

php-cgi suddenly very slow with minimal CPU on DMZ server

We have a Moodle IIS implementation where the primary data/IIS server is on our LAN, but we also have a public facing IIS server on our DMZ. Until recently, performance when accessing Moodle via the DMZ server was on-par with accessing via the LAN server; but last week I noticed that accessing via the DMZ was very slow and I was often getting 500 timeouts. I increased Activity Timeout for fastcgi and the timeouts disappeared, but the site is now painfully slow.
I monitored Activity Monitor when browsing the site using the LAN server and php-cgi.exe shows CPU goes up while actively browsing (20-25% or so). Monitoring the same on the DMZ server shows no change in CPU utilisation for the php-cgi processes - they all stay at 0-1%.
I moved the DMZ server to the LAN and the performance was immediately as expected: pages loaded quickly and php-cgi CPU utilisation goes up to 20-25% while browsing.
I tested pings and bandwidth when copying files between LAN and DMZ servers and the pings are around 20ms and the bandwidth seems capped at 100 Mbps when on the DMZ. That was unexpected, but I don't have historic pings to prove that latency used to be lower and bandwidth used to be higher.
Our core network provider recently performed maintenance and access to our DMZ dropped completely for a period until they 'fixed' the issue. It feels like they've introduced a bottleneck recently (traffic now routing through a 100 Mbps adapter?) and I have an open ticket, but I'm not sure how to prove this is the issue.
The only logs I can think to check are for IIS and looking at response-time. It looks like this has gone up 2-4x since the maintenance, but it's not as conclusive as I'd like (I'm guessing due to a good amount being locally cached). Is there anything else I could/should be looking at?
Servers are Windows Data Center 2012 R2, php is 7.4 nts 64-bit, and Moodle is 3.10.
Many thanks.
It is difficult to reproduce your problem based on your description. When the server hangs, crashes or performance is low, it is usually necessary to grab the server's thread stack (Thread Dump) for subsequent analysis. So I suggest you open a case via: https://support.microsoft.com, there will be professional technicians to assist you in grabbing the dump file and analyzing it.

webpage request with 10 pictures and some text freezes on nginx and php-fpm and disconnects other services?

is it possible that a request for a page, where the server or php might have an issue freezes and even disconnects other not related SSH services?
I am running a simple webpage (10 pictures and some text) on a dockerized environment with separate reverse proxy, a web server, a database (nginx, php-fpm and postgresql).
The whole system was up without a restart for a year or so, without problems. Now I have a newly occurring issue (about a month) with page/system freezes. When I visit my webpage it locks up from time to time (sometimes 1 instance is enough, other times, I need to open up to 20x) and needs about 30 seconds to start reacting again.The strange thing is that if I am connected in parallel with SSH to the server, it sometimes (not always) also disconnects my terminal. Which is why I believed it hast to do something with the system (but can't find anything there, so trying a different perspective here).
server (only remote access available):
Debian GNU/Linux 9.4 (stretch)
Kernel: 4.9.0-6-amd64 #1 SMP Debian 4.9.82-1+deb9u3 (2018-03-02) x86_64 GNU/Linux
68GB Ram, 8 Core, 2x4 TB HDDs and 1TB SDD
1 GBit-Uplink
I have monitoring installed and there does not seem to be any high workload on the IOs, network, CPU, or other during the lock up (I am not monitoring php stats though). I also have the same setup running on a local test server (different hardware and Kernel 4.9.0-6-amd64 #1 SMP Debian 4.9.88-1+deb9u1 (2018-05-07) x86_64 GNU/Linux) and that server has no freezing issues, so again an argument against the issue being with the dockerized environment or my page code.
I have done so far on the hardware side:
1.) SMART diagnostics - without any obvious issues (the "backup disk (not the one the servers are saved on)" has for some time: 191
G-Sense_Error_Rate 0x0032 001 001 000 , but the provider ran a
separate test some time ago and said that the disk has no issue, and
that the G-Sense_Error_Rate has little informational value anyhow)
2.) atop ( htop and iotop are live and SSH disconnects, thus I can't watch it as the problem occurs) over a 1s interval and 300 samples
(thus 5mins), where i was able to produce multiple freezes, but there
were no obvious load issues (granted this is the first time I am
looking at those things! - but there was also no high level line
coloring that atop does automatically)
3.) I have also a dockerized monitoring stack running (the freeze occurs with it running and with it being disabled, so it should not
come from here either) where I can view the dockers separately and
they also do not show anything alarming
4.) restarted the whole server - issue continues
5.) memtester-d 55 of 65 RAM without issues
6.) no problems in syslog
7.) ping the server, while producing the error and the ping is quick with 27ms, but when the server hangs, I lose 1 ping in about 10 (in those 30-40s, then ping is perfect again). But I cannot figure out, why that is
Where else could I look????
Any suggestions are highly appreciated!
Thanks!
Strange that this has only started to happen within the last few months and was fine previously.
Are you pulling down the latest image for nginx, postgres... etc? Maybe its a problem with the version of the images and could try using a specific release.

Apache server ..Waiting ....for 40 seconds occasionally

I have a apache 2.4, php, mysql based web application. The application works like charm most of the time with 40ms response times. Already performance tuned with apache cache, php's opcache and apache mod_deflate.
However rarely once every 2 hours or so...when I invoke the application on a browser...it hangs saying waiting ..... for few seconds say 40 seconds or so...and then picks up back to super speed.
what do you think could be possible reasons?
Just for information. it's all localhost. and there is no network.

Nginx php-fpm clogs up with writing connections under high load

we have nginx/1.6.2 running with php5-fpm (5.6) on a debian 8 system.
In the past days we got higher load than usual due to more users hitting our servers. With most visitors coming in the evening hours between 6pm and midnight.
Since a couple of days, two different servers runnning the above setup showed very slow response rates for several hours. In Munin, we saw, that there were suddenly hundreds of nginx connections in "writing" state were there were previously only about 20 at a time.
We do not get any errors other than timed out connections on remote hosts when trying to access those servers. All logs I saw were just normal.
The problem can be fixed with a restart of php5-fpm.
My question now is: why do suddenly hundreds of processes claim they are writing? Is there some known issue or maybe config setting we missed which could cause this?
Here is the complete list of symptoms we see:
Instead of < 20 very fast active connections /s we see up to 100 to 900 connections in writing state (all nginx connections hit php5-fpm, static content is not served by these servers) Avg. script runtime for the php scripts is 80ms.
Problem occurs only if total amount of nginx requests /s goes above 300 /s, It then drops from ~350 to ~250 req/s but these 250 show up to 900 "writing" connections
Many of these connections eventually time out and give no correct result
There are no errors in our logs
The eth / database traffic as well as CPU load correspond to the lower level of 250req/s to which the total drops, so there is no "writing" happening afaik.
For the setup:
as stated above. We use the build-in opcode cache of Zend, the APCu for some user variable cache, one of the servers runs a memcache instance (which works fine throughout the problem) and the other is running a Redis version, which also runs fine while the problem occurs.
Can anyone shed some light to what the problem might be?
Thanks!
We found the problem: APCu seems to be unstable with PHP 5.6.
Details:
debian 8
nginx/1.6.2
PHP 5.6.14-0+deb8u1
APCu 4.0.7 (Revision: 328290, 126M shm_size)
we used xhprof to profile requests when the server was slow (see question) and noticed, that APCu took > 100ms per read/write operation. Clearing the APCu variables did not help. All other parts of the code had normal speed.
We completely disabled our use of APCu and the system has been stable since.
So it seems, that this APCu version is unstable under load with PHP 5.6. At least for us.
We had the same problem, and the reason for that was that the data in Redis was more than the "maxmemory" so redis was unable to write any more data. I could login with redis-cli but couldn't set a value, if you are having this issue, you could login to redis using redis-cli and try to set something, if the redis memory is full you'll get an error.

Nginx scaling and bottleneck identification on an EC2 cluster

I am developing a big application and i have to load test it. It is a EC2 based cluster with one HighCPU Ex.Large instance for application which runs PHP / NGinx.
This applicaton is responsible for reading data from a redis server which holds some 5k - 10k key values, it then makes the response and logs the data into a mongoDB server and replies back to client.
Whenever i send a request to the app server, it does all its computations in about 20 - 25 ms which is awesome.
I am now trying to do some load testing and i run a php based app on my laptop to send requests to server. Many thousands of them quickly over 20 - 30 seconds. During this load period, whenever i open the app URL in the browser, it replies back with the execution time of around 25 - 35 ms which is again cool. So i am sure that redis and mongo are not causing bottlenecks. But it is taking about 25 seconds to get the response back during load.
The high CPU ex. large instance has 8 GB RAM and 8 cores.
Also, during the load test, the top command shows about 4 - 6 php_cgi processes consuming some 15 - 20% of CPU.
I have 50 worker processes on nginx and 1024 worker connections.
What could be the issue causing the bottleneck ?
IF this doesnt work out, i am seriously considering moving out to a whole java application with an embedded webserver and an embedded cache.
UPDATE - increased PHP_FCGI_CHILDREN to 8 and it halfed the response time during load
50 worker processes is too many, you need only one worker process per CPU core. Using more worker processes will invoke inter-process switching, that will consume many time.
What you can do now:
1. Set worker process to minimum (one worker per CPU, e.g. 4 worker process if you have 4 cpu units), but worker connections - to maximum (10240 for example)
Tune up TCP stack via sysctl. You can reach stack limits if you have many connections
Get statistics from nginx stub_status module (you can use munin + nginx, its easy to setup and gave you enough information about system status).
Check nginx error.log and system messages log for errors.
Tune up nginx (decrease connection timings and max query size).
I hope that helps you.

Categories