Debugging the Cause of Stuck PHP Processes

Debugging the Cause of Stuck PHP Processes - php

I'm trying to figure out what is causing my system to open a large number of PHP threads. This issue has occurred 3 times over the last 2 weeks, and is capable of crashing our application if undetected for several hours, as once it opens up 300 database connections it prevents anyone further from connecting.
The application is based on CakePHP 2.X, is running across multiple EC2 Instances, which share an RDS database.
The primary identifier that something is going wrong is high number of database connections, as shown by this graph:
We have CloudWatch monitoring setup to notify us on slack when average connections go above 40 for more than 5 minutes (normally connections don't go much above 10).
Looking at New Relic I can also see that the number of php processes steadily increased by 1 per minute. This is on our operations server which just handles background processing and tasks, and does not handle any web traffic.
Over the same time the graphs on the web servers appear normal.
In looking at New Relics information on long-running processes there is no information provided that would suggest any php processes ran for 20+ minutes, however, these processes were killed manually which may be why they're not visible within New Relic - I believe it may not record processes which are killed.
While this issue has now occurred 3 times, I'm still unsure what is causing the problem or how to debug what a particular running php thread is doing.
The last time this happened I could see all the php threads running, and could see they had been running for some time, but had no idea what they were doing or how to find out what they were doing, and to prevent the database from becoming overloaded I had to kill them all.
Are there any tools, or other information I am overlooking here which may help me in my search to determine which particular process is causing this issue?

Related

How can I open multiple copies of phpMyAdmin simultaneously to our mySQL database server when it is busy?

This is my first nervous question on SO because all of my questions in the last decade have already had excellent answers.
I have searched all the terms that I can think of with no hits that appear to address the problem - either on SO or Google generally...
For the last 15 years we have used phpMyAdmin to administer a linux MySQL manufacturing database of about 100 tables, some of which are now 50 to 300 million records each. Ongoing development is constant, and manual lookup of various tables to correct erroneous data, or to modify table indexes etc are frequent as the size of the data grows. All of this is internal to our fast network - i.e. accessed via our intranet. Most queries are short, and the database runs responsively at a low average loading.
As may be understood, DBA mistakes happen. For example to speed up a slow query, an additional index may be added to a large table without enough thought. At this point, the re-indexing may take 30 minutes, and the manufacturing applications (written in php for Apache2 also on a linux server) come to an immediate halt. This is not appreciated in the factory.
And here is the real problem. I cannot then from my development PC open a second instance of phpMyAdmin to kill the unwanted MySQL process while it is still busy. Which is the very time I need to most :-) The browser just goes into waiting for the phpMyAdmin page to load until after the long query is finished.
If I happen to have a second instance pf phpMyAdmin open already, I can look up the process and kill it satisfactorily. Normallly, my only resort is to restart Apache2 and/or MySQL on the server. This is too drastic and requires re-starting many client machines as well in order to re-establish necessary manufacturing connections to the database.
I have seen reference on SO that Apache will queue requests from the same IP address in the case of php programs using file-based session management, but it seems to me that I have no control over how phpMyAdmin uses its sessions.
I also read some time ago that if multiple CPU cores were brought into play on the database server, multiple simultaneous connections could be made despite one such query still being busy. I cannot now find any reference to this concept.
Does anyone please know how to permit or force a second phpMyAdmin connection from the same PC to the same database server using phpMyAdmin while the first instance of phpMyAdmin is still tied up with a previous slow query?
Many thanks, Jem Stanners

Try mySQL Workbench
https://dev.mysql.com/downloads/workbench/
Try upgrading servers RAMs an processors
Consider cleaning the tables and delete rows if possible
Consider shifting to Oracle (cost is to be considered)

Apache server slow when high HTTP API call

I am running HTTP API which should be called more than 30,000 time per minute simultaneously.
Currently I can call it 1,200 time per minute. If I call 1200 time per minute, all the request are completed and get response immediately.
But if I called 12,000 time per minute simultaneously it take 10 minute to complete all the request. And during that 10 minute, I cannot browse any webpage on the server. It is very slow
I am running CentOS 7
Server Specification
Intel® Xeon® E5-1650 v3 Hexa-Core Haswell,
RAM 256 GB DDR4 ECC RAM,
Hard Drive2 x 480 GB SSD(Software-RAID 1),
Connection 1 Gbit/s
API- simple php script that echo the time-stamp
echo time();
I check the top command, there is no load in the server
please help me on it
Thanks

Sounds like a congestion problem.
It doesn't matter how quick your script/page handling is, if the next request gets done within the execution time of the previous:
It is going to use resources (cpu, ram, disk, network traffic and connections).
And make everything parallel to it slower.
There are multiple things you could do, but you need to figure out what exactly the problem is for your setup and decide if the measure produces the desired result.
If the core problem is that resources get hogged by parallel processes, you could lower connection limits so more connections go in to wait mode, which keeps more resources available for actually handing out a page instead of congesting everything even more.
Take a look at this:
http://oxpedia.org/wiki/index.php?title=Tune_apache2_for_more_concurrent_connections
If the server accepts connections quicker then it can handle them, you are going to have a problem which ever you change. It should start dropping connections at some point. If you cram down French baguettes down its throat quicker then it can open its mouth, it is going to suffocate either way.
If the system gets overwhelmed at the network side of things (transfer speed limit, maximum possible of concurent connections for the OS etc etc) then you should consider using a load balancer. Only after the loadbalancer confirms the server has the capacity to actually take care of the page request it will send the user further.
This usually works well when you do any kind of processing which slows down page loading (server side code execution, large volumes of data etc).
Optimise performance
There are many ways to execute PHP code on a webserver and I assume you use appache. I am no expert, but there are modes like CGI and FastCGI for example. Which can greatly enhance execution speed. And tweaking settings connected to these can also show you what is happening. It could for example be that you use to little number of PHP threats to handle that number of concurrent connections.
Have a look at something like this for example
http://blog.layershift.com/which-php-mode-apache-vs-cgi-vs-fastcgi/
There is no 'best fit for all' solution here. To fix it, you need to figure out what the bottle neck for the server is. And act accordingly.
12000 Calls per minute == 200 calls a second.
You could limit your test case to a multitude of those 200 and increase/decrease it while changing settings. Your goal is to dish that number of requestst out in a shortest amount of time as possible, thus ensuring the congestion never occurs.
That said: consequences.
When you are going to implement changes to optimise the maximum number of page loads you want to achieve you are inadvertently going to introduce other conditions. For example if maximum ram usage by Apache would be the problem, the upping that limit will ensure better performance, but heightens the chance the OS runs out of memory when other processes also want to claim more memory.
Adding a load balancer adds another possible layer of failure and possible slow downs. Yes you prevent congestion, but is it worth the slow down caused by the rerouting?
Upping performance will increase the load on the system, making it possible to accept more concurrent connections. So somewhere along the line a different bottle neck will pop up. High traffic on different processes could always end in said process crashing. Apache is a very well build web server, so it should in theories protect you against said problem, however tweaking settings wrongly could still cause crashes.
So experiment with care and test before you use it live.

Server overload due to multiple xhr requests

Recently L started experiencing performance issues with my online application hosted on bluehost.
I have an online form that takes a company name and event handler "onKeyUp" tied up to that field. Every time you put a character into the field it sends request to server which makes multiple mysql queries to get the data. Mysql queries all together take about 1-2 seconds. But since requests are send after every character that is put in it easily overloads the server.
The solution for this problem was to cancel previous XHR request before sending a new one. And it seemed to work fine for me (for about a year) until today. Not sure if bluehost changed any configuration on server (I have VPS), or any php/apache settings, but right now my application is very slow due to the amount of users i have.
And i would understand gradual decrease in productivity that may be caused bu database grow, but it suddenly happened over the weekend and speeds went down like 10 times. usual request that took about 1-2 seconds before now takes 10-16 seconds.
I connected to server via SSH & ran some stress test sending lots of queries to see what process monitor (top) will show. And as I expected, for every new request it was a php process created that was put in queue for processing. This queue waiting, apparently, took the most of wait-time.
Now I'm confused, is it possible that before (hypothetical changes on server) every XHR Abort command was actually causing PHP process to quit, reducing additional load on server, and therefore making it work faster? And now for some reason this doesn't work anymore?
I have WAMP installed on Windows 7, as my test environment, and when I export the same database and run the stress-test locally it works fast. Just like it used to be on server before. But on windows I dont have such handy process monitor as TOP, so i cannot see if php processes are actually created and killed respectively.
Not sure how to do the troubleshooting at this point.

PHP/Mysql Connection Time

I have a script which attempts to stop long running queries on a MySQL server. The logic - when the server starts to slow down for whatever reason, it accumulates a rush of queries as each user refreshes his page, each connection hanging in a queue, not being stopped by PHP's time limit, and preventing new connections. In addition, a mistaken query might use a lot of resources.
I have encountered a strange situation recently with this system. We have two cron scripts running constantly. Normally, their connections don't have a time more than 1 in "SELECT * FROM INFORMATION_SCHEMA.PROCESSLIST". For some reason, the other day, these connections were increased in time 50+ seconds, but they did not have a query attached to them. I was not able to see this live, but it was recorded in the log clearly enough to be traced back to these processes.
My question is why would these connections suddenly increase in their duration? Especially, why, given that they did not have a query, but were in sleep mode. (As proof, my logs showed Time=78,State='',Info=0 - not sure why 0). In PHP, I am using PDO with the standard options, except a ATTR_TIMEOUT of 30 for CLI scripts. Also, there was reported a slowness on the site at the time of these problem connections.

What can be causing an "exceeded process limit" error?

I launched a website about a week ago and I sent out an email blast to a mailing list telling everyone the website was live. Right after that the website went down and the general error log was flooded with "exceeded process limit" errors. Since then, I've tried to really clean up a lot of the code and minimize database connections. I will still see that error about once a day in the error log. What could be causing this error? I tried to call the web host and they said it had something to do with my code but couldn't point me in any direction as to what was wrong with the code or which page was causing the error. Can anyone give me any more information? Like for instance, what is a process and how many processes should I have?

Wow. Big question.
Obviously, your maxing out your apache child worker processes. To get a rough idea of how many you can create, use top to get the rough memory footprint of one http process. If you are using wordpress or another cms, it could easily be 50-100m each (if you're using the php module for apache). Then, assuming the machine is only used for web serving, take your total memory, subtract a chunk for OS use, then divide that by 100m (in this example). Thats the max worker processes you can have. Set it in your httpd.conf. Once you do this and restart apache, monitor top and make sure you don't start swapping memory. If you do, you have set too high a number of workers.
If there is any other stuff running like mysql servers, make space for that before you compute number of workers you can have. If this number is small, to roughly quote a great man 'you are gonna need a bigger boat'. Just kidding. You might see really high memory usage for a http process like over 100m. You can tweak your the max requests per child lower to shorten the life of a http process. This could help clean up bloated http workers.
Another area to look at is time response time for a request... how long does each request take? For a quick check, use firebug plugin for firefox and look at the 'net' tab to see how long it takes for your initial request to respond back (not images and such). If for some reason request are taking more than 1 or 2 seconds to respond, that's a big problem as you get sort of a log jam. The cause of this could be php code, or mysql queries taking too long to respond. To address this, make sure if you're using wordpress to use some good caching plugin to lower the stress on mysql.
Honestly, though, unless your just not utilizing memory by having too few workers, optimizing your apache isn't something easily addressed in a short post without detail on your server (memory, cpu count, etc..) and your httpd.conf settings.
Note: if you don't have server access you'll have a hard time figuring out memory usage.

The process limit is typically something enforced by shared webhost providers, and generally has to do with the number of processes executing under your account. This will typically equate to the number of connections made to your server at once (assuming one PHP process per each connection).
There are many factors that come into play. You should figure out what that limit is from your hosting provider, and then find a new one that can handle your load.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.