Debugging MySql Gone away errors, randomly geting multiple sleeping threads

Debugging MySql Gone away errors, randomly geting multiple sleeping threads - php

To start with the issue itself: My server randomly has issues where pages stop loading for a few minutes (often reaching a 300sec timeout), or load extremely slowly. It does not necessarily correlate with an increase in traffic, but is more likely to happen when it increases. Sometimes I can just be messing around, things gets slow for a couple minutes, then everything is back to normal, same files acessed. The system itself is very simple: It does nothing beyong basic data insertion and reading, no table has more than a couple thousand rows. Google Analytics show peak concurrent access at around 300 per hour on peak times.
This is an old project that came to our company recently. The PHP code is very old, and I spent a lot of time refactoring queries to use a centralized PDO connection over using mysql_connect before every time. However, in the old server, these issues did not happen, just on the new.
Now, for the catch. Our hosting service said that this was actually an issue on their server, and I couldn't find a code reason for that either. After a couple months, we decided to change to a new, dedicated server on a different host...and the issue remains. So there has to be something wrong with the code, somewhere, or a configuration I need to change.
Thing is, there's no specific file or script causing it. Since now I have WHM control, I can see there's no issues with memory or CPU usage. Slow log query is empty, set at 20 seconds to log. What happens is that everything just runs very slowly, then just as easily returns to normal.
Errors that I can see on the log are MySql Gone away, or others like
AH01071: Got error 'Primary script unknown
or
(70007)The timeout specified has expired: [client IP] AH01075: Error dispatching request to : (polling) (Timeout is set at 300 seconds, nothing should take more than a couple to run).
I turned the general log on and noticed one interesting thing: When it happens, some threads take more than a minute to close. Query goes like this on the log
SELECT *,
COUNT(DISTINCT(command_type)) as conta,
MIN(event_time) as inicio,
MAX(event_time) as fim,
timediff(MAX(event_time),MIN(event_time)) as diferenca
FROM `general_log`
WHERE user_host LIKE '%hostname%'
GROUP BY thread_id HAVING diferenca > "00:01:00"
And it shows a couple results, one of which I know for a fact lines up to a perceived slowness. More interestingly, the last command before slowdown is this select
SELECT * FROM publicidade WHERE secao = 1 ORDER BY posicao ASC, data_insercao DESC LIMIT 2
Thing is, this select always runs fast, on a table with 29 rows, returning 1 result. Makes no sense for this to screw things up, but it always is the last query command shown, even found 2 situations where the thread never got to QUIT.
So at this point I'm at wits end. Problems keep happening, it's intermitent, comes and go for no discernible reason, and I cannot find out why. slow_queries log is empty, so it's not like there's a query hanging on, it just goes sleeping after this query until it timeouts, no further PHP errors, nothing. And then minutes later the same scripts just run fine as if nothing happened. I could even see situations where www / non www have different results (one is fine, other is not), as well as direct access via IP.
How can I debug what's going on? What could be a likely reason?
By the way persistent connections are off.

Unfortunately, it is simply not enough information to recommend anything useful.
Have you checked server I/O?
You should check if there is no reverse lookup issues if you mysql user permission checked 'per domain' (you can try to switch to 'per IP')
You can try MySQLTuner-perl, it might give you some hints you missed.
You can try to use webyog tools (trial). It might also give you some clues.

Related

Possible causes for connection interrupted, LAMP stack

MySQL 5.1.73
Apache/2.2.15
PHP 5.6.13
CentOS release 6.5
Cakephp 3.1
After about 4 minutes (3 min, 57 seconds) the import process I'm running stops. There are no errors or warnings in any log that I can find. The import process consists of a lot of SQL calls and data processing, nothing too crazy, but it can take about 10 minutes to get through 5500 records if it's doing a full compare for updates.
Firefox: Secure Connection Failed - The connection to the server was reset while the page was loading.
Chrome: ERR_NO RESPONSE
The php set time limit is set to 900, which is working. I can set it to 5 seconds and get an error. The limit is not being reached.
I can sleep another controller for 10 minutes, and this error does not happen, indicating that something in the actual program is causing it to fail, and not the hosting service killing the request because it's taking too long (read about VPS doing this to prevent spam).
The php errors are turned all the way up in the php.ini, and just to be sure, in the controller itself.
The import process completes if I reduce the size of the file being imported. If it's just long enough, it will complete AND show the browser message. This indicates to me it's not failing at the same point of execution each time.
I have deleted all the cache and restarted the server.
I do not see any output in the apache logs other then that the request was made.
I do not see any errors in the mysql log, however, I don't know if it's because its not turned on.
The exact same code works on my local host without any issue. It's not a perfect match to the server, but it's close. Ubuntu Desktop vs Centos, php 5.5 vs php 5.6
I have kept an eye on the memory usage and don't see any issues there.
At this point I'm looking for any good suggestions on what else to look at or insights into what could be causing the failure. There are a lot of possible places to look, and without an error, it's really difficult to narrow down where the issue might be. Thanks in advance for any advice!
UPDATE
After taking a closer look at the memory usage during the request, I noticed it was getting much higher than it ideally should.
The httpd (apache) process gets killed and a new thread spawned. Once the new thread runs out of memory, the error shows up on the screen. When I had looked at it previous, it was only at 30%, probably because it had just killed the old process. Watching it the whole way through, I saw it get as high as 80%, which with the other processes was enough to get have it run out of memory, and a killed process can't log anything, hence the no errors or warnings. It is interesting to me that the process just starts right back up.
I found a command to show which processes had been killed due to memory which proved very useful:
dmesg | egrep -i 'killed process'

I did have similar problems with debugkit.
I had bug in my code during memory peak and the context was written to html in the error "log".

PHP/Mysql Connection Time

I have a script which attempts to stop long running queries on a MySQL server. The logic - when the server starts to slow down for whatever reason, it accumulates a rush of queries as each user refreshes his page, each connection hanging in a queue, not being stopped by PHP's time limit, and preventing new connections. In addition, a mistaken query might use a lot of resources.
I have encountered a strange situation recently with this system. We have two cron scripts running constantly. Normally, their connections don't have a time more than 1 in "SELECT * FROM INFORMATION_SCHEMA.PROCESSLIST". For some reason, the other day, these connections were increased in time 50+ seconds, but they did not have a query attached to them. I was not able to see this live, but it was recorded in the log clearly enough to be traced back to these processes.
My question is why would these connections suddenly increase in their duration? Especially, why, given that they did not have a query, but were in sleep mode. (As proof, my logs showed Time=78,State='',Info=0 - not sure why 0). In PHP, I am using PDO with the standard options, except a ATTR_TIMEOUT of 30 for CLI scripts. Also, there was reported a slowness on the site at the time of these problem connections.

PHP/MySQL: What would cause a white page even though display PHP errors are turned on?

You'll have to excuse my lack of details in regards to this question as I am still trying to work out what's going on.
I understand there may not be a straight answer to this but any help I can get will help me further debug the issue.
My issue is that all of a sudden my PHP script will exit and display a white page. No PHP or MySQL errors on the page and none in the error logs.
The issue occurs at very random times. When it does occur, it "appears" to be when there are a large number of MySQL queries are run at one time. When I say large, it might be a few hundred when sending out emails. Sometimes thousands, if a large import is occurring.
The last time this issue happened was last night when a user tried to send out 118 SMS Messages. After each SMS was queue and also stored in the archive, there would have been roughly a couple hundred queries.
I tried to replicate the issue today when trying to send 125 and 250 SMS Messages on two different occasions. Both worked fine. I then tried sending 250 SMS Messages and 250 emails and also worked fine.
I am using Amazon Elastic Beanstalk for my PHP pages and RDS for my MySQL database.
Does this sound like a PHP or MySQL issue? And if neither are giving me anything in the error logs, do you have any suggestions as to what I can do to further debug this? Are there some other hidden logs or logging I should turn on?
Or is there any MySQL or PHP settings I should look at to try get around the issue?

Configuration side:
First, look into the server's error log (it is different from PHP error log). For example, apache has its own log files, related to the startup of different modules/server messages and etc. PHP's error log is a separate log, so if there are no messages there - it doesn't mean anything.
Second, look into php.ini and see your log settings - which level of errors are written.
Program side:
First, split your code, so that it processes a maximum of 50 records per run. Redo your scripts so that it runs and re-runs until executes all necessary actions.
Second, look into time/memory limits - are they sufficient to execute your operations? Say, sending mail takes 1 second, if your time limit is 30 seconds - you can only send a maximum of 30 emails. It is related to the first part, since you want to partition your tasks into segments which can be safely executed within the provided limits.

If this helps anyone, the issue ending up being my DNS provider (Route S3). Even though I had increased my time limit on my PHP (max_execution_time), my DNS provider had a time limit of 60 seconds. So as soon as the 60 seconds ticked past, it killed it. That's why I didn't get any errors.
I've increased this limit but will also be relooking at my code :)

What can be causing an "exceeded process limit" error?

I launched a website about a week ago and I sent out an email blast to a mailing list telling everyone the website was live. Right after that the website went down and the general error log was flooded with "exceeded process limit" errors. Since then, I've tried to really clean up a lot of the code and minimize database connections. I will still see that error about once a day in the error log. What could be causing this error? I tried to call the web host and they said it had something to do with my code but couldn't point me in any direction as to what was wrong with the code or which page was causing the error. Can anyone give me any more information? Like for instance, what is a process and how many processes should I have?

Wow. Big question.
Obviously, your maxing out your apache child worker processes. To get a rough idea of how many you can create, use top to get the rough memory footprint of one http process. If you are using wordpress or another cms, it could easily be 50-100m each (if you're using the php module for apache). Then, assuming the machine is only used for web serving, take your total memory, subtract a chunk for OS use, then divide that by 100m (in this example). Thats the max worker processes you can have. Set it in your httpd.conf. Once you do this and restart apache, monitor top and make sure you don't start swapping memory. If you do, you have set too high a number of workers.
If there is any other stuff running like mysql servers, make space for that before you compute number of workers you can have. If this number is small, to roughly quote a great man 'you are gonna need a bigger boat'. Just kidding. You might see really high memory usage for a http process like over 100m. You can tweak your the max requests per child lower to shorten the life of a http process. This could help clean up bloated http workers.
Another area to look at is time response time for a request... how long does each request take? For a quick check, use firebug plugin for firefox and look at the 'net' tab to see how long it takes for your initial request to respond back (not images and such). If for some reason request are taking more than 1 or 2 seconds to respond, that's a big problem as you get sort of a log jam. The cause of this could be php code, or mysql queries taking too long to respond. To address this, make sure if you're using wordpress to use some good caching plugin to lower the stress on mysql.
Honestly, though, unless your just not utilizing memory by having too few workers, optimizing your apache isn't something easily addressed in a short post without detail on your server (memory, cpu count, etc..) and your httpd.conf settings.
Note: if you don't have server access you'll have a hard time figuring out memory usage.

The process limit is typically something enforced by shared webhost providers, and generally has to do with the number of processes executing under your account. This will typically equate to the number of connections made to your server at once (assuming one PHP process per each connection).
There are many factors that come into play. You should figure out what that limit is from your hosting provider, and then find a new one that can handle your load.

MySQL progressively decaying...?

I have a Zend/PHP script that reads rows from a table in one MySQL DB, transforms the data, and adds rows to a second table in another MySQL DB.
As I've bee debugging the script, it has been getting less and less far along before tossing an error. Right now, after adding 60 rows, it quits. In the beginning it was adding 300+ rows. The source data hasn't changed.
I've got try...catch blocks around every ounce of code and I'm not getting anything but a generic "broken" style error - it's possible something is tossing an error in the Zend Framework that isn't being caught, but I don't understand the relation to rows being added.
It literally went from adding 83 rows to 80 to 74 to 63 to 60... with no code changes in between. I emptied the target database between tries. I've optimized and flushed the database, and I've restarted MySQL, I've restarted the WHOLE DARN SERVER... and it sticks with the same pattern.
Any wild guesses on what I could look at or try?

There must be data accumulating somewhere on disk (or on another machine you are not considering) or else you would not see a slowdown across restarts. I would have guessed a memory leak eating up VM and gradually forcing the machine into heavy swapping, slowing things down - but that, again, should not persist across restarts.
Assuming you are not bumping into some funky HW error, like a died cooling on a Pentium 4 or newer CPU, causing the CPU itself to gradually heat up and slow down as a response, making your script be able to do less and less work until it bumps into the script execution time limit configured in PHP.

The problem turned out to be a memory leak of some kind in MySQL - after persuading them to upgrade to a more 21st-century version, the problem vanished. We did verify the memory leak before doing so.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.