I have a script which attempts to stop long running queries on a MySQL server. The logic - when the server starts to slow down for whatever reason, it accumulates a rush of queries as each user refreshes his page, each connection hanging in a queue, not being stopped by PHP's time limit, and preventing new connections. In addition, a mistaken query might use a lot of resources.
I have encountered a strange situation recently with this system. We have two cron scripts running constantly. Normally, their connections don't have a time more than 1 in "SELECT * FROM INFORMATION_SCHEMA.PROCESSLIST". For some reason, the other day, these connections were increased in time 50+ seconds, but they did not have a query attached to them. I was not able to see this live, but it was recorded in the log clearly enough to be traced back to these processes.
My question is why would these connections suddenly increase in their duration? Especially, why, given that they did not have a query, but were in sleep mode. (As proof, my logs showed Time=78,State='',Info=0 - not sure why 0). In PHP, I am using PDO with the standard options, except a ATTR_TIMEOUT of 30 for CLI scripts. Also, there was reported a slowness on the site at the time of these problem connections.
Related
To start with the issue itself: My server randomly has issues where pages stop loading for a few minutes (often reaching a 300sec timeout), or load extremely slowly. It does not necessarily correlate with an increase in traffic, but is more likely to happen when it increases. Sometimes I can just be messing around, things gets slow for a couple minutes, then everything is back to normal, same files acessed. The system itself is very simple: It does nothing beyong basic data insertion and reading, no table has more than a couple thousand rows. Google Analytics show peak concurrent access at around 300 per hour on peak times.
This is an old project that came to our company recently. The PHP code is very old, and I spent a lot of time refactoring queries to use a centralized PDO connection over using mysql_connect before every time. However, in the old server, these issues did not happen, just on the new.
Now, for the catch. Our hosting service said that this was actually an issue on their server, and I couldn't find a code reason for that either. After a couple months, we decided to change to a new, dedicated server on a different host...and the issue remains. So there has to be something wrong with the code, somewhere, or a configuration I need to change.
Thing is, there's no specific file or script causing it. Since now I have WHM control, I can see there's no issues with memory or CPU usage. Slow log query is empty, set at 20 seconds to log. What happens is that everything just runs very slowly, then just as easily returns to normal.
Errors that I can see on the log are MySql Gone away, or others like
AH01071: Got error 'Primary script unknown
or
(70007)The timeout specified has expired: [client IP] AH01075: Error dispatching request to : (polling) (Timeout is set at 300 seconds, nothing should take more than a couple to run).
I turned the general log on and noticed one interesting thing: When it happens, some threads take more than a minute to close. Query goes like this on the log
SELECT *,
COUNT(DISTINCT(command_type)) as conta,
MIN(event_time) as inicio,
MAX(event_time) as fim,
timediff(MAX(event_time),MIN(event_time)) as diferenca
FROM `general_log`
WHERE user_host LIKE '%hostname%'
GROUP BY thread_id HAVING diferenca > "00:01:00"
And it shows a couple results, one of which I know for a fact lines up to a perceived slowness. More interestingly, the last command before slowdown is this select
SELECT * FROM publicidade WHERE secao = 1 ORDER BY posicao ASC, data_insercao DESC LIMIT 2
Thing is, this select always runs fast, on a table with 29 rows, returning 1 result. Makes no sense for this to screw things up, but it always is the last query command shown, even found 2 situations where the thread never got to QUIT.
So at this point I'm at wits end. Problems keep happening, it's intermitent, comes and go for no discernible reason, and I cannot find out why. slow_queries log is empty, so it's not like there's a query hanging on, it just goes sleeping after this query until it timeouts, no further PHP errors, nothing. And then minutes later the same scripts just run fine as if nothing happened. I could even see situations where www / non www have different results (one is fine, other is not), as well as direct access via IP.
How can I debug what's going on? What could be a likely reason?
By the way persistent connections are off.
Unfortunately, it is simply not enough information to recommend anything useful.
Have you checked server I/O?
You should check if there is no reverse lookup issues if you mysql user permission checked 'per domain' (you can try to switch to 'per IP')
You can try MySQLTuner-perl, it might give you some hints you missed.
You can try to use webyog tools (trial). It might also give you some clues.
I'm tuning my project MYSQL database, as many people I got suggestion to reduce
wait_timeout
, but it is unclear for me, does this session variable exclude query execution time, or it includes it? I have set it for 5 seconds, taking into account that I may have queries which are being executed for 3-5 seconds sometimes (yea that is slow there are few of them, but they still exist), so mysql connections have at least 1-2 seconds to be taken by PHP scripts before they are closed by MYSQL.
In MySQL docs there is no clear explanation about how it starts counting that timeout and if it includes execution time. Perhaps your experience may help. Thanks.
Does the wait_timeout session variable exclude query execution time?
Yes, it excludes query time.
max_execution_time controls how long the server will keep a long-running query alive before stopping it.
Do you use php connection pools? If so 5 seconds is an extremely short wait_time. Make it longer. 60 seconds might be good. Why? the whole point of connection pools is to hold some idle connections open from php to MySQL, so php can handle requests from users without the overhead of opening connections.
Here's how it works.
php sits there listening for incoming requests.
A request arrives, and the php script starts running.
The php script asks for a database connection.
php (the mysqli or PDO module) looks to see whether it has an idle connection waiting in the connection pool. If so it passes the connection to the php script to use.
If there's no idle connection php creates one and passes it to the php script to use. This connection creation starts the wait_timeout countdown.
The php script uses the connection for a query. This stops the wait_timeout countdown, and starts the max_execution_time countdown.
The query completes. This stops the max_execution_time countdown and restarts the wait_timeout countdown. Repeat 6 and 7 as often as needed.
The php script releases the connection, and php inserts it into the connection pool. Go back to step 1. The wait_time is now counting down for that connection while it is in the pool.
If the connection's wait_time expires, php removes it from the connection pool.
If step 9 happens a lot, then step 5 also must happen a lot and php will respond more slowly to requests. You can make step 9 happen less often by increasing wait_timeout.
(Note: this is simplified: there's also provision for a maximum number of connections in the connection pool.)
MySQL also has an interactive_timeout variable. It's like wait_timeout but used for interactive sessions via the mysql command line program.
What happens when a web user makes a request and then abandons it before completion? For example, a user might stop waiting for a report and go to another page. In some cases, the host language processor detects the closing of the user connection, kills the MySQL query, and returns the connection to the pool. In other cases the query either completes or hits the max_execution_timeout barrier. Then the connection is returned to the pool. In all cases the wait_timeout countdown only is active when a connection is open but has no query active on it.
A MySQL server timeout can occur for many reasons, but happens most often when a command is sent to MySQL over a closed connection. The connection could have been closed by the MySQL server because of an idle-timeout; however, in most cases it is caused by either an application bug, a network timeout issue (on a firewall, router, etc.), or due to the MySQL server restarting.
It is clear from the documentation that it does not include the query execution time. It is basically the maximum idle-time allowed between two activities. If it crosses that limit, server closes the connection automatically.
The number of seconds the server waits for activity on a
noninteractive connection before closing it.
From https://www.digitalocean.com/community/questions/how-to-set-no-timeout-to-mysql :
Configure the wait_timeout to be slightly longer than the application connection pool's expected connection lifetime. This is a good safety check.
Consider changing the waittimeout value online. This does not require a MySQL restart, and the waittimeout can be adjusted in the running server without incurring downtime. You would issue set global waittimeout=60 and any new sessions created would inherit this value. Be sure to preserve the setting in my.cnf. Any existing connections will need to hit the old value of waittimeout if the application abandoned the connection. If you do have reporting jobs that will do longer local processing while in a transaction, you might consider having such jobs issue set session wait_timeout=3600 upon connecting.
From the reference manual,
Time in seconds that the server waits for a connection to become active before closing it.
In elementary terms, How many SECONDS will you tolerate someone reserving your resources and doing nothing? It is likely your 'think time' before you decide which action to take is frequently more than 5 seconds. Be liberal. For a web application, some processes take more than 5 seconds to run a query and you are causing them to be terminated at 5 seconds. The default is 28800 seconds which is not reasonable. 60 seconds could be a reasonable time to expect any web based process to be completed. If your application is also used in a traditional workplace environment, a 15 minute break is not unreasonable. be liberal to avoid 'bad feedback'.
I'm trying to figure out what is causing my system to open a large number of PHP threads. This issue has occurred 3 times over the last 2 weeks, and is capable of crashing our application if undetected for several hours, as once it opens up 300 database connections it prevents anyone further from connecting.
The application is based on CakePHP 2.X, is running across multiple EC2 Instances, which share an RDS database.
The primary identifier that something is going wrong is high number of database connections, as shown by this graph:
We have CloudWatch monitoring setup to notify us on slack when average connections go above 40 for more than 5 minutes (normally connections don't go much above 10).
Looking at New Relic I can also see that the number of php processes steadily increased by 1 per minute. This is on our operations server which just handles background processing and tasks, and does not handle any web traffic.
Over the same time the graphs on the web servers appear normal.
In looking at New Relics information on long-running processes there is no information provided that would suggest any php processes ran for 20+ minutes, however, these processes were killed manually which may be why they're not visible within New Relic - I believe it may not record processes which are killed.
While this issue has now occurred 3 times, I'm still unsure what is causing the problem or how to debug what a particular running php thread is doing.
The last time this happened I could see all the php threads running, and could see they had been running for some time, but had no idea what they were doing or how to find out what they were doing, and to prevent the database from becoming overloaded I had to kill them all.
Are there any tools, or other information I am overlooking here which may help me in my search to determine which particular process is causing this issue?
Recently L started experiencing performance issues with my online application hosted on bluehost.
I have an online form that takes a company name and event handler "onKeyUp" tied up to that field. Every time you put a character into the field it sends request to server which makes multiple mysql queries to get the data. Mysql queries all together take about 1-2 seconds. But since requests are send after every character that is put in it easily overloads the server.
The solution for this problem was to cancel previous XHR request before sending a new one. And it seemed to work fine for me (for about a year) until today. Not sure if bluehost changed any configuration on server (I have VPS), or any php/apache settings, but right now my application is very slow due to the amount of users i have.
And i would understand gradual decrease in productivity that may be caused bu database grow, but it suddenly happened over the weekend and speeds went down like 10 times. usual request that took about 1-2 seconds before now takes 10-16 seconds.
I connected to server via SSH & ran some stress test sending lots of queries to see what process monitor (top) will show. And as I expected, for every new request it was a php process created that was put in queue for processing. This queue waiting, apparently, took the most of wait-time.
Now I'm confused, is it possible that before (hypothetical changes on server) every XHR Abort command was actually causing PHP process to quit, reducing additional load on server, and therefore making it work faster? And now for some reason this doesn't work anymore?
I have WAMP installed on Windows 7, as my test environment, and when I export the same database and run the stress-test locally it works fast. Just like it used to be on server before. But on windows I dont have such handy process monitor as TOP, so i cannot see if php processes are actually created and killed respectively.
Not sure how to do the troubleshooting at this point.
I'm having trouble investigating an issue with many sleeping MySQL connections.
Once every one or two days I notice that all (151) MySQL connections
are taken, and all of them seem to be sleeping.
I investigated this, and one of the most reasonable explanations is that the PHP script was just killed, leaving a MySQL connection behind. We log visits at the beginning of the request, and update that log when the request finishes, so we can tell that indeed some requests do start, but don't finish, which indicates that the script was indeed killed somehow.
Now, the worrying thing is, that this only happens for 1 specific user, and only on 1 specific page. The page works for everyone else, and when I log in as this user on the Production environment, and perform the exact same action, everything works fine.
Now, I have two questions:
I'd like to find out why the PHP script is killed. Could this possibly have anything to do with the client? Can a client do 'something' to end the request and kill the php script? If so, why don't I see any evidence of that in the Apache logs? Or maybe I don't know what to look for? How do I find out if the script was indeed killed or what caused it?
how do I prevent this? Can I somehow set a limit the to number of mysql connections per PHP session? Or can I somehow detect long-running and sleeping mysql connections and kill them? It isn't an option for me to set the connection-timeout to a shorter time, because there are processes which run considerably longer, and the 151 connetions are used up in less than 2 minutes. Also increasing the number of connections is no solution. So, basically.. how do I kill processes which are sleeping for more than say 1 minute?
Best solution would be that I find out why the request of 1 user can eat up all the database connections and basically bring down the whole application. And how to prevent this.
Any help greatly appreciated.
You can decrease wait_timeout variable of the MySQL server. This specifies the amount of seconds MySQL waits for anything on a non-interactive connection, before it aborts the connection. The default value is 28800 seconds, which seems quite high. You can set this dynamically by executing SET GLOBAL wait_timeout = X; once.
You can still increase it for cronjobs again. Just execute the query SET SESSION wait_timeout = 28800; at the beginning of the cronjob. This only affects the current connection.
Please note that this might cause problems too, if you set this too low. Although I do not see that much problems. Most scripts should finish in less than a second. Setting wait_timeout=5 should therefore cause no harm…