Limiting mysql use per process - php

I have Debian VPS configured with a standard LAMP.
On this server, there is only one site (shop) which has a few cron jobs - mostly PHP scripts. One of them is update script executed by Lynx browser, which sends tons of queries.
When this script runs (it takes 3-4 minutes to complete) it consumes all MySQL resources, and the site almost doesn't work (page generates in 30-60 seconds instead of 1-2s).
How can I limit this script (i.e. extending its execution time limiting available resources) to allow other services to run properly? I believe there is a simple solution to the problem but can't find it. Seems my Google superpowers are limited last two days.

You don't have access to modify the offending script, so fixing this requires database administrator work, not programming work. Your task is called tuning the MySQL databse.
(I guess you already asked your vendor for help with this, and they said no.)
Ron top or htop while the script runs. Is CPU pinned at 100%? Is RAM exhausted?
1) Just live with it, and run the update script at a time of day when your web site doesn't have many visitors. Fairly easy, but not a real solution.
2) As an experiment, add RAM to your VPS instance. It may let MySQL do things all-in-RAM that it's presently putting on the hard drive in temporary tables. If it helps, that may be a way to solve your problem with a small amount of work, and a larger server rental fee.
3) Add some indexes to speed up the queries in your script, so each query gets done faster. The question is, what indexes will help? (Just adding indexes randomly generally doesn't help much.)
First, figure out which queries are slow. Give the command SHOW FULL PROCESSLIST repeatedluy while your script runs. The Info column in that result shows all the running queries. Copy them into a text file to keep them. (Or you can use MySQL's slow query log, about which you can read online.)
Second, analyze the worst offending queries to see whether there's an obvious index to add. Telling you how to do that generally is beyond the scope of a Stack Overflow answer. You might ask another question about a specific query. Before you do, please
reead this note about asking good SQL questions, and pay attention to the section on query performance.
3) It's possible your script is SELECTing many rows, or using SELECT to summarize many rows, from tables that also need to be updated when users visit your web site. In that case your visitors may be waiting for those SELECTs to finish. If you could change the script, you could put this statement right before long-running SELECTS.
SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED;
This allows the SELECT statement after it to do a "dirty read", in which it might get an earlier version of an updated row. See here.
Or, if you can figure out how to insert one statement into your obscured script, put this one right after it opens a database session.
SET SESSION TRANSACTION ISOLATION LEVEL READ UNCOMMITTED;
Without access to the source code, though, you only have one way to see if this is the problem. That is, access the MySQL server from a privileged account, right before your script runs, and give these SQL commands.
SHOW VARIABLES LIKE 'tx_isolation';
SET GLOBAL TRANSACTION ISOLATION LEVEL READ UNCOMMITED;
then see if the performance problem is improved. Set it back after your script finishes, probably like this (depending on the tx_isolation value retrieved above)
SET GLOBAL TRANSACTION ISOLATION LEVEL READ COMMITED;
Warning a permanent global change to the isolation level might foul up your application if it relies on transaction consistency. This is just an experiment.
4) Harass the script's author to fix this problem.

Slow queries? High CPU? High I/O? Then you must look at the queries. You cannot "tune your way out of a performance problem". Tuning might give you a few percent improvement; fixing the indexes and queries is likely to give you a lot more improvement.
See this for finding the 'worst' queries; then come back with SELECTs, EXPLAINs, and SHOW CREATE TABLEs for help.

Related

How to find bottleneck in Laravel?

I have a large application written using Laravel 5.2. The application seems to be running good for a while "day or two" and then it starts slowing down "each request table 15+ seconds."
I am trying to figure out what could be causing the speed degrade. To get started, I listed the top 4 categories "below" that I should review in order.
SQL Server problems. Locks, long running queries.
PHP problem which could be causing extra/not needed work like long running loops
Web Server Issues like memory leaks or slow response time.
Network Issues.
For the first category "i.e. SQL Problems" I evaluated all of the queries and everything seems to be light and fairly fast. There are no long running queries, and I find no SQL locks. Although I did not eliminate this as a possible issue, but for now it is fair to look elsewhere. One thing to note that the application is generating lots of queries which suggests that I may be running into the N+1 case.
While categories 3 and 4 are important, I like to focus some time on the second category "i.e. Code problem" which is where I need help. I need to be able to figure out couple of things to help me make an educated judgement if there is an code issue or not. Here are few thing that I like to know/start logging.
How long each class takes to execute to see if one takes longer time to run.
How many/list of all queries that being generated from each class where I can identify the source of the N+1 case.
I am using Clockwork extension in Google Chrome which is helping me a log. But, I am unable to break down the result at the class level which will give me a deep understanding of what is going on.
How can gather the 2 items listed above? Is it possible to hook into Clockwork and add that info as a filter to it where I just see all that into Google Extension?

Best practice to record large amount of hits into MySQL database

Well, this is the thing. Let's say that my future PHP CMS need to drive 500k visitors daily and I need to record them all in MySQL database (referrer, ip address, time etc.). This way I need to insert 300-500 rows per minute and update 50 more. The main problem is that script would call database every time I want to insert new row, which is every time someone hits a page.
My question, is there any way to locally cache incoming hits first (and what is the best solution for that apc, csv...?) and periodically send them to database every 10 minutes for example? Is this good solution and what is the best practice for this situation?
500k daily it's just 5-7 queries per second. If each request will be served for 0.2 sec, then you will have almost 0 simultaneous queries, so there is nothing to worry about.
Even if you will have 5 times more users - all should work fine.
You can just use INSERT DELAYED and tune your mysql.
About tuning: http://www.day32.com/MySQL/ - there is very useful script (will change nothing, just show you the tips how to optimize settings).
You can use memcache or APC to write log there first, but with using INSERT DELAYED MySQL will do almost same work, and will do it better :)
Do not use files for this. DB will serve locks much better, than PHP. It's not so trivial to write effective mutexes, so let DB (or memcache, APC) do this work.
A frequently used solution:
You could implement an counter in memcached which you increment on an visit, and push an update to the database for every 100 (or 1000) hits.
We do this by storing locally on each server to CSV, then having a minutely cron job to push the entries into the database. This is to avoid needing a highly available MySQL database more than anything - the database should be able to cope with that volume of inserts without a problem.
Save them to a directory-based database (or flat file, depends) somewhere and at a certain time, use a PHP code to insert/update them into your MySQL database. Your php code can be executed periodically using Cron, so check if your server has Cron so that you can set the schedule for that, say every 10 minutes.
Have a look at this page: http://damonparker.org/blog/2006/05/10/php-cron-script-to-run-automated-jobs/. Some codes have been written in the cloud and are ready for you to use :)
One way would be to use Apache access.log. You can get a quite fine logging by using cronolog utility with apache . Cronolog will handle the storage of a very big number of rows in files, and can rotate it based on volume day, year, etc. Using this utility will prevent your Apache from suffering of log writes.
Then as said by others, use a cron-based job to analyse these log and push whatever summarized or raw data you want in MySQL.
You may think of using a dedicated database (or even database server) for write-intensive jobs, with specific settings. For example you may not need InnoDB storage and keep a simple MyIsam. And you could even think of another database storage (as said by #Riccardo Galli)
If you absolutely HAVE to log directly to MySQL, consider using two databases. One optimized for quick inserts, which means no keys other than possibly an auto_increment primary key. And another with keys on everything you'd be querying for, optimized for fast searches. A timed job would copy hits from the insert-only to the read-only database on a regular basis, and you end up with the best of both worlds. The only drawback is that your available statistics will only be as fresh as the previous "copy" run.
I have also previously seen a system which records the data into a flat file on the local disc on each web server (be careful to do only atomic appends if using multiple proceses), and periodically asynchronously write them into the database using a daemon process or cron job.
This appears to be the prevailing optimium solution; your web app remains available if the audit database is down and users don't suffer poor performance if the database is slow for any reason.
The only thing I can say, is be sure that you have monitoring on these locally-generated files - a build-up definitely indicates a problem and your Ops engineers might not otherwise notice.
For an high number of write operations and this kind of data you might find more suitable mongodb or couchdb
Because INSERT DELAYED is only supported by MyISAM, it is not an option for many users.
We use MySQL Proxy to defer the execution of queries matching a certain signature.
This will require a custom Lua script; example scripts are here, and some tutorials are here.
The script will implement a Queue data structure for storage of query strings, and pattern matching to determine what queries to defer. Once the queue reaches a certain size, or a certain amount of time has elapsed, or whatever event X occurs, the query queue is emptied as each query is sent to the server.
you can use a Queue strategy using beanstalk or IronQ

Detect how much stress the mysql database is currently under with PHP

I am developing a large web app and want it to alter itself dependent on a factor that relates to the stress the database is currently under.
I am not sure what would me most accurate/effective/easiest. I am considering maybe the number of current connections or server response time or CPU useage?
What would be best suited and possible?
Thanks
The MySQL Query Profiler does what you are looking for.
http://dev.mysql.com/tech-resources/articles/using-new-query-profiler.html
If you would rather pay money to get a graphical profiler then try this out:
http://www.jetprofiler.com/
The amount of "stress" the database is under is not very real metric. The important thing is to identify how scalable the application is, and the extenet to which the database is contributing to unacceptable performance. This sounds like a bit of a get-out but there's not much point in spending time and effort on something without a clear objective of what you intend to achieve.
Important things are to start recording microsecond level response times in your webserver logs and enable slow query logging in mysql. Then test your DBMS to see what's slow, what's slow AND getting hit often, and what slows down as demand increases.
Certainly if you have performance problems then by all means start looking at CPU, memory usage and I/O but these are primarily symptoms of a performance problem - not true indicators. You might have 10% CPU usage and your system could be running like a dog, or 95% usage and running like a greyhound ;).
System load (i.e. the average length of the run queue is a better indicator than CPU - but still measuring a symtpom. In general database related slowness is usually primarily about I/O issues, and usually resolved by SQL tuning.
C.
Interesting question. What you REALLY want is a way for PHP to ask the mySQL server two questions:
server, are you using almost all your cpu capacity?
server, are you using almost all your disk IO capacity?
Based on the answers, I suppose you want to simplify the work your PHP web app does ... perhaps by eliminating some kind of search capability, or caching some data more aggressively.
If you have a shell to your (linux or bsd) mysql server, your two questions can be answered by eyeballing the output from these two commands.
sar -u 1 10 # the %idle column tells you about unused cpu cycles
sar -d 1 10 # the %util column tells you which disks are busy and how busy.
But, there's no sweet little query which fetches this data from mySQL to your app.
Edit: one possibility is to write a little PERL hack or other simple program that runs on your server, connects to the local data base, and once every so often (once a minute, maybe) determines %idle and %util, and updates a little one-row table in your data base. You could, without too much trouble, also add stuff like how full your disks are, to this table, if you care. Then your PHP app can query this table. This is an ideal use of the MEMORY access method. At any rate, keep it simple: you don't want your monitoring to weigh down your server.
A second-best trick, that you CAN do from your client.
Issue the command SHOW PROCESSLIST FULL, count the number of rows (mySQL processes) for which the Command is "Query", and if you have a lot of them consider it to be a high workload.
You might also add up the Time values for the processes which have status Query, and use a high value of that time as a threshold.
EDIT: if you're running on a mySQL 5 server, and your server account has access to the mySql-furnished information_schema, you can use a query directly to get the process data I mentioned:
SELECT (COUNT(*)-1) P.QUERYCOUNT, SUM(P.TIME) QUERYTIME
FROM information_schema.PROCESSLIST P
WHERE P.COMMAND = 'Query'
COUNT(*) - 1: because the above query itself counts as a query.
You will need to fiddle with the threshold values to make this work right in production.
It's a good idea to have your PHP web app shed load when the data base server can't keep up. Still, a better idea is to identify your long-running queries and optimize them.

How to make a javascript/php chatroom more efficient in terms of load time and sql communication

Right now the setup for my javascript chat works so it's like
function getNewMessage()
{
//code would go here to get new messages
getNewMessages();
}
getNewMessages();
And within the function I would use JQuery to make a get post to retrieve the messages from a php scrip which would
1. Start SQL connection
2. Validate that it's a legit user through SQL
3. retrieve only new message since the last user visit
4. close SQL
This works great and the chat works perfectly. My concern is that this is opening and closing a LOT of SQL connections. It's quite fast, but I'd like to make a small javascript multiplayer game now, and transferring user coordinates as well as the tens of other variables 3 times a second in which I'm opening and closing the sql connection each time and pulling information from numerous tables each time might not be efficient enough to run smoothly, and might be too much strain on the server too.
Is there any better more efficient way of communicating all these variables that I should know about which isn't so hard on my server/database?
Don't use persistent connections unless it's the only solution available to you!
When MySQL detects the connection has been dropped, any temporary tables are dropped, any active transaction is rolled back, and any locked tables are unlocked. Persistent connections only drop when the Apache child exits, not when your script ends, even if the script crashes! You could inherit a connection in the middle of a transaction. Worse, other requests could block, waiting for those tables to unlock, which may take quite a long time.
Unless you have measured how long it takes to connect and identified it as a very large percentage of your script's run time, you should not consider using persistent connections. In fact, that should be what you do here, if you're worried about performance. Check out xhprof or xdebug, profile your code, then start optimizing.
Maybe try to use a different approach to get the new messages from the server: Comet.
Using this technique you do not have to open that much new connections.
http://www.php.net/manual/en/features.persistent-connections.php
and
http://www.php.net/manual/en/function.mysql-pconnect.php
A couple of dozen players at the same time won't hurt the database or cause noticeable lag if you have efficient SQL statements. Likely your database will be hosted on the same server or at least the same network as your game or site, so no worries. If your DB happens to be hosted on a separate server running an 8-bit 16mz board loaded with MSDOS, located in the remote Amazon, connected by radio waves hooked up to a crank-powered generator operatated by a drunk monkey, you're on your own with this one.
Otherwise, really you should be more worried about exactly how much data you're passing back and forth to your players. If you're passing back and forth coordinates for all objects in an entire world, page load could take a painfully long time, even though the DB query takes a fraction of a second. This is sometimes overcome in games by a "fog of war" feature which doesn't bother notifying the user of every single object in the entire map, only those which are in immediate range of the player. This can easily be done with a single SQL query where object coordinates are in proximity to a player. Though, if you have a stingy host, they will care about the number of connects and queries.
If you're concerned about attracting even more players than that, consider exploring cache methods like pre-building short files storing commonly fetched records or values using fopen(), fgets(), fclose(), etc. Or, use php extensions like apc to store values in memory which persist from page load to page load. memcache or memcached also act similarly, but in a way which acts like a separate server you can connect to, store values which can be shared with other page hits, and query.
To update cached pages or values when you think they might become stale, you can run a cron job every so often to update these files or values. If your host doesn't allow cron jobs, consider making your guests do that legwork: a line of script on a certain page will refresh the cache with new values from a database query after a certain number of page hits. Or cache a date value to check against on every page hit, and if so much time has passed, refresh the cache.
Again, unless you're under the oppressive thumb of a stingy host, or unless you're getting a hundred or more page hits at a time, no need to even be concerned about your database. Databases are not that fragile. If they crashed in a hysterical fit of tears anytime more than one query came their way, the engineers who made it wouldn't have a job for very long.
I know this is quite an annoying "answer" but perhaps you should be thinking about this a different way, after all this is really not the strongest use of a relational database. Have you considered an XMPP solution? IMO this would be the best tool for the job and both ejabberd and openfire are trivial to set up these days. The excellent Strophe library can make the front end story easy, and as an added bonus you get HTTP binding (like commet) so you won't need to poll the server, your latency will go down and you'll be generating less HTTP traffic.
I know it's highly unlikely you're going to change your whole approach just cos I said so, but wanted to provide an alternative perspective.
http://www.ejabberd.im/
http://code.stanziq.com/strophe/

PHP/MYSQL - Pushing it to the limit?

I've been coding php for a while now and have a pretty firm grip on it, MySQL, well, lets just say I can make it work.
I'd like to make a stats script to track the stats of other websites similar to the obvious statcounter, google analytics, mint, etc.
I, of course, would like to code this properly and I don't see MySQL liking 20,000,000 to 80,000,000 inserts ( 925 inserts per second "roughly**" ) daily.
I've been doing some research and it looks like I should store each visit, "entry", into a csv or some other form of flat file and then import the data I need from it.
Am I on the right track here? I just need a push in the right direction, the direction being a way to inhale 1,000 psuedo "MySQL" inserts per second and the proper way of doing it.
Example Insert: IP, time(), http_referer, etc.
I need to collect this data for the day, and then at the end of the day, or in certain intervals, update ONE row in the database with, for example, how many extra unique hits we got. I know how to do that of course, just trying to give a visualization since I'm horrible at explaining things.
If anyone can help me, I'm a great coder, I would be more than willing to return the favor.
We tackled this at the place I've been working the last year so over summer. We didn't require much granularity in the information, so what worked very well for us was coalescing data by different time periods. For example, we'd have a single day's worth of real time stats, after that it'd be pushed into some daily sums, and then off into a monthly table.
This obviously has some huge drawbacks, namely a loss of granularity. We considered a lot of different approaches at the time. For example, as you said, CSV or some similar format could potentially serve as a way to handle a month of data at a time. The big problem is inserts however.
Start by setting out some sample schema in terms of EXACTLY what information you need to keep, and in doing so, you'll guide yourself (through revisions) to what will work for you.
Another note for the vast number of inserts: we had potentially talked through the idea of dumping realtime statistics into a little daemon which would serve to store up to an hours worth of data, then non-realtime, inject that into the database before the next hour was up. Just a thought.
For the kind of activity you're looking at, you need to look at the problem from a new point of view: decoupling. That is, you need to figure out how to decouple the data-recording steps so that delays and problems don't propogate back up the line.
You have the right idea in logging hits to a database table, insofar as that guarantees in-order, non-contended access. This is something the database provides. Unfortunately, it comes at a price, one of which is that the database completes the INSERT before getting back to you. Thus the recording of the hit is coupled with the invocation of the hit. Any delay in recording the hit will slow the invocation.
MySQL offers a way to decouple that; it's called INSERT DELAYED. In effect, you tell the database "insert this row, but I can't stick around while you do it" and the database says "okay, I got your row, I'll insert it when I have a minute". It is conceivable that this reduces locking issues because it lets one thread in MySQL do the insert, not whichever you connect to. Unfortuantely, it only works with MyISAM tables.
Another solution, which is a more general solution to the problem, is to have a logging daemon that accepts your logging information and just en-queues it to wherever it has to go. The trick to making this fast is the en-queueing step. This the sort of solution syslogd would provide.
In my opinion it's a good thing to stick to MySQL for registering the visits, because it provides tools to analyze your data. To decrease the load I would have the following suggestions.
Make a fast collecting table, with no indixes except primary key, myisam, one row per hit
Make a normalized data structure for the hits and move the records once a day to that database.
This gives you a smaller performance hit for logging and a well indexed normalized structure for querying/analyzing.
Presuming that your MySQL server is on a different physical machine to your web server, then yes it probably would be a bit more efficient to log the hit to a file on the local filesystem and then push those to the database periodically.
That would add some complexity though. Have you tested or considered testing it with regular queries? Ie, increment a counter using an UPDATE query (because you don't need each entry in a separate row). You may find that this doesn't slow things down as much as you had thought, though obviously if you are pushing 80,000,000 page views a day you probably don't have much wiggle room at all.
You should be able to get that kind of volume quite easily, provided that you do some stuff sensibly. Here are some ideas.
You will need to partition your audit table on a regular (hourly, daily?) basis, if nothing else only so you can drop old partitions to manage space sensibly. DELETEing 10M rows is not cool.
Your web servers (as you will be running quite a large farm, right?) will probably want to do the inserts in large batches, asynchronously. You'll have a daemon process which reads flat-file logs on a per-web-server machine and batches them up. This is important for InnoDB performance and to avoid auditing slowing down the web servers. Moreover, if your database is unavailable, your web servers need to continue servicing web requests and still have them audited (eventually)
As you're collecting large volumes of data, some summarisation is going to be required in order to report on it at a sensible speed - how you do this is very much a matter of taste. Make sensible summaries.
InnoDB engine tuning - you will need to tune the InnoDB engine quite significantly - in particular, have a look at the variables controlling its use of disc flushing. Writing out the log on each commit is not going to be cool (maybe unless it's on a SSD - if you need performance AND durability, consider a SSD for the logs) :) Ensure your buffer pool is big enough. Personally I'd use the InnoDB plugin and the file per table option, but you could also use MyISAM if you fully understand its characteristics and limitations.
I'm not going to further explain any of the above as if you have the developer skills on your team to build an application of that scale anyway, you'll either know what it means or be capable of finding it out.
Provided you don't have too many indexes, 1000 rows/sec is not unrealistic with your data sizes on modern hardware; we insert that many sometimes (and probably have a lot more indexes).
Remember to performance test it all on production-spec hardware (I don't really need to tell you this, right?).
I think that using MySQL is an overkill for the task of collecting the logs and summarizing them. I'd stick to plain log files in your case. It does not provide the full power of relational database management but it's quite enough to generate summaries. A simple lock-append-unlock file operation on a modern OS is seamless and instant. On the contrary, using MySQL for the same simple operation loads the CPU and may lead to swapping and other hell of scalability.
Mind the storage as well. With plain text file you'll be able to store years of logs of a highly loaded website taking into account current HDD price/capacity ratio and compressability of plain text logs

Categories