Short:
Is there a way to get the amount of queries that were executed within a certain timespan (via PHP) in an efficient way?
Full:
I'm currently running an API for a frontend web application that will be used by a great amount of users.
I use my own custom framework that uses models to do all the data magic and they execute mostly INSERTs and SELECTs. One function of a model can execute 5 to 10 queries on a request and another function can maybe execute 50 or more per request.
Currently, I don't have a way to check if I'm "killing" my server by executing (for example) 500 queries every second.
I also don't want to have surprises when the amount of users increases to 200, 500, 1000, .. within the first week and maybe 10.000 by the end of the month.
I want to pull some sort of statistics, per hour, so that I have an idea about an average and that I can maybe work on performance and efficiency before everything fails. Merge some queries into one "bigger" one or stuff like that.
Posts I've read suggested to just keep a counter within my code, but that would require more queries, just to have a number. The preferred way would be to add a selector within my hourly statistics script that returns me the amount of queries that have been executed for the x-amount of processed requests.
To conclude.
Are there any other options to keep track of this amount?
Extra. Should I be worried and concerned about the amount of queries? They are all small ones, just for fast execution without bottlenecks or heavy calculations and I'm currently quite impressed by how blazingly fast everything is running!
Extra extra. It's on our own VPS server, so I have full access and I'm not limited to "basic" functions or commands or anything like that.
Short Answer: Use the slowlog.
Full Answer:
At the start and end of the time period, perform
SELECT VARIABLE_VALUE AS Questions
FROM information_schema.GLOBAL_STATUS
WHERE VARIABLE_NAME = 'Questions';
Then take the difference.
If the timing is not precise, also get ... WHERE VARIABLE_NAME = 'Uptime' in order to get the time (to the second)
But the problem... 500 very fast queries may not be as problematic as 5 very slow and complex queries. I suggest that elapsed time might be a better metric for deciding whether to kill someone.
And... Killing the process may lead to a puzzling situation wherein the naughty statement remains in "Killing" State for a long time. (See SHOW PROCESSLIST.) The reason why this may happen is that the statement needs to be undone to preserve the integrity of the data. An example is a single UPDATE statement that modifies all rows of a million-row table.
If you do a Kill in such a situation, it is probably best to let it finish.
In a different direction, if you have, say, a one-row UPDATE that does not use an index, but needs a table scan, then the query will take a long time and possible be more burden on the system than "500 queries". The 'cure' is likely to be adding an INDEX.
What to do about all this? Use the slowlog. Set long_query_time to some small value. The default is 10 (seconds); this is almost useless. Change it to 1 or even something smaller. Then keep an eye on the slowlog. I find it to be the best way to watch out for the system getting out of hand and to tell you what to work on fixing. More discussion: http://mysql.rjweb.org/doc.php/mysql_analysis#slow_queries_and_slowlog
Note that the best metric in the slowlog is neither the number of times a query is run, nor how long it runs, but the product of the two. This is the default for pt-query-digest. For mysqlslowdump, adding -s t gets the results sorted in that order.
Related
I have this code that never finishes executing.
Here is what happens:
We make an API call to get large data and we need to check to see if any difference from our database, we need to update our DB for that specific row. Row numbers will increase as the project grows, could go even over 1 billion rows in some cases.
Issue is making this scalable that even in 1 billion row update, it works
To simulate it I did 9000 for loop
<?PHP
ini_set("memory_limit","-1");
ignore_user_abort(true);
for ($i=0; $i < 9000; $i++) {
// Complex SQL UPDATE query that requires joining tables,
// and doing search and update if matches several variables
}
//here I have log function to see if for loop has been finished
If I loop it 10 times, it still takes time but it works and records, but with 9000 it doesn't finish the loop and never records anything.
Note: I added ini_set("memory_limit","-1"); ignore_user_abort(true); to prevent memory errors.
Is there any way to make this scalable?
Details: I do this query 2 times a day
Without knowing the specifics of the API, how often you call it, how much data it's returning at a time, and how much information you actually have to store, it's hard to give you specific answers. In general, though, I'd approach it like this:
Have a "producer" script query the API on whatever basis you need, but instead of doing your complex SQL update, have it simply store the data locally (presumably in a table, let's call it tempTbl). That should ensure it runs relatively fast. Implement some sort of timestamp on this table, so you know when records were inserted. In the ideal world, the next time this "producer" script runs, if it encounters any data from the API that already exists in tempTbl, it will overwrite it with the new data (and update the last updated timestamp). This ensures tempTbl always contains the latest cached updates from the API.
You'll also have a "consumer" script which runs on a regular basis and which processes the data from tempTbl (presumably in LIFO order, but could be in any order you want). This "consumer" script will process a chuck of, say, 100 records from tempTbl, do your complex SQL UPDATE on them, and delete them from tempTbl.
The idea is that one script ("producer") is constantly filling tempTbl while the other script ("consumer") is constantly processing items in that queue. Presumably "consumer" is faster than "producer", otherwise tempTbl will grow too large. But with an intelligent schema, and careful throttling of how often each script runs, you can hopefully maintain stasis.
I'm also assuming these two scripts will be run as cron jobs, which means you just need to tweak how many records they process at a time, as well as how often they run. Theoretically there's no reason why "consumer" can't simply process all outstanding records, although in practice that may put too heavy a load on your DB so you may want to limit it to a few (dozen, hundred, thousand, or million?) records at a time.
I am doing a pagination system for about 100 items.
My question is:
Should I just load all 100 of them and then use jQuery to switch pages without reloading? Or should I use a MySQL query with "LIMIT 5" and then, each time user presses on Next Page or Previous Page, another Mysql query with LIMIT 5 is initiated?
For every item, I would have to load a thumbnail picture but I could keep it in the cache to avoid using my server bandwidth.
Which one is the best option from a server resource perspective?
Thanks in advance. Regards
Try connecting directly to your MySql instance via the command line interface. Execute the query with 100 at at time, and then with LIMIT 5. Look at the msec results. This will tell you which is more efficient or less resource-demanding.
100 records at a time from MySql (depending on dataset) really is nothing. The performance hit wouldn't be noticeable for a properly written query/database schema.
That said, I vote for calling only the results you need at a time. Use the LIMIT clause and your jquery pagination method to make it efficient.
For the server, the most efficient way would be to grab all 100 items once, send them to the client once, and have the client page through them locally. That's one possibly expensive query, but that's cheaper overall than having the client go back and forth for each additional five items.
Having said that, whether that's feasible is a different topic. You do not want to be pushing a huge amount of data to your client at once, since it'll slow down page loads and client-side processing. In fact, it's usually desirable to keep the bandwidth consumed by the client to a minimum. From that POV, making small AJAX requests with five results at a time when and only when necessary is much preferable. Unless even 100 results are so small overall that it doesn't make much of a difference.
Which one works best for you, you need to figure out.
Depends significantly on your query. If it is a simple SELECT from a well-designed table (indexes set etc.) then unlee you're running on a very underpowered server, there will be no noticeable difference between requesting 100 rows and 5 rows. If it is complicated query, then you should probably limit the number of queries.
Other considerations to take into account are how long it takes to load a page, as in the actual round trip time to the server to receive the data by the client. I'm going to make the wild guess that you are in America or Europe, where internet speeds are nice a fast, not the entire world is that lucky. Limiting the number of times your site has to request data from the server is a much better metric than how much load your server has.
This is moving rapidly into UX here, but your users don't care about your server load, they don't care if this way means your load average is 0.01 instead of 0.02. They will care if you have almost instantaneous transitions between sections of your site.
Personally, I'd go with the "load all data, then page locally" method. Also remember that Ajax is your friend, if you have to, load the results page, then request the data. You can split the request into two: first page and rest of pages. There's alot of behind-the-scenes tweaks you can do to make your site seem incredibly fast, and that is something people notice.
I'd say, load 5 at a time and paginate. My considerations:
It is indeed much lighter to load 5 at a time
Not all of your users will navigate through all 100, so those loaded might not even be used
A slight load time between 5 records are something expected (i.e. most users won't complain just because they have to wait 500ms - 1s)
You can also give user options to display x number of items per page, and put all options as well to let users see all items in the page. Over time, you can also monitor what most of your users preference in terms of x number of items to display per page are then go with that for the default LIMIT
I made a logger that shows me all the db queries onscreen. Average is about 30 queries a page. Is that a lot? Each call takes about 0.001 seconds to complete, some longer, some shorter. Here are some totals for a few pages: 0.9 secs, 0.09 secs, 0.8 secs. (Note: these are ONLY the times for the database queries and not image loading etc).
Are these acceptable tiems? What is ideal? What is the industry standard?
If you'd ask me, I would say a page will have to be able to load within 0.5 seconds. Almost 1 second for Queries is way to long.
But, if it is a huge page, which loads of information, a user will probably want to wait for it.
You should probably take a look at the queries, and find out why it takes so long (0.9/0.8)
Add this to your query: EXPLAIN EXTENDED and see if any indexes are used.
That depends on the query, and the level of interactivity of your application. If you have to provide a web application, you are not going to accept a query that completes in 10 seconds. If you can't avoid it, you may have to use some tricks to make it faster, or to build and return the data progressively as new results are found.
depends on the type of the query and ORM you use
I have a table of more than 15000 feeds and it's expected to grow. What I am trying to do is to fetch new articles using simplepie, synchronously and storing them in a DB.
Now i have run into a problem, since the number of feeds is high, my server stops responding and i am not able to fetch feeds any longer. I have also implemented some caching and fetching odd and even feeds at diff time intervals.
What I want to know is that, is there any way of improving this process. Maybe, fetching feeds in parallel. Or may be if someone can tell me a psuedo algo for it.
15,000 Feeds? You must be mad!
Anyway, a few ideas:
Increase the Script Execution time-limit - set_time_limit()
Don't go overboard, but ensuring you have a decent amount of time to work in is a start.
Track Last Check against Feed URLs
Maybe add a field for each feed, last_check and have that field set to the date/time of the last successful pull for that feed.
Process Smaller Batches
Better to run smaller batches more often. Think of it as being the PHP equivalent of "all of your eggs in more than one basket". With the last_check field above, it would be easy to identify those with the longest period since the last update, and also set a threshold for how often to process them.
Run More Often
Set a cronjob and process, say 100 records every 2 minutes or something like that.
Log and Review your Performance
Have logfiles and record stats. How many records were processed, how long was it since they were last processed, how long did the script take. These metrics will allow you to tweak the batch sizes, cronjob settings, time-limits, etc. to ensure that the maximum checks are performed in a stable fashion.
Setting all this may sound like alot of work compared to a single process, but it will allow you to handle increased user volumes, and would form a strong foundation for any further maintenance tasks you might be looking at down the track.
fetch new articles using simplepie, synchronously
What do you mean by "synchronously"? Do you mean consecutively in the same process? If so, this is a very dumb approach.
You need a way of sharding the data to run across multiple processes. Doing this declaratively based on, say the modulus of the feed id, or the hash of the URL is not a good solution - one slow URL would cause multiple feeds to be held up.
A better solution would be to start up multiple threads/processes which would each:
lock list of URL feeds
identify the feed with the oldest expiry date in the past which is not flagged as reserved
flag this record as reserved
unlock the list of URL feeds
fetch the feed and store it
remove the reserved flag on the list for this feed and update the expiry time
Note that if there are no expired records at step 2, then the table should be unlocked, the next step depends on whether you run the threads as daemons (in which case it should implement an exponential back of, e.g. sleeping for 10 seconds doubling up to 320 seconds for consecutive iterations) or if you're running as batches, exit.
Thank You for your responses. I apologize I am replying a little late. I got busy with this problem and later I forgot about this post.
I have been researching a lot on this. Faced a lot of problems. You see, 15,000 feed everyday is not easy.
May be I am MAD! :) But I did solve it.
How?
I wrote my own algorithm. And YES! It's written in PHP/MYSQL. I basically implemented a simple weighted machine learning algorithm. My algorithm basically learns the posting time about a feed and then estimates the next polling time for the feed. I save it in my DB.
And since it's a learning algorithm it improves with time. Ofcourse, there are 'misses'. but these misses are alteast better than crashing servers. :)
I have also written a paper on this. which got published in a local computer science journal.
Also, regarding the performance gain, I am getting a 500% to 700% improvement in speed as opposed to sequential polling.
How is it going so far?
I have a DB that has grown in size of TBs. I am using MySQL. Yes, I am facing perforance issues on MySQL. but it's not much. Most probably, I will be moving to some other DB or implement sharding to my existing DB.
Why I chose PHP?
Simple, because I wanted to show people that PHP and MySQL are capable of such things! :)
I have a PHP application that currently has 5k users and will keep increasing for the forseeable future. Once a week I run a script that:
fetches all the users from the database
loops through the users, and performs some upkeep for each one (this includes adding new DB records)
The last time this script ran, it only processed 1400 users before dieing due to a 30 second maximum execute time error. One solution I thought of was to have the main script still fetch all the users, but instead of performing the upkeep process itself, it would make an asynchronous cURL call (1 for each user) to a new script that will perform the upkeep for that particular user.
My concern here is that 5k+ cURL calls could bring down the server. Is this something that could be remedied by using a messaging queue instead of cURL calls? I have no experience using one, but from what I've read it seems like this might help. If so, which message queuing system would you recommend?
Some background info:
this is a Symfony project, using Doctrine as my ORM and MySQL as my DB
the server is a Windows machine, and I'm using Windows' task scheduler and wget to run this script automatically once per week.
Any advice and help is greatly appreciated.
If it's possible, I would make a scheduled task (cron job) that would run more often and use LIMIT 100 (or some other number) to process a limited number of users at a time.
A few ideas:
Increase the Script Execution time-limit - set_time_limit()
Don't go overboard, but more than 30 seconds would be a start.
Track Upkeep against Users
Maybe add a field for each user, last_check and have that field set to the date/time of the last successful "Upkeep" action performed against that user.
Process Smaller Batches
Better to run smaller batches more often. Think of it as being the PHP equivalent of "all of your eggs in more than one basket". With the last_check field above, it would be easy to identify those with the longest period since the last update, and also set a threshold for how often to process them.
Run More Often
Set a cronjob and process, say 100 records every 2 minutes or something like that.
Log and Review your Performance
Have logfiles and record stats. How many records were processed, how long was it since they were last processed, how long did the script take. These metrics will allow you to tweak the batch sizes, cronjob settings, time-limits, etc. to ensure that the maximum checks are performed in a stable fashion.
Setting all this may sound like alot of work compared to a single process, but it will allow you to handle increased user volumes, and would form a strong foundation for any further maintenance tasks you might be looking at down the track.
Why don't you still use the cURL idea, but instead of processing only one user for each, send a bunch of users to one by splitting them into groups of 1000 or something.
Have you considered changing your logic to commit changes as you process each user? It sounds like you may be running a single transaction to process all users, which may not be necessary.
How about just increasing the execution time limit of PHP?
Also, looking into if you can improve your upkeep-procedure to make it faster can help too. Depending on what exactly you are doing, you could also look into spreading it out a bit. Do a couple once in a while rather than everyone at once. But depends on what exactly you're doing of course.