I made a logger that shows me all the db queries onscreen. Average is about 30 queries a page. Is that a lot? Each call takes about 0.001 seconds to complete, some longer, some shorter. Here are some totals for a few pages: 0.9 secs, 0.09 secs, 0.8 secs. (Note: these are ONLY the times for the database queries and not image loading etc).
Are these acceptable tiems? What is ideal? What is the industry standard?
If you'd ask me, I would say a page will have to be able to load within 0.5 seconds. Almost 1 second for Queries is way to long.
But, if it is a huge page, which loads of information, a user will probably want to wait for it.
You should probably take a look at the queries, and find out why it takes so long (0.9/0.8)
Add this to your query: EXPLAIN EXTENDED and see if any indexes are used.
That depends on the query, and the level of interactivity of your application. If you have to provide a web application, you are not going to accept a query that completes in 10 seconds. If you can't avoid it, you may have to use some tricks to make it faster, or to build and return the data progressively as new results are found.
depends on the type of the query and ORM you use
Related
Short:
Is there a way to get the amount of queries that were executed within a certain timespan (via PHP) in an efficient way?
Full:
I'm currently running an API for a frontend web application that will be used by a great amount of users.
I use my own custom framework that uses models to do all the data magic and they execute mostly INSERTs and SELECTs. One function of a model can execute 5 to 10 queries on a request and another function can maybe execute 50 or more per request.
Currently, I don't have a way to check if I'm "killing" my server by executing (for example) 500 queries every second.
I also don't want to have surprises when the amount of users increases to 200, 500, 1000, .. within the first week and maybe 10.000 by the end of the month.
I want to pull some sort of statistics, per hour, so that I have an idea about an average and that I can maybe work on performance and efficiency before everything fails. Merge some queries into one "bigger" one or stuff like that.
Posts I've read suggested to just keep a counter within my code, but that would require more queries, just to have a number. The preferred way would be to add a selector within my hourly statistics script that returns me the amount of queries that have been executed for the x-amount of processed requests.
To conclude.
Are there any other options to keep track of this amount?
Extra. Should I be worried and concerned about the amount of queries? They are all small ones, just for fast execution without bottlenecks or heavy calculations and I'm currently quite impressed by how blazingly fast everything is running!
Extra extra. It's on our own VPS server, so I have full access and I'm not limited to "basic" functions or commands or anything like that.
Short Answer: Use the slowlog.
Full Answer:
At the start and end of the time period, perform
SELECT VARIABLE_VALUE AS Questions
FROM information_schema.GLOBAL_STATUS
WHERE VARIABLE_NAME = 'Questions';
Then take the difference.
If the timing is not precise, also get ... WHERE VARIABLE_NAME = 'Uptime' in order to get the time (to the second)
But the problem... 500 very fast queries may not be as problematic as 5 very slow and complex queries. I suggest that elapsed time might be a better metric for deciding whether to kill someone.
And... Killing the process may lead to a puzzling situation wherein the naughty statement remains in "Killing" State for a long time. (See SHOW PROCESSLIST.) The reason why this may happen is that the statement needs to be undone to preserve the integrity of the data. An example is a single UPDATE statement that modifies all rows of a million-row table.
If you do a Kill in such a situation, it is probably best to let it finish.
In a different direction, if you have, say, a one-row UPDATE that does not use an index, but needs a table scan, then the query will take a long time and possible be more burden on the system than "500 queries". The 'cure' is likely to be adding an INDEX.
What to do about all this? Use the slowlog. Set long_query_time to some small value. The default is 10 (seconds); this is almost useless. Change it to 1 or even something smaller. Then keep an eye on the slowlog. I find it to be the best way to watch out for the system getting out of hand and to tell you what to work on fixing. More discussion: http://mysql.rjweb.org/doc.php/mysql_analysis#slow_queries_and_slowlog
Note that the best metric in the slowlog is neither the number of times a query is run, nor how long it runs, but the product of the two. This is the default for pt-query-digest. For mysqlslowdump, adding -s t gets the results sorted in that order.
I'm puzzled; I assume a slow query.
Note: all my queries are tested and run great when there`s less people using my app/website (less then 0.01sec each).
So I've some high cpu usage with my current setup and I was wondering why? Is it possible it's an index issue?
Our possible solution: we thought we could use an XML cache file to store the informations each hour, and then reduce the load on our MySQL query? (update files each hour).
Will it be good for us to do such things? Since we have an SSD drive? Or will it be slower then before?
Currently in high traffic time, our website/app can take up to 30 seconds before return the first byte. My website is running under a Plesk 12 server.
UPDATE
Here's more informations about my mysql setup..
http://pastebin.com/KqvFYy8y
Is it possible it's an index issue?
Perhaps but not necessarily. You need first to identify which query is slow. You find that in the slow query log. Then analyze the query. This is explained in literature or you can contact a consultant / tutor for that.
We thought we could use an xml cache file to store the informations each hour.. and then reduce the load on our mysql query?
Well, cache invalidation is not the easiest thing to do, but with a fixed rythm every hour this seems easy enough. But take care that it will only help if the actual query you cache was slow. Mysql normally has a query cache built in, check if it is enabled or not first.
Will it be good for us to do such things?
Normally if the things to do are good, the results will be good, too. Sometimes even bad things will result in good results, so such a general question is hard to answer. Instead I suggest you gain more concrete information first before you continue to ask around. Sounds more like guessing. Stop guessing. Really, that's only for the first two minutes, after that, just stop guessing.
Since we have an ssd drive? Or will it be slower then before?
You can try to throw hardware on it. Again lierature and a consultant / tutor can help you greatly with that. But just stop guessing. Really.
I assume the query is not slow all the time. If this is true, the query is not very likely the problem.
You need to know what is using the CPU. Likely a runaway script with an infinite loop.
Try this:
<?php
header('Content-Type: text/plain; charset=utf-8');
echo system('ps auxww');
?>
This should return a list in this format:
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
Scan down the %CPU column and look for your user name in the USER column
If you see a process taking 100% CPU, you may want to get the PID number and:
system('kill 1234');
Where 1234 is the PID
The mysql processes running at 441% and 218% seems very problematic.
Assuming this is a shared server, there may be another user running queries that is hogging the CPU. you may need to take that up with your provider.
I've been watching on one of my shared servers and the CPU for the mysql process has not gone over 16%.
MySQLTuner
From the link it appears you have heavy traffic.
The Tuner was running 23.5 minutes
Joins performed without indexes: 69863
69863 in 23.5 min. comes out to almost 50 queries per second.
Does this sound correct? Running a query with a JOIN 150 times per second.
Index JOIN Table
You have a query with a JOIN.
The tables are joined by column(s).
On the joined table add an index to the column that joins the two table together.
I am doing a pagination system for about 100 items.
My question is:
Should I just load all 100 of them and then use jQuery to switch pages without reloading? Or should I use a MySQL query with "LIMIT 5" and then, each time user presses on Next Page or Previous Page, another Mysql query with LIMIT 5 is initiated?
For every item, I would have to load a thumbnail picture but I could keep it in the cache to avoid using my server bandwidth.
Which one is the best option from a server resource perspective?
Thanks in advance. Regards
Try connecting directly to your MySql instance via the command line interface. Execute the query with 100 at at time, and then with LIMIT 5. Look at the msec results. This will tell you which is more efficient or less resource-demanding.
100 records at a time from MySql (depending on dataset) really is nothing. The performance hit wouldn't be noticeable for a properly written query/database schema.
That said, I vote for calling only the results you need at a time. Use the LIMIT clause and your jquery pagination method to make it efficient.
For the server, the most efficient way would be to grab all 100 items once, send them to the client once, and have the client page through them locally. That's one possibly expensive query, but that's cheaper overall than having the client go back and forth for each additional five items.
Having said that, whether that's feasible is a different topic. You do not want to be pushing a huge amount of data to your client at once, since it'll slow down page loads and client-side processing. In fact, it's usually desirable to keep the bandwidth consumed by the client to a minimum. From that POV, making small AJAX requests with five results at a time when and only when necessary is much preferable. Unless even 100 results are so small overall that it doesn't make much of a difference.
Which one works best for you, you need to figure out.
Depends significantly on your query. If it is a simple SELECT from a well-designed table (indexes set etc.) then unlee you're running on a very underpowered server, there will be no noticeable difference between requesting 100 rows and 5 rows. If it is complicated query, then you should probably limit the number of queries.
Other considerations to take into account are how long it takes to load a page, as in the actual round trip time to the server to receive the data by the client. I'm going to make the wild guess that you are in America or Europe, where internet speeds are nice a fast, not the entire world is that lucky. Limiting the number of times your site has to request data from the server is a much better metric than how much load your server has.
This is moving rapidly into UX here, but your users don't care about your server load, they don't care if this way means your load average is 0.01 instead of 0.02. They will care if you have almost instantaneous transitions between sections of your site.
Personally, I'd go with the "load all data, then page locally" method. Also remember that Ajax is your friend, if you have to, load the results page, then request the data. You can split the request into two: first page and rest of pages. There's alot of behind-the-scenes tweaks you can do to make your site seem incredibly fast, and that is something people notice.
I'd say, load 5 at a time and paginate. My considerations:
It is indeed much lighter to load 5 at a time
Not all of your users will navigate through all 100, so those loaded might not even be used
A slight load time between 5 records are something expected (i.e. most users won't complain just because they have to wait 500ms - 1s)
You can also give user options to display x number of items per page, and put all options as well to let users see all items in the page. Over time, you can also monitor what most of your users preference in terms of x number of items to display per page are then go with that for the default LIMIT
I have several users accessing my page at the same time, with each client polling (using
setInverval) another php script that reads a value from a database and prints it.
setInterval( "printData();", 300 );
I'm relatively new to jQuery and javascript, and I'm a bit skeptical of the viability of constantly running this php script and constantly making database queries.
Can someone calm my nerves or provide an alternative to my current method?
You are updating it every 0.3 seconds - that's over 3 times a second. Way too much. Depending on how smooth it needs to be, update it at most every 5 seconds (5000) instead.
Also, just to make it a little faster, just drop the quotes and parentheses:
setInterval(printData,5000);
I'm currently building a user panel which will scrape daily information using curl. For each URL it will INSERT a new row to the database. Every user can add multiple URLs to scrape. For example: the database might contain 1,000 users, and every user might have 5 URLs to scrape on average.
How do I to run the curl scraping - by a cron job once a day at a specific time? Will a single dedicated server stand this without lags? Are there any techniques to reduce the server load? And about MySQL databases: with 5,000 new rows a day the database will be huge after a single month.
If you wonder I'm building a statistics service which will show the daily growth of their pages (not talking about traffic), so as i understand i need to insert a new value per user per day.
Any suggestions will be appreciated.
5000 x 365 is only 1.8 million... nothing to worry about for the database. If you want, you can stuff the data into mongodb (need 64bit OS). This will allow you to expand and shuffle loads around to multiple machines more easily when you need to.
If you want to run curl non-stop until it is finished from a cron, just "nice" the process so it doesn't use too many system resources. Otherwise, you can run a script which sleeps a few seconds between each curl pull. If each scrape takes 2 seconds that would allow you to scrape 43,200 pages per 24 period. If you slept 4 sec between a 2 second pull that would let you do 14,400 pages per day (5k is 40% of 14.4k, so you should be done in half a day with 4 sec sleep between 2 sec scrape).
This seems very doable on a minimal VPS machine for the first year, at least for the first 6 months. Then, you can think about utilizing more machines.
(edit: also, if you want you can store the binary GZIPPED scraped page source if you're worried about space)
I understand that each customer's pages need to be checked at the same time each day to make the growth stats accurate. But, do all customers need to be checked at the same time? I would divide my customers into chunks based on their ids. In this way, you could update each customer at the same time every day, but not have to do them all at once.
For the database size problem I would do two things. First, use partitions to break up the data into manageable pieces. Second, if the value did not change from one day to the next, I would not insert a new row for the page. In my processing of the data, I would then extrapolate for presentation the values of the data. UNLESS all you are storing is small bits of text. Then, I'm not sure the number of rows is going to be all that big a problem if you use proper indexing and pagination for queries.
Edit: adding a bit of an example
function do_curl($start_index,$stop_index){
// Do query here to get all pages with ids between start index and stop index
$query = "select * from db_table where id >= $start_index and id<=$stop_index";
for($i=$start_index; $i<= $stop_index; $i++;){
// do curl here
}
}
urls would look roughly like
http://xxx.example.com/do_curl?start_index=1&stop_index=10;
http://xxx.example.com/do_curl?start_index=11&stop_index=20;
The best way to deal with the growing database size is to perhaps write a single cron script that would generate the start_index and stop_index based on the number of pages you need to fetch and how often you intend to run the script.
Use multi curl and properly optimise not simply normalise your database design. If I were to run this cron job, I will try to spend time studying that is it possible to do this in chunks or not? Regarding hardware start with an average configuration, keep monitoring it and increment the hardware, CPU or Memory. Remember, there is no silver bullet.