I have over 6000 results with come to about more than 235 pages using pagination. When I click first page, it loads really fast ~ 300ms up until around 40th page. after that it really goes down hill with about 30 ~ 40+ seconds of page load time. I am using indexed database. I tried to us mysql catch query, but did not like it. Can someone help me out.
php:
$sql = mysql_query("SELECT * FROM data WHERE (car = '$cars') AND (color = '$color' AND price BETWEEN '".$min."' AND '".$max."'
ORDER BY price LIMIT {$startpoint} , {$limit}");
Index:
data 0 PRIMARY 1 id A 106199 NULL NULL BTREE
data 1 car_index 1 car A 1799 NULL NULL BTREE
data 1 car_index 2 color A 2870 NULL NULL BTREE
data 1 car_index 3 price A 6247 NULL NULL BTREE
data 1 car_index 4 location A 106199 NULL NULL BTREE
This is a common issue with MySQL (and other database systems). Using LIMIT + OFFSET (which is what you are using implicitely with LIMIT x, y) works great at first but slows down exponentially as the number of fetched rows grows.
Adding an index is definitely a good first step, as you should always query data based on an index, to avoid full table scans.
Only having an index on price won't be enough as you have other WHERE attributes. Basically, this is what MySQL is doing:
Assuming that $limit = 25 and $startPoint = 0, MySQL will start reading the table from the beginning and stop after it finds 25 matching rows and will return them. Let's assume that it read 500 rows for this first iteration. Next iteration and because it does not have an index on car + color + price, it does not know how to jump directly to the 25th matching row (the 500th row in the table), so it will start reading from the beginning again, skip the first 25 matching rows and return the 25 next matching rows. Let's assume that this iteration also required 500 extra rows to be read.
Now you see what's going wrong. For every iteration, MySQL will have to read the all the rows from the beginning, exponentially increasing the time it takes to return row.
In my example, to fetch 100 (25 * 4 iterations) rows, MySQL will have to read 500 + 1000 + 1500 + 2000 = 5000 rows while you could expect it to only read 500 * 4 = 2,000 rows. To fetch 1000 (25 * 40 iterations) rows, MySQL will have to read 500 + 1000 + 1500 + ... 20000 = 410,000 rows!! That's way more than the 500 * 40 = 20,000 rows you could expect.
To optimize your query, first only select the data you need (no SELECT *). Then the trick is to remember the last fetched id.
$lastFetchedId = 0;
do {
$sql = mysql_query("SELECT * FROM data WHERE id > $lastFetchedId AND (car = '$cars' AND color = '$color' AND price BETWEEN '".$min."' AND '".$max."')
ORDER BY price LIMIT {$limit}");
$hasFoundRows = false;
while ($row = mysql_fetch_assoc($sql)) {
$hasFoundRows = true;
$lastFetchedId = $row['id'];
// do something with the row
}
} while ($hasFoundRows === false);
Having MySQL taking care of the ordering works well only if you have an index on all the columns you are using in the WHERE clause. Think about it this way: if the data is not sorted, how would MySQL know which rows will match and where the matching rows are. To be able to sort the results and only return a subset, MySQL needs to build a sorted list of ALL the rows that actually match. This means going through the entire table to first get all the matching rows, then sort them and finally return only a few of them.
Hope that helps you understand better what you can do better here :)
It would be a good idea to post here the table structure to see what indexes you have.
Please add an index on the column price, it should improve the query performance.
Cheers
Related
I have a database with more than 600 rows but I can only retrieve/display 100 every hour. So I use
select * from table ORDER BY id DESC LIMIT 100
to retrieve the first 100. How do I write a script that will retrieve the data in batches of 100 every 1hr so that I can use it in a cron job?
Possible solution.
Add a field for to mark the record was already shown.
ALTER TABLE tablename
ADD COLUMN shown TINYINT NULL DEFAULT NULL;
NULL will mean that the record was not selected, 1 - that record is marked for selection, 0 - that record was already selected.
When you need to select up to 100 records you
2.1. Mark records to be shown
UPDATE tablename
SET shown = 1
WHERE shown = 1
OR shown IS NULL
ORDER BY shown = 1 DESC, id ASC
LIMIT 100;
shown = 1 condition in WHERE considered the fact that some records were marked but were not selected due to some error. shown = 1 DESC re-marks such records before non-marked.
If there is 100 or less records which were not selected all of them will be marked, else only 100 records with lower id (most ancient) will be marked.
2.2. Select marked records.
SELECT *
FROM tablename
WHERE shown = 1
ORDER BY id
LIMIT 100;
2.3. Mark selected records.
UPDATE tablename
SET shown = 0
WHERE shown = 1
ORDER BY id
LIMIT 100;
This is applicable when only one client selects the records.
If a lot of clients may work in parallel, and only one cliens must select a record, then use some cliens number (unique over all clients) for to mark a record for selection instead of 1.
Of course if there is only one client, and you guarantee that selection will not fail, you may simply store last shown ID somewhere (on the client side, or in some service table on the MySQL side) and simply select "next 100" starting from this stored ID:
SELECT *
FROM tablename
WHERE id > #stored_id
ORDER BY id
LIMIT 100;
and
SELECT MAX(id)
FROM tablename
WHERE id > #stored_id
ORDER BY id
LIMIT 100;
for to store instead of previous #stored_id.
Thank you #Akina and #Vivek_23 for your contributions. I was able to figure out an easier way to go about it.
Add a new field to table, eg shownstatus
Create a cronjob to display 100 (LIMIT 100) records with their shownstatus not marked as shown from table every hour and then update each record's shownstatus to shown NB. If I create a cronjob to run every hour for the whole day, I can get all records displayed and their shownstatus updated to shown by close of day.
Create a second cronjob to update all record's shownstatus to notshown
The downside to this is that, you can only display a total of 2,400 records a day. ie. 100 records every hour times 24hrs. So if your record grows to about 10,000. You will need to set your cronjob to run for atleast 5 days to display all records.
Still open to a better approach if there's any, but till then, I will have to just stick to this for now.
Let's say you made a cron that hits a URL something like
http://yourdomain.com/fetch-rows
or a script for instance, like
your_project_folder/fetch-rows.php
Let's say you have a DB table in place that looks something like this:
| id | offset | created_at |
|----|--------|---------------------|
| 1 | 100 | 2019-01-08 03:15:00 |
| 2 | 200 | 2019-01-08 04:15:00 |
Your script:
<?php
define('FETCH_LIMIT',100);
$conn = mysqli_connect(....); // connect to DB
$result = mysqli_query($conn,"select * from cron_hit_table where id = (select max(id) from cron_hit_table)")); // select the last record to get the latest offset
$offset = 0; // initial default offset
if(mysqli_num_rows($result) > 0){
$offset = intval(mysqli_fetch_assoc($result)['offset']);
}
// Now, hit your query with $offset included
$result = mysqli_query($conn,"select * from table ORDER BY id DESC LIMIT $offset,100");
while($row = mysqli_fetch_assoc($result)){
// your data processing
}
// insert new row to store next offset for next cron hit
$offset += FETCH_LIMIT; // increment current offset
mysqli_query($conn,"insert into cron_hit_table(offset) values($offset)"); // because ID would be auto increment and created_at would have default value as current_timestamp
mysqli_close($conn);
Whenever cron hits, you fetch last row from your hit table to get the offset. Hit the query with that offset and store the next offset for next hit in your table.
Update:
As pointed out by #Dharman in the comments, you can use PDO for more abstracted way of dealing with different types of database(but make sure you have appropriate driver for it, see checklist of drivers PDO supports to be sure) along with minor checks of query syntaxes.
Hi i have a 7milion records db table for testing query speed.
I tested up my 2 queries which are the same query with different limit parametres:
query 1 -
SELECT *
FROM table
LIMIT 20, 50;
query 2 -
SELECT *
FROM table
LIMIT 6000000, 6000030;
query exec times are:
query 1 - 0.006 sec
query 2 - 5.500 sec
In both of these queries, I am fetching same number of records, but in the second case it's taking more time. Can someone please explain the reasons behind this?
Without looking into it too closely, my assumption is that this occurs because the first query only has to read to the 50th record to return results, whereas the second query has to read six million before returning results. Basically, the first query just shorts out quicker.
I would assume that this has an incredible amount to do with the makeup of the table - field types and keys, etc.
If a record is made up of fixed-length fields (e.g. CHAR vs. VARCHAR), then the DBMS can just calculate where the nth record starts and jumps there. If its variable length, then you would have to read the records to determine where the nth record starts. Similarly, I'd further assume that tables which have appropriate primary keys would be quicker to query than those without such keys.
I think the slowdown is tied to the fact you are using limits with offsets and are querying the table with no additional context for indexing. Its possible the first is just faster because it can get to the offset quicker.
It's the difference between returning 50 rows and 6000030 rows (or ~1million rows since you said there were only 7million rows).
With two arguments, the first argument specifies the offset of the
first row to return, and the second specifies the maximum number of
rows to return. The offset of the initial row is 0 (not 1):
SELECT * FROM tbl LIMIT 5,10; # Retrieve rows 6-15
http://dev.mysql.com/doc/refman/5.0/en/select.html
Also, I think you're looking for 30 row pages so your queries should be using 30 as the second parameter in the limit clause.
SELECT *
FROM table
LIMIT 20, 30;
SELECT *
FROM table
LIMIT 6000000, 30;
Lets say I have a table, with say 1 million rows, with the first column being a primary key.
Then, if I run the following:
SELECT * FROM table WHERE id='tomato117' LIMIT 1
Does the table ALL get put into the cache (thereby causing the query to slow as more and more rows get added) or would the number of rows of the table not matter, since the query uses the primary key?
edit: (added limit 1)
If the id is define as primary key, which only one record with value tomato117, so limit does not useful.
Using SELECT * will trigger mysql read from disk because unlikely all columns are stored into index. (mysql not able to fetch from index) In theory, it will affect performance.
However, your sql is matching query cache condition. So, mysql will stored the result into query cache for subsequent usage.
If you query cache size is huge, mysql will keep store all sql results into query cache until memory full.
This come with a cost, if there is an update on your table, query cache invalidation will be harder for mysql.
http://www.mysqlperformanceblog.com/2007/03/23/beware-large-query_cache-sizes/
http://www.mysqlperformanceblog.com/2006/06/09/why-mysql-could-be-slow-with-large-tables/
nothing of the sort.
It will only fetch the row you selected and perhaps a few other blocks. They will remain in cache until something pushes them out.
By cache, I refer to the innodb buffer pool, not query cache, which should probably be off anyway.
SELECT * FROM table WHERE id = 'tomato117' LIMIT 1
When tomato117 is found, it stops searching, if you don't set LIMIT 1 it will search until end of table. tomato117 can be second, and it will still search 1 000 000 rows for other tomato117.
http://forge.mysql.com/wiki/Top10SQLPerformanceTips
Showing rows 0 - 0 (1 total, Query took 0.0159 sec)
SELECT *
FROM 'forum_posts'
WHERE pid = 643154
LIMIT 0 , 30
Showing rows 0 - 0 (1 total, Query took 0.0003 sec)
SELECT *
FROM `forum_posts`
WHERE pid = 643154
LIMIT 1
Table is about 1GB, 600 000+ rows.
If you add the word EXPLAIN before the word SELECT, it will show you a table with a summary of how many rows it's reading instead of the normal results.
If your table has an index on the id column (including if it's set as primary key), the engine will be able to jump straight to the exact row (or rows, for a non-unique index) and only read the minimal amount of date. If there's no index, it will need to read the whole table.
A table with about 70K records is displayed on a site, showing 50 records per page.
Pagination is done with limit offset,50 on the query, and the records can be ordered on different columns.
Browsing the latest pages (so the offset is around 60,000) makes the queries much slower than when browsing the first pages (about 10x)
Is this an issue of using the limit command?
Are there other ways to get the same results?
With large offsets, MySQL needs to browse more records.
Even if the plan uses filesort (which means that all records should be browsed), MySQL optimizes it so that only $offset + $limit top records are sorted, which makes it much more efficient for lower values of $offset.
The typical solution is to index the columns you are ordering on, record the last value of the columns and reuse it in the subsequent queries, like this:
SELECT *
FROM mytable
ORDER BY
value, id
LIMIT 0, 10
which outputs:
value id
1 234
3 57
4 186
5 457
6 367
8 681
10 366
13 26
15 765
17 345 -- this is the last one
To get to the next page, you would use:
SELECT *
FROM mytable
WHERE (value, id) > (17, 345)
ORDER BY
value, id
LIMIT 0, 10
, which uses the index on (value, id).
Of course this won't help with arbitrary access pages, but helps with sequential browsing.
Also, MySQL has certain issues with late row lookup. If the columns are indexed, it may be worth trying to rewrite your query like this:
SELECT *
FROM (
SELECT id
FROM mytable
ORDER BY
value, id
LIMIT $offset, $limit
) q
JOIN mytable m
ON m.id = q.id
See this article for more detailed explanations:
MySQL ORDER BY / LIMIT performance: late row lookups
It's how MySQL deals with limits. If it can sort on an index (and the query is simple enough) it can stop searching after finding the first offset + limit rows. So LIMIT 0,10 means that if the query is simple enough, it may only need to scan 10 rows. But LIMIT 1000,10 means that at minimum it needs to scan 1010 rows. Of course, the actual number of rows that need to be scanned depend on a host of other factors. But the point here is that the lower the limit + offset, the lower that the lower-bound on the number of rows that need to be scanned is...
As for workarounds, I would optimize your queries so that the query itself without the LIMIT clause is as efficient as possible. EXPLAIN is you friend in this case...
I have a dataset of rows each with an 'odds' number between 1 and 100. I am looking to do it in the most efficient way possible. The odds do not necessarily add up to 100.
I have had a few ideas.
a)
Select the whole dataset and then add all the odds up and generate a random number between 1 and that number. Then loop through the dataset deducting the odds from the number until it is 0.
I was hoping to minimize the impact on the database so I considered if I could only select the rows I needed.
b)
SELECT * FROM table WHERE (100*RAND()) < odds
I considered LIMIT 0,1
But then if items have the same probability only one of the will be returned
Alternatively take the whole dataset and pick a random one from there... but then the odds are affected as it becomes a random with odds and then a random without odds thus the odds become tilted in favour of the higher odds (even more so).
I guess I could order by odds ASC then take the whole dataset and then with PHP take a random out of the rows with the same odds as the first record (the lowest).
Seems like a clumsy solution.
Does anyone have a superior solution? If not which one of the above is best?
Do some up-front work, add some columns to your table that help the selection. For example suppose you have these rows
X 2
Y 3
Z 1
We add some cumulative values
Key Odds Start End
X 2 0 1 // range 0->1, 2 values == odds
Y 3 2 4 // range 2->4, 3 values == odds
Z 1 5 5 // range 5->5, 1 value == odds
Start and End are chosen as follows. The first row has a start of zero. Subsequent rows have a start one more than previous end. End is the (Start + Odds - 1).
Now pick a random number R in the range 0 to Max(End)
Select * from T where R >= T.Start and R <= T.End
If the database is sufficiently clever we may we be able to use
Select * from T where R >= T.Start and R <= (T.Start + T.Odds - 1)
I'm speculating that having an End column with an index may give the better performance. Also the Max(End) perhaps gets stashed somewhere and updated by a trigger when ncessary.
Clearly there's some hassle in updating the Start/End. This may not be too bad if either
The table contents are stable
or insertions are in someway naturally ordered, so that each new row just continues from the old highest.
What if you took your code, and added an ORDER BY RAND() and LIMIT 1?
SELECT * FROM table WHERE (100*RAND()) < odds ORDER BY RAND() LIMIT 1
This way, even if you have multiples of the same probability, it will always come back randomly ordered, then you just take the first entry.
select * from table
where id between 1 and 100 and ((id % 2) <> 0)
order by NewId()
Hmm. Not entirely clear what result you want, so bear with me if this is a bit crazy. That being said, how about:
Make a new table. The table is a fixed data table, and looks like this:
Odds
====
1
2
2
3
3
3
4
4
4
4
etc,
etc.
Then join from your dataset to that table on the odds column. You'll get as many rows back for each row in your table as the given odds of that row.
Then just pick one of that set at random.
If you have an index on the odds column, and a primary key, this would be very efficient:
SELECT id, odds FROM table WHERE odds > 0
The database wouldn't even have to read from the table, it would get everything it needed from the odds index.
Then, you'll select a random value between 1 and the number of rows returned.
Then select that row from the array of rows returned.
Then, finally, select the whole target row:
SELECT * FROM table WHERE id = ?
This assures an even distribution between all rows with an odds value.
Alternatively, put the odds in a different table, with an autoincrement primary key.
Odds
ID odds
1 4
2 9
3 56
4 12
Store the ID foreign key in the main table instead of the odds value, and index it.
First, get the max value. This never touches the database. It uses the index:
SELECT MAX(ID) FROM Odds
Get a random value between 1 and the max.
Then select the record.
SELECT * FROM table
JOIN Odds ON Odds.ID = table.ID
WHERE Odds.ID >= ?
LIMIT 1
This will require some maintenance if you tend to delete Odds value or roll back inserts to keep the distribution even.
There is a whole chapter on random selection in the book SQL Antipatterns.
I didn't try it, but maybe something like this (with ? a random number from 0 to SUM(odds) - 1)?
SET #prob := 0;
SELECT
T.*,
(#prob := #prob + T.odds) AS prob
FROM table T
WHERE prob > ?
LIMIT 1
This is basically the same as your idea a), but entirely within one (well, technically two if you count the variable set-up) SQL commands.
A general solution, suitable for O(log(n)) updates, is something like this:
Store objects as leaves of a (balanced) tree.
At each branch node, store the weights of all objects under it.
When adding, removing, or modifying nodes, update weights of their parents.
Then pick a number between 0 and (total weight - 1) and navigate down the tree until you find the right object.
Since you don't care about the order of things in the tree, you can store them as an array of N pointers and N-1 numbers.