select count(*) from mytable;
select count(table_id) from mytable; //table_id is the primary_key
both query were running slow on a table with 10 million rows.
I am wondering why since wouldn't it easy for mysql to keep a counter that gets updated on all insert,update and delete?
and is there a way to improve this query? I used explain but didn't help much.
take a look at the following blog posts:
1) COUNT(***) vs COUNT(col)
2) Easy MySQL Performance Tips
3) Fast count(*) for InnoDB
btw, which engine do you use?
EDITED: About technique to speed up count when you need just to know if there are some amount of rows. Sorry, just was wrong with my query. So, when you need just to know, if there is e.g. 300 rows by specific condition you can try subquery:
select count(*) FROM
( select 1 FROM _table_ WHERE _conditions_ LIMIT 300 ) AS result
at first you minify result set, and then count the result; it will still scan result set, but you can limit it (once more, it works when the question to DB is "is here more or less than 300 rows), and if DB contains more than 300 rows which satisfy condition that query is faster
Testing results (my table has 6.7mln rows):
1) SELECT count(*) FROM _table_ WHERE START_DATE > '2011-02-01'
returns 4.2mln for 65.4 seconds
2) SELECT count(*) FROM ( select 1 FROM _table_ WHERE START_DATE > '2011-02-01' LIMIT 100 ) AS result
returns 100 for 0.03 seconds
Below is result of the explain query to see what is going on there:
EXPLAIN SELECT count(*) FROM ( select 1 FROM _table_ WHERE START_DATE > '2011-02-01' LIMIT 100 ) AS result
As cherouvim pointed out in the comments, it depends on the storage engine.
MyISAM does keep a count of the table rows, and can keep it accurate since the only locks MyISAM supports is a table lock.
InnoDB however supports transactions, and needs to do a table scan to count the rows.
http://www.mysqlperformanceblog.com/2006/12/01/count-for-innodb-tables/
Related
I would like to construct a query that displays all the results in a table, but is offset by 5 from the start of the table. As far as I can tell, MySQL's LIMIT requires a limit as well as an offset. Is there any way to do this?
From the MySQL Manual on LIMIT:
To retrieve all rows from a certain
offset up to the end of the result
set, you can use some large number for
the second parameter. This statement
retrieves all rows from the 96th row
to the last:
SELECT * FROM tbl LIMIT 95, 18446744073709551615;
As you mentioned it LIMIT is required, so you need to use the biggest limit possible, which is 18446744073709551615 (maximum of unsigned BIGINT)
SELECT * FROM somewhere LIMIT 18446744073709551610 OFFSET 5
As noted in other answers, MySQL suggests using 18446744073709551615 as the number of records in the limit, but consider this: What would you do if you got 18,446,744,073,709,551,615 records back? In fact, what would you do if you got 1,000,000,000 records?
Maybe you do want more than one billion records, but my point is that there is some limit on the number you want, and it is less than 18 quintillion. For the sake of stability, optimization, and possibly usability, I would suggest putting some meaningful limit on the query. This would also reduce confusion for anyone who has never seen that magical looking number, and have the added benefit of communicating at least how many records you are willing to handle at once.
If you really must get all 18 quintillion records from your database, maybe what you really want is to grab them in increments of 100 million and loop 184 billion times.
Another approach would be to select an autoimcremented column and then filter it using HAVING.
SET #a := 0;
select #a:=#a + 1 AS counter, table.* FROM table
HAVING counter > 4
But I would probably stick with the high limit approach.
As others mentioned, from the MySQL manual. In order to achieve that, you can use the maximum value of an unsigned big int, that is this awful number (18446744073709551615). But to make it a little bit less messy you can the tilde "~" bitwise operator.
LIMIT 95, ~0
it works as a bitwise negation. The result of "~0" is 18446744073709551615.
You can use a MySQL statement with LIMIT:
START TRANSACTION;
SET #my_offset = 5;
SET #rows = (SELECT COUNT(*) FROM my_table);
PREPARE statement FROM 'SELECT * FROM my_table LIMIT ? OFFSET ?';
EXECUTE statement USING #rows, #my_offset;
COMMIT;
Tested in MySQL 5.5.44. Thus, we can avoid the insertion of the number 18446744073709551615.
note: the transaction makes sure that the variable #rows is in agreement to the table considered in the execution of statement.
I ran into a very similar issue when practicing LC#1321, in which I have to select all the dates but the first 6 dates are skipped.
I achieved this in MySQL with the help of ROW_NUMBER() window function and subquery. For example, the following query returns all the results with the first five rows skipped:
SELECT
fieldname1,
fieldname2
FROM(
SELECT
*,
ROW_NUMBER() OVER() row_num
FROM
mytable
) tmp
WHERE
row_num > 5;
You may need to add some more logics in the subquery, especially in OVER() to fit your need. In addition, RANK()/DENSE_RANK() window functions may be used instead of ROW_NUMBER() depending on your real offset logic.
Reference:
MySQL 8.0 Reference Manual - ROW_NUMBER()
Just today I was reading about the best way to get huge amounts of data (more than a million rows) from a mysql table. One way is, as suggested, using LIMIT x,y where x is the offset and y the last row you want returned. However, as I found out, it isn't the most efficient way to do so. If you have an autoincrement column, you can as easily use a SELECT statement with a WHERE clause saying from which record you'd like to start.
For example,
SELECT * FROM table_name WHERE id > x;
It seems that mysql gets all results when you use LIMIT and then only shows you the records that fit in the offset: not the best for performance.
Source: Answer to this question MySQL Forums. Just take note, the question is about 6 years old.
I know that this is old but I didnt see a similar response so this is the solution I would use.
First, I would execute a count query on the table to see how many records exist. This query is fast and normally the execution time is negligible. Something like:
SELECT COUNT(*) FROM table_name;
Then I would build my query using the result I got from count as my limit (since that is the maximum number of rows the table could possibly return). Something like:
SELECT * FROM table_name LIMIT count_result OFFSET desired_offset;
Or possibly something like:
SELECT * FROM table_name LIMIT desired_offset, count_result;
Of course, if necessary, you could subtract desired_offset from count_result to get an actual, accurate value to supply as the limit. Passing the "18446744073709551610" value just doesnt make sense if I can actually determine an appropriate limit to provide.
WHERE .... AND id > <YOUROFFSET>
id can be any autoincremented or unique numerical column you have...
This question already has answers here:
Doing a while / loop to get 10 random results
(3 answers)
Closed 9 years ago.
I have this table (PERSONS) with 25M rows:
ID int(10) PK
points int(6) INDEX
some other columns
I want to show the user 4 random rows which are somewhat close to each other in points. I found this query after some searching and tuning to generate random rows which is impressive fast:
SELECT person_id, points
FROM persons AS r1 JOIN
(SELECT (RAND() *
(SELECT MAX(person_id)
FROM persons)) AS id)
AS r2
WHERE r1.person_id>= r2.id and points > 0
ORDER BY r1.person_id ASC
LIMIT 4
So I query this in the PHP. Which gives me great and fast results (below 0.05 seconds when warmed up). But these rows are really just random (with at least 1 point since the points > 0). I would like to show some rows which are a little bit close, doesn't have to be every time, but let's say I do this query with limit 50 and than select a random row in PHP and the 3 closest rows (based on points) next to it. I would think you would need to sort the result, pick a random row and show the rows after/before it. But i have no idea how I can make this, since I am quite new to PHP.
Anyone suggestions, all feedback is welcome :)
Build an index on your points column (if it does not already exist), then perform your randomisation logic on that:
ALTER TABLE persons ADD INDEX (points);
SELECT person_id, points
FROM persons JOIN (
SELECT RAND() * MAX(points) AS pivot
FROM persons
WHERE points > 0
) t ON t.pivot <= points
ORDER BY points
LIMIT 4
Note that this approach will select the pivot using a uniform probability distribution over the range of points values; if points are very non-uniform, you can end up pivoting on some values a lot more often than others (thereby resulting in seemingly "non-random" outcomes).
To resolve that, you can select a random record by a more uniformly distributed column (maybe person_id?) and then use the points value of that random record as the pivot; that is, substitute the following for the subquery in the above statement:
SELECT points AS pivot
FROM persons JOIN (
SELECT FLOOR(
MIN(person_id)
+ RAND() * (MAX(person_id)-MIN(person_id))
) AS random
FROM persons
WHERE points > 0
) r ON r.random <= person_id
WHERE points > 0
ORDER BY person_id
LIMIT 1
Removing a subquery from it will drasticly improve the performance and caching so you could for example get list your IDs, put it in a file and then random from it (for example by reading random lines from file). This will improve it by a whole lot, as you can see if you will run EXPLAIN on this query and compare it by changing the query to load just data for the 4 (still random) ids.
I would suggest doing two separate sql queries in PHP and not join/subquery them. In many cases the optimizer can not simplify your query and has to perform each one separatly. So, in your case. if you have 1000 persons the optimizer will do the following wueries at worst case:
Get 1000 persons rows
Do Sub Select for each person which get's 1000 persons rows
Join 1000 persons with joined rows resulting in 1.000.000 rows
Filter all of them
In short:
1001 queries with 1.000.000 rows
My advice?
Perform two queries and NO joins or sub-selects as both (especially in combination have dramatic performance drops in most cases)
SELECT person_id, points
FROM persons
ORDER BY RAND() LIMIT 1
Now use the found points for your second query
SELECT person_id, points, ABS(points - <POINTS FROM ABOVE>) AS distance
FROM persons
ORDER BY distance ASC LIMIT 4
suppose I have a table t and table t has 15000 entries
suppose the query
SELECT * FROM t WHERE t.nid <1000
returns 1000 rows
but then I only want the first 10 rows so I do a LIMIT
SELECT * FROM t WHERE t.nid <1000 LIMIT 10
is it possible to construct a single query in which in addition to returning the 10 rows information with the LIMIT clause above, it also returns the total count of the rows that satisfy the conditions set in the WHERE clause, hence in addition to returning the 10 rows above, it also returns 1000 since there are a total of 1000 rows satisfying the WHERE clause...and have both returned in a single query
Preferred solution
First of all, the found_rows() function is not portable (it is a MySQL extension) and is going to be removed. As user #Zveddochka pointed out, it has already been deprecated in MySQL 8.0.17.
But more importantly, it turns out that if you use proper indexing, then running two queries is actually faster. The SQL_CALC_FOUND_ROWS directive is achieved through a "virtual scan" that incurs an additional recovery cost. When the query is not indexed, then this cost would be the same of a COUNT(), and therefore running two queries will cost double - i.e., using SQL_CALC_FOUND_ROWS will make things run 50% faster.
But what happens when the query is properly indexed? The guys at Percona checked it out. And it turns out that not only the COUNT() is blazing fast since it only accesses metadata and indexes, and the query without SQL_CALC_FOUND_ROWS is faster because it doesn't incur any additional cost; the cost of the two queries combined is less than the cost of the enhanced single query:
Results with SQL_CALC_FOUND_ROWS are following: for each b value it
takes 20-100 sec to execute uncached and 2-5 sec after warmup. Such
difference could be explained by the I/O which required for this query
– mysql accesses all 10k rows this query could produce without LIMIT
clause.
The results are following: it takes 0.01-0.11 sec to run this query
first time and 0.00-0.02 sec for all consecutive runs.
So, as we can see, total time for SELECT+COUNT (0.00-0.15 sec) is much
less than execution time for original query (2-100 sec). Let’s take a
look at EXPLAINs...
So, what to do?
// Run two queries ensuring they satisfy exactly the same conditions
$field1 = "Field1, Field2, blah blah blah";
$field2 = "COUNT(*) AS rows";
$where = "Field5 = 'X' AND Field6 = 'Y' AND blah blah";
$cntQuery = "SELECT {$field2} FROM {$joins} WHERE {$where}";
$rowQuery = "SELECT {$field1} FROM {$joins} WHERE {$where} LIMIT {$limit}";
Now the first query returns the count, the second query returns the actual data.
Old answer (useful just for non-indexed tables)
Don't do this. If you find out this section of the answer works for you better than the section above, it's almost certainly a signal that something else is not optimal in your setup - most likely you're not using the indexes properly, or you need to update your MySQL server, or run an analyze/optimize of the database to update cardinality statistics.
You can, but I think it would be a performance killer.
Your best option would be to use the SQL_CALC_FOUND_ROWS MySQL extension and issue a second query to recover the full number of rows using FOUND_ROWS().
SELECT SQL_CALC_FOUND_ROWS * FROM t WHERE t.nid <1000 LIMIT 10;
SELECT FOUND_ROWS();
See e.g http://www.arraystudio.com/as-workshop/mysql-get-total-number-of-rows-when-using-limit.html
Or you could simply run the full query without LIMIT clause, and retrieve only the first ten rows. Then you can use one query as you wanted, and also get the row count through mysql_num_rows(). This is not ideal, but also not so catastrophic for most queries.
If you do this last, though, be very careful to close the query and free its resources: I have found out that retrieving less than the full resultset and forgetting to free the rs handle is one outstanding cause of "metadata locking".
You can try SQL_CALC_FOUND_ROWS, which can get a count of total records without running the statement again.
SELECT SQL_CALC_FOUND_ROWS * FROM t WHERE t.nid <1000 LIMIT 10; -- get records
SELECT FOUND_ROWS(); -- get count
Reference: http://dev.mysql.com/doc/refman/5.0/en/information-functions.html
"is it possible to construct a single query in which in addition to returning the 10 rows information with the LIMIT clause above, it also returns the total count of the rows that satisfy the conditions set in the WHERE clause"
Yes, it is possible to do both in single query, by using windowed function i.e. COUNT(*) OVER()(MySQL 8.0+):
SELECT t.*, COUNT(*) OVER() AS cnt
FROM t
WHERE t.nid <1000
LIMIT 10;
db<>fiddle demo
Sidenote:
LIMIT without explicit ORDER BY is non-deterministic. It could return different results between multiple runs.
There are many things that need discussing.
A LIMIT without an ORDER BY is somewhat unpredictable, hence somewhat meaningless.
But if you add an ORDER BY, it may need to find all the rows, sort them then deliver only the 10 you want.
Or, the ORDER BY may be handled adequately by an INDEX.
Your particular query, if turned into 2 queries (as needed after 8.0.17), would be
SELECT * FROM t WHERE t.nid < 1000 LIMIT 10;
SELECT COUNT(*) FROM t WHERE t.nid < 1000;
Note that each of those would benefit from INDEX(nid). The first would pick 10 items from the index's BTree, then look them up in the data's BTree -- only 10 rows touched in each. The second would scan the INDEX until it hits 1000, and not touch the data BTree.
If you add an ORDER BY as advised, then, the first query:
SELECT * FROM t WHERE t.nid < 1000 ORDER BY t.nid LIMIT 10;
will work identically as above. But
SELECT * FROM t WHERE t.nid < 1000 ORDER BY t.abcd LIMIT 10;
will need to scan lots of rows, and be quite slow. And probably use a temp table and filesort. (Check EXPLAIN for details.) INDEX(nid, abcd) would help, but only a little.
And there are other variants, such as when the index can be "covering".
What is the goal of having "one query"?
Speed? -- as discussed above, there are other factors that are more pertinent.
Consistency? -- You may need a transaction to avoid, for example, getting N rows from the first query and a smaller number from the COUNT.
BEGIN;
SELECT * ...
SELECT COUNT(*) ...
COMMIT;
Single command? -- Consider a stored procedure that combines the 2 statements. Or
SELECT * FROM t WHERE t.nid < 1000 LIMIT 10
UNION ALL
SELECT COUNT(*) FROM t WHERE t.nid < 1000;
but that gets tricky because the number of columns is different, so some kludge would be needed to make the second query have the same number of columns. Another variant involves GROUP BY WITH ROLLUP. (But it may be even harder to fabricate.)
Lukasz's Answer looks promising. However, it gives an extra column (which might be good) and its performance needs to be tested. If you are on 8.0 and their answer works well for you, accept that Answer.
Count(*) time complexity is O(1), so you can use a subquery
SELECT *, (SELECT COUNT(*) FROM t WHERE t.nid <1000) AS cnt
FROM t
WHERE t.nid <1000
LIMIT 10
Sounds like you want FOUND_ROWS()
SELECT SQL_CALC_FOUND_ROWS * FROM t WHERE t.nid <1000 LIMIT 10;
SELECT FOUND_ROWS();
I would like to be able to pull back 15 or so records from a database. I've seen that using WHERE id = rand() can cause performance issues as my database gets larger. All solutions I've seen are geared towards selecting a single random record. I would like to get multiples.
Does anyone know of an efficient way to do this for large databases?
edit:
Further Edit and Testing:
I made a fairly simple table, on a new database using MyISAM. I gave this 3 fields: autokey (unsigned auto number key) bigdata (a large blob) and somemore (a medium int). I then applied random data to the table and ran a series of queries using Navicat. Here are the results:
Query 1: select * from test order by rand() limit 15
Query 2: select *
from
test
join
(select round(rand()*(select max(autokey) from test)) as val from test limit 15) as rnd
on
rnd.val=test.autokey;`
(I tried both select and select distinct and it made no discernible difference)
and:
Query 3 (I only ran this on the second test):
SELECT *
FROM (
SELECT #cnt := COUNT(*) + 1,
#lim := 10
FROM test
) vars
STRAIGHT_JOIN
(
SELECT r.*,
#lim := #lim - 1
FROM test r
WHERE (#cnt := #cnt - 1)
AND RAND(20090301) < #lim / #cnt
) i
ROWS: QUERY 1: QUERY 2: QUERY 3:
2,060,922 2.977s 0.002s N/A
3,043,406 5.334s 0.001s 1.260
I would like to do more rows so I can see how query 3 scales, but at the moment, it seems as though the clear winner is query 2.
Before I wrap up this testing and declare an answer, and while I have all this data and the test environment set up, can anyone recommend any further testing?
Try:
select * from table order by rand() limit 15
Another (and possibly more efficient way) would be to join against a set of random values. This should work, if there's some contiguous integer key in the table. Here is how I would do it in postgres (My MySQL is a bit rusty)
select * from table join
(select (random()*maxid)::integer as val from generate_series(1,15)) as rnd
on rand.val=table.id;
where maxid is the highest id in table. If id has an index, then this would mean only 15 index lookup, so its very fast.
UPDATE:
Looks like there no such thing as generate_series in MySQL. My fault. We don't need it actually:
select *
from
table
join
-- this just returns 15 random numbers.
-- I need `table` here only to produce rows for rand()
(select round(rand()*(select max(id) from table)) as val from table limit 15) as rnd
on
rnd.val=table.id;
P.S. If I don't want duplicates returned, I can use (select distinct [...]) in the random generator expression.
Update: Check out the accepted answer in this question. It's pure mySQL and even deals with even distribution.
The problem with id = rand() or anything comparable in PHP is that you can't be sure whether that particular ID still exists. Therefore, you need to work with LIMIT, and that can become slow for large amounts of data.
As an alternative to that, you could try using a loop in PHP.
What the loop does is
Create a random integer number using rand(), with a scope between 0 and the number of records in the database
Query the database whether a record with that ID exists
If it exists, add the number to an array
If it doesn't, go back to step 1
End the loop when the array of random numbers contains the desired number of elements
this method could cause a lot of queries in a fragmented table, but they should be pretty fast to execute. It may be faster than LIMIT rand() in certain situations.
The LIMIT method, as outlined by #Luther, is certainly the simplest code-wise.
You could do a query with all the results or however many limited, then use mysqli_fetch_all followed by:
shuffle($a);
$a = array_slice($a, 0, 15);
For a large dataset doing
select * from table order by rand() limit 15
can be quite time and memory consuming.
If your data records happen to be numbered you can put and index on the numbering colum and do a
select * from table where no >= rand() limit 15
Or even better do the random number generation in your application and do
select * from table where no >= $rand and no <= $rand+15
If your data doesn't change too often, it might be worth to add such a numbering a column to make the selection efficient.
Assuming MySQL supports nested queries and that operations on the primary key are fast, I'd try something like
select * from table where id in (select id from table order by rand() limit 15)
Ok here is the situation (using PHP/MySQL) you are getting results from a large mysql table,
lets say your mysql query returns 10,000 matching results and you have a paging script to show 20 results per page, your query might look like this
So page 1 query
SELECT column
FROM table_name
WHERE userId=1
AND somethingelse=something else
LIMIT 0,20
So page 2 query
SELECT column
FROM table_name
WHERE userId=1
AND somethingelse=something else
LIMIT 20,40
Now you are grabbing 20 results at a time but there are a total of 10,000 rows that match your search criteria,
How can you only return 3,000 of the 10,000 results and still do your paging of 20 per page with a LIMIT 20 in your query?
I thought this was impossible but myspace does it on there browse page somehow, I know they aren't using php/mysql but how can it be achieved?
UPDATE
I see some people have replied with a couple of methods, it seems none of these would actually improve the performance by limiting the number to 3000?
Program your PHP so that when it finds itself ready to issue a query that ends with LIMIT 3000, 20 or higher, it would just stop and don't issue the query.
Or I am missing something?
Update:
MySQL treats LIMIT clause nicely.
Unless you have SQL_CALC_FOUND_ROWS in your query, MySQL just stops parsing results, sorting etc. as soon as it finds enough records to satisfy your query.
When you have something like that:
SELECT column
FROM table_name
WHERE userId=1
AND somethingelse='something else'
LIMIT 0, 20
, MySQL will fetch first 20 records that satisfy the criteria and stop.
Doesn't matter how many records match the criteria: 50 or 1,000,000, performance will be the same.
If you add an ORDER BY to your query and don't have an index, then MySQL will of course need to browse all the records to find out the first 20.
However, even in this case it will not sort all 10,000: it will have a "running window" of top 20 records and sort only within this window as soon as it finds a record with value large (or small) enough to get into the window.
This is much faster than sorting the whole myriad.
MySQL, however, is not good in pipelining recorsets. This means that this query:
SELECT column
FROM (
SELECT column
FROM table_name
WHERE userId=1
AND somethingelse='something else'
LIMIT 3000
)
LIMIT 0, 20
is worse performance-wise than the first one.
MySQL will fetch 3,000 records, cache them in a temporary table (or in memory) and apply the outer LIMIT only after that.
Firstly, the LIMIT paramters are Offset and number of records, so the second parameter should always be 20 - you don't need to increment this.
Surely if you know the upper limit of rows you want to retrieve, you can just put this into your logic which runs the query, i.e. check that Offset + Limit <= 3000
As Sohnee said, or (depending on your requirements) you can get all the 3000 records by SQL and then use array_slice in php to get chunks of the array.
You could achieve this with a subquery...
SELECT name FROM (
SELECT name FROM tblname LIMIT 0, 3000
) `Results` LIMIT 20, 40
Or with a temporary table, whereby you select all 3000 rows into a temp table then page by the temporary row id, which will be sequential.
You can specify the limit as a function of the page number (20*p, 20*p+2) in your php code, and limit the value of the page number to 150.
Or you could get the 3000 records and them using jquery tabs, split the records on 20 per page.