I have more than 400 000 id's in NOT IN statement. Whether it will execute or not ?
$query = "
SELECT
*
FROM
table_name
WHERE
my_field_id NOT IN(
34535345,3453451234,234242345,3465465,12234234,23435465,122343,345435,3453454,
34535345,3453451234,234242345,3465465,12234234,23435465,122343,345435,3453454,
34535345,3453451234,234242345,3465465,12234234,23435465,122343,345435,3453454,
34535345,3453451234,234242345,3465465,12234234,23435465,122343,345435,3453454,
34535345,3453451234,234242345,3465465,12234234,23435465,122343,345435,3453454,
34535345,3453451234,234242345,3465465,12234234,23435465,122343,345435,3453454
)
";
Yes it will (you could try that yourself without asking here actually). There is no reasonable limit for sql query string.
But you should keep in mind the more ids you add - the slower the query will be
PS: the only mysql setting you may be interested in is max_allowed_packet. From what I remember it is the only parameter that could bring some issues on extra-large queries
As I know the IN clause in MySQL does not have limitations, you may write as many id as you want. Agree with zerkms about the performance and max_allowed_packet variable.
As a workaround - try to populate another table (maybe temporary) with indexed id with your values, and then loin these two tables using JOIN clause.
AFAIK the limit is 1024 characters for NOT IN. If you use a sub-select there is no limit. You could populate a memory table with the id's, get them with a sub-select in the query and drop the memory table afterwards. This may even be faster.
Related
I would like to construct a query that displays all the results in a table, but is offset by 5 from the start of the table. As far as I can tell, MySQL's LIMIT requires a limit as well as an offset. Is there any way to do this?
From the MySQL Manual on LIMIT:
To retrieve all rows from a certain
offset up to the end of the result
set, you can use some large number for
the second parameter. This statement
retrieves all rows from the 96th row
to the last:
SELECT * FROM tbl LIMIT 95, 18446744073709551615;
As you mentioned it LIMIT is required, so you need to use the biggest limit possible, which is 18446744073709551615 (maximum of unsigned BIGINT)
SELECT * FROM somewhere LIMIT 18446744073709551610 OFFSET 5
As noted in other answers, MySQL suggests using 18446744073709551615 as the number of records in the limit, but consider this: What would you do if you got 18,446,744,073,709,551,615 records back? In fact, what would you do if you got 1,000,000,000 records?
Maybe you do want more than one billion records, but my point is that there is some limit on the number you want, and it is less than 18 quintillion. For the sake of stability, optimization, and possibly usability, I would suggest putting some meaningful limit on the query. This would also reduce confusion for anyone who has never seen that magical looking number, and have the added benefit of communicating at least how many records you are willing to handle at once.
If you really must get all 18 quintillion records from your database, maybe what you really want is to grab them in increments of 100 million and loop 184 billion times.
Another approach would be to select an autoimcremented column and then filter it using HAVING.
SET #a := 0;
select #a:=#a + 1 AS counter, table.* FROM table
HAVING counter > 4
But I would probably stick with the high limit approach.
As others mentioned, from the MySQL manual. In order to achieve that, you can use the maximum value of an unsigned big int, that is this awful number (18446744073709551615). But to make it a little bit less messy you can the tilde "~" bitwise operator.
LIMIT 95, ~0
it works as a bitwise negation. The result of "~0" is 18446744073709551615.
You can use a MySQL statement with LIMIT:
START TRANSACTION;
SET #my_offset = 5;
SET #rows = (SELECT COUNT(*) FROM my_table);
PREPARE statement FROM 'SELECT * FROM my_table LIMIT ? OFFSET ?';
EXECUTE statement USING #rows, #my_offset;
COMMIT;
Tested in MySQL 5.5.44. Thus, we can avoid the insertion of the number 18446744073709551615.
note: the transaction makes sure that the variable #rows is in agreement to the table considered in the execution of statement.
I ran into a very similar issue when practicing LC#1321, in which I have to select all the dates but the first 6 dates are skipped.
I achieved this in MySQL with the help of ROW_NUMBER() window function and subquery. For example, the following query returns all the results with the first five rows skipped:
SELECT
fieldname1,
fieldname2
FROM(
SELECT
*,
ROW_NUMBER() OVER() row_num
FROM
mytable
) tmp
WHERE
row_num > 5;
You may need to add some more logics in the subquery, especially in OVER() to fit your need. In addition, RANK()/DENSE_RANK() window functions may be used instead of ROW_NUMBER() depending on your real offset logic.
Reference:
MySQL 8.0 Reference Manual - ROW_NUMBER()
Just today I was reading about the best way to get huge amounts of data (more than a million rows) from a mysql table. One way is, as suggested, using LIMIT x,y where x is the offset and y the last row you want returned. However, as I found out, it isn't the most efficient way to do so. If you have an autoincrement column, you can as easily use a SELECT statement with a WHERE clause saying from which record you'd like to start.
For example,
SELECT * FROM table_name WHERE id > x;
It seems that mysql gets all results when you use LIMIT and then only shows you the records that fit in the offset: not the best for performance.
Source: Answer to this question MySQL Forums. Just take note, the question is about 6 years old.
I know that this is old but I didnt see a similar response so this is the solution I would use.
First, I would execute a count query on the table to see how many records exist. This query is fast and normally the execution time is negligible. Something like:
SELECT COUNT(*) FROM table_name;
Then I would build my query using the result I got from count as my limit (since that is the maximum number of rows the table could possibly return). Something like:
SELECT * FROM table_name LIMIT count_result OFFSET desired_offset;
Or possibly something like:
SELECT * FROM table_name LIMIT desired_offset, count_result;
Of course, if necessary, you could subtract desired_offset from count_result to get an actual, accurate value to supply as the limit. Passing the "18446744073709551610" value just doesnt make sense if I can actually determine an appropriate limit to provide.
WHERE .... AND id > <YOUROFFSET>
id can be any autoincremented or unique numerical column you have...
I am working on a project where a user can add comments and also hit any post.
Now I have to display the total number of comments and total number of hits and also show whether the user has already hitted that post or not.
So basically I need to do three sql queries for this action:
one counting comments,
one for counting hits and
one for checking whether the user has hitted the post or not.
I wanted to know that if it's possible to reduce these three sql queries into one and reduce the database load?
Any help is appreciated.
$checkifrated=mysql_query("select id from fk_views where (onid='$postid' and hit='hit' and email='$email')");//counting hits
$checkiffollowing=mysql_query("select id from fk_views where (onid='$postid' and hit='hit' and email='$email')");
$hitcheck=mysql_num_rows($checkifrated);//checking if already hited or not
$checkifrated=mysql_query("select id from fk_views where (onid='$postid' and comment !='' and email='$email')");//counting comments
This query returns the number of hits and number of nonempty comments.
select ifnull(sum(hit='hit'),0) as hits, ifnull(sum(comment !=''),0) as comments
from fk_views where onid='$postid' and email='$email'
Based on the queries you provided I dont think you need to query separately if he is hitted the post, just check in you code if number of hits is > 0
Yes, it may be possible to combine the three queries into a single query. That may (or may not) "reduce the database load". The key here is going to be an efficient execution plan, which is going to primarily depend on the availability of suitable indexes.
Combining three inefficient queries into one isn't going to magically make the query more efficient. The key is getting each of the queries to be as efficient as they can be.
If each of the queries is processing rows from the same table, then it may be possible to have a single SELECT statement process the entire set, to obtain the specified result. But if each of the the queries is referencing a different table, then it's likely the most efficient would be to combine them with a UNION ALL set operator.
Absent the schema definition, the queries that you are currently using, and the EXPLAIN output of each query, it's not practical to attempt to provide you with usable advice.
UPDATE
Based on the update to the question, providing sample queries... we note that two of the queries appear to be identical.
It would be much more efficient to have a query return a COUNT() aggregate, than pulling back all of the individual rows to the client and counting them on the client, e.g.
SELECT COUNT(1) AS count_hits
FROM fk_views v
WHERE v.onid = '42'
AND v.hit = 'hit'
AND v.email = 'someone#email.address'
To combine processing of the three queries, we can use conditional expressions in the SELECT list. For example, we could use the equality predicates on the onid and email columms in the WHERE clause, and do the check of the hit column with an expression...
For example:
SELECT SUM(IF(v.hit='hit',1,0)) AS count_hits
, SUM(1) AS count_all
FROM fk_views v
WHERE v.onid = '42'
AND v.email='someone#email.address'
The "trick" to getting three separate queries combined would be to use a common set of equality predicates (the parts of the WHERE clause that match in all three queries).
SELECT SUM(IF(v.hit='hit' ,1,0)) AS count_hits
, SUM(IF(v.comment!='',1,0)) AS count_comments
, SUM(1) AS count_all
FROM fk_views v
WHERE v.onid = '42'
AND v.email ='someone#email.address'
If we are going to insist on using the deprecated mysql interface (over PDO or mysqli) it's important that we use the mysql_real_escape_string function to avoid SQL Injection vulnerabilities
$sql = "SELECT SUM(IF(v.hit='hit' ,1,0)) AS count_hits
, SUM(IF(v.comment!='',1,0)) AS count_comments
, SUM(1) AS count_all
FROM fk_views v
WHERE v.onid = '" . mysql_real_escape_string($postid) . "'
AND v.email = '" . mysql_real_escape_string($email) ;
# for debugging
#echo $sql
$result=mysql_query($sql);
if (!$result) die(mysql_error());
while ($row = mysql_fetch_assoc($result)) {
echo $row['count_hits'];
echo $row['count_comments'];
}
For performance, we'd likely want an index with leading columns of onid and email, e.g.
... ON fk_views (onid,email)
The output from EXPLAIN will show the execution plan.
I'm using PDO, and I need to know how many rows are returned with a SELECT statement. My question is, is the following slower, the same, or faster than doing it in two queries? PHPMyAdmin will tell me how long just the SELECT statement, takes, but not just the COUNT statement, so I'm having trouble telling how long a query takes.
Query in question:
SELECT *, (SELECT COUNT(*) from table) AS count FROM table
Faster, same or slower than splitting it into two queries?
Thanks.
You can write this query as:
SELECT t.*, const.totalcount
FROM table t cross join
(select count(*) as totalcount from table) const;
This may or may not be faster than running two queries. Two queries involve "query running" overhead -- compiling the query, transmitting the data back and forth. This adds another column, so it increases the total amount of data in the result set.
Two queries is going to be faster. What you have is a dependent subquery, it's going to run for every record in the parent. If it's a MyISAM table, the subquery will be very fast and you may not notice it with a small number of records.
Do an EXPLAIN on it and see what MySQL reports back.
I have to select 4 rows randomly from a column.
Is is better to generate randomly 4 id and to perform 4 requests 'select column from database where id = ... '
Or to select all the rows in one request and to choose after?
If you are capable of generating random existing id's, I think the best approach is to use a clause like where id in (id1, id2, id3, id4). This will result in getting 4 records in one query, so no unnecessary query's or records are fetched.
As told before, where id in (id1, id2, id3, id4) is the fastest way from the MySQL perspective. How ever, you will need some logic in the application generating those IDs : All 4 IDs shall exist, be randomly distributed, and you want to avoid duplicates. In worst case you will be retrieving a list of all existend IDs with a huge query, extracting 4 random values, and querying again.
With all that logic to be done, it can be wise to move selection into MySQL:
SELECT * FROM foobar
ORDER BY RAND()
LIMIT 4;
You must understand that this is slow in mysql, but you have a speed gain in the application logic and can be sure to get random values equally seed all over your table.
EDIT:
The comment asks if PHP is fasten in this task then MySQL. Answer is no.
It is not done by "using rand". You need to have an array containing all those IDs in PHP. That is a huge query, lots of TCP traffic, huge array to be buildt in php, huge btree to be buildt by zend engine. Then, with the IDs, you must fire a second query to get the rows for those IDs.
Although the RAND() function may be slow, so far I have not had significant problems with speed. MY strategy is actually to join the database back to a query of itself returning a list of random IDs with a limit.
SELECT *
FROM table AS t1
JOIN (
SELECT rowID
FROM table
ORDER BY RAND()
LIMIT 4
) AS t2
WHERE t1.rowID = t2.rowID
There is also a more robust solution that exist - try checking out this question (asked in 2010).
suppose I have a table t and table t has 15000 entries
suppose the query
SELECT * FROM t WHERE t.nid <1000
returns 1000 rows
but then I only want the first 10 rows so I do a LIMIT
SELECT * FROM t WHERE t.nid <1000 LIMIT 10
is it possible to construct a single query in which in addition to returning the 10 rows information with the LIMIT clause above, it also returns the total count of the rows that satisfy the conditions set in the WHERE clause, hence in addition to returning the 10 rows above, it also returns 1000 since there are a total of 1000 rows satisfying the WHERE clause...and have both returned in a single query
Preferred solution
First of all, the found_rows() function is not portable (it is a MySQL extension) and is going to be removed. As user #Zveddochka pointed out, it has already been deprecated in MySQL 8.0.17.
But more importantly, it turns out that if you use proper indexing, then running two queries is actually faster. The SQL_CALC_FOUND_ROWS directive is achieved through a "virtual scan" that incurs an additional recovery cost. When the query is not indexed, then this cost would be the same of a COUNT(), and therefore running two queries will cost double - i.e., using SQL_CALC_FOUND_ROWS will make things run 50% faster.
But what happens when the query is properly indexed? The guys at Percona checked it out. And it turns out that not only the COUNT() is blazing fast since it only accesses metadata and indexes, and the query without SQL_CALC_FOUND_ROWS is faster because it doesn't incur any additional cost; the cost of the two queries combined is less than the cost of the enhanced single query:
Results with SQL_CALC_FOUND_ROWS are following: for each b value it
takes 20-100 sec to execute uncached and 2-5 sec after warmup. Such
difference could be explained by the I/O which required for this query
– mysql accesses all 10k rows this query could produce without LIMIT
clause.
The results are following: it takes 0.01-0.11 sec to run this query
first time and 0.00-0.02 sec for all consecutive runs.
So, as we can see, total time for SELECT+COUNT (0.00-0.15 sec) is much
less than execution time for original query (2-100 sec). Let’s take a
look at EXPLAINs...
So, what to do?
// Run two queries ensuring they satisfy exactly the same conditions
$field1 = "Field1, Field2, blah blah blah";
$field2 = "COUNT(*) AS rows";
$where = "Field5 = 'X' AND Field6 = 'Y' AND blah blah";
$cntQuery = "SELECT {$field2} FROM {$joins} WHERE {$where}";
$rowQuery = "SELECT {$field1} FROM {$joins} WHERE {$where} LIMIT {$limit}";
Now the first query returns the count, the second query returns the actual data.
Old answer (useful just for non-indexed tables)
Don't do this. If you find out this section of the answer works for you better than the section above, it's almost certainly a signal that something else is not optimal in your setup - most likely you're not using the indexes properly, or you need to update your MySQL server, or run an analyze/optimize of the database to update cardinality statistics.
You can, but I think it would be a performance killer.
Your best option would be to use the SQL_CALC_FOUND_ROWS MySQL extension and issue a second query to recover the full number of rows using FOUND_ROWS().
SELECT SQL_CALC_FOUND_ROWS * FROM t WHERE t.nid <1000 LIMIT 10;
SELECT FOUND_ROWS();
See e.g http://www.arraystudio.com/as-workshop/mysql-get-total-number-of-rows-when-using-limit.html
Or you could simply run the full query without LIMIT clause, and retrieve only the first ten rows. Then you can use one query as you wanted, and also get the row count through mysql_num_rows(). This is not ideal, but also not so catastrophic for most queries.
If you do this last, though, be very careful to close the query and free its resources: I have found out that retrieving less than the full resultset and forgetting to free the rs handle is one outstanding cause of "metadata locking".
You can try SQL_CALC_FOUND_ROWS, which can get a count of total records without running the statement again.
SELECT SQL_CALC_FOUND_ROWS * FROM t WHERE t.nid <1000 LIMIT 10; -- get records
SELECT FOUND_ROWS(); -- get count
Reference: http://dev.mysql.com/doc/refman/5.0/en/information-functions.html
"is it possible to construct a single query in which in addition to returning the 10 rows information with the LIMIT clause above, it also returns the total count of the rows that satisfy the conditions set in the WHERE clause"
Yes, it is possible to do both in single query, by using windowed function i.e. COUNT(*) OVER()(MySQL 8.0+):
SELECT t.*, COUNT(*) OVER() AS cnt
FROM t
WHERE t.nid <1000
LIMIT 10;
db<>fiddle demo
Sidenote:
LIMIT without explicit ORDER BY is non-deterministic. It could return different results between multiple runs.
There are many things that need discussing.
A LIMIT without an ORDER BY is somewhat unpredictable, hence somewhat meaningless.
But if you add an ORDER BY, it may need to find all the rows, sort them then deliver only the 10 you want.
Or, the ORDER BY may be handled adequately by an INDEX.
Your particular query, if turned into 2 queries (as needed after 8.0.17), would be
SELECT * FROM t WHERE t.nid < 1000 LIMIT 10;
SELECT COUNT(*) FROM t WHERE t.nid < 1000;
Note that each of those would benefit from INDEX(nid). The first would pick 10 items from the index's BTree, then look them up in the data's BTree -- only 10 rows touched in each. The second would scan the INDEX until it hits 1000, and not touch the data BTree.
If you add an ORDER BY as advised, then, the first query:
SELECT * FROM t WHERE t.nid < 1000 ORDER BY t.nid LIMIT 10;
will work identically as above. But
SELECT * FROM t WHERE t.nid < 1000 ORDER BY t.abcd LIMIT 10;
will need to scan lots of rows, and be quite slow. And probably use a temp table and filesort. (Check EXPLAIN for details.) INDEX(nid, abcd) would help, but only a little.
And there are other variants, such as when the index can be "covering".
What is the goal of having "one query"?
Speed? -- as discussed above, there are other factors that are more pertinent.
Consistency? -- You may need a transaction to avoid, for example, getting N rows from the first query and a smaller number from the COUNT.
BEGIN;
SELECT * ...
SELECT COUNT(*) ...
COMMIT;
Single command? -- Consider a stored procedure that combines the 2 statements. Or
SELECT * FROM t WHERE t.nid < 1000 LIMIT 10
UNION ALL
SELECT COUNT(*) FROM t WHERE t.nid < 1000;
but that gets tricky because the number of columns is different, so some kludge would be needed to make the second query have the same number of columns. Another variant involves GROUP BY WITH ROLLUP. (But it may be even harder to fabricate.)
Lukasz's Answer looks promising. However, it gives an extra column (which might be good) and its performance needs to be tested. If you are on 8.0 and their answer works well for you, accept that Answer.
Count(*) time complexity is O(1), so you can use a subquery
SELECT *, (SELECT COUNT(*) FROM t WHERE t.nid <1000) AS cnt
FROM t
WHERE t.nid <1000
LIMIT 10
Sounds like you want FOUND_ROWS()
SELECT SQL_CALC_FOUND_ROWS * FROM t WHERE t.nid <1000 LIMIT 10;
SELECT FOUND_ROWS();