Check if row exists, The most efficient way? - php

SQL Queries /P1/
SELECT EXISTS(SELECT /p2/ FROM table WHERE id = 1)
SELECT /p2/ FROM table WHERE id = 1 LIMIT 1
SQL SELECT /P2/
COUNT(id)
id
PHP PDO Function /P3/
fetchColumn()
rowCount()
From the following 3 Parts, What is the best method to check if a row exists or not with and without the ability to retrieve data like.
Retrievable:
/Query/ SELECT id FROM table WHERE id = 1 LIMIT 1
/Function/ rowCount()
Irretrievable
/Query/ SELECT EXISTS(SELECT COUNT(id) FROM table WHERE id = 1)
/Function/ fetchColumn()
In your opinion, What is the best way to do that?

By best I guess you mean consuming the least resources on both MySQL server and client.
That is this:
SELECT COUNT(*) count FROM table WHERE id=1
You get a one-row, one-column result set. If that column is zero, the row was not found. If the column is one, a row was found. If the column is greater that one, multiple rows were found.
This is a good solution for a few reasons.
COUNT(*) is decently efficient, especially if id is indexed.
It has a simple code path in your client software, because it always returns just one row. You don't have to sweat edge cases like no rows or multiple rows.
The SQL is as clear as it can be about what you're trying to do. That's helpful to the next person to work on your code.
Adding LIMIT 1 will do nothing if added to this query. It is already a one-row result set, inherently. You can add it, but then you'll make the next person looking at your code wonder what you were trying to do, and wonder whether you made some kind of mistake.
COUNT(*) counts all rows that match the WHERE statement. COUNT(id) is slightly slower because it counts all rows unless their id values are null. It has to make that check. For that reason, people usually use COUNT(*) unless there's some chance they want to ignore null values. If you put COUNT(id) in your code, the next person to work on it will have to spend some time figuring out whether you meant anything special by counting id rather than *.
You can use either; they give the same result.

Related

INSERT INTO table SELECT not giving correct last_id

I have 2 tables with similar columns in MYSQL. I am copying data from one to another with INSERT INTO table2 SELECT * FROM table1 WHERE column1=smth. I have different columns as autoincrement and KEY in tables. When I use mysqli_insert_id i get the first one rather then last one inserted. Is there any way to get the last one?
Thanks
There is no inherit ordering of data in a relational database. You have to specify which field it is that you wish to order by like:
INSERT INTO table2
SELECT *
FROM table1
WHERE column1=smth
ORDER BY <field to sort by here>
LIMIT 1;
Relying on the order a record is written to a table is a very bad idea. If you have an auto-numbered id on table1 then just use ORDER BY id DESC LIMIT 1 to sort the result set by ID in descending order and pick the last one.
Updated to address OP's question about mysqli_insert_id
According to the Mysql reference the function called here is last_insert_id() where it states:
Important If you insert multiple rows using a single INSERT statement,
LAST_INSERT_ID() returns the value generated for the first inserted
row only. The reason for this is to make it possible to reproduce
easily the same INSERT statement against some other server.
Unfortunately, you'll have to do a second query to get the true "Last inserted id". Your best bet might be to run a SELECT COUNT(*) FROM table1 WHERE column1=smth; and then use that count(*) return to add to the mysqli_insert_id value. That's not great, but if you have high volume where this one function is getting hit a lot, this is probably the safest route.
The less safe route would be SELECT max(id) FROM table2 or SELECT max(id) FROM table2 Where column1=smth. But... again, depending on your keys and the number of times this insert is getting hit, this might be risky.

Cleanest way to check if a record exists

I am using the following code to check if a row exists in my database:
$sql = "SELECT COUNT(1) FROM myTable WHERE user_id = :id_var";
$stmt = $conn->prepare($sql);
$stmt->bindParam(':id_var', $id_var);
$stmt->execute();
if ($stmt->fetch()[0]>0)
{
//... many lines of code
}
All of the code works and the doubts I have are concerning if the previous code is clean and efficient or if there is room for improvement.
Currently there are two questions bugging me with my previous code:
Should I have a LIMIT 1 at the end of my SQL statement? Does COUNT(1) already limit the amount of rows found by 1 or does the server keep searching for more records even after finding the first one?
The if ($stmt->fetch()[0]>0). Would this be the cleanest way to fetch the information from the SQL Query and execute the "if conditional"?
Of course if anyone spots anything else that can improve my code, I would love your feedback.
Q: Should I have a LIMIT 1 at the end of my SQL statement? Does COUNT(1) already limit the amount of rows found by 1 or does the server keep searching for more records even after finding the first one?
Your SELECT COUNT() FROM query will return one row, if the execution is successful, because there is no GROUP BY clause. There's no need to add a LIMIT 1 clause, it wouldn't have any affect.
The database will search for all rows that satisfy the conditions in the WHERE clause. If the user_id column is UNIQUE, and there is an index with that as the leading column, or, if that column is the PRIMARY KEY of the table... then the search for all matching rows will be efficient, using the index. If there isn't an index, then MySQL will need to search all the rows in the table.
It's the index that buys you good performance. You could write the query differently, to get a usable result. But what you have is fine.
Q: Is this the cleanest...
if ($stmt->fetch()[0]>0)
My personal preference would be to avoid that construct, and break that up into two or more statements. The normal pattern...separate statement to fetch the row, and then do a test.
Personally, I would tend to avoid the COUNT() and just get a row, and test whether there was a row to fetch...
$sql = "SELECT 1 AS `row_exists` FROM myTable WHERE user_id = :id_var";
$stmt = $conn->prepare($sql);
$stmt->bindParam(':id_var', $id_var);
$stmt->execute();
if($stmt->fetch()) {
// row found
} else {
// row not found
}
$stmt->closeCursor();

limit vs exists vs count(*) vs count(id) in MySQL [duplicate]

This question already has answers here:
Is EXISTS more efficient than COUNT(*)>0?
(5 answers)
Closed 1 year ago.
I just want to know which one is the fastest.
What I'm trying to do is to just check if the data is existing on the table.
I've been using "LIMIT" most of the time but in your opinion or if you have basis, which one is the fastest to check if data is existing.
Example:
limit 1:
SELECT ID
FROM TABLE
WHERE ID=1 LIMIT 1;
exists:
SELECT EXISTS(
SELECT *
FROM TABLE
WHERE ID=1);
count(*):
SELECT (*)
FROM TABLE;
count(ID):
SELECT (ID)
FROM TABLE;"
Additional: I'm using InnoDB.
Limit is always the fastest, because it iterate one line of the table.
Exists has little difference with Limit because you just add another select statement, we can say it has the same efficiency as the first one.
Count will iterate all the table and count the result. When you use count(), by default, mysql count the primary key of the table. I've done some tests of count(id), count(), count(field) and count(1) in big table, there is no big difference. In my opinion, 'count' will always try to count the index unless the field you count is not an index, but many people said that we should use count(id) rather than use count(*).
In a small table, the four ways all work fine. But if you join with some big table, count will take a very very long time.
So in all, the time used is count(*) > count(id) >> exists > limit
I think they are all fine; except I would remove the WHERE ID = 1 clauses. If you ever clear your table and start re-inserting then ID 1 will not exist. Just LIMIT 1 will do the trick. Personally I don't favour the exists and count(*) options. I would prefer count(ID) then, as you would normally have an index on ID so I would expect that to run fairly quickly. To be sure, you would have to time them (on a really big table) - I expect them to come out something like exists, limit 1, count(id), count(*) from fastest to slowest. (I am in doubt about the exists though - if it actually evaluates the whole SELECT * it may come out worst).

Multiple 'Select column where ...' vs Select all the column

I have to select 4 rows randomly from a column.
Is is better to generate randomly 4 id and to perform 4 requests 'select column from database where id = ... '
Or to select all the rows in one request and to choose after?
If you are capable of generating random existing id's, I think the best approach is to use a clause like where id in (id1, id2, id3, id4). This will result in getting 4 records in one query, so no unnecessary query's or records are fetched.
As told before, where id in (id1, id2, id3, id4) is the fastest way from the MySQL perspective. How ever, you will need some logic in the application generating those IDs : All 4 IDs shall exist, be randomly distributed, and you want to avoid duplicates. In worst case you will be retrieving a list of all existend IDs with a huge query, extracting 4 random values, and querying again.
With all that logic to be done, it can be wise to move selection into MySQL:
SELECT * FROM foobar
ORDER BY RAND()
LIMIT 4;
You must understand that this is slow in mysql, but you have a speed gain in the application logic and can be sure to get random values equally seed all over your table.
EDIT:
The comment asks if PHP is fasten in this task then MySQL. Answer is no.
It is not done by "using rand". You need to have an array containing all those IDs in PHP. That is a huge query, lots of TCP traffic, huge array to be buildt in php, huge btree to be buildt by zend engine. Then, with the IDs, you must fire a second query to get the rows for those IDs.
Although the RAND() function may be slow, so far I have not had significant problems with speed. MY strategy is actually to join the database back to a query of itself returning a list of random IDs with a limit.
SELECT *
FROM table AS t1
JOIN (
SELECT rowID
FROM table
ORDER BY RAND()
LIMIT 4
) AS t2
WHERE t1.rowID = t2.rowID
There is also a more robust solution that exist - try checking out this question (asked in 2010).

Getting random results from large tables

I'm trying to get 4 random results from a table that holds approx 7 million records. Additionally, I also want to get 4 random records from the same table that are filtered by category.
Now, as you would imagine doing random sorting on a table this large causes the queries to take a few seconds, which is not ideal.
One other method I thought of for the non-filtered result set would be to just get PHP to select some random numbers between 1 - 7,000,000 or so and then do an IN(...) with the query to only grab those rows - and yes, I know that this method has a caveat in that you may get less than 4 if a record with that id no longer exists.
However, the above method obviously will not work with the category filtering as PHP doesn't know which record numbers belong to which category and hence cannot select the record numbers to select from.
Are there any better ways I can do this? Only way I can think of would be to store the record id's for each category in another table and then select random results from that and then select only those record ID's from the main table in a secondary query; but I'm sure there is a better way!?
You could of course use the RAND() function on a query using a LIMIT and WHERE (for the category). That however as you pointed out, entails a scan of the database which takes time, especially in your case due to the volume of data.
Your other alternative, again as you pointed out, to store id/category_id in another table might prove a bit faster but again there has to be a LIMIT and WHERE on that table which will also contain the same amount of records as the master table.
A different approach (if applicable) would be to have a table per category and store in that the IDs. If your categories are fixed or do not change that often, then you should be able to use that approach. In that case you will effectively remove the WHERE from the clause and getting a RAND() with a LIMIT on each category table would be faster since each category table will contain a subset of records from your main table.
Some other alternatives would be to use a key/value pair database just for that operation. MongoDb or Google AppEngine can help with that and are really fast.
You could also go towards the approach of a Master/Slave in your MySQL. The slave replicates content in real time but when you need to perform the expensive query you query the slave instead of the master, thus passing the load to a different machine.
Finally you could go with Sphinx which is a lot easier to install and maintain. You can then treat each of those category queries as a document search and let Sphinx randomize the results. This way you offset this expensive operation to a different layer and let MySQL continue with other operations.
Just some issues to consider.
Working off your random number approach
Get the max id in the database.
Create a temp table to store your matches.
Loop n times doing the following
Generate a random number between 1 and maxId
Get the first record with a record Id greater than the random number and insert it into your temp table
Your temp table now contains your random results.
Or you could dynamically generate sql with a union to do the query in one step.
SELECT * FROM myTable WHERE ID > RAND() AND Category = zzz LIMIT 1
UNION
SELECT * FROM myTable WHERE ID > RAND() AND Category = zzz LIMIT 1
UNION
SELECT * FROM myTable WHERE ID > RAND() AND Category = zzz LIMIT 1
UNION
SELECT * FROM myTable WHERE ID > RAND() AND Category = zzz LIMIT 1
Note: my sql may not be valid, as I'm not a mySql guy, but the theory should be sound
First you need to get number of rows ... something like this
select count(1) from tbl where category = ?
then select a random number
$offset = rand(1,$rowsNum);
and select a row with offset
select * FROM tbl LIMIT $offset, 1
in this way you avoid missing ids. The only problem is you need to run second query several times. Union may help in this case.
For MySQl you can use
RAND()
SELECT column FROM table
ORDER BY RAND()
LIMIT 4

Categories