I tried to compare two zipcode columns between two tables to see if values were missing in the second one.
I first wanted to do it with mysql, my query was something like
'SELECT code FROM t1 WHERE t1 NOT IN (select code FROM t2)'
But it was really slow so I tried another way :
I made two select, and then compared the results with array_diff().
With mysql : few minutes, and sometimes crash
With PHP : less than 1 second.
Can someone explain these differences ?
Is my SQL query wrong ?
If your main table has 50k rows, using a sub select in your query will result into 1 + 50k executions of selects. One for the first table, and 50k selects, one for each row. The server compares the row with your sub select that is reloaded every time iterating the main table. This is why your sql code takes its time and it also may be a huge memory problem as well.
See serjoschas information about joins to fix it in sql, it should be even faster that your php solution.
Checking which values are missing within a table (compared to another) can easily be done with a LEFT or RIGHT JOIN they are just made for actions like this.. alternatively take a look at this: How to Find Missing Value Between Two Mysql Tables – serjoscha
One solution to:
SELECT code FROM t1
WHERE code NOT IN ( SELECT code FROM t2 )
will be:
SELECT t1.code
FROM t1
LEFT JOIN t2
ON t1.code = t2.code
WHERE t2.code is null
Have a try. Also have a look on indexing as Cyclone suggests:
If you don't have an index you should definitly add one since this will speed up your query. You could add an index like this: ALTER TABLE ADD INDEX code_idx (code) this should be done for both tables. If you then were to execute EXPLAIN for the query you would see something like Using where; Using index; Using join buffer which is good – Cyclone
Indexing speeds up your query. If the table only provides one column, searching an index table with the same content as the source table will be exactly the same and redundant. Otherwise I strongly recommend indexing the code column of t2 which leads to a high increase of performance and less memory consumtion.
Related
I would like to count the number of rows in a statement returned by a query. The only solutions I found were:
sqlsrv_num_rows() This one seems a bit too complicated for such a simple task like this and I read that using this slows down the execution quite a bit
Executing a query with SELECT COUNT This method seems unnecessary, also it slows down the execution and if you already have a statement why bother with another query.
Counting the rows while generating a table As I have to generate a html table from the statemnt I could put a variable in the table generating loop and increment it by one, but this one only works when you already have to loop through the entire statement.
Am I missing some fundamental function and/or knowledge or is there no simpler way?
Any help or guidance is appreciated.
EDIT: The statement returned is only a small portion of the original table so it wouldn't be practical to execute another query for this purpose.
In sql server table rows information is stored in the catalog views and Dynamic Management Views you can use it to find the count
This method will only work for the physical tables. So you can store the records in one temp table and drop it later
SELECT Sum(p.rows)
FROM sys.partitions AS p
INNER JOIN sys.tables AS t
ON p.[object_id] = t.[object_id]
INNER JOIN sys.schemas AS s
ON t.[schema_id] = s.[schema_id]
WHERE p.index_id IN ( 0, 1 ) -- heap or clustered index
AND t.NAME = N'tablename'
AND s.NAME = N'dbo';
For more info check this article
If you don't want to execute another query then use select ##rowcount after the query. It will get the count of rows returned by previous select query
select * from query_you_want_to_find_count
select ##rowcount
I'm using PDO, and I need to know how many rows are returned with a SELECT statement. My question is, is the following slower, the same, or faster than doing it in two queries? PHPMyAdmin will tell me how long just the SELECT statement, takes, but not just the COUNT statement, so I'm having trouble telling how long a query takes.
Query in question:
SELECT *, (SELECT COUNT(*) from table) AS count FROM table
Faster, same or slower than splitting it into two queries?
Thanks.
You can write this query as:
SELECT t.*, const.totalcount
FROM table t cross join
(select count(*) as totalcount from table) const;
This may or may not be faster than running two queries. Two queries involve "query running" overhead -- compiling the query, transmitting the data back and forth. This adds another column, so it increases the total amount of data in the result set.
Two queries is going to be faster. What you have is a dependent subquery, it's going to run for every record in the parent. If it's a MyISAM table, the subquery will be very fast and you may not notice it with a small number of records.
Do an EXPLAIN on it and see what MySQL reports back.
I have a tableA this contains the following structure
I modified this structure into tableB like below to reduce number of rows and the category is fixed length
Assume I have 21 lakh data in tableA after modified into new structure tableB contains 70k rows only
In some case I want to SUM all the values into the table,
QUERY1: SELECT SUM(val) AS total FROM tableA;
vs
QUERY2: SELECT SUM(cate1+cate2+cate3) AS total FROM tableB;
QUERY1 is executing faster while comparing to QUERY2.
tableB contains less rows while comparing to tableA
As of my expectation QUERY2 is faster but QUERY1 is the fastest one.
Help me to understand why the performance is reduced in QUERY2?
MySQL is optimized to speed up relational operations. There is not so much effort at speeding up the other kinds of operations MySQL can perform. Cate1+Cate2+Cate3 is a perfectly legitimate operation, but there's nothing particularly relational about it.
Table1 is actually simpler in terms of the relational model of data than Table2, even though Table1 has more rows. It's worth noting in passing that Table1 conforms to first normal form but Table2 does not. Those three columns are really a repeating group even though it's been made to look like they are not.
So First Normal form is good for you in terms of performance (most of the time).
In your first query, mysql just need to do the summation. (1 process)
In your second query, mysql first need an arithmetic addition along three columns , then do a summation through the results.(2 process).
I have a SQL query that has 4 UNIONS and 4 LEFT JOINS. It is layed out as such:
SELECT ... FROM table1
LEFT JOIN other_table1
UNION SELECT ... FROM table2
LEFT JOIN other_table2
UNION SELECT ... other_table3
LEFT JOIN other_table3
UNION SELECT ... FROM table4
LEFT JOIN other_table4
Would it be better to run 4 separate queries and then merge the results with php after the fact? Or should I keep them together? Which would provide that fastest execution?
The most definitive answer is to test each method, however the UNION is most likely to be faster as only one query is run by MySQL as opposed to 4 for each part of the union.
You also remove the overhead of reading the data into memory in PHP and concatenating it. Instead, you can just do a while() or foreach() or whatever on one result.
In this case, it depends on the number of records you are going to get out of the result. Since you are using left join in all unions, I suggest to do different fetch to avoid bottleneck in SQL and merge the results in PHP
When a query is executed from a programming language, following steps occur
A connection is created to between application and database (or an existing connection is used from pool)
Query is sent to database
Database sends the result back
Connection is released to pool
If you are running N number of queries, above steps happen N number of times, which you can guess will definitely slow down the process. So ideally we should keep number of queries to as minimum as possible.
It will make sense to break a query into multiple parts if single query becomes complex and it gets difficult to maintain and takes a lot of time to execute. In that case too, good way will be to optimize the query itself.
As in your case, query is pretty simple, and as someone has pointed out that union will also help removing duplicate rows, the best way is to go for sql query than php code. Try optimization techniques like creating proper indexes on tables.
The UNION clause can be faster, because it will return distinct records at once (duplicated records won't be returned), otherwise you will need to do it in the application. Also, in this case it may help to reduce a traffic.
From the documentation:
The default behavior for UNION is that duplicate rows are removed from
the result. The optional DISTINCT keyword has no effect other than the
default because it also specifies duplicate-row removal. With the
optional ALL keyword, duplicate-row removal does not occur and the
result includes all matching rows from all the SELECT statements.
You can mix UNION ALL and UNION DISTINCT in the same query. Mixed UNION
types are treated such that a DISTINCT union overrides any ALL union
to its left. A DISTINCT union can be produced explicitly by using
UNION DISTINCT or implicitly by using UNION with no following DISTINCT
or ALL keyword.
At the moment, I select rows from 'table01 and table02' using:
SELECT t1.*,t2.* FROM table01 AS t1
INNER JOIN table02 AS t2 ON (t1.ID = t2.t1ID)
WHERE t1.UUID = 'whatever';
The UUID column is a unique index, type: char(15), with alphanumeric input. I know this isn't the fastest way to select data from the database, but the UUID is the only row-identifier that is available to the front-end.
Since I have to select by UUID, and not ID, I need to know what of these two options I should go for, if say the table consists of 100'000 rows. What speed differences would I look at, and would the index for the UUID grow to large, and lag the DB?
Get the ID before doing the "big" select
1. $id = SELECT ID FROM table01 WHERE UUID = '{alphanumeric character}';
2. SELECT t1.*,t2.* FROM table01 AS t1
INNER JOIN table02 AS t2 ON (t1.ID = t2.t1ID)
WHERE t1.ID = $id;
Or keep it the way it is now, using the UUID.
2. SELECT t1.*,t2.* FROM table01 AS t1
INNER JOIN table02 AS t2 ON (t1.ID = t2.t1ID)
WHERE t1.UUID = 'whatever';
Side note: All new rows are created by checking if the system generated uniqueid exists before trying to insert a new row. Keeping the column always unique.
Why not just try it out? Create a new db with those tables. Write a quick php script to populate the tables with more records than you can imagine being stored (if you're expecting 100k rows, insert 10 million). Then experiment with different indexes and queries (remember, EXPLAIN is your friend)...
When you finally get something you think works, put the query into a script on a webserver and hit it with ab (Apache Bench). You can watch what happens as you increase the concurrency of the requests (1 at a time, 2 at a time, 10 at a time, etc).
All this shouldn't take too long (maybe a few hours at most), but it will give you a FAR better answer than anyone at SO could for your specific problem (as we don't know your DB server config, exact schema, memory limits, etc)...
The second solution have the best performance. You will need to look up the row by the UUID in both solutions, but in the first solution you first do it by UUID, and then do a faster lookup by primary key, but then you've already found the right row by UUID so it doesn't matter that the second lookup is faster because the second lookup is unnecessary altogether.