Optimizing and indexing query with random component

Optimizing and indexing query with random component - php

I have the following query in the code I've inherited:
SELECT a.row2, a.row3
FROM table1 a
JOIN table2 b ON a.row1 = b.row1
WHERE b.row2 IN (
SELECT id
FROM table3
WHERE id IN ($table3_ids)
)
ORDER BY RAND();
[a.row1 is the primary key for table1]
Several questions:
Is there a more efficient way to structure this query?
I already have an index in table1 on (row1, row2, row4); is it redundant to make a separate index for (row1, row2, row3), or should I just replace the former with an index on (row1, row2, row3, row4)?
From the opposite end, I already have an index in table2 on (row1, row2, row3); since it would seem I need an index in table2 for (row1, row2) to optimize this query, would it be redundant to include an index that simply excludes a single element from a different index in the same table?
This is where I'm unclear on how the query engine can know which index is appropriate; when it parses the query, does it first check for matching indices in the table?
Lastly (and probably most simply answered), I'm adding indices with this syntax:
ALTER TABLE table_name ADD KEY (row1, row2, row3);
After creating the index, I'm then manually renaming each index descriptively. Is it possible to include the name of the index in the command?
Many thanks!

This is your query:
SELECT a.row2, a.row3
FROM table1 a JOIN
table2 b
ON a.row1 = b.row1
WHERE b.row2 IN (SELECT id FROM table3 WHERE id IN ($table3_ids))
ORDER BY RAND();
I think the best indexes are: table2(row2, row1) and table1(row1, row2, row3), and table3(id). You can add row4 to the table1 index, but it doesn't make a difference. Also, it is really odd that you named your columns "row" -- for me it results in cognitive dissonance.
Actually, unless you have a typo in your query, you can leave out table3 and just do:
WHERE b.row2 IN ($table3_ids)
Note that in ($table3_ids) requires a string substitution. This cannot be parameterized. That introduces a danger of SQL injection.
If your result set is more than a few hundred, maybe a few thousand rows, then the order by will be significant. If this is the case, you might want to try a different approach to getting the results you want.

Some additions to Gordon's answer:
The ALTER TABLE reference shows an optional index_name in the syntax.
IN ( SELECT ... ) is grossly inefficient; turn it into a JOIN:
SELECT a.row2, a.row3
FROM table1 a
JOIN table2 b ON a.row1 = b.row1
JOIN table3 c ON b.row2 = c.id
WHERE c.id IN ($table3_ids) )
ORDER BY RAND();
or...
SELECT a.row2, a.row3
FROM table1 a
JOIN table2 b ON a.row1 = b.row1
WHERE b.row2 IN ($table3_ids) )
ORDER BY RAND();
(A possible reason needing c: You are filtering on missing ids in c?)
ORDER BY RAND() is costly. It essentially cannot be optimized unless you also have a LIMIT.

Related

SELECT emails FROM table 1 only if table 2 doesn't have value [duplicate]

table1 (id, name)
table2 (id, name)
Query:
SELECT name
FROM table2
-- that are not in table1 already

SELECT t1.name
FROM table1 t1
LEFT JOIN table2 t2 ON t2.name = t1.name
WHERE t2.name IS NULL
Q: What is happening here?
A: Conceptually, we select all rows from table1 and for each row we attempt to find a row in table2 with the same value for the name column. If there is no such row, we just leave the table2 portion of our result empty for that row. Then we constrain our selection by picking only those rows in the result where the matching row does not exist. Finally, We ignore all fields from our result except for the name column (the one we are sure that exists, from table1).
While it may not be the most performant method possible in all cases, it should work in basically every database engine ever that attempts to implement ANSI 92 SQL

You can either do
SELECT name
FROM table2
WHERE name NOT IN
(SELECT name
FROM table1)
or
SELECT name
FROM table2
WHERE NOT EXISTS
(SELECT *
FROM table1
WHERE table1.name = table2.name)
See this question for 3 techniques to accomplish this

I don't have enough rep points to vote up froadie's answer. But I have to disagree with the comments on Kris's answer. The following answer:
SELECT name
FROM table2
WHERE name NOT IN
(SELECT name
FROM table1)
Is FAR more efficient in practice. I don't know why, but I'm running it against 800k+ records and the difference is tremendous with the advantage given to the 2nd answer posted above. Just my $0.02.

SELECT <column_list>
FROM TABLEA a
LEFTJOIN TABLEB b
ON a.Key = b.Key
WHERE b.Key IS NULL;
https://www.cloudways.com/blog/how-to-join-two-tables-mysql/

This is pure set theory which you can achieve with the minus operation.
select id, name from table1
minus
select id, name from table2

Here's what worked best for me.
SELECT *
FROM #T1
EXCEPT
SELECT a.*
FROM #T1 a
JOIN #T2 b ON a.ID = b.ID
This was more than twice as fast as any other method I tried.

Watch out for pitfalls. If the field Name in Table1 contain Nulls you are in for surprises.
Better is:
SELECT name
FROM table2
WHERE name NOT IN
(SELECT ISNULL(name ,'')
FROM table1)

You can use EXCEPT in mssql or MINUS in oracle, they are identical according to :
http://blog.sqlauthority.com/2008/08/07/sql-server-except-clause-in-sql-server-is-similar-to-minus-clause-in-oracle/

That work sharp for me
SELECT *
FROM [dbo].[table1] t1
LEFT JOIN [dbo].[table2] t2 ON t1.[t1_ID] = t2.[t2_ID]
WHERE t2.[t2_ID] IS NULL

You can use following query structure :
SELECT t1.name FROM table1 t1 JOIN table2 t2 ON t2.fk_id != t1.id;
table1 :
id
name
1
Amit
2
Sagar
table2 :
id
fk_id
email
1
1
amit#ma.com
Output:
name
Sagar

All the above queries are incredibly slow on big tables. A change of strategy is needed. Here there is the code I used for a DB of mine, you can transliterate changing the fields and table names.
This is the strategy: you create two implicit temporary tables and make a union of them.
The first temporary table comes from a selection of all the rows of the first original table the fields of which you wanna control that are NOT present in the second original table.
The second implicit temporary table contains all the rows of the two original tables that have a match on identical values of the column/field you wanna control.
The result of the union is a table that has more than one row with the same control field value in case there is a match for that value on the two original tables (one coming from the first select, the second coming from the second select) and just one row with the control column value in case of the value of the first original table not matching any value of the second original table.
You group and count. When the count is 1 there is not match and, finally, you select just the rows with the count equal to 1.
Seems not elegant, but it is orders of magnitude faster than all the above solutions.
IMPORTANT NOTE: enable the INDEX on the columns to be checked.
SELECT name, source, id
FROM
(
SELECT name, "active_ingredients" as source, active_ingredients.id as id
FROM active_ingredients
UNION ALL
SELECT active_ingredients.name as name, "UNII_database" as source, temp_active_ingredients_aliases.id as id
FROM active_ingredients
INNER JOIN temp_active_ingredients_aliases ON temp_active_ingredients_aliases.alias_name = active_ingredients.name
) tbl
GROUP BY name
HAVING count(*) = 1
ORDER BY name

See query:
SELECT * FROM Table1 WHERE
id NOT IN (SELECT
e.id
FROM
Table1 e
INNER JOIN
Table2 s ON e.id = s.id);
Conceptually would be: Fetching the matching records in subquery and then in main query fetching the records which are not in subquery.

First define alias of table like t1 and t2.
After that get record of second table.
After that match that record using where condition:
SELECT name FROM table2 as t2
WHERE NOT EXISTS (SELECT * FROM table1 as t1 WHERE t1.name = t2.name)

I'm going to repost (since I'm not cool enough yet to comment) in the correct answer....in case anyone else thought it needed better explaining.
SELECT temp_table_1.name
FROM original_table_1 temp_table_1
LEFT JOIN original_table_2 temp_table_2 ON temp_table_2.name = temp_table_1.name
WHERE temp_table_2.name IS NULL
And I've seen syntax in FROM needing commas between table names in mySQL but in sqlLite it seemed to prefer the space.
The bottom line is when you use bad variable names it leaves questions. My variables should make more sense. And someone should explain why we need a comma or no comma.

I tried all solutions above but they did not work in my case. The following query worked for me.
SELECT NAME
FROM table_1
WHERE NAME NOT IN
(SELECT a.NAME
FROM table_1 AS a
LEFT JOIN table_2 AS b
ON a.NAME = b.NAME
WHERE any further condition);

Slow process for IN clause in MySql SELECT Query that contain large numbers

I have written a query to fetch details from table1, which has this condition clause:
IN(number1,number2......
Up to 323 entries so far now. These numbers are the primary key of table1, which has been extracted from table2 and passed into the IN condition clause.
Due to this my query slows down and takes 13 seconds to run. Is there any other way to overcome this? If I give some constant values (like PK id), the query works in usual time.

You can also do it using LEFT JOIN:
For example:
SELECT T1.*
FROM Table1 T1 LEFT JOIN
Table2 T2 ON T1.numberfield = T2.numberfield
WHERE T2.someotherfield IS NOT NULL
This does the exact job of the query with IN.

try below-
select a.* from table1 a join table2 b
on a.parent_id=b.id;
Note: parent_id should be indexed in table 1 and assuming id will be prmary key of table b means already indexed.

How to join two large tables with different strings

I have two tables
table1
customer_id
101
102
103
and table2
customer_id country_id
AO-101 1
AO-102 2
AO-103 3
both the tables are very large tables I have used CONCAT(table1.customer_id) for joining with the table2
all the fields stated above are index fields
joining them and getting all the customer of country 1 is taking lot of time
Can anyone help me please?

You can try this mate:
SELECT * FROM table1
JOIN table2 ON CONCAT('AO-', table1.customer_id) = table2.customer_id
WHERE table2.country_id = 1;
or this one:
SELECT * FROM table2
JOIN (
SELECT CONCAT('AO-', customer_id) AS in_customer_id, table1.* FROM table1
) AS table1 ON table1.in_customer_id = table2.customer_id
WHERE table2.country_id = 1;

I believe the problem you are running into is HOW an index is stored.
The way to understand this is to literally think of a PHYSICAL index that sits NEXT to the table as alookup.
If you do something like "create index index_1 on table1(column_1)", what this is does is stores this right next to the table and before you run a query referencing that table, the the DBMS looks over the tables and your query and determines the best way to query the tables based on indexes, table sizes, etc.
Now, the index stores literally the exact value in the exact DATATYPE as the field, unless you cast the index as a different datatype.
Right now you are joining an integer field to a character field and right there, you are not going to get the same performance from the index as you cannot use the index purely as such - it has to be translated on the fly, so to speak.
So what I would do is type something like:
create index on table2(cast(replace(customer_id,'AO-','') as integer));
This should store an integer value as the INDEX so when joined to the integer primary key, the index should run fine.
Also, why don't you just store the same integer value instead of adding this 'AO-' thing?

mysql uses CONCAT() to concatenate strings
So we use following query:
ON tableTwo.query = concat('category_id=',tableOne.category_id)
Hope this helps to you.

You can write a subquery like this:
SELECT * FROM table1 JOIN
(SELECT SUBSTRING_INDEX(customer_id, '-', -1) AS customer_id, country_id
FROM table2) t2 USING customer_id;
I haven't tried it, but you might also be able to join directly:
ON SUBSTRING_INDEX(table2.customer_id, '-', -1) = table1.customer_id

Try This Code It's Working.
select * from table4 t4, table3 t3 where t4.cus_id in (CONCAT('A0-', t3.cus_id)) && t4.country=1 ;

MySQL error 1242 - Subquery returns more than 1 row

i have two tables in a DB with the following structure:
table 1: 3 rows - category_id, product_id and position
table 2: 3 rows - category_id, product_id and position
i am trying to set table 1 position to table 2 position where category and product id is the same from the tables.
below is the sql i have tried to make this happen but returns MySQL error 1242 - subquery returns more then 1 row
UPDATE table1
SET position = (
SELECT position
FROM table2
WHERE table1.product_id = table2.product_id AND table1.category_id = table2.category_id
)

The solution is very simple and it can be done in two simple steps. The first step is just a preview of what will be changed, to avoid destroying data. It can be skipped if you are confident of your WHERE clause.
Step 1: preview the changes
Join the tables using the fields you want to match, select everything for visual validation of the match.
SELECT t1.*, t2.*
FROM table1 t1
INNER JOIN table2 t2
ON t1.category_id = t2.category_id
AND t1.product_id = t2.product_id
You can also add a WHERE clause if only some of the rows must be modified.
Step2: do the actual update
Replace the SELECT clause and the FROM keyword with UPDATE, add the SET clause where it belongs. Keep the WHERE clause:
UPDATE table1 t1
INNER JOIN table2 t2
ON t1.category_id = t2.category_id
AND t1.product_id = t2.product_id
SET t1.position = t2.position
That's all.
Technical considerations
Indexes on the columns used on the JOIN clause on both tables are a must when the tables have more than several hundred rows. If the query doesn't have WHERE conditions then MySQL will use indexes only for the biggest table. Indexes on the fields used on the WHERE condition will speed up the query. Prepend EXPLAIN to the SELECT query to check the execution plan and decide what indexes do you need.
You can add SORT BY and LIMIT to further reduce the set of changed rows using criteria that cannot be achieved using WHERE (for example, only the most recent/oldest 100 rows etc). Put them on the SELECT query first to validate the outcome then morph the SELECT into an UPDATE as described.
Of course, indexes on the columns used on the SORT BY clause are a must.

You can run this query to see what is happening:
SELECT product_id, category_id, count(*), min(position), max(position)
FROM table2
GROUP BY product_id, category_id
HAVING COUNT(*) > 1;
This will give you the list of product_id, category_id pairs that appear multiple times in table2. Then you can decide what to do. Do you want an arbitrary value of position? Is the value of position always the same? Do you need to fix the table?
It is easy enough to fix the particular problem by using limit 1 or an aggregation function. However, you may really need to fix the data in the table. A fix looks like:
UPDATE table1 t1
SET t1.position = (SELECT t2.position
FROM table2 t2
WHERE t2.product_id = t1.product_id AND t2.category_id = t1.category_id
LIMIT 1
);

Mysql Query to check 3 tables for an existing row

What I want to do is to query three separate tables into one row which is identified by a unique reference. I don't really have full understanding of the Join clause as it seems to require some sort of related data from each table.
I know I can go about this the long way round, but can not afford to lose even a little efficiency. Any help would be greatly appreciated.
Table Structure
package_id int(8),
client_id int(8),
unique reference varchar (40)
Each of the tables have essentially the same structure. I just need to know how to query all three, for 1 row.

If you have few tables that are sharing the same or similar definition, you can use union or union all to treat them as one. This query will return rows from each table having requested reference. I've included OriginTable info in case your code will need to refer to original table for update or something else.
select 'TableA' OriginTable,
package_id,
client_id
from TableA
where reference = ?
union all
select 'TableB' OriginTable,
package_id,
client_id
from TableB
where reference = ?
union all
select 'TableC' OriginTable,
package_id,
client_id
from TableC
where reference = ?
You might extend select list with other columns, provided that they have the same data type, or are implicitly convertible to data type from first select.

Let's say you have 3 tables :
table1, table2 and table3 with structure
package_id int(8),
client_id int(8),
unique reference varchar (40)
Let's assume that column reference is unique key.
Then you can use this:
SELECT t1.exists_row ,t2.exists_row ,t3.exists_row FROM
(
(SELECT COUNT(1) as exists_row FROM table1 t1 WHERE
t1.reference = #reference ) t1,
(SELECT COUNT(1) as exists_row FROM table1 t2 WHERE
t2.reference = #reference ) t2,
(SELECT COUNT(1) as exists_row FROM table1 t3 WHERE
t3.reference = #reference ) t3
) a
;
Replace #reference with actual value of unique key
or when you provide output of
SHOW CREATE TABLE
I can rewrite SQL with actual query

It is entirely possible to create a join between tables using a where clause. In fact this is often what I do as I find it leads to clearer information of what you are actually doing, and if you don't get the results you expect you can debug it bit by bit.
That said however a join is certainly a lot quicker to write!
Please bear in mind I'm a bi rusty on SQL so I may have missed remembered, and I'm not going to include any code as you haven't said what DBMS you are using as they all have slightly different code.
The thing to remember is that the join functions on a column with the same data (and type) within it.
It is much easier if each table has the 'joining' field named the same, then it should be a matter of
join on <nameOfField>
However if you wish to use field that have different names in the different tables you will need to list the fully qualified names. ie tableName.FieldName
If you are having trouble with natural, inner and outer, left and right, you need to think of a venn diagram with the natural being the point of commonality between the tables. If you are using only 2 tables inner and outer are equivalent to left and right (with each table being a single circle in the venn diagram) and left and right being the order of the tables in your list in the main part of your select (the first being the left and the second being the right).
When you add a third table this is where you can select any of the cross over section using these keywords.
Again however I have always found it easier to do a primary select and create a temp table, then perform my next join using this temp table (so effectively only need to use natural or left and right again). Again I find this easier to debug.
The best thing is to experiment and see what you get in return. Without a diagram of your tables this is the best I can offer.
in brief...
nested selects where field = (select from table where field = )
and temp tables
are (I think) easier to debug... but do take more writting !
David.

array_of_tables[]; // contain name of each table
foreach(array_of_tables as $val)
{
$query="select * from `$val` where $condition "; // $conditon
$result=mysqli_query($connection,$query);
$result_row[]=mysqli_fetch_assoc($result); // if only one row going to return form each table
//check resulting array ,for your row
}

SELECT * FROM table1 t1 JOIN table2 t2 ON (t2.unique = t1.unique) JOIN table3 t3 ON (t3.unique = t1.unique) WHERE t1.unique = '?';
You could use a JOIN like this, assuming all three tables have the same unique column.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.