I need to update several columns in one table, based on columns in another. To start with I am just updating one of them. I have tried 2 ways of doing this, which both work, but they are taking about 4 minutes using mySQL commands, and over 20 when run in php. Both tables are about 20,000 rows long.
My question is, is there a better or more efficient way of doing this?
Method 1:
UPDATE table_a,table_b
SET table_a.price = table_b.price
WHERE table_a.product_code=table_b.product_code
Method 2:
UPDATE table_a INNER JOIN table_b
ON table_a.product_code = table_b.product_code
SET table_a.price=table_b.price
I guess that these basically work in the same way, but I thought that the join would be more efficient. The product_code column is random text, albeit unique and every row matches one in the other table.
Anything else I can try?
Thanks
UPDATE: This was resolved by creating an index e.g.
CREATE UNIQUE INDEX index_code on table_a (product_code)
CREATE UNIQUE INDEX index_code on table_b (product_code)
If your queries are running slowly you'll have to examine the data that query is using.
Your query looks like this:
UPDATE table_a INNER JOIN table_b
ON table_a.product_code = table_b.product_code
SET table_a.price=table_b.price
In order to see where the delay is you can do
EXPLAIN SELECT a.price, b.price FROM table_b b
INNER JOIN table_a a ON (a.product_code = b.product_code)
This will tell you if indexes are being used, see the info on EXPLAIN and more info here.
In your case you don't have any indexes (possible keys = null) forcing MySQL to do a full table scan.
You should always do an explain select on your queries when slowness is an issue. You'll have to convert non-select queries to a select, but that's not difficult, just list all the changed fields in the select clause and copy join and where clauses over as is.
Related
I have PHP system that runs a MYSQL query like below
select
order.id,
order.name,
order.date,
customer.name,
items.coupon_code,
from order
left join customer on order.custid = customer.id
left join items on items.coupon_code = order.coupon_code
where items.coupon_new_code is null
and order.status = 1000
AND order.promo_code in (1,2)
order table has 800K records and items table has 300k records. When I run this the query takes about 9 hours to finish!
If I comment the left join to the items table then the query runs in a few seconds! I am not very efficient with MySQL joins and would really really appreciate if someone can tell me how I can optimise this query to run in an acceptable time frame.
Try changing
LEFT JOIN to INNER JOIN (or just JOIN)
This will work to speed things up assuming that you only want to see orders that have both customers and items associated with them. Currently your query is trying to return all data from the order table but that's not needed. It's possible other changes to the database structure could improve things as well.
The top answer here provides a useful diagram that demonstrates the difference between these types of statements.
At the very least you need an index on coupon_code on both order and items tables. Consider also adding to a compound index, the other field you are joining on custid, as well as on your WHERE conditions items.coupon_new_code, order.status and order.promo_code. Knowing next to nothing about your data I can only speculate about what the dbms will use. Try various combinations in a compound key and run explain to see what's being used. It's really going to depend on the specificity of the data in your columns.
Posting the output of EXPLAIN along with the tables' schema will help us improve these answers.
I have two tables tableOne = 90K data and tableTwo = 100k data, i will look for the duplicate numbers on both tables with the given conditions and the matching must be 1:1 if multiple match are on the other table only one will be tagged as match (given that the data on both tables has match data).
I have this select statement below, but when i run it on my local xampp and even on CMD the screen freezes after i press enter then it takes hours before it returns an error out of memory. Hope you can help me with this.
SELECT rNum,
cDate,
cTime,
aNumber,
bNumber,
duration,
tag,
aNumber2,
bNumber2,
'hasMatch',
concatDate,
timeMinutes
FROM tableOne a
LEFT JOIN
tableTwo b ON a.aNumber2 = b.aNumber2
AND a.bNumber2 = b.bNumber2
WHERE a.hasMatch = 'valid'
AND (a.duration - b.duration) <= 3
AND (a.duration - b.duration) >= -3
AND TIMEDIFF(a.concatDate,b.concatDate) <= 3
AND TIMEDIFF(a.concatDate,b.concatDate) >= -3
Thank you In advance.
If you're doing 1:1 relationship with two tables then I think you should probably go with INNER JOIN rather than LEFT JOIN
Secondly, your query doesn't seem to be indexed properly. So, better would be using EXPLAIN SELECT ... to see the profile of SQL and create INDEXES for Filters.
in your SELECT you have aNumber2 and based on your join rule both table a and table b have aNumber2 column. it's a problem. if two table have a column with the same name, on select you should specify the table.
for example like this
SELECT a.aNumber2 as a_number2,....
in your query the same problem exists for other columns like duration and concatDate
another thing is you should use INNER JOIN in your case instead of LEFT JOIN.
if you final result have many rows(thousands), take them step by step... add LIMIT to your example and take 100 result each time.
There are many questions on how to find duplicates in a database, but not with the specific problem that I have.
I have a table with approx. 120000 entries. I need to find duplicates. To find them, I use a php script that is structured like the following:
//get all entries from database
//loop through them
//get entries with greater id
//compare all of them with the original one
//update database (delete duplicate, update information in linked tables, etc.)
It is not possible to sort out all duplicates already in the initial query, because I have to loop through all entries since my duplicate search is sensitive not only to entries that are 100% alike, but also entries that are 90% alike. I use similar_text() for that.
I think the first loop is okay, but looping through all other entries within the loop is just too much. With 120000 entries this would be close to (120000^2)/2 iterations.
So instead of using a loop within the loop, there must be a better way to do it. Do you have any ideas? I thought about using in_array(), but it is not sensitive to something like 90% string similarity, and also doesn't give me the array's fields it found the duplicates in - I would need those to get the entries' ids to update the database correctly.
Any ideas?
Thank you very much!
Charles
UPDATE 1
The query I am using right now is the following:
SELECT a.host_id
FROM host_webs a
JOIN host_webs b ON a.host_id != b.host_id AND a.web = b.web
GROUP BY a.host_id
It shows originals and duplicates perfectly, but I need to get rid of the originals, i.e. the first ones found with the associated data. How can I accomplish that?
You can JOIN the table onto itself and do it all in SQL (I know you say you don't think you can, but I would be surprised if this is the case). All you need to do is put all the columns you use to test for duplicates into the ON clause of the JOIN.
SELECT id
FROM tablename a
JOIN tablename b ON a.id != b.id AND a.col1 = b.col1 AND a.col2 = b.col2
GROUP BY id
This will return just the ids of the rows where col1 and col2 are duplicated. You can incorporate whatever string comparisons you need into this, the ON clause can be as complicated as you need it to be. For example:
SELECT id
FROM tablename a
JOIN tablename b ON a.id != b.id AND
(a.col1 = b.col1 AND (a.col2 = b.col2 OR a.col3 = b.col3))
OR ((a.col1 = b.col1 OR a.col2 = b.col2) AND a.col3 = b.col3)
OR (SOUNDEX(a.col1) = SOUNDEX(b.col1) AND SOUNDEX(a.col2) = SOUNDEX(b.col2) AND SOUNDEX(a.col3) = SOUNDEX(b.col3))
GROUP BY id
EDIT
Since all you are actually doing with your query is looking for rows where the web column is identical, this would do the job of finding only the duplicates and not the original "good" records - assuming host_id is numeric and that the "good" record would be the one with the lowest host_id:
SELECT b.host_id
FROM host_webs a
INNER JOIN host_webs b ON b.web = a.web AND b.host_id > a.host_id
GROUP BY b.host_id
I imagine the end game here would be to remove the duplicates, so if you are feeling brave you could actually delete them in one go:
DELETE b.*
FROM host_webs a
INNER JOIN host_webs b ON b.web = a.web AND b.host_id > a.host_id
The GROUP BY is not necessary in the DELETE statement because it doesn't matter if you try and delete the same row more than once in a single statement.
If you're doing a 1-time removal of duplicate items, I wouldn't bother writing a php script - it's cleaner to do it in sql.
The general algorithm for removing duplicates that I find works the best is:
1. duplicate the table
2. truncate the original table
3. set a unique index on whichever columns need to be unique
4. reinsert the rows using either INSERT IGNORE INTO original_table SELECT * FROM duplicate_table OR REPLACE INTO original_table SELECT * FROM duplicate table
5. fixed linked tables - remove orphaned rows (DELETE x FROM x LEFT JOIN original TABLE ON (...) WHERE original_table.id IS NULL)
What I want to do is to query three separate tables into one row which is identified by a unique reference. I don't really have full understanding of the Join clause as it seems to require some sort of related data from each table.
I know I can go about this the long way round, but can not afford to lose even a little efficiency. Any help would be greatly appreciated.
Table Structure
package_id int(8),
client_id int(8),
unique reference varchar (40)
Each of the tables have essentially the same structure. I just need to know how to query all three, for 1 row.
If you have few tables that are sharing the same or similar definition, you can use union or union all to treat them as one. This query will return rows from each table having requested reference. I've included OriginTable info in case your code will need to refer to original table for update or something else.
select 'TableA' OriginTable,
package_id,
client_id
from TableA
where reference = ?
union all
select 'TableB' OriginTable,
package_id,
client_id
from TableB
where reference = ?
union all
select 'TableC' OriginTable,
package_id,
client_id
from TableC
where reference = ?
You might extend select list with other columns, provided that they have the same data type, or are implicitly convertible to data type from first select.
Let's say you have 3 tables :
table1, table2 and table3 with structure
package_id int(8),
client_id int(8),
unique reference varchar (40)
Let's assume that column reference is unique key.
Then you can use this:
SELECT t1.exists_row ,t2.exists_row ,t3.exists_row FROM
(
(SELECT COUNT(1) as exists_row FROM table1 t1 WHERE
t1.reference = #reference ) t1,
(SELECT COUNT(1) as exists_row FROM table1 t2 WHERE
t2.reference = #reference ) t2,
(SELECT COUNT(1) as exists_row FROM table1 t3 WHERE
t3.reference = #reference ) t3
) a
;
Replace #reference with actual value of unique key
or when you provide output of
SHOW CREATE TABLE
I can rewrite SQL with actual query
It is entirely possible to create a join between tables using a where clause. In fact this is often what I do as I find it leads to clearer information of what you are actually doing, and if you don't get the results you expect you can debug it bit by bit.
That said however a join is certainly a lot quicker to write!
Please bear in mind I'm a bi rusty on SQL so I may have missed remembered, and I'm not going to include any code as you haven't said what DBMS you are using as they all have slightly different code.
The thing to remember is that the join functions on a column with the same data (and type) within it.
It is much easier if each table has the 'joining' field named the same, then it should be a matter of
join on <nameOfField>
However if you wish to use field that have different names in the different tables you will need to list the fully qualified names. ie tableName.FieldName
If you are having trouble with natural, inner and outer, left and right, you need to think of a venn diagram with the natural being the point of commonality between the tables. If you are using only 2 tables inner and outer are equivalent to left and right (with each table being a single circle in the venn diagram) and left and right being the order of the tables in your list in the main part of your select (the first being the left and the second being the right).
When you add a third table this is where you can select any of the cross over section using these keywords.
Again however I have always found it easier to do a primary select and create a temp table, then perform my next join using this temp table (so effectively only need to use natural or left and right again). Again I find this easier to debug.
The best thing is to experiment and see what you get in return. Without a diagram of your tables this is the best I can offer.
in brief...
nested selects where field = (select from table where field = )
and temp tables
are (I think) easier to debug... but do take more writting !
David.
array_of_tables[]; // contain name of each table
foreach(array_of_tables as $val)
{
$query="select * from `$val` where $condition "; // $conditon
$result=mysqli_query($connection,$query);
$result_row[]=mysqli_fetch_assoc($result); // if only one row going to return form each table
//check resulting array ,for your row
}
SELECT * FROM table1 t1 JOIN table2 t2 ON (t2.unique = t1.unique) JOIN table3 t3 ON (t3.unique = t1.unique) WHERE t1.unique = '?';
You could use a JOIN like this, assuming all three tables have the same unique column.
I read rows from some mssql table via PHPs PDO.
Some rows, are brought twice, exactly same rows, with exactly the same id values
This happens to specific rows. Each time I run my import script, the issue happens on the very same rows. For example, after bringing some 16,000 rows correctly, one row, the same one each time, is brought twice.
The duplication occurs in a row. The line is brought, and the next fetch() request returns the very same row.
When I run:
select * from MY_TABLE where id='the problematic id'
only one row is returned, not two
Any ideas what (the hell) can go on here?
Thank you very much guys
edit:
The query that is being run:
select o.accountid, c.contactid, o.opportunityid, o.createdate, o.modifydate, o.createuser, o.modifyuser, o.description, o.projclosedate, o.notes, o.accountmanagerid
from sysdba.opportunity o
left join sysdba.opportunity_contact oc on o.opportunityid = oc.opportunityid and oc.salesrole = 'speaker' ";
left join sysdba.contact c on c.contactid = oc.contactid
where o.status <> 'Inactive'
order by o.opportunityid asc;
I think you need to join your contact table to your opportunity table. It seems that you might not have a 1 to 1 mapping between those tables the way you have it set up. See below:
--This should reference the "o" table but it doesn't.
left join sysdba.contact c on c.contactid = oc.contactid
If that's not the case then you should really be joining around the opportunity_contact table instead (put it as your 'from' table).