MYSQL Delete From Based on Multiple Distinct Columns

MYSQL Delete From Based on Multiple Distinct Columns - php

I have this problem that's been killing me for a couple days now.
So we have a table of all processed orders.
We have a table for all orders that come in.
We need to effectively cross-reference the orders in the new table that is continually updating against the orders already completely in the primary table so that we don't complete the same order multiple times.
After we get a batch of new orders, this is the query that I currently run in an attempt to cross reference it with the table of completed orders:
$sql = "DELETE
FROM
`orders_new`
WHERE
`order` IN (
SELECT DISTINCT
`order`
FROM
`orders_all`
)
AND `name` IN (
SELECT DISTINCT
`name`
FROM
`orders_all`
)
AND `jurisdiction` IN (
SELECT DISTINCT
`jurisdiction`
FROM
`orders_all`
)";
As you can probably tell, I want to delete rows from the "orders_new" table where a row with the same order, name, and jurisdiction already exists in the "orders_all" table.
Is this the right way to handle this sort of query?

Well, the right way depends on many things.
But first, I do not like your division into two tables. In that case I would introduce a column identfying state, that woul reference a table with possible states. Those would be "new", "in process", "completed". That way you have one order stored as only one record as it should be.
But your query migt be ok, but you should check the performance.
Take a look at: https://sqlperformance.com/2012/12/t-sql-queries/left-anti-semi-join
Not exactly your case but very similar.
Another thing: Why do you use DISTINCT. That would imply that "order" is not a unique identifier.
Based on your edit you identify the order with composite key "order", "name", "jurisdiction". Is this really the key, the whole key and nothing but the key so help you Codd. If not, you could delete a bunch of records. But even so your query would delete an all orders for which the order, name and jurisdiction can be found in table order IN DIFFERENT RECORDS. So your query is false.
Saying that, a variant of your query might be
DELETE order_new
FROM
order_new
INNER JOIN
order_all ON order_all.order = order_new.order
AND order_all.name = order_new.name
AND order_all.jurisdiction = order_new.jurisdiction
But, the real problem is your ER model.

No, your query will delete any record where there are any records with the same order, name, and jurisdiction, even if those records are different from one another. In other words, a row in orders_new will be deleted if one row in order_all has the same order, a different one has the same name, and a third one has the same jurisdiction. You are very very likely to delete way more than you want to. Instead, this would be more appropriate:
DELETE FROM `orders_new`
WHERE (`order`, `name`, jurisdiction`) IN (
SELECT `order`, `name`, `jurisdiction`
FROM `orders_all`
)
or maybe
DELETE FROM `orders_new`
WHERE EXISTS (
SELECT 1
FROM `orders_all` AS oa
WHERE oa.`order` = `orders_new`.`order`
AND oa.`name` = `orders_new`.`name`
AND oa.`jurisdiction` = `orders_new`.`jurisdiction`
)

You should convert that to a DELETE - JOIN construct like
DELETE `orders_new`
FROM `orders_new`
INNER JOIN `orders_all` ON `orders_new`.`order` = `orders_all`.`order`
AND `orders_new`.`name` = `orders_all`.`name`
AND `orders_new`.`jurisdiction` = `orders_all`.`jurisdiction`;

Related

Duplicate records in MySQL. EXISTS check for the same data not working properly?

SELECT EXISTS
(SELECT * FROM table WHERE deleted_at IS NULL and the_date = '$the_date' AND company_name = '$company_name' AND purchase_country = '$p_country' AND lot = '$lot_no') AS numofrecords")
What is wrong with this mysql query?
It is still allowing duplicates inserts (1 out of 1000 records). Around 100 users making entries, so the traffic is not that big, I assume. I do not have access to the database metrics, so I can not be sure.

The EXISTS condition is use in a WHERE clause. In your case, the first select doesn't specify the table and the condition.
One example:
SELECT *
FROM customers
WHERE EXISTS (SELECT *
FROM order_details
WHERE customers.customer_id = order_details.customer_id);
Try to put your statement like this, and if it returns the data duplicated, just use a DISTINCT. (SELECT DISCTINCT * .....)

Another approach for you :
INSERT INTO your_table VALUES (SELECT * FROM table GROUP BY your_column_want_to_dupplicate);

The answer from #Nick gave the clues to solve the issue. Separated EXIST check and INSERT was not the best way. Two users were actually able to do INSERT, if one got 0. A single statement query with INSERT ... ON DUPLICATE KEY UPDATE... was the way to go.

JOIN query too slow on real database, on small one it runs fine

I need help with this mysql query that executes too long or does not execute at all.
(What I am trying to do is a part of more complex problem, where I want to create PHP cron script that will execute few heavy queries and calculate data from the results returned and then use those data to store it in database for further more convenient use. Most likely I will make question here about that process.)
First lets try to solve one of the problems with these heavy queries.
Here is the thing:
I have table: users_bonitet. This table has fields: id, user_id, bonitet, tstamp.
First important note: when I say user, please understand that users are actually companies, not people. So user.id is id of some company, but for some other reasons table that I am using here is called "users".
Three key fields in users_bonitet table are: user_id ( referencing user.id), bonitet ( represents the strength of user, it can have 3 values, 1 - 2 - 3, where 3 is the best ), and tstamp ( stores the time of bonitet insert. Every time when bonitet value changes for some user, new row is inserted with tstamp of that insert and of course new bonitet value.). So basically some user can have bonitet of 1 indicating that he is in bad situation, but after some time it can change to 3 indicating that he is doing great, and time of that change is stored in tstamp.
Now, I will just list other tables that we need to use in query, and then I will explain why. Tables are: user, club, club_offer and club_territories.
Some users ( companies ) are members of a club. Member of the club can have some club offers ( he is representing his products to the people and other club members ) and he is operating on some territory.
What I need to do is to get bonitet value for every club offer ( made by some user who is member of a club ) but only for specific territory with id of 1100000; Since bonitet values are changing over time for each user, that means that I need to get the latest one only. So if some user have bonitet of 1 at 21.01.2012, but later at 26.05.2012 it has changed to 2, I need to get only 2, since that is the current value.
I made an SQL Fiddle with example db schema and query that I am using right now. On this small database, query is working what I want and it is fast, but on real database it is very slow, and sometimes do not execute at all.
See it here: http://sqlfiddle.com/#!9/b0d98/2
My question is: am I using wrong query to get all this data ? I am getting right result but maybe my query is bad and that is why it executes so slow ? How can I speed it up ? I have tried by putting indexes using phpmyadmin, but it didn't help very much.
Here is my query:
SELECT users_bonitet.user_id, users_bonitet.bonitet, users_bonitet.tstamp,
club_offer.id AS offerId, club_offer.rank
FROM users_bonitet
INNER JOIN (
SELECT max( tstamp ) AS lastDate, user_id
FROM users_bonitet
GROUP BY user_id
)lastDate ON users_bonitet.tstamp = lastDate.lastDate
AND users_bonitet.user_id = lastDate.user_id
JOIN users ON users_bonitet.user_id = users.id
JOIN club ON users.id = club.user_id
JOIN club_offer ON club.id = club_offer.club_id
JOIN club_territories ON club.id = club_territories.club_id
WHERE club_territories.territory_id = 1100000
So I am selecting bonitet values for all club offers made by users that are members of a club and operate on territory with an id of 1100000. Important thing is that I am selecting club_offer.id AS offerId, because I need to use that offerId in my application code so I can do some calculations based on bonitet values returned for each offer, and insert data that was calculated to the field "club_offer.rank" for each row with the id of offerId.

Your query looks fine. I suspect your query performance may be improved if you add a compound index to help the subquery that finds the latest entry from users_botinet for each user.
The subquery is:
SELECT max( tstamp ) AS lastDate, user_id
FROM users_bonitet
GROUP BY user_id
If you add (user_id, tstamp) as an index to this table, that subquery can be satisfied with a very efficient loose index scan.
ALTER TABLE users_bonitet ADD KEY maxfinder (user_id, tstamp);
Notice that if this users_botinet table had an autoincrementing id number in it, your subquery could be refactored to use that instead of tstamp. That would eliminate the possibility of duplicates and be even more efficient, because there's a unique id for joining. Like so.
FROM users_botinet
INNER JOIN (
SELECT MAX(id) AS id
FROM users_botinet
GROUP BY user_id
) ubmax ON users_botinet.id = ubmax.id
In this case your compound index would be (user_id, id.
Pro tip: Don't add lots of indexes unless you know you need them. It's a good idea to read up on how indexes can help you. For example. http://use-the-index-luke.com/

Updating column with sum(data) from other table

Ok, i'm drawing a blank here and in dire need of your help!
3 tables:
matches (id, goals_slot_1, goals_slot_2, won, draw)
teams (id, name, score_for, score_against, won, lost, draw, points)
team-match (junction table) (team_id, match_id)
So what i want to achieve, is to update the 'draw' column in the teams table SET to the 'sum(draw)' of the matches table of the according teams.
The value of 'draw' in the matches table is '1' when it's a draw, '0' when not.
I just can't figure it out anymore. Stuck on it for days...
Can someone put me on the right track?

You would need to use a correlated sub query to get the values from the other tables. Something like:
UPDATE `teams`
SET `draw`=(SELECT SUM(`draw`)
FROM `matches`
WHERE `id` IN (SELECT `match_id`
FROM `team-match`
WHERE `team_id`=`teams`.`id`))
Or even a single sub query with a join:
UPDATE `teams`
SET `draw`=(SELECT SUM(`draw`)
FROM `matches`
JOIN `team-match`
ON `team-match`.`match_id`=`matches`.`id`
WHERE `team-match`.`team_id`=`teams`.`id`)
Both should do the work. I would assume the first is better for performance, but haven't tested and really they should be within a few milliseconds of each other. Other than this, you would need to use php to query the values and update the individual rows. Really though, the won/lost/draw columns could be calculated on the fly with similar performance and you wouldn't have to update the values every match.

Count duplicates and update table with a single query

I have a table which has several thousand records.
I want to update all the records which have a duplicate firstname
How can I achieve this with a single query?
Sample table structure:
Fname varchar(100)
Lname varchar(100)
Duplicates int
This duplicate column must be updated with the total number of duplicates with a single query.
Is this possible without running in a loop?

update table as t1
inner join (
select
fname,
count(fname) as total
from table
group by fname) as t2
on t1.fname = t2.fname
set t1.duplicates = t2.total

I have a table which has several thousand records. I want to update all the records which have a duplicate firstname How can I achieve this with a single query?
Are you absolutely sure you want to store the number of the so called duplicates? If not, it's a rather simple query:
SELECT fname, COUNT(1) AS number FROM yourtable GROUP BY fname;
I don't see why you would want to store that number though. What if there's another record inserted? What if there are records deleted? The "number of duplicates" will remain the same, and therefore will become incorrect at the first mutation.

Create the column first, then write a query like:
UPDATE table SET table.duplicates = (SELECT COUNT(*) FROM table r GROUP BY Fname/Lname/some_id)
Maybe this other SO will help?
How do I UPDATE from a SELECT in SQL Server?

You might not be able to do this. You can't update the same table that you are selecting from in the same query.

Removing duplicate entries from MySQL database

I have a table with 8 columns in, but over time I have picked up numerous duplicates. I have looked at the other question with a similar topic, but it does not solve the issue I am currently having.
+---------------------------------------------------------------------------------------+
| id | market | agent | report_name | producer_code | report_date | entered_date | sync |
+---------------------------------------------------------------------------------------+
What defines a unique entry is based on the market, agent, report_name, producer_code, and report_date fields. What I am looking for is a way to list all the duplicate entries and delete them. Or to just delete the duplicate entries.
I have thought about doing it with a script, but the table contains 2.5mil entries, and the time it would take would be unfeasible.
Could anybody suggest any alternatives? I have seen people get a list of duplicates using the following query, but not sure on how to adapt it to my situation:
SELECT id, count(*) AS n
FROM table_name
GROUP BY id
HAVING n > 1

Here are two strategies you might think about. You will have to adjust the columns used to select duplicates based upon what you actually consider a duplicate. I just included all of your listed columns other than the id column.
The first simply creates a new table without duplicates. Sometimes this is actually faster and easier than trying to delete all the offending rows. Just create a new table, insert the unique rows (I used min(id) for the id of the resulting row), rename the two tables, and (once you are satisfied that everything worked correctly) drop the original table. Of course, if you have any foreign key constraints you'll have to deal with those as well.
create table table_copy like table_name;
insert into table_copy
(id, market, agent, report_name, producer_code, report_date, entered_date, sync)
select min(id), market, agent, report_name, producer_code, report_date,
entered_date, sync
from table_name
group by market, agent, report_name, producer_code, report_date,
entered_date, sync;
RENAME TABLE table_name TO table_old, table_copy TO table_name;
drop table table_old;
The second strategy, which just deletes the duplicates, uses a temporary table to hold the information about what rows have duplicates since MySQL won't allow you to select from the same table you are deleting from in a subquery. Simply create a temporary table with the columns that identify the duplicates plus an id column that will actually hold the id to keep and then you can do a multi-table delete where you join the two tables to select just the duplicates.
create temporary table dups
select min(id), market, agent, report_name, producer_code, report_date,
entered_date, sync
from table_name
group by market, agent, report_name, producer_code, report_date,
entered_date, sync
having count(*) > 1;
delete t
from table_name t, dups d
where t.id != d.id
and t.market = d.market
and t.agent = d.agent
and t.report_name = d.report_name
and t.producer_code = d.producer_code
and t.report_date = d.report_date
and t.entered_date = d.entered_date
and t.sync = d.sync;

You can find the dupes, based on your "key" fields, by doing:
select id, count(*) as row_count
from table
group by market, agent, report_name, producer_code, report_date
having (row_count > 1)
which you could then use in a delete script. Of course, you'd have to be very careful doing this, as it'll return ALL the duplicate rows, and you'd want to save at least ONE of those rows from each grouping.

Another easy way would be to
create a new table
put a UNIQUE index on the fields you need to be unique (a primary key is a special kind of unique index)
use INSERT IGNORE INTO newtable SELECT * FROM oldtable (ORDER BY if you want the last/first records to remain - should there be a difference in the other columns)
DROP the old table and RENAME the new table to the old table

You may also use Primary key on the columns the unique entries are based on, this will prevent adding new records with duplicate details.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.