PHP / MySQL delete DUPLICATE rows and PREVENT duplicates in the future - php

I know this has been asked before, but I'm not a coder, and cannot figure it out from other similar posts. I've spent over 5 hours trying to figure this out with great unsuccess :( So I ask for your help.
1) Prevent Duplicates
I have a PHP script that writes to DB. Here is the code:
$sql = "INSERT INTO results (total, size, persq, strip, material, region)
VALUES ('$total', '$size', '$persq', '$strip', '$material', '$region')";
I want to prevent duplicate rows based on TOTAL and SIZE columns. So if a new entry matches value in TOTAL and SIZE, do not enter new row.
2) Delete Duplicates
I want to delete ALL existing douplicate rows from DB, also based on TOTAL and SIZE columns.
If row contains duplicates in both TOTAL and SIZE, delete entire row.
How do I do this?
PS - I've read that I can use SQL IGNORE command to prevent futue duplicates - example (i've tryed to structure it to work for my situation:
INSERT IGNORE INTO results ...;
would something like this work? If so please help me structure it (i'm new to PHP and MySQL).
Big thanks in advance.

I think the easiest way to remove the duplicates is to use a CTAS (Create Table As Select) statement to create a temporary table for your data. Using group by, you can remove the duplicates. MySQL is 'smart' enough to just pick any value for the other fields from one of the rows that match the group.
/* De-duplicate and copy all the data to a temporary table. */
CREATE TABLE Temp AS
SELECT * FROM results
GROUP BY total, size;
/* Delete all data from your current table. Truncate is faster but more dangerous. */
DELETE FROM results; /* TRUNCATE results; */
/* Insert the de-duplicated data back into your table. */
INSERT INTO results
SELECT * FROM Temp;
/* Drop the temporary table. */
DROP TABLE Temp;
After that you can add a unique constraint for total,size to prevent new duplicates.
ALTER TABLE results
ADD UNIQUE results_uni_total_size (total, size);

If you have duplicate rows where EVERY column has a duplicate value, the easiest thing to do is crate a new table and import all of the rows using a group by on every column. First create a new table with each column set as a unique key:
CREATE TABLE newresults total INT NOT NULL, size ...
UNIQUE KEY (total, size, presq, strip, material, region)
Then push clean values into the new table:
INSERT INTO newresults (total, size, persq, strip, material, region) SELECT total, size, persq, strip, material, region FROM RESULTS GROUP BY total, size, persq, strip, material, region
That will give you a clean data set. The final thing you'll have to do is drop the old table and rename newresults to results:
DROP TABLE results;
RENAME TABLE mydatabase.newresults TO mydatabase.results
Hope that helps...

Related

Transfer data from memory table to production one by criteria

I wander around can I transfer the data gathered in memory table to actual one by sql query only.
The two tables have the same structure and pk is products_id(int, AI)
The problem is the criteria is completely different than the pk. The products are identified by 2 columns - barcode and company.
So ignoring the pk in whole, I need to update the data if in actual table there is a row with the same barcode and company, and insert new record if there is none.
Tried this:
INSERT INTO products (products_sku, ...)
SELECT products_sku... FROM temp_products
WHERE (temp_products.products_barcode = products.products_barcode) AND (temp_products.products_comp = products.products_comp)
But i dont have access in the select to products table so to make the filtering
I think you need to add a unique key on products_barcode and products_comp:
ALTER TABLE products ADD UNIQUE KEY (products_barcode, products_comp);
Once you have it you can perform insert-or-update in one statement:
INSERT INTO products (/* all columns except the id */)
SELECT /* all columns except the id */
FROM products_sku
ON DUPLICATE KEY UPDATE some_field = VALUES(some_field), ...
/* list all columns except the id / barcode / comp */;
So when it meets a duplicate barcode/comp pair it will fall into the ON DUPLICATE KEY UPDATE and won't insert. Read more how it works: https://dev.mysql.com/doc/refman/8.0/en/insert-on-duplicate.html

Mysql Insert .... select, obtain last insert ID

I have this query in php. It's an insert select copying from table2, but I need to get the IDs of the newly created rows and store them into an array. Here is my code:
$sql = "INSERT INTO table1 SELECT distinct * from table2";
$db->query($sql);
I could revert the flow starting with a select on table2 and making all single inserts but it would slow down the script on a big table. Ideas?
You could lock the table, insert the rows, and get the ID of the last item inserted, and then unlock; that way you know that the IDs will be contiguous as no other concurrent user could have changed them. Locking and unlocking is something you want to use with caution though.
An alternative approach could be to use one of the columns in the table - either an 'updated' datetime column, or an insert-id column (for which you put in a value that will be the same across all of your rows.)
That way you can do a subsequent SELECT of the IDs back out of the database matching either the updated time or your chosen insert ID.

Controlled table quering based on existing rows

I have a product_group table with the following fields: group_id, product_id, order. The table will be queried against a lot: a single-form view will make it possible to insert new records and/or update existing ones with one submit.
I'm trying to figure out optimal solution to cover the following 3 cases:
User tries to insert an existing row: do nothing. Here a unique index of the 3 columns can be useful.
User changes only the order column: perform an update.
User inserts a completely new set of values: perform an insert.
Is there a way to put all of this together in one MySQL query? If not, what would be the best approach here? The goal is to limit database queries as much as possible.
Does this do what you want?
insert into product_group(group_id, product_id, `order`)
values (#group_id, #product_id, #order)
on duplicate key update `order` = values(`order`);
Along with a unique index on group_id, product_id:
create unique index idx_product_group_2 on product_group(group_id, product_id)
This handles your three cases:
Because the value assignment is a no-op if the values are the same.
The order column will be updated if the other two have the same value.
A new row that has a different group_id or product_id will be inserted.
As a note, order is a lousy name for a column, because it is a SQL key word.

how to check whether value exists or not in mysql database column while inserting

i have a contactnumber column in mysql database. In contactnumber column there are more than 20,000 entries. Now when i upload new numbers through .csv file, i dont want duplicate numbers in database.
How can i avoid duplicate numbers while inserting in database.
I initially implemented logic that checks each number in .csv file with each of the number in database.
this works but takes lot of time to upload .csv file containing 1000 numbers.
Pleae suggest how to minimize time required to upload .csv file while not uploading duplicate values.
Simply add a UNIQUE constraint to the contactnumber column:
ALTER TABLE `mytable` ADD UNIQUE (`contactnumber`);
From there you can use the IGNORE option to ignore the error you'd usually be shown when inserting a duplicate:
INSERT IGNORE INTO `mytable` VALUES ('0123456789');
Alternatively, you could use the ON DUPLICATE KEY UPDATE to do something with the dupe, as detailed in this question: MySQL - ignore insert error: duplicate entry
If your contactnumber should not be repeated then make it PRIMARY or at least a UNIQUE key. That way when a value is being inserted as a duplicate, insert will fail automatically and you won't have to check beforehand.
The way I would do it is to create a temporary table.
create table my_dateasyyyymmddhhiiss as select * from mytable where 1=0;
Do your inserts into that table.
and then query out the orphans on the between mytable and the temp table based on contactnumber
then run an inner join query between the two tables and fetch out the duplicate for your telecaller tracking.
finally drop the temporary table.
Thing that this does not address are duplicates within the supplied file (don't know if that would be an issue in this problem)
Hope this help
If you don't want to insert duplicate values in table and rather wants to keep that value in different table.
You can create trigger on table.
like this:
DELIMITER $$
CREATE TRIGGER unique_key BEFORE INSERT ON table1
FOR EACH ROW BEGIN
DECLARE c INT;
SELECT COUNT(*) INTO c FROM table1 WHERE itemid = NEW.itemid;
IF (c > 0) THEN
insert into table2 (column_name) values (NEW.itemid);
END IF;
END$$
DELIMITER ;
I would recommend this way
Alter the contactnumber column as UNIQUE KEY
Using phpmyadmin import the .csv file and check the option 'Do not abort on INSERT error' under Format-Specific Options before submitting

group by mysql option

I am writing a converter to transfer data from old systems to new systems. I am using php+mysql.
I have one table that contains millions records with duplicate entries. I want to transfer that data in a new table and remove all entries. I am using following queries and pseudo code to perform this task
select *
from table1
insert into table2
ON DUPLICATE KEY UPDATE customer_information = concat('$firstName',',','$lastName')
It takes ages to process one table :(
I am pondering that is it possible to use group by and get all grouped record automatically?
Other than going through each record and checking duplicate etc.?
For example
select *
from table1
group by firstName, lastName
insert into table 2 only one record and add all users'
first last name into column ALL_NAMES with comma
EDIT
There are different records for each customers with different information. Each row is called duplicated if first and last name of user is same. In new table, we will just add one customer and their bought product in different columns (we have only 4 products).
I don't know what you are trying to do with customer_information, but if you just want to transfer the non-duplicated set of data from one table to another, this will work:
INSERT IGNORE INTO table2(field1, field2, ... fieldx)
SELECT DISTINCT field1, field2, ... fieldx
FROM table1;
DISTINCT will take care of rows that are exact duplicates. But if you have rows that are only partial duplicates (like the same last and first names but a different email) then IGNORE can help. If you put a unique index on table2(lastname,firstname) then IGNORE will make sure that only the first record with lastnameX, firstnameY from table1 is inserted. Of course, you might not like which record of a pair of partial duplicates is chosen.
ETA
Now that you've updated your question, it appears that you want to put the values of multiple rows into one field. This is, generally speaking, a bad idea because when you denormalize your data this way you make it much less accessible. Also, if you are grouping by (lastname, firstname), there will not be names in allnames. Because of this, my example uses allemails instead. In any event, if you really need to do this, here's how:
INSERT INTO table2(lastname, firstname, allemails)
SELECT lastname, firstname, GROUP_CONCAT(email) as allemails
FROM table1
GROUP BY lastname, firstname;
If they are really duplicate rows (every field is the the same) then you can use:
select DISTINCT * from table1
instead of :
select * from table1

Categories