MySQL: fix fully duplicate row

MySQL: fix fully duplicate row - php

I have table tags and I forget to set id column as primary key when created it.
Now I am facing a duplicate key problem.
tags table:
id text
1 man
2 ball
2 ball
2 ball
3 love
3 love
4 heart
4 heart
How to remove duplication and keep and set id as the primary key ?
Expected result: ( the new required tags table)
id text
1 man
2 ball
3 love
4 heart

I think the easiest way is to create a temporary table with the data and then reload the data:
create temporary table tags_temp as
select distinct id, text
from tags;
truncate table tags;
alter table tags add primary key (id);
insert into tags(id, text)
select id, temp
from tags_temp;

What I would do, is create a new table, add the key insert the data from the old table, then drop tags and rename temp
/* Make a copy of the database table (including indexes) */
create table tags_tmp like tags;
/* Add the primary key to the table */
alter table tags_tmp add primary key (id);
/* Insert the data from the bad table and ignore any duplicates */
insert ignore into tags_tmp (id, text)
select id, text from tags;
/* Drop the bad table */
drop table tags;
/* Rename the temporary table to the original name */
rename table tags_tmp to tags;

First, I created your table and inserted data in:
mysql> select * from tags;
+----+-------+
| id | text |
+----+-------+
| 1 | man |
| 2 | ball |
| 2 | ball |
| 2 | ball |
| 3 | love |
| 3 | love |
| 4 | heart |
| 4 | heart |
+----+-------+
8 rows in set (0.00 sec)
I backup the distinct entries only:
mysql> create table T as select distinct * from tags;
Query OK, 4 rows affected (0.27 sec)
Records: 4 Duplicates: 0 Warnings: 0
I no longer need the original table, so I drop it from the database:
mysql> drop table tags;
Query OK, 0 rows affected (0.12 sec)
I rename the previous backup table:
mysql> rename table T to tags;
Query OK, 0 rows affected (0.08 sec)
Now it is time to add the PRIMARY KEY constraint to our table:
mysql> alter table tags add primary key(id);
Query OK, 0 rows affected (0.48 sec)
Records: 0 Duplicates: 0 Warnings: 0
Now, let us test if what we did is correct. First, let us display the data:
mysql> select * from tags;
+----+-------+
| id | text |
+----+-------+
| 1 | man |
| 2 | ball |
| 3 | love |
| 4 | heart |
+----+-------+
4 rows in set (0.00 sec)
Let's try to add a row with id = 4:
mysql> insert into tags values(4,'proof');
ERROR 1062 (23000): Duplicate entry '4' for key 'PRIMARY'
Conclusion: what we did is correct.

Related

load data infile order by line

I :)
I need to insert 13500 lines and +500 columns from csv.
So, I use load data infile and it's work.
But, I need exactly the same order in my MySQL database and my Csv.
Actually, for example, the 1000 line of the csv can be at the 800 place in my base
I need something like "Order by column1" but I don't find the clue.
Thank for your help
Ps : I have 2primary keys (ref of products) and the are not in the mathematical order (like 1, 8, 4, etc.)
EDIT : My code
$dataload = 'LOAD DATA LOCAL INFILE "'.__FILE__.'../../../../bo/csv/'.$nomfichier.'"
REPLACE
INTO TABLE gc_csv CHARACTER SET "latin1"
FIELDS TERMINATED BY "\t"
IGNORE 1 LINES
';
I just take the csv and use data local inline with him... And the order is'nt perfectly respected, I don't know why...
My design Table
CREATE TABLE `csv` (
`example` int(20) unsigned NOT NULL,
`example` int(15) unsigned NOT NULL,
`example` varchar(10) default NULL,
[...]
`example` varchar(4) default NULL,
PRIMARY KEY (`RefCatSYS`,`IdProduit`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8;

Add an auto_increment column to your table, with DEFAULT NULL. When you load data with LOAD DATA INFILE, there will be no value for the column, and it will get assigned an automatically generated id. Select data ordered by the column.
kostja#annie:~$ sudo cat /var/lib/mysql/test/foo.csv
10
9
8
7
6
5
4
3
2
1
mysql> create table tmp (example int primary key, id int unique auto_increment default null);
Query OK, 0 rows affected (0.11 sec)
mysql> load data infile "foo.csv" into table tmp;
Query OK, 10 rows affected, 10 warnings (0.03 sec)
Records: 10 Deleted: 0 Skipped: 0 Warnings: 10
mysql> select * from tmp;
+---------+----+
| example | id |
+---------+----+
| 10 | 1 |
| 9 | 2 |
| 8 | 3 |
| 7 | 4 |
| 6 | 5 |
| 5 | 6 |
| 4 | 7 |
| 3 | 8 |
| 2 | 9 |
| 1 | 10 |
+---------+----+
10 rows in set (0.00 sec)

The tables in relational databases are unordered collections of records. You can get the rows in a particular order when you run a query by using SORT BY. If the query does not contain a SORT BY clause, the server returns the rows in the order they are on the storage medium. This order sometimes changes when records are updated. You must never rely on it and always use SORT BY (and indexes) to get a certain order of the rows in the result set.
LOAD DATA INFILE reads the lines from the CSV file and inserts them in the same order they are in the file. Apart from setting the value of an auto-incremented column (if there is one in the table), the order of the lines in the CSV file does not matter.

I have solve the problem. In fact, the WS send to me a csv with a multiple couple of the two primary key. At the line 2365 and 9798, the RefCatSYS and IdProduit as the same. So the load data infile REPLACE the line 2365 by the 9798 and that change the order.
I ask them to send a 3th and UNIQUE primary key
Thank for your help, and sorry for the disruption.

Improving a query UPDATE using large mysql databases

I'm trying to attempt updating my quite robust database (nearly 3 million rows) with following query:
$length = strlen($this);
$query = "UPDATE database
SET row_to_update='1'
WHERE row='{$this}'
AND row_length='{$length}'
LIMIT 1";
It gets words ($this) from a file (quite a lot of them) and then searches for a match. If found, it updates row_to_update with value 1 (set none as default).
Every row_length contains already value of length of certain cell, which I thought might speed up process significantly. Sadly it didn't.
It manages only ~30k queries in 8h. That's slow, to say the least!
Is there any way, I could improve this bit of inefficient code?

Try to collect a bunch of values you're looking for and use
UPDATE table SET row_to_update='1' WHERE row IN ({$my_values});
You can use EXPLAIN <your_query> and EXPLAIN EXTENDED .. to check if it uses indexes or not and adjust the query or create indexes to speed it up. Play with SELECT with the same WHERE conditions that way.
Much more you can get using:
SET profiling = 1;
<your query goes here>
SHOW PROFILES;
SHOW PROFILE FOR QUERY 1;
Be carefull with it if it's not on dev. env.
Consider as well to fill temp table with the values you're interested in and use it that way:
UPDATE table SET row_to_update='1' WHERE row in (SELECT values FROM my_temp_table);
when you get there than you can improve it to:
UPDATE table INNER JOIN temp_table ON table.row = temp_table.row SET row_to_update = '1';
EXAMPLES:
As you asked for examples. Lat say example table represents your original one with lot of data in it. In this example I'll use just 4 rows:
mysql> select * from example;
+----+------+
| id | data |
+----+------+
| 1 | a |
| 2 | b |
| 3 | c |
| 4 | d |
+----+------+
4 rows in set (0.00 sec)
Let say that you're looking for ids of rows that has data= 'a', 'b', or 'c'
You can do this in 3 ways:
1) SELECT ... IN (list)
mysql> select id from example where data in ('a', 'b', 'c');
+----+
| id |
+----+
| 1 |
| 2 |
| 3 |
+----+
3 rows in set (0.00 sec)
2) SELECT ... IN (SELECT ... FROM temp_table)
mysql> select * from temp_table;
+----+------+
| id | data |
+----+------+
| 10 | foo~ |
| 11 | a |
| 12 | bar |
| 13 | baz |
| 14 | b |
| 15 | c |
+----+------+
6 rows in set (0.00 sec)
mysql> select id from example where data in (SELECT data from temp_table);
[..]
3 rows in set (0.00 sec)
3) SELECT ... INNER JOIN temp_table ...
mysql> select example.id from example inner join temp_table on example.data = temp_table.data;
[..]
3 rows in set (0.01 sec)
And when you'll be ready use UPDATE with the same conditions to apply changes you like.

updating row with duplicates value

If I have a table with 2 columns , what If I am going to update a column in that table that creates duplicate rows, This table has unique constraint as well, is there any way that if unique row get created while I am updating I can process that row?

Usually adding an SQL inline IF statement with some criteria and optional processing and a self-join to detect duplication will do what you're looking for. The true answer will be specific to your structure, but I will give an example for a table called user with a column called id which is the primary key and SSN which has a unique constraint on it. We'll populate it with 2 users and update one of them to duplicate the first one in the unique ssn column"
CREATE TABLE `test`.`user` (
`id` INT NOT NULL,
`SSN` VARCHAR(45) NULL,
PRIMARY KEY (`id`),
UNIQUE INDEX `ssn_UNIQUE` (`SSN` ASC));
INSERT INTO user VALUES (1, "1234567"), (2, "0123456");
As you have noticed, if I run the following update when another user (where id=1) already has SSN="1234567", then we will have made no updates.
UPDATE user SET SSN="1234567" WHERE id=2;
ERROR 1062 (23000): Duplicate entry '1234567' for key 'ssn_UNIQUE'
However, consider the following instead:
UPDATE user u
LEFT JOIN user AS u2
ON u2.SSN="1234567"
SET u.SSN=IF(
u2.id IS NOT NULL,
CONCAT(u2.SSN, "duplicates", u2.id, "onto", u.id),
"1234567")
WHERE u.id=2;
Query OK, 1 row affected (0.00 sec)
Rows matched: 1 Changed: 1 Warnings: 0
In the above example, the following scenarios could play out:
If user id=1 already has SSN="1234567", and I run the above update, the result will be:
SELECT * FROM test.user;
+----+--------------------------+
| id | SSN |
+----+--------------------------+
| 2 | 1234567duplicates1onto2 |
| 1 | 1234567 |
+----+--------------------------+
2 rows in set (0.00 sec)
If I try to set instead to "01234567" instead, and I run the same above update, the result will be:
SELECT * FROM test.user;
+----+----------+
| id | SSN |
+----+----------+
| 2 | 01234567 |
| 1 | 1234567 |
+----+----------+
2 rows in set (0.00 sec)
If I had a 3rd user, that user might possibly have the value "1234567duplicates2" if two other users had attempts at setting the value to "1234567" similarly:
SELECT * FROM test.user;
+----+-------------------------+
| id | SSN |
+----+-------------------------+
| 1 | 1234567 |
| 2 | 1234567duplicates1onto2 |
| 3 | 1234567duplicates1onto3 |
+----+-------------------------+
3 rows in set (0.00 sec)
As you can see, the "onto" part allows me to have many duplicates in the same update batch.
To adapt this technique, just change the output of the inline IF to be the formula you would use for processing, and the criteria for the JOIN should be anything to provide duplication detection.
http://dev.mysql.com/doc/refman/5.1/en/control-flow-functions.html

delete duplicate rows that have blob text / mediumtext mysql

I have seen lots of posts on deleting rows using sql commands but i need to filter out rows which have mediumtext.
I keep getting an error Error Code: 1170. BLOB/TEXT column used in key specification without a key length from solution such as:
ALTER IGNORE TABLE foobar ADD UNIQUE (title, SID)
My table is simple, i need to check for duplicates in mytext, id is unique and they are AUTO_INCREMENT.
As a note, the table has about a million rows, and all attempts keep timing out. I would need a solution that performs actions in batches such as WHERE id>0 AND id<100
Also I am using MySQL Workbench on amazons RDS
From a table like this
+---+-----+-----+------+-------+
|id |fname|lname|mytext|morevar|
|---|-----|-----|------|-------|
| 1 | joe | min | abc | 123 |
| 2 | joe | min | abc | 123 |
| 3 | mar | kam | def | 789 |
| 4 | kel | smi | ghi | 456 |
+------------------------------+
I would like to end up with a table like this
+---+-----+-----+------+-------+
|id |fname|lname|mytext|morevar|
|---|-----|-----|------|-------|
| 1 | joe | min | abc | 123 |
| 3 | mar | kam | def | 789 |
| 4 | kel | smi | ghi | 456 |
+------------------------------+
update forgot to mention this is on amazon RDS using mysql workbench
my table is very large and i keep getting an error Error Code: 1205. Lock wait timeout exceeded from this sql command:
DELETE n1 FROM names n1, names n2 WHERE n1.id > n2.id AND n1.name = n2.name
Also, if anyone else is having issues with MySQL workbench timing out the fix is
Go to Preferences -> SQL Editor and set to a bigger value this parameter:
DBMS connection read time out (in seconds)

OPTION #1: Delete all duplicates records leaving one of each (e.g. the one with max(id))
DELETE
FROM yourTable
WHERE id NOT IN
(
SELECT MAX(id)
FROM yourTable
GROUP BY mytext
)
You could prefer using min(id).
Depending on the engine used, this won't work and, as it did, give you the Error Code: 1093. You can't specify target table 'yourTable' for update in FROM clause. Why? Because deleting one record may cause something to happen which made the WHERE condition FALSE, i.e. max(id) changes the value.
In this case, you could try using another subquery as a temporary table:
DELETE
FROM yourTable
WHERE id NOT IN
(
SELECT MAXID FROM
(
SELECT MAX(id) as MAXID
FROM yourTable
GROUP BY mytext
) as temp_table
)
OPTION #2: Use a temporary table like in this example or:
First, create a temp table with the max ids:
SELECT MAX(id) AS MAXID
INTO tmpTable
FROM yourTable
GROUP BY mytext;
Then execute the delete:
DELETE
FROM yourTable
WHERE id NOT IN
(
SELECT MAXID FROM tmpTable
);

How about this it will delete all the duplicate records from the table
DELETE t1 FROM foobar t1 , foobar t2 WHERE t1 .mytext= t2.mytext

PHP - Doctrine ORM not able to handle bit(1) types correctly?

UPDATE I have filed a bug in Doctrine about this http://www.doctrine-project.org/jira/browse/DC-400
I have the following Doctrine schema:
---
TestTable:
columns:
bitty: bit(1)
I have created the database and table for this. I then have the following PHP code:
$obj1 = new TestTable();
$obj1['bitty'] = b'0';
$obj1->save();
$obj2 = new TestTable();
$obj2['bitty'] = 0;
$obj2->save();
Clearly my attempt is to save the bit value 0 in the bitty column.
However after running this PHP code I get the following odd results:
mysql> select * from test_table;
+----+-------+
| id | bitty |
+----+-------+
| 1 | |
| 2 | |
+----+-------+
2 rows in set (0.00 sec)
mysql> select * from test_table where bitty = 1;
+----+-------+
| id | bitty |
+----+-------+
| 1 | |
| 2 | |
+---+-------+
2 rows in set (0.00 sec)
mysql> select * from test_table where bitty = 0;
Empty set (0.00 sec)
Those boxes are the 0x01 character, i.e. Doctrine has set the value to 1, not 0.
However I can insert 0's into that table direct from MySQL:
mysql> insert into test_table values (4, b'0');
Query OK, 1 row affected (0.00 sec)
mysql> select * from test_table where bitty = 0;
+----+-------+
| id | bitty |
+----+-------+
| 4 | |
+----+-------+
1 row in set (0.00 sec)
What's going on? Is this a bug in Doctrine?

There is nothing in the doctrine documentation that says Bit is a legal type.

Doctrine does know the bit type - at least if you're using MySQL and generate Doctrine models from the existing tables.
I tried to read a few bit columns and dump the resulting objects. Basically the bit value returned is either \0 or \1, instead of 0 and 1 as I expected.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

MySQL: fix fully duplicate row - php

I think the easiest way is to create a temporary table with the data and then reload the data: create temporary table tags_temp as select distinct id, text from tags; truncate table tags; alter table tags add primary key (id); insert into tags(id, text) select id, temp from tags_temp;

Related

load data infile order by line

Improving a query UPDATE using large mysql databases

updating row with duplicates value

delete duplicate rows that have blob text / mediumtext mysql

PHP - Doctrine ORM not able to handle bit(1) types correctly?

Categories

Resources