mysql, how to delete duplicate data? - php

i have a table with some duplicate values and i want to remove them:
table1:
id | access | num
12 1144712030 101
13 1144712030 101
14 1154512035 102
15 1154512035 102
i would like to remove the duplicates so i will have left:
id | access | num
12 1144712030 101
14 1154512035 102
any idea how to do this in a mysql command?
thanks

The simpler solution i think would be:
CREATE TABLE new_table as SELECT id,DISTINCT access,num FROM original_table
TRUNCATE TABLE original_table
INSERT INTO original_table SELECT * FROM new_table
DROP TABLE new_table;
Note:
I think some kind of cursor could be used, and maybe a temporary table. But you should be really careful.

if your table called foo, rename in foo_old, re-create table foo as a structure identical to foo_old.
Make a query with the DISTINCT operator obtained and the results reported on Table foo_old enter them in foo.

do a quick search here for DELETE DUPLICATE ROWS
you'll find a ton of examples.

Related

SQL finding specific character in table

I have a table like this
d_id | d_name | d_desc | sid
1 |flu | .... |4,13,19
Where sid is VARCHAR. What i want to do is when enter 4 or 13 or 19, it will display flu. However my query only works when user select all those value. Here is my query
SELECT * FROM diseases where sid LIKE '%sid1++%'
From above query, I work with PHP and use for loop to put the sid value inside LIKE value. So there I just put sid++ to keep it simple. My query only works when all of the value is present. If let say user select 4 and 19 which will be '%4,19%' then it display nothing. Thanks all.
If you must do what you ask for, you can try to use FIND_IN_SET().
SELECT d_id, d_name, d_description
FROM diseases
WHERE FIND_IN_SET(13,sid)<>0
But this query will not be sargable, so it will be outrageously slow if your table contains more than a few dozen rows. And the ICD10 list of disease codes contains almost 92,000 rows. You don't want your patient to die or get well before you finish looking up her disease. :-)
So, you should create a separate table. Let's call it diseases_sid.
It will contain two columns. For your example the contents will be
d_id sid
1 4
1 13
1 19
If you want to find a row from your diseases table by sid, do this.
SELECT d.d_id, d.d_name, d.d_description
FROM diseases d
JOIN diseases_sid ds ON d.d_id = ds.d_id
WHERE ds.sid = 13
That's what my colleagues are talking about in the comments when they mention normalization.

mysql like query exclude numbers

I have a small problem with a php mysql query, I am looking for help.
I have a family tree table, where I am storing for each person his/her ancestors id separated by a comma. like so
id ancestors
10 1,3,4,5
So the person of id 10 is fathered by id 5 who is fathered by id 4 who is fathered by 3 etc...
Now I wish to select all the people who have id x in their ancestors, so the query will be something like:
select * from people where ancestors like '%x%'
Now this would work fine except, if id x is lets say 2, and a record has an ancestor id 32, this like query will retrieve 32 because 32 contains 2. And if I use '%,x,%' (include commas) the query will ignore the records whose ancestor x is on either edge(left or right) of the column. It will also ignore the records whose x is the only ancestor since no commas are present.
So in short, I need a like query that looks up an expression that either is surrounded by commas or not surrounded by anything. Or a query that gets the regular expression provided that no numbers are around. And I need it as efficient as possible (I suck at writing regular expressions)
Thank you.
Edit: Okay guys, help me come up with a better schema.
You are not storing your data in a proper way. Anyway, if you still want to use this schema you should use FIND_IN_SET instead of LIKE to avoid undesired results.
SELECT *
FROM mytable
WHERE FIND_IN_SET(2, ancestors) <> 0
You should consider redesigning your database structure. Add new table "ancestors" to database with columns:
id id_person ancestor
1 10 1
2 10 3
3 10 4
After -- use JOIN query with "WHERE IN" to choose right rows.
You're having this issue because of wrong design of database.First DBMS based db's aren't meant for this kind of data,graph based db's are more likely to fit for this kind of solution.
if it contain small amount of data you could use mysql but still the design is still wrong,if you only care about their 'father' then just add a column to person (or what ever you call it) table. if its null - has no father/unknown otherwise - contains (int) of his parent.
In case you need more then just 'father' relationship you could use a pivot table to contain two persons relationship but thats not a simple task to do.
There are a few established ways of storing hierarchical data in RDBMS. I've found this slideshow to be very helpful in the past:
Models for Hierarchical Design
Since the data deals with ancestry - and therefore you wouldn't expect it to change that often - a closure table could fit the bill.
Whatever model you choose, be sure to look around and see if someone else has already implemented it.
You could store your values as a JSON Array
id | ancestors
10 | {"1","3","4","5"}
and then query as follows:
$query = 'select * from people where ancestors like \'%"x"%\'';
Better is of course using a mapping table for your many-to-many relation
You can do this with regexp:
SELECT * FROM mytable WHERE name REGEXP ',?(x),?'
where x is your searched value
DROP TABLE IF EXISTS my_table;
CREATE TABLE my_table
(id INT NOT NULL AUTO_INCREMENT PRIMARY KEY
,ancestors VARCHAR(250) NOT NULL
);
INSERT INTO my_table VALUES(10,',1,3,4,5');
SELECT *
FROM my_table
WHERE CONCAT(ancestors,',') LIKE '%,5,%';
+----+-----------+
| id | ancestors |
+----+-----------+
| 10 | ,1,3,4,5 |
+----+-----------+
SELECT *
FROM my_table
WHERE CONCAT(ancestors,',') LIKE '%,4,%';
+----+-----------+
| id | ancestors |
+----+-----------+
| 10 | ,1,3,4,5 |
+----+-----------+

MySQL query for reassigning tags in pictures and avoiding duplicates

I have the following structure in MySQL, table 'images_tags':
id | image_id | tag_id
----------------------
1 | 243 | 52
2 | 94 | 52
3 | 56 | 52
4 | 56 | 53
5 | 56 | 54
Table 'tags':
id | tag
---------------
52 | fashion
53 | cars
54 | sports
55 | bikes
I'm building a function in my CMS to delete a tag, for that I need to reassign all the pictures containing that tag to a another tag. The problem is the picture can already have assigned the new tag and I want to avoid possible duplicated records.
I couldnt find the right way to do it straight in SQL so I tried in PHP as follows:
$result=mysql_query("select image_id from images_tags where tag_id='".$oldtag."'");
while($row=mysql_fetch_assoc($result)){
$result2=mysql_query("select id from images_tags
where image_id='".$row['image_id']."' and tag_id='".$newtag."'");
if(mysql_num_rows($result2)==0){
mysql_query("update images_tags set tag_id='".$newtag."'
where image_id='".$row['image_id']."' and tag_id='".$newtag."'");
}
}
As you can see, my code is very bad and non-efficient as I'm running queries inside iterations. Do you know a better way to do this? Preferably in just one SQL query. Thanks.
When I think of this problem, I think of it more easily as "insert the new image tags, if appropriate, then delete the old ones".
The following code takes this approach:
create unique index image_tags_unique on image_tags(image_id, tag_id);
insert into image_tags
select image_id, <newtagid>
from image_tags
where tag_id = <oldtagid>
on duplicate key ignore;
delete from image_tags
where tag_id = <oldtagid>;
The first step creates a unique index on image_tags, so duplicates are not allowed in the table.
The second inserts the new records, ignoring any errors generated by duplicates.
The third deletes the old records.
To be honest, you can also do this with the ignore keyword on update instead of insert step. However, ignore is very general, so -- in theory -- there could be another error being ignored incorrectly. The on duplicate key ignore is much more specific about what is allowed.
I think this will add new rows to images_tags that meet your criteria.
insert into images_tags (image_id, tag_id)
select image_id, tag_id
from (select i.image_id image_id, t.id tag_id
from images_tags i
join tags t
where i.tag_id = $oldtag and t.id != $oldtag) crossp
left join images_tags existing
using (image_id, tag_id)
where existing.id is null
group by image_id
The crossp subquery creates a full cross-product between all the image_ids that currently have the old tag, and all the tags other than the old tag. Then we do a left join with the existing images_tags, and use the null check to filter out all the pairs that already exist. This generates a list of image_id and tag_id pairs that do not match anything in the database. Finally we group by image_id so we just add one new row per image.
After you do this, you can delete the rows with tag_id = $oldtag.
SQLFIDDLE
The only problem with this is that it changes the IDs of the images_tags rows. There may be a way to do it all in one step with an UPDATE query, which wouldn't have that problem, but I'm not sure how to turn my query into that.

Deleting Duplicate Rows from MySql Table

I have a script to find duplicate rows in my MySql table, the table contains 40,000,000 rows. but it is very slow going, is there an easier way to find the duplicate records without going in and out of php?
This is the script i currently use
$find = mysql_query("SELECT * FROM pst_nw ID < '1000'");
while ($row = mysql_fetch_assoc($find))
{
$find_1 = mysql_query("SELECT * FROM pst_nw add1 = '$row[add1]' AND add2 = '$row[add2]' AND add3 = '$row[add3]' AND add4 = '$row[add4]'");
if (mysql_num_rows($find_1) > 0) {
mysql_query("DELETE FROM pst_nw WHERE ID ='$row[ID]'}
}
You have a number of options.
Let the DB do the work
Create a copy of your table with a unique index - and then insert the data into it from your source table:
CREATE TABLE clean LIKE pst_nw;
ALTER IGNORE TABLE clean ADD UNIQUE INDEX (add1, add2, add3, add4);
INSERT IGNORE INTO clean SELECT * FROM pst_nw;
DROP TABLE pst_nw;
RENAME TABLE clean pst_nw;
The advantage of doing things this way is you can verify that your new table is correct before dropping your source table. The disadvantage is it takes up twice as much space and is (relatively) slow to execute.
Let the DB do the work #2
You can also achieve the result you want by doing:
set session old_alter_table=1;
ALTER IGNORE TABLE pst_nw ADD UNIQUE INDEX (add1, add2, add3, add4);
The first command is required as a workaround for the ignore flag being .. ignored
The advantage here is there's no messing about with a temporary table - the disadvantage is you don't get to check that your update does exactly what you expect before you run it.
Example:
CREATE TABLE `foo` (
`id` int(10) NOT NULL AUTO_INCREMENT,
`one` int(10) DEFAULT NULL,
`two` int(10) DEFAULT NULL,
PRIMARY KEY (`id`)
)
insert into foo values (null, 1, 1);
insert into foo values (null, 1, 1);
insert into foo values (null, 1, 1);
select * from foo;
+----+------+------+
| id | one | two |
+----+------+------+
| 1 | 1 | 1 |
| 2 | 1 | 1 |
| 3 | 1 | 1 |
+----+------+------+
3 row in set (0.00 sec)
set session old_alter_table=1;
ALTER IGNORE TABLE foo ADD UNIQUE INDEX (one, two);
select * from foo;
+----+------+------+
| id | one | two |
+----+------+------+
| 1 | 1 | 1 |
+----+------+------+
1 row in set (0.00 sec)
Don't do this kind of thing outside the DB
Especially with 40 million rows doing something like this outside the db is likely to take a huge amount of time, and may not complete at all. Any solution that stays in the db will be faster, and more robust.
Usually in questions like this the problem is "I have duplicate rows, want to keep only one row, any one".
But judging from the code, what you want is: "if a set of add1, add2, add3, add4 is duplicated, DELETE ALL COPIES WITH ID < 1000". In this case, copying from the table to another with INSERT IGNORE won't do what you want - might even keep rows with lower IDs and discard subsequent ones.
I believe you need to run something like this to gather all the "bad IDs" (IDs with a duplicate, the duplicate above 1000; in this code I used "AND bad.ID < good.ID", so if you have ID 777 which duplicates to ID 888, ID 777 will still get deleted. If this is not what you want, you can modify that in "AND bad.ID < 1000 AND good.ID > 1000" or something like that).
CREATE TABLE bad_ids AS
SELECT bad.ID FROM pst_nw AS bad JOIN pst_nw AS good
ON ( bad.ID < 1000 AND bad.ID < good.ID
AND bad.add1 = good.add1
AND bad.add2 = good.add2
AND bad.add3 = good.add3
AND bad.add4 = good.add4 );
Then once you have all bad IDs into a table,
DELETE pst_nw.* FROM pst_nw JOIN bad_ids ON (pst_nw.ID = bad_ids.ID);
Performances will greatly benefit from a (non_unique, possibly only temporary) index on add1, add2, add3, add4 and ID in this order.
Get the duplicate rows using "Group by" operator. Here is a sample that you can try :
select id
from table
group by matching_field1,matching_field2....
having count(id) > 1
So, you are getting all the duplicate ids. Now delete them using a delete query.
Instead of using "IN", use "OR" operator as "IN" is slow compared to "OR".
Sure there is. Note however that with 40 million records You most probably will exceed max php execution time. Try following
Create table temp_pst_nw like pst_nw;
Insert into temp_pst_nw select * from pst_nw group by add1,add2,add3,add4;
Confirm that everything is ok first!!
Drop table pat_nw;
Rename table temp_pst_nw to pst_nw;
Try creating a new table that has the same definitions. i.e. "my_table_two", then do:
SELECT DISTINCT unique_col1, col2, col3 [...] FROM my_table INTO
my_table_two;
Maybe that'll sort it out.
Your code will be better if you don't use select *, only select columns (4 address) you want to compare. It should have limit clause in my sql. It can avoid state not response when you have too large nums rows like that.

How can I insert a row if it doesn't already exist while updating multiple rows?

I have a MySQL query that looks like this:
UPDATE `Table` SET `Column` =
CASE
WHEN `Option Id` = '1' THEN 'Apple'
WHEN `Option Id` = '2' THEN 'Banana'
WHEN `Option Id` = '3' THEN 'Q-Tip'
END
An my table currently looks like this:
Option Id | Column
1 | x
2 | x
I'd like it to result in:
Option Id | Column
1 | Apple
2 | Banana
3 | Q-Tip
But it doesn't insert the Q-Tip row. I've looked up and read a bit about INSERT ON DUPLICATE KEY UPDATE and REPLACE, but I can't find a way to get those to work with this multiple row update using CASE. Do I have to write a separate query for each row to get this to work, or is there a nice way to do this in MySQL?
Option Id is not a Key itself, but it is part of the Primary Key.
EDIT Some more info:
I'm programming in PHP, and essentially I'm storing an array for the user. Option Id is the key, and Column is the value. So for simplicities sake, my table could look like:
User Id | Option Id | Value
10 | 1 | Apple
10 | 2 | Shoe
11 | 1 | Czar
...
That user can easily update the elements in the array and add new ones, then POST the array to the server, in which case I'd like to store it in the table. My query above updates any array elements that they've edited, but it doesn't insert the new ones. I'm wondering if there is a query that can take my array from POST and insert it into the table without me having to write a loop and have a query for every array element.
This should work, if Option_Id is a primary key:
REPLACE INTO `Table` (`Option_Id`, `Column`) VALUES
(1, 'Apple'),
(2, 'Banana'),
(3, 'Q-Tip');
The statement means: Insert the given rows or replace the values, if the PK is already existing.
Of course it does not insert. As there is no such value, it cannot get updated.
I suppose you are normalizing a database by putting in the values already present and now want to add the required mapping for every valid value.
So it would be better to start from scratch and just do INSERTs.
You could always query the database for entries and then choose update or insert based on yor results
I see no point in such updating.
Why don't you have a separate table with option ids and corresponding values, leaving only option ids linked to user ids in this one?

Categories