How to Remove duplicate rows having minimal set of information?

How to Remove duplicate rows having minimal set of information? - php

I have a situation,
My MySQL table (company) contains duplicate records,i.e.,it has repeated companies, some records have values in most columns and some don't have. So I want to remove the duplicate companies having minimal set of information. Guys any ideas?
Id Company_name column column2 column3 column4
-------------------------------------------------
1 A xyz
2 B pqr abc tcv aaa
3 A bnm xyz ccc
4 A bnm xyz
5 B aaa
I need to get my table as follows
Id Company_name column column2 column3 column4
-------------------------------------------------
2 B pqr abc tcv aaa
3 A bnm xyz ccc

You can have a php method to do this work, and manually you will retrieve all the record grouped by the column by what you want to reduce the repetitive rows. In above case you are considering for the Company_name column. But there is possibility that it may have some different value on other columns but not in the Company_name column. This may create ambiguity in understanding that how it will the algorithm will treat such type of row.
But it will be good practice that before inserting the values, the information must be checked so no repetition occurs. But in the case when you already have such records,following query may help.
DELETE FROM TABLENAME WHERE (Company_name, column)
NOT IN
(
SELECT Company_name, column FROM
(
SELECT MIN(Id) AS Id, column FROM TABLENAME GROUP BY Company_name
)
X
);
This is for deleting the duplicate values for one column, you can make with combination of multiple query to reduce the duplicate values.

It's possible to get a "score" of each row and base the decision on that. Here is a quick example that shows where to start.
SELECT id,
name,
length(concat_ws('', col1, col2, col3, col4)) AS score
FROM company
ORDER BY score DESC;
See it on sqlfiddle

Related

update column based on two columns

I have a weird problem.
I have a rather large database with two tables. I need to change a column's contents from a name to an ID that already exists in another table.
Example:
I have a table that contains a column "Name"
the name column has the persons "lastname, firstname" as shown
Name | othercolumn
Smith, John |
I would like to change the contents of the name column to the staffID associated with the persons name.
The staff table is
staffID | firstName | lastName
1 john smith
My end result should be
Name | othercolumn
1 |
I've tried all sorts of joins and concats, but can't seem to get it down with my limited mysql knowledge. Is there a way to do this without having to do it manually? The comma seems to give me alot of grief. Thanks!

You need to be very careful about this. First, I assume that StaffId is a number. So, add a column to the table:
alter table t add StaffId int;
Then, update this column:
update t join
staff s
on t.name = concat_ws(',', s.lastname, s.firstname)
set t.StaffId = s.StaffId;
Note that after you have done this, you may still have StaffId values that are NULL:
select t.*
from t
where t.StaffId is null;
These are the names that are not in the staff table. They require more work. When you are done, you can drop the name column.

MYSQL Duplicate entries

i am trying to find duplicate entries within my mysql table. I would like to compare the different fields with each other. Here is the structure of my table:
ID FirstName LastName Street ZIP City IpAddress
1 Jack Smith 2nd 12345 Sample1 12.21.24.212
2 Paul Miller 3rd 45685 Sample2 78.54.85.654
3 Jenny Smith 3rd 77273 Sample3 84.91.67.311
4 Frank Jackson 1st 27819 Sample1 78.54.85.654
5 Jack Smith 3rd 72891 Sample2 94.79.99.465
Now i would like to compare the street and ip column individually and then i would like to find the combination of the first- and lastname. There are actually a few more columns in my table that i would like to search for but i think my example above should give you an idea about what i am planning.
I need the id numbers of the entries that could potencially duplicates.
In the example above the output should be the id numbers 1 and 5 when i compare the combination of the first- and lastname.
The output should be the id numbers 2,3 and 5 if i compare the street names.
And the output for the ip addresses should be id numbers 2 and 4.
Does anyone have some ideas about how i should do this? What is the best way to compare those different tables? I don't mind if i have to do several queries.

Use GROUP_CONCAT() to get all the IDs within a group, and GROUP BY to specify the columns that you're looking for duplidates of. And you can use COUNT(*) so you only return the ones that have duplicates.
For streets:
SELECT street, GROUP_CONCAT(id)
FROM yourTable
GROUP BY street
HAVING COUNT(*) > 1
For names:
SELECT firstname, lastname, GROUP_CONCAT(id)
FROM yourTable
GROUP BY firstname, lastname
HAVING COUNT(*) > 1

how to get user name based on id separated by comma in single table

user_id user_name user_friend_list
1 dharmendra 2,3,4,5,6,7
2 jitendra 1,3,4,5,6,7
3 xyz 1,2,6,7
4 pqr 1,3,4
now i want to extract user_id & user name based on user_friend_list id 6 i.e
it will return 1,2,3 user_id & user name dharmendra jitendra and xyz.
i simply use splite function of php but it is so complicated please provide me well shortcut method
thanks & regards

You can use FIND_IN_SET()
select user_id, user_name
from your_table
where find_in_set(6, user_friend_list) > 0
But it would actually be better to change your table design. Never store multiple values in one column!

first thing the table is not normalized that's why the problem exist
you must break this table like this or better :)
table 1
user_id
user_name
table 2
user_id
friend_id
table 2 will have some redundant data though which can be removed by adding a third table as a mapping between table1 and table2
now you can have the following query to get the result
select user_id,user_name from table1 as a join table2 as b on a.user_id=b.user_id where b.friend_id=6;

"SELECT user_id,username from your table WHERE user_id IN (2,3,4,5,6,7)";

PHP/MySQL code to process associated entries

I have table:
Name GroupID etc...
ABC
ABC
DEF
DEF
DEF
KKK
LLL
III
III
I'd like a PHP/MYSQL mix to process into this:
Name GroupID etc...
ABC 1
ABC 1
DEF 2
DEF 2
DEF 2
KKK 0
LLL 0
III 3
III 3
ie. If entries with duplicated Name exist for the row (exactly string match), it will assign a GroupID (increment automatically) to all of the entries with that Name. If the entry is unique name, it will assign a 0 to the GroupID
My table has 250,000 entries, what is the fastest way to achieve this? Working code would be nice but high level algorithm is good enough to get me going.
Thanks!

This could be done with a quick PHP script, but I like the idea of letting the database handle it by itself.
You could probably do this with a clever UPDATE join, but because I can't test it I'll use a temporary table instead. The idea is to select all values for Name having counts > 1 and assign a row number to them into a temporary table. Then use an update join to modify the GroupID in the original table.
SET #rownum=0;
CREATE TEMPORARY TABLE groupnums (groupid INT, Name VARCHAR(16), numgroups INT)
SELECT
#rownum := #rownum + 1 AS groupid,
Name,
COUNT(*) AS numgroups
FROM original_table
GROUP BY Name
HAVING COUNT(*) > 1
UPDATE
original_table
JOIN groupnums ON original_table.Name = groupnums.Name
SET original_table.GroupID = groupnums.groupid
Then set the remaining ones to 0
UPDATE original_table SET GroupID='0' WHERE GroupID IS NULL
And get rid of the temporary table.
DROP TABLE groupnums;
Update:
After testing this quickly for myself, I find that although it works you won't get directly incremental values for groupid. The #rownum is incremented for each row rather than each group so you'll end up with groups like the following with gaps in between.
/* Sample results - groups work, but have gaps between GroupID */
Name GroupID etc...
ABC 1
ABC 1
DEF 3
DEF 3
DEF 3
KKK 0
LLL 0
III 6
III 6
Update 2 I overcomplicated this a bit.
On deeper thought, the #rownum isn't needed at all. Just use an auto-increment id in the temporary table. This should produce incremental GroupID without the gaps in between. Use the same UPDATE statement to join against this as above.
CREATE TEMPORARY TABLE groupnums (groupid INT NOT NULL AUTO_INCREMENT, Name VARCHAR(16), numgroups INT)
SELECT
NULL AS groupid
Name,
COUNT(*) AS numgroups
FROM original_table
GROUP BY Name
HAVING COUNT(*) > 1

mysql, how to delete duplicate data?

i have a table with some duplicate values and i want to remove them:
table1:
id | access | num
12 1144712030 101
13 1144712030 101
14 1154512035 102
15 1154512035 102
i would like to remove the duplicates so i will have left:
id | access | num
12 1144712030 101
14 1154512035 102
any idea how to do this in a mysql command?
thanks

The simpler solution i think would be:
CREATE TABLE new_table as SELECT id,DISTINCT access,num FROM original_table
TRUNCATE TABLE original_table
INSERT INTO original_table SELECT * FROM new_table
DROP TABLE new_table;
Note:
I think some kind of cursor could be used, and maybe a temporary table. But you should be really careful.

if your table called foo, rename in foo_old, re-create table foo as a structure identical to foo_old.
Make a query with the DISTINCT operator obtained and the results reported on Table foo_old enter them in foo.

do a quick search here for DELETE DUPLICATE ROWS
you'll find a ton of examples.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.