Removing duplicate field entries in SQL

Removing duplicate field entries in SQL - php

Is there anyway I can erase all the duplicate entries from a certain table (users)? Here is a sample of the type of entries I have. I must say the table users consists of 3 fields, ID, user, and pass.
mysql_query("DELETE FROM users WHERE ???") or die(mysql_error());
randomtest
randomtest
randomtest
nextfile
baby
randomtest
dog
anothertest
randomtest
baby
nextfile
dog
anothertest
randomtest
randomtest
I want to be able to find the duplicate entries, and then delete all of the duplicates, and leave one.

You can solve it with only one query.
If your table has the following structure:
CREATE TABLE `users` (
`id` int(10) unsigned NOT NULL auto_increment,
`username` varchar(45) NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=MyISAM AUTO_INCREMENT=8 DEFAULT CHARSET=latin1;
you could do something like that (this will delete all duplicate users based on username with and ID greater than the smaller ID for that username):
DELETE users
FROM users INNER JOIN
(SELECT MIN(id) as id, username FROM users GROUP BY username) AS t
ON users.username = t.username AND users.id > t.id
It works and I've already use something similar to delete duplicates.

You can do it with three sqls:
create table tmp as select distinct name from users;
drop table users;
alter table tmp rename users;

This delete script (SQL Server syntax) should work:
DELETE FROM Users
WHERE ID NOT IN (
SELECT MIN(ID)
FROM Users
GROUP BY User
)

I assume that you have a structure like the following:
users
-----------------
| id | username |
-----------------
| 1 | joe |
| 2 | bob |
| 3 | jane |
| 4 | bob |
| 5 | bob |
| 6 | jane |
-----------------
Doing the magic with temporary is required since MySQL cannot use a sub-select in delete query that uses the delete's target table.
CREATE TEMPORARY TABLE IF NOT EXISTS users_to_delete (id INTEGER);
INSERT INTO users_to_delete (id)
SELECT MIN(u1.id) as id
FROM users u1
INNER JOIN users u2 ON u1.username = u2.username
GROUP BY u1.username;
DELETE FROM users WHERE id NOT IN (SELECT id FROM users_to_delete);
I know the query is a bit hairy but it does the work, even if the users table has more than 2 columns.

You need to be a bit careful of how the data in your table is used. If this really is a users table, there is likely other tables with FKs pointing to the ID column. In which case you need to update those tables to use ID you have selected to keep.
If it's just a standalone table (no table reference it)
CREATE TEMPORARY TABLE Tmp (ID int);
INSERT INTO Tmp SELECT ID FROM USERS GROUP BY User;
DELETE FROM Users WHERE ID NOT IN (SELECT ID FROM Tmp);
Users table linked from other tables
Create the temporary tables including a link table that holds all the old id's and the respective new ids which other tables should reference instead.
CREATE TEMPORARY TABLE Keep (ID int, User varchar(45));
CREATE TEMPORARY TABLE Remove (OldID int, NewID int);
INSERT INTO Keep SELECT ID, User FROM USERS GROUP BY User;
INSERT INTO Remove SELECT u1.ID, u2.ID FROM Users u1 INNER JOIN Keep u2 ON u2.User = u1.User WHERE u1.ID NOT IN (SELECT ID FROM Users GROUP BY User);
Go through any tables which reference your users table and update their FK column (likely called UserID) to point to the New unique ID which you have selected, like so...
UPDATE MYTABLE t INNER JOIN Remove r ON t.UserID = r.OldID
SET t.UserID = r.NewID;
Finally go back to your users table and remove the no longer referenced duplicates:
DELETE FROM Users WHERE ID NOT IN (SELECT ID FROM Keep);
Clean up those Tmp tables:
DROP TABLE KEEP;
DROP TABLE REMOVE;

A very simple solution would be to set an UNIQUE index on the table's column you wish to have unique values. Note that you subsequently cannot insert the same key twice.
Edit: My mistake, I hadn't read that last line: "I want to be able to find the duplicate entries".

I would get all the results, put them in an array of IDs and VALUES. Use a PHP function to work out the dupes, log all the IDs in an array, and use those values to delete the records.

I don't know your db schema, but the simplest solution seems to be to do SELECT DISTINCT on that table, keep the result in a variable (i.e. array), delete all records from the table and then reinsert the list returne by SELECT DISTINCT previously.

The temporary table is an excellent solution, but I'd like to provide a SELECT query that grabs duplicate rows from the table as an alternative:
SELECT * FROM `users` LEFT JOIN (
SELECT `name`, COUNT(`name`) AS `count`
FROM `users` GROUP BY `name`
) AS `grouped`
WHERE `grouped`.`name` = `users`.`name`
AND `grouped`.`count`>1

Select your 3 columns as per your table structure and apply condition as per your requirements.
SELECT user.userId,user.username user.password FROM user As user
GROUP BY user.userId, user.username
HAVING (COUNT(user.username) > 1));

Every answer above and/or below didn't work for me, therefore I decided to write my own little script. It's not the best, but it gets the job done.
Comments are included throughout, but this script is customized for my needs, and I hope the idea helps you.
I basically wrote the database contents to a temp file, called the temp file, applied the function to the called file to remove the duplicates, truncated the table, and then input the data right back into the SQL. Sounds like a lot, I know.
If you're confused as to what $setprofile is, it's a session that's created upon logging into my script (to establish a profile), and is cleared upon logging out.
<?php
// session and includes, you know the drill.
session_start();
include_once('connect/config.php');
// create a temp file with session id and current date
$datefile = date("m-j-Y");
$file = "temp/$setprofile-$datefile.txt";
$f = fopen($file, 'w'); // Open in write mode
// call the user and pass via SQL and write them to $file
$sql = mysql_query("SELECT * FROM _$setprofile ORDER BY user DESC");
while($row = mysql_fetch_array($sql))
{
$user = $row['user'];
$pass = $row['pass'];
$accounts = "$user:$pass "; // the white space right here is important, it defines the separator for the dupe check function
fwrite($f, $accounts);
}
fclose($f);
// **** Dupe Function **** //
// removes duplicate substrings between the seperator
function uniqueStrs($seperator, $str) {
// convert string to an array using ' ' as the seperator
$str_arr = explode($seperator, $str);
// remove duplicate array values
$result = array_unique($str_arr);
// convert array back to string, using ' ' to glue it back
$unique_str = implode(' ', $result);
// return the unique string
return $unique_str;
}
// **** END Dupe Function **** //
// call the list we made earlier, so we can use the function above to remove dupes
$str = file_get_contents($file);
// seperator
$seperator = ' ';
// use the function to save a unique string
$new_str = uniqueStrs($seperator, $str);
// empty the table
mysql_query("TRUNCATE TABLE _$setprofile") or die(mysql_error());
// prep for SQL by replacing test:test with ('test','test'), etc.
// this isn't a sufficient way of converting, as i said, it works for me.
$patterns = array("/([^\s:]+):([^\s:]+)/", "/\s++\(/");
$replacements = array("('$1', '$2')", ", (");
// insert the values into your table, and presto! no more dupes.
$sql = 'INSERT INTO `_'.$setprofile.'` (`user`, `pass`) VALUES ' . preg_replace($patterns, $replacements, $new_str) . ';';
$product = mysql_query($sql) or die(mysql_error()); // put $new_str here so it will replace new list with SQL formatting
// if all goes well.... OR wrong? :)
if($product){ echo "Completed!";
} else {
echo "Failed!";
}
unlink($file); // delete the temp file/list we made earlier
?>

This will work:
create table tmp like users;
insert into tmp select distinct name from users;
drop table users;
alter table tmp rename users;

If you have a Unique ID / Primary key on the table then:
DELETE FROM MyTable AS T1
WHERE MyID <
(
SELECT MAX(MyID)
FROM MyTable AS T2
WHERE T2.Col1 = T1.Col1
AND T2.Col2 = T1.Col2
... repeat for all columns to consider duplicates ...
)
if you don't have a Unique Key select all distinct values into a temporary table, delete all original rows, and copy back from temporary table - but this will be problematic if you have Foreign Keys referring to this table

Related

Add dynamic column to existing MySQL table?

just wondering how to insert dynamic column to existing MySQL table? For example: I already have "sampletable" and I want to make input fields that can add dynamic column to the existing table, example: column1, column2, column3. How to do that with dynamic numbering?

I would agree with #Barmar that your SQL table structure is wrong if you are trying to do this. What you are trying to do in this case is what's called a "one to many" relationship. This is usually achieved by doing something like the following.
Table 1: Contains columns for all the usual data (non-"dynamic" columns in your terms), and a unique ID column which all good database tables should have
Table 2: An ID column, and column that refers to the ID column on table one and a column for the data that goes in the dynamic column.
Now you can store your values that you would normally store in "dynamic columns" in individual rows on the second table.
Example
// sample:
//
// | id | name |
//
// dynamic_values:
//
// | id | sample_id | value |
// Selecting data
SELECT * FROM sample WHERE id = 1;
SELECT * FROM dynamic_values WHERE sample_id = 1;
// Querying on "dynamic columns"
SELECT * FROM sample s LEFT JOIN dynamic_values d ON d.sample_id = s.id WHERE d.value = 'something';

Try This set of code for Dynamic Column Creation for Existing Table.
SET SQL_SAFE_UPDATES = 0;
Drop TEMPORARY table if exists Temp_Report;
CREATE TEMPORARY TABLE Temp_Report (Report_Date Date);
Drop TEMPORARY table if exists Temp_Product_Tax;
CREATE TEMPORARY TABLE Temp_Product_Tax as SELECT concat(REPLACE(Tax_category,' ','_'),'|',Taxvalue) as 'Tax_category',Taxvalue FROM tax_category c left join taxmaster t on c.id=t.catid ; -- where c.is_Product =1
select * from Temp_Product_Tax;
set Count_1=(SELECT COUNT(*) FROM Temp_Product_Tax);
set Var_1=0;
While(Count_1>Var_1) do
set #Col_Name=Concat( Var_1+1,'_',REPLACE((select Tax_category from Temp_Product_Tax limit Var_1,1),'.','_'),' Double(15,2)');
set #Col_Name=Concat('ALTER TABLE Temp_Report ADD COLUMN ', #Col_Name) ;
PREPARE stmt FROM #Col_Name;
EXECUTE stmt;
set Var_1=Var_1+1;
END While;
select * from Temp_Report;
SET SQL_SAFE_UPDATES = 1;

In fact, what you intend to do, ie, adding dynamic columns is not at all a good practice I think. Anyway
You can do that using ALTER TABLE
for($i=1;$i<4;$i++){
mysqli_query("ALTER TABLE mytable ADD COLUMN `input.$i` VARCHAR(40)",$db_con);
}
But I would suggest the same way, which is BARMER mentioned in the above comments.

Accessing Row from Mysql DB by chaining query of foreign key from another table

I'm having difficulty trying to find the best way to get my results from a table. I want to get the targeted row from a table by one using the primary key from another using a foreign key.
The tables are would be set similar to this(minus a lot of other attributes for space):
user Table:
user_Id(pk)
name
type
venue_Id(unique/indexed)
venue Table:
venue_Id(fk)
rating
Logic flow is: user_Id is provided by a session variable. Query DB table 'user' to find that user. Go to type of user to identify if user is person or venue. Assuming user is venue, go to DB table 'venue' and query table for rating using foreign key from unique/indexed venue_Id from user table.
The query looks like
SELECT rating FROM `venue` WHERE `user_Id` = '$user_Id' AND `type` = 'venue'
Is this possible, and if so, what is the correct way to go about it?

You have a few ways to retrieve this information.
Using JOIN:
SELECT v.rating
FROM venue v INNER JOIN user u
ON v.venue_id= u.venue_id
AND u.`user_Id` = '$user_Id' AND u.`type` = 'venue'
Using an IN sub-query
SELECT rating
FROM venue
WHERE venue_id IN (SELECT venue_id FROM user
WHERE `user_Id` = '$user_Id' AND `type` = 'venue')
BTW, you should consider protect your code from potential SQL Injections

Its a bit unclear you explained that way.
From what I get, there is 2 table User and Venue.
In User table u have: user_id, venue_id, name, type.
While in Venue table u have: venue_id, rating.
You are expecting to get rating (Venue Table) while you use the WHERE clause in user_id and type which both stored on User Table.
Your Query:
SELECT rating FROM venue WHERE user_Id = '$user_Id' AND type = 'venue'
It is impossible to get it done like above because you are selecting from venue table while user_id and type is not from venue table. So it will make it unidentified even you have chaining the FK. Because FK will only to show and make some constraint to parent child table.
The query should be something like this:
SELECT rating FROM venue v JOIN user u on v.venue_id = u.venue_id WHERE u.user_Id = '$user_Id' AND u.type = 'venue'
Correct me if I am wrong..

Combining rows from two tables based on the tables having columns with equal values is called an equi-join operation, it's the pattern we typically use to "follow" foreign key relationships.
As an example:
$sql = "SELECT v.rating
FROM `venue` v
JOIN `user` s
ON s.venue_Id = v.venue_Id
AND s.type` = 'venue'
WHERE s.user_Id` = '" . mysqli_real_escape_string($con, $user_Id) ."'"
This isn't the only pattern, there are several other query forms that will return an equivalent result.
As an example of using an EXISTS predicate:
$sql = "SELECT v.rating
FROM `venue` v
WHERE EXISTS
( SELECT 1
FROM `user` s
WHERE s.venue_Id = v.venue_Id
AND s.type` = 'venue'
AND s.user_Id` = '"
. mysqli_real_escape_string($con, $user_Id)
."'"
)";
The original query appears to be vulnerable to SQL Injection; the example queries demonstrate the use of the mysqli_real_escape_string function to "escape" unsafe values and make them safe to include in SQL text. (That function would only be appropriate if you are using the mysqli interface. Using prepared statements with bind placeholders is another approach.

Mysql update one table column based on another table Large amount of data

I am stuck to update one column of table by comparing with another table in php/Mysql. I have tried to speed up the process by indexing the table columns, optimizing the query etc but unable to speed up the process.
In my php based application there is two table (table A and table B) , I want to update one column of table A by comparing with table B (with two column - name & sku).
Previously above process has taken max 15 mints to update 28k products. But now both table (table A and table B) have 60k rows. Now it's taking more than two hours. I have used below query
mysql_query("UPDATE tableA a
JOIN tableB b ON a.product_code_sku = b.sku
SET a.is_existing_product = '1'") or die(mysql_error());
mysql_query("UPDATE tableA a
JOIN tableB b ON a.product_name = b.product_name
SET a.is_existing_product = '1'") or die(mysql_error());
Above query was very slow after that I have changed the updating process like below
$query_result = mysql_query("SELECT t1.`id`,t2.`product_id` FROM `tableA` t1,
`tableB` t2 where (t1.product_code_sku = t2.sku
or t1.product_name = t2.product_name)") or die (mysql_error());
while($result_row = mysql_fetch_array($query_result))
{
mysql_query("UPDATE `tableA` SET is_existing_product = '1'
where id = '".$result_row['id']."' ") or die (mysql_error());
}
But all of my efforts are in vain.
Please advice me how to make the process faster.

Your first update query and the second update query is doing two different thing. The second query is slower because you are using a OR for comparison.
You can consider to create a temporary table to compare and insert, the update back to tableA.
First and all, you should examine the execution for the two join queries, like
desc select a.id
from tableA a
join tableB b ON a.product_code_sku = b.sku;
If this is the reason why the update is slow, you should optimize the query.
Otherwise, you can try the below:
For instance (assuming ID the primary key),
// make sure the columns are in the same data type
create table tmp_sku (
id .. // just the primary key, make sure is using the same data type as in tableA
);
// do a insert into this temporary table
insert into tmp_sku select a.id
from tableA a
join tableB b ON a.product_code_sku = b.sku;
// now we have list of matches,
// then do a insert .. duplicate key update
// by comparing the primary id
insert into tableA (id, is_existing_product)
select tmp_sku.id, 1 from tmp_sku
on duplicate key set is_existing_product = 1;
// repeat for the product name
truncate tmp_sku;
insert into tmp_sku
select a.id
from tableA a
join tableB b ON a.product_name = b.product_name;
// repeat the duplicate .. update
insert into tableA (id, is_existing_product)
select tmp_sku.id, 1 from tmp_sku
on duplicate key set is_existing_product = 1;

A complicated mysql join

Ok, I have this first table which has, among other things:
table 1: id | depID (every id has one depID)
Then, I have a second table where I have table 2: userID | depID (where an userID is associated with multiple depIDs in separate rows. Also, I have table 3 with userID | rankID (where an userID is associated with one rankID).
I need to get all id and depID from table 1, and then to check, which userIDs of table 2 shares the same depID (table1.depID = table2.depID), and then, to check which of those userIDs from table 2 has rankID = $rID
Thanks guys.

I think this SQL should get you what you want, but I'm not 100% clear from the wording of the question:
SELECT table2.userID
FROM table1
JOIN table2
ON table1.depID = table2.depID
JOIN table3
ON table2.userID = table3.userID
AND table3.rankID = $rID;

SQL command to copy selected content from selected rows into other rows?

Any idea how to copy: name, content from rows where language_id = 1 to rows where language_id = 2?
How should SQL command look like?
I want to achive:

http://dev.mysql.com/doc/refman/5.0/en/insert-select.html is what you need to do

assuming it is the productid that you want to update from lang1 to lang 2
update a set
a.name = b.name,
a.content = b.content
from tablea a
join tablea b on a.productid = b.productid
where a.language_id = 2
and b.language_id = 1
ofcourse this will do it for every row in the table so if you want to restrict it then make sure to restrict it by the productids

Did you mean copying all language_id=1 rows to language_id=2 ones?
My knowledge of MySQL syntax is very poor, so I dare not give you all the codez, but at least you may find the following approach useful:
Create a temp table with the structure like this:
product_id int,
name (varchar?)
content (varchar?)
That is, include product_id and all the columns you need to copy.
Populate the temp table with the language_id=1 data. Probably like this:
INSERT INTO temp_table
SELECT product_id, name, content
FROM orig_table
WHERE language_id = 1
Update those rows in the original table where language_id=2 with the corresponding data in the temp table. It may look like this:
UPDATE orig_table
SET
name = temp_table.name,
content = temp_table.content
FROM temp_table
WHERE orig_table.product_id = temp_table.product_id
AND orig_table.language_id = 2
Insert the rows from the temp table into the original table, where the products don't have language_id=2. Something like this:
INSERT INTO orig_table (product_id, language_id, name, content)
SELECT product_id, 2, name, content
FROM temp_table
WHERE NOT EXISTS (
SELECT 1 FROM orig_table
WHERE product_id = temp_table.product.id
AND language_id = 2
)
If you didn't mean to change the already existing language_id=2 data, then step #3 should be omitted and you might further want to modify step #2 in such a way that it selected language_id=1 data only for the products lacking language_id=2.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Removing duplicate field entries in SQL - php

You can do it with three sqls: create table tmp as select distinct name from users; drop table users; alter table tmp rename users;

This delete script (SQL Server syntax) should work: DELETE FROM Users WHERE ID NOT IN ( SELECT MIN(ID) FROM Users GROUP BY User )

A very simple solution would be to set an UNIQUE index on the table's column you wish to have unique values. Note that you subsequently cannot insert the same key twice. Edit: My mistake, I hadn't read that last line: "I want to be able to find the duplicate entries".

I would get all the results, put them in an array of IDs and VALUES. Use a PHP function to work out the dupes, log all the IDs in an array, and use those values to delete the records.

I don't know your db schema, but the simplest solution seems to be to do SELECT DISTINCT on that table, keep the result in a variable (i.e. array), delete all records from the table and then reinsert the list returne by SELECT DISTINCT previously.

Select your 3 columns as per your table structure and apply condition as per your requirements. SELECT user.userId,user.username user.password FROM user As user GROUP BY user.userId, user.username HAVING (COUNT(user.username) > 1));

This will work: create table tmp like users; insert into tmp select distinct name from users; drop table users; alter table tmp rename users;

Related

Add dynamic column to existing MySQL table?

Accessing Row from Mysql DB by chaining query of foreign key from another table

Mysql update one table column based on another table Large amount of data

A complicated mysql join

SQL command to copy selected content from selected rows into other rows?

Categories

Resources