MySQL Remove/Combine Similar Rows and it's references

MySQL Remove/Combine Similar Rows and it's references - php

I have 2 tables: Tags and Post_Tags_relationship
Tag table has 3 columns - ID(primary), Title and URL
Post_Tags_relationship table has 2 columns - Tag_ID AND Post_ID (primary is the combination of both)
There are a lot of similar tags title and url in the Tags table, I want to delete all replicated records and also modify the Post_Tags_relationship to update the deleted tag id with the existing one, and if this update will return duplicate id error then remove it.
So if Tag table has:
ID= 20, Title = News Section, URL = news-section
ID= 68, Title = News Section, URL = news-section
Post_Tags_relationship has:
Post_ID = 56, Tag_ID = 20
Post_ID = 80, Tag_ID = 20
Post_ID = 500, Tag_ID = 68
Post_ID = 584, Tag_ID = 20
Post_ID = 695, Tag_ID = 20
Post_ID = 695, Tag_ID = 68```
If we delete ID 20 from Tags table, the Post_Tags_relationship will look like:
Post_ID = 56, Tag_ID = 68
Post_ID = 80, Tag_ID = 68
Post_ID = 500, Tag_ID = 68
Post_ID = 584, Tag_ID = 68
Post_ID = 695, Tag_ID = 68 // deplicate Primary key I want this to be removed please.
Post_ID = 695, Tag_ID = 68 // ```
I hope this makes sense, please let me know if you will have any questions.

Find tag duplicates and store them in a "temporary" table:
drop table if exists tmp_tags_duplicates;
create table tmp_tags_duplicates
select t1.id, min(t0.id) as duplicate_of
from tags t1
join tags t0 using(title, url)
where t1.id > t0.id
group by t1.id;
Find already inserted duplicates in posts_tags table (which need to be deleted). Store them in another "temporary" table:
drop table if exists tmp_to_delete;
create table tmp_to_delete
select pt1.*, d.duplicate_of
from posts_tags pt1
join tmp_tags_duplicates d on d.id = pt1.tag_id
join posts_tags pt0
on pt0.post_id = pt1.post_id
and pt0.tag_id = d.duplicate_of;
Find entries in posts_tags which need to be updated. Store them in a third "temporary" table:
drop table if exists tmp_to_update;
create table tmp_to_update
select pt1.*, d.duplicate_of
from posts_tags pt1
join tmp_tags_duplicates d on d.id = pt1.tag_id
left join posts_tags pt0
on pt0.post_id = pt1.post_id
and pt0.tag_id = d.duplicate_of
where pt0.tag_id is null;
Delete duplicates in posts_tags:
delete pt
from posts_tags pt
join tmp_to_delete t using(post_id, tag_id);
Update tag_id in posts_tags:
update posts_tags pt
join tmp_to_update t using(post_id, tag_id)
set pt.tag_id = t.duplicate_of;
Delete duplicates in tagstable
delete t
from tags t
join tmp_tags_duplicates using(id);
Delete the "temporary" tables.
drop table tmp_tags_duplicates;
drop table tmp_to_delete;
drop table tmp_to_update;
Demo: http://rextester.com/FUWZG89399
Now define proper UNIQUE and FOREIGN keys, so you won't need to fix it ever again.

I will give you an outline how I would approach this problem, I will assume your table isn't large and the queries aren't expensive:
Select all distinct titles from Tag Table, this can be done using the DISTINCT keyword. This will give the titles without replication:
SELECT DISTINCT Title FROM Tag
Loop the resulting titles and make a new query to the tags table for each title to get all duplicated fields for this specific title. You will end up getting rows with the same title.
Loop the rows with the same title replacing each one of them with the ID you want to keep and in the same time replacing this ID in the Post_Tags_relationship. All done with UPDATE statement
To avoid problems like this in the future use foreign keys. https://www.w3schools.com/sql/sql_foreignkey.asp
Update:
To avoid the error that will occur because of the duplicate primary key you can create an array and in the loop add to it every post id and if the post id already exists, delete this record from the table. Something like this:
$post_ids = array()
//...
// Duplicate fields loop
if ( in_array( $pid, $post_ids ) ) {
// The post has the tag already
// Delete this record from table
// ..
} else {
$post_ids[] = $pid
// Update fields
// ..
}

Related

How to query for many to many relationship between products and filters in MySQL?

I have three tables viz. tb_filters, tb_products, and tb_products_to_filters. The structure of these tables along with some dummy data is given by:
tb_filters:
CREATE TABLE IF NOT EXISTS `tb_filters`
(
`filter_id` INT (11) AUTO_INCREMENT PRIMARY KEY,
`filter_name` VARCHAR (255)
);
INSERT INTO `tb_filters`
(`filter_name`)
VALUES ('USB'),
('High Speed'),
('Wireless'),
('Ethernet');
tb_products:
CREATE TABLE IF NOT EXISTS `tb_products`
(
`product_id` INT (11) AUTO_INCREMENT PRIMARY KEY,
`product_name` VARCHAR (255)
);
INSERT INTO `tb_products`
(`product_name`)
VALUES ('Ohm precision shunt resistor'),
('Orchestrator Libraries'),
('5cm scanner connection'),
('Channel isolated digital'),
('Network Interface Module');
tb_products_to_filters:
CREATE TABLE IF NOT EXISTS `tb_products_to_filters`
(
`id` INT (11) AUTO_INCREMENT PRIMARY KEY,
`product_id` INT (11),
`filter_id` INT (11)
);
INSERT INTO `tb_products_to_filters`
(`product_id`, `filter_id`)
VALUES (1, 1),
(2, 2),
(3, 3),
(4, 3),
(1, 3);
By looking into above "tb_products_to_filters" table, my required queries are:
When filter id = 1 and 3 are selected via checkbox on the page, all those products which belong to filter id 1 as well as filter id 3 must be fetched from the database. In this case, the product with id 1 should come.
Second, when only one filter (say id = 3) is checked, then all those products which fall under this id should be fetched. In this condition, the products id 1, 3 and 4 will come.
If filter id 2 is selected, then only one product with id = 2 will come.
If combination of filter (2 and 3) is selected, then no product will come because there is no product which belongs to both of them.
What is the way of writing queries to obtain above goal?
Please note that I want to include columns: product_id, product_name, filter_id and filter_name to display data in table result set.
EDIT:
The output should match below when filter ids 1 and 3 were checked:
EDIT 2:
I'm trying below query to fetch results when filter 1 and 3 were checked:
SELECT `p`.`product_id`, `p`.`product_name`,
GROUP_CONCAT(DISTINCT `f`.`filter_id` ORDER BY `f`.`filter_id` SEPARATOR ', ') AS filter_id, GROUP_CONCAT(DISTINCT `f`.`filter_name` ORDER BY `f`.`filter_name` SEPARATOR ', ') AS filter_name
FROM `tb_products` AS `p` INNER JOIN `tb_products_to_filters` AS `ptf`
ON `p`.`product_id` = `ptf`.`product_id` INNER JOIN `tb_filters` AS `f`
ON `ptf`.`filter_id` = `f`.`filter_id` GROUP BY `p`.`product_id`
HAVING GROUP_CONCAT(DISTINCT `ptf`.`filter_id` SEPARATOR ', ') = ('1,3')
ORDER BY `p`.`product_id`
But unfortunately, it returns an empty set. Why?

You can use the HAVING clause with GROUP_CONCAT :
SELECT t.product_id,tp.product_name,
GROUP_CONCAT(t.filter_id) as filter_id,
GROUP_CONCAT(tb.filter_name) as filter_name
FROM tb_products_to_filters t
INNER JOIN tb_filters tb ON(t.filter_id = tb.filter_id)
INNER JOIN tb_products tp ON(t.product_id = tp.product_id)
WHERE t.filter_id IN(1,3)
GROUP BY t.product_id
HAVING COUNT(distinct t.filter_id) = 2
You can adjust this any way you want. Note that the number of arguments placed inside the IN() should be the same as the COUNT(..) = X
EDIT:
A DISTINCT keyword is required in GROUP_CONCAT while fetching those columns otherwise all the filters would come in the list. I tried it by doing
SELECT t.product_id,tp.product_name,
GROUP_CONCAT(DISTINCT t.filter_id ORDER BY `t`.`filter_id` SEPARATOR ', ') as filter_id,
GROUP_CONCAT(DISTINCT tb.filter_name ORDER BY tb.filter_name SEPARATOR ', ') as filter_name
FROM tb_products_to_filters t
INNER JOIN tb_filters tb ON(t.filter_id = tb.filter_id)
INNER JOIN tb_products tp ON(t.product_id = tp.product_id)
WHERE t.filter_id IN(1,3)
GROUP BY t.product_id
HAVING COUNT(distinct t.filter_id) = 2
But still all the filter names (Ethernet, High Speed, USB, Wireless) are coming in the list. How to list only those filter names whose corresponding filter id (1, 3) are in the string?

SQL Query: do I need two queries or can I use a nested subquery

This question may have been asked before but I don't really know what verbiage to search with.
I have a mysql DB that has a table with 3 columns [ID, fieldName and fieldValue] that is used to describe attributes of objects in another table. The ID field stores the foreign key of object in the other table and the fieldName and fieldValue store things like title, description, file size and summary.
I am trying to write a query that returns rows where a fieldName and fieldValue pair match known values and the returned row ID has a another distinct fieldValue in another row. Right now I am accomplishing it with two queries and an if statement. Here is the sudo code:
$result = SELECT * FROM table_a WHERE fieldName = 'title' and fieldValue = 'someTitle'
$test = SELECT * FROM table_a WHERE fieldValue = 'someValue' and id = '{$result['id']}'
if ($test) {
/* Result Found */
}

You can self-join the table:
SELECT * FROM table_a AS s1
JOIN table_a AS s2 USING (id)
WHERE
s1.fieldName = 'Title' AND s1.fieldValue = 'someTitle'
AND s2.fieldValue = 'someValue'

What you said translated in sql would be:
SELECT b.*
FROM table_a a
INNER JOIN table_a b ON a.id = b.id
WHERE a.fieldName = 'title'
AND a.fieldValue = 'someTitle'
AND a.fieldValue <> b.fieldValue
This gets you the rows in table_a that have the same id as the row with you predefined values, but with a different fieldValue. This assumes that id is not the primary key, otherwise there will not be another row with the same id, but it looks in your question that this isn't the case. (If you want to check for a specific value you can do: AND b.fieldValue = 'someValue' in the last line)

MYSQL working slowly as query with subquery than 2 queries (+php)

I have table (about 80'000 rows), looks like
id, parentId, col1, col2, col3...
1, null, 'A', 'B', 'C'
2, 1, ...
3, 1, ...
4, null, ...
5, 4, ...
(one level parent - child only)
and I need get all dependent rows -
SELECT ...
FROM table
WHERE id = :id OR parentId = :id OR id IN (
SELECT parentId
FROM table
WHERE id = :id
)
but why this request working slowly instead 2 request - if I get parentId on php first?
$t = executeQuery('SELECT parentId FROM table WHERE id = :Id;', $id);
if ($t) {
$id = $t;
}
$t = executeQuery('SELECT * FROM table WHERE id = :id OR parentId = :id ORDER BY id;', $id);
PS: max depends rows < 70
PPS:
id select_type table type possible_keys key key_len ref rows Extra
1 PRIMARY product ALL PRIMARY,parentId NULL NULL NULL 73415 Using where
2 DEPENDENT SUBQUERY product const PRIMARY,parentId PRIMARY 4 const 1

Change the IN for an equal =
SELECT ...
FROM table
WHERE id = :id OR parentId = :id OR id = (
SELECT parentId
FROM table
WHERE id = :id
)
or change it to a join:
SELECT ...
FROM table
inner join (
SELECT parentId
FROM table
WHERE id = :id
) s on s.parentID = table.id or s.parentID = table.parentID

Well, in the first case, MySQL need to create an intermediate result, store it in memory and then iterate over it to find all the relevant id in the table. In the second way, assuming you correctly created an index on id and parent id, it simply go straigth to the index, find the relevant rows, and send you back the result immediately.

UNION works faster for this case
this allows first query to user UNION INDEX and second just uses inner join, then merges results.
SELECT *
FROM `table`
WHERE id = :id OR parentId = :id
UNION
SELECT t1.*
FROM `table` t1 JOIN `table` t2 ON t2.parentId = t1.id AND t2.id = :id

An EXPLAIN might shed some more light on the problem for you.
Look into EXISTS, or rewriting your query as a JOIN.

It's a long shot but in first case you have "IN" statement of the WHERE part of the query. Maybe MySQL tries to optimize the query as if there would be multiple options and in the second case there is no IN part, so the compiled query is more straight forward for the database - thus utilizing the indexes in better manner.
Basically for 2 queries on the same connection the overhead of performing the queries should be minimal and irelevant in this case. Also subqueries in general are not very optimizable by the query parser. Try using JOIN instead (if possible).

SQL command to copy selected content from selected rows into other rows?

Any idea how to copy: name, content from rows where language_id = 1 to rows where language_id = 2?
How should SQL command look like?
I want to achive:

http://dev.mysql.com/doc/refman/5.0/en/insert-select.html is what you need to do

assuming it is the productid that you want to update from lang1 to lang 2
update a set
a.name = b.name,
a.content = b.content
from tablea a
join tablea b on a.productid = b.productid
where a.language_id = 2
and b.language_id = 1
ofcourse this will do it for every row in the table so if you want to restrict it then make sure to restrict it by the productids

Did you mean copying all language_id=1 rows to language_id=2 ones?
My knowledge of MySQL syntax is very poor, so I dare not give you all the codez, but at least you may find the following approach useful:
Create a temp table with the structure like this:
product_id int,
name (varchar?)
content (varchar?)
That is, include product_id and all the columns you need to copy.
Populate the temp table with the language_id=1 data. Probably like this:
INSERT INTO temp_table
SELECT product_id, name, content
FROM orig_table
WHERE language_id = 1
Update those rows in the original table where language_id=2 with the corresponding data in the temp table. It may look like this:
UPDATE orig_table
SET
name = temp_table.name,
content = temp_table.content
FROM temp_table
WHERE orig_table.product_id = temp_table.product_id
AND orig_table.language_id = 2
Insert the rows from the temp table into the original table, where the products don't have language_id=2. Something like this:
INSERT INTO orig_table (product_id, language_id, name, content)
SELECT product_id, 2, name, content
FROM temp_table
WHERE NOT EXISTS (
SELECT 1 FROM orig_table
WHERE product_id = temp_table.product.id
AND language_id = 2
)
If you didn't mean to change the already existing language_id=2 data, then step #3 should be omitted and you might further want to modify step #2 in such a way that it selected language_id=1 data only for the products lacking language_id=2.

Removing duplicate field entries in SQL

Is there anyway I can erase all the duplicate entries from a certain table (users)? Here is a sample of the type of entries I have. I must say the table users consists of 3 fields, ID, user, and pass.
mysql_query("DELETE FROM users WHERE ???") or die(mysql_error());
randomtest
randomtest
randomtest
nextfile
baby
randomtest
dog
anothertest
randomtest
baby
nextfile
dog
anothertest
randomtest
randomtest
I want to be able to find the duplicate entries, and then delete all of the duplicates, and leave one.

You can solve it with only one query.
If your table has the following structure:
CREATE TABLE `users` (
`id` int(10) unsigned NOT NULL auto_increment,
`username` varchar(45) NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=MyISAM AUTO_INCREMENT=8 DEFAULT CHARSET=latin1;
you could do something like that (this will delete all duplicate users based on username with and ID greater than the smaller ID for that username):
DELETE users
FROM users INNER JOIN
(SELECT MIN(id) as id, username FROM users GROUP BY username) AS t
ON users.username = t.username AND users.id > t.id
It works and I've already use something similar to delete duplicates.

You can do it with three sqls:
create table tmp as select distinct name from users;
drop table users;
alter table tmp rename users;

This delete script (SQL Server syntax) should work:
DELETE FROM Users
WHERE ID NOT IN (
SELECT MIN(ID)
FROM Users
GROUP BY User
)

I assume that you have a structure like the following:
users
-----------------
| id | username |
-----------------
| 1 | joe |
| 2 | bob |
| 3 | jane |
| 4 | bob |
| 5 | bob |
| 6 | jane |
-----------------
Doing the magic with temporary is required since MySQL cannot use a sub-select in delete query that uses the delete's target table.
CREATE TEMPORARY TABLE IF NOT EXISTS users_to_delete (id INTEGER);
INSERT INTO users_to_delete (id)
SELECT MIN(u1.id) as id
FROM users u1
INNER JOIN users u2 ON u1.username = u2.username
GROUP BY u1.username;
DELETE FROM users WHERE id NOT IN (SELECT id FROM users_to_delete);
I know the query is a bit hairy but it does the work, even if the users table has more than 2 columns.

You need to be a bit careful of how the data in your table is used. If this really is a users table, there is likely other tables with FKs pointing to the ID column. In which case you need to update those tables to use ID you have selected to keep.
If it's just a standalone table (no table reference it)
CREATE TEMPORARY TABLE Tmp (ID int);
INSERT INTO Tmp SELECT ID FROM USERS GROUP BY User;
DELETE FROM Users WHERE ID NOT IN (SELECT ID FROM Tmp);
Users table linked from other tables
Create the temporary tables including a link table that holds all the old id's and the respective new ids which other tables should reference instead.
CREATE TEMPORARY TABLE Keep (ID int, User varchar(45));
CREATE TEMPORARY TABLE Remove (OldID int, NewID int);
INSERT INTO Keep SELECT ID, User FROM USERS GROUP BY User;
INSERT INTO Remove SELECT u1.ID, u2.ID FROM Users u1 INNER JOIN Keep u2 ON u2.User = u1.User WHERE u1.ID NOT IN (SELECT ID FROM Users GROUP BY User);
Go through any tables which reference your users table and update their FK column (likely called UserID) to point to the New unique ID which you have selected, like so...
UPDATE MYTABLE t INNER JOIN Remove r ON t.UserID = r.OldID
SET t.UserID = r.NewID;
Finally go back to your users table and remove the no longer referenced duplicates:
DELETE FROM Users WHERE ID NOT IN (SELECT ID FROM Keep);
Clean up those Tmp tables:
DROP TABLE KEEP;
DROP TABLE REMOVE;

A very simple solution would be to set an UNIQUE index on the table's column you wish to have unique values. Note that you subsequently cannot insert the same key twice.
Edit: My mistake, I hadn't read that last line: "I want to be able to find the duplicate entries".

I would get all the results, put them in an array of IDs and VALUES. Use a PHP function to work out the dupes, log all the IDs in an array, and use those values to delete the records.

I don't know your db schema, but the simplest solution seems to be to do SELECT DISTINCT on that table, keep the result in a variable (i.e. array), delete all records from the table and then reinsert the list returne by SELECT DISTINCT previously.

The temporary table is an excellent solution, but I'd like to provide a SELECT query that grabs duplicate rows from the table as an alternative:
SELECT * FROM `users` LEFT JOIN (
SELECT `name`, COUNT(`name`) AS `count`
FROM `users` GROUP BY `name`
) AS `grouped`
WHERE `grouped`.`name` = `users`.`name`
AND `grouped`.`count`>1

Select your 3 columns as per your table structure and apply condition as per your requirements.
SELECT user.userId,user.username user.password FROM user As user
GROUP BY user.userId, user.username
HAVING (COUNT(user.username) > 1));

Every answer above and/or below didn't work for me, therefore I decided to write my own little script. It's not the best, but it gets the job done.
Comments are included throughout, but this script is customized for my needs, and I hope the idea helps you.
I basically wrote the database contents to a temp file, called the temp file, applied the function to the called file to remove the duplicates, truncated the table, and then input the data right back into the SQL. Sounds like a lot, I know.
If you're confused as to what $setprofile is, it's a session that's created upon logging into my script (to establish a profile), and is cleared upon logging out.
<?php
// session and includes, you know the drill.
session_start();
include_once('connect/config.php');
// create a temp file with session id and current date
$datefile = date("m-j-Y");
$file = "temp/$setprofile-$datefile.txt";
$f = fopen($file, 'w'); // Open in write mode
// call the user and pass via SQL and write them to $file
$sql = mysql_query("SELECT * FROM _$setprofile ORDER BY user DESC");
while($row = mysql_fetch_array($sql))
{
$user = $row['user'];
$pass = $row['pass'];
$accounts = "$user:$pass "; // the white space right here is important, it defines the separator for the dupe check function
fwrite($f, $accounts);
}
fclose($f);
// **** Dupe Function **** //
// removes duplicate substrings between the seperator
function uniqueStrs($seperator, $str) {
// convert string to an array using ' ' as the seperator
$str_arr = explode($seperator, $str);
// remove duplicate array values
$result = array_unique($str_arr);
// convert array back to string, using ' ' to glue it back
$unique_str = implode(' ', $result);
// return the unique string
return $unique_str;
}
// **** END Dupe Function **** //
// call the list we made earlier, so we can use the function above to remove dupes
$str = file_get_contents($file);
// seperator
$seperator = ' ';
// use the function to save a unique string
$new_str = uniqueStrs($seperator, $str);
// empty the table
mysql_query("TRUNCATE TABLE _$setprofile") or die(mysql_error());
// prep for SQL by replacing test:test with ('test','test'), etc.
// this isn't a sufficient way of converting, as i said, it works for me.
$patterns = array("/([^\s:]+):([^\s:]+)/", "/\s++\(/");
$replacements = array("('$1', '$2')", ", (");
// insert the values into your table, and presto! no more dupes.
$sql = 'INSERT INTO `_'.$setprofile.'` (`user`, `pass`) VALUES ' . preg_replace($patterns, $replacements, $new_str) . ';';
$product = mysql_query($sql) or die(mysql_error()); // put $new_str here so it will replace new list with SQL formatting
// if all goes well.... OR wrong? :)
if($product){ echo "Completed!";
} else {
echo "Failed!";
}
unlink($file); // delete the temp file/list we made earlier
?>

This will work:
create table tmp like users;
insert into tmp select distinct name from users;
drop table users;
alter table tmp rename users;

If you have a Unique ID / Primary key on the table then:
DELETE FROM MyTable AS T1
WHERE MyID <
(
SELECT MAX(MyID)
FROM MyTable AS T2
WHERE T2.Col1 = T1.Col1
AND T2.Col2 = T1.Col2
... repeat for all columns to consider duplicates ...
)
if you don't have a Unique Key select all distinct values into a temporary table, delete all original rows, and copy back from temporary table - but this will be problematic if you have Foreign Keys referring to this table

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

MySQL Remove/Combine Similar Rows and it's references - php

Related

How to query for many to many relationship between products and filters in MySQL?

SQL Query: do I need two queries or can I use a nested subquery

MYSQL working slowly as query with subquery than 2 queries (+php)

SQL command to copy selected content from selected rows into other rows?

Removing duplicate field entries in SQL

Categories

Resources