Find duplicate rows in mysql simplified example - php

Confession: a mysql newb requires simple example to locate duplicate rows in somewhat large table. I have searched for and read many other threads with similar titles, but the examples are so complex that I cannot apply them to my basic situation.
A MySQL table has only 5 fields, but there are hundreds of rows. I wish to locate duplicate rows -- I know there is one for sure and wonder if there are others.
Example Row: (rel_id is auto-incrementing, primary key field)
'rel_id' => 1
'host' => 17
'host_type' => 'client'
'rep' => 7
'rep_type => 'cli_mgr'
My approach was to:
1. Read entire table into mysql query
2. row-by-row compare the 4 data fields to those of previous ("done") rows
3. after comparing a "new" row, append it to array of "done" rows
Here is what I have tried. I am sure that there must be a much simpler solution. You will see that I am bogged down in trying to append the "new" row to the array of "done" rows:
$rRels = mysql_query("SELECT * FROM `rels`");
$a = array();
$e = array();
$c1 = 0;
$c2 = 0;
While ($r = mysql_fetch_assoc($rRels)) {
$i = $r['rel_id'];
$h = $r['host'];
$ht = $r['host_type'];
$r = $r['rep'];
$rt = $r['rep_type'];
foreach($a as $row) {
$xh = $row['host'];
$xht = $row['host_type'];
$xr = $row['rel'];
$xrt = $row['rel_type'];
if (($h==$xh) && ($ht==$xht) && ($r==$xr) && ($rt==$xrt)) {
echo 'Found one<br>';
$e[] = $r;
}
$c2++;
}
$a = array_merge(array('rel_id'=>$i, 'host'=>$h, 'host_type'=>$ht, 'rep'=>$r, 'rep_type'=>$rt), $a);
$c1++;
}
echo '<h3>Duplicate Rows:</h3>';
foreach ($e as $row) {
print_r($row);
echo '<br>';
}
echo '<br><br>';
echo 'Counter 1: ' . $c1 . '<br>';
echo 'Counter 2: ' . $c2 . '<br>';

This should do the trick:
SELECT COUNT(*) as cnt, GROUP_CONCAT(rel_id) AS ids
FROM rels
GROUP BY host, host_type, rep, rep_type
HAVING cnt > 1
any "duplicate" records will have a cnt > 1, and the group_concat will give you the ids of the duped records.

Pure no-php solution : to make the copy of the old table (named oldTable) , with no data
create table newTable like oldTable;
Modify the structure to prevent duplicates and add unique key over all 5 columns.
alter table newTable add unique index(rel_id,host,host_type,rep,rep_type );
Then whith sql query copy the rows from oldTable
insert IGNORE into newTable select * from oldTable
In newTable you have only the unique data.
Another option is group by, if you will get the number of duplicate rows use
select concat_ws('_',rel_id,host,host_type,rep,rep_type) as str, count(*)
from oldTable
group by str

You can this query to find all the duplicate rows. Hopefully, it should be easy integrating in your PHP code.
// This will give you all the duplicates
// Self join where all the columns have the same values but different primary keys
SELECT *
FROM rels t1, rels t2
WHERE t1.rel_id != t2.rel_id
AND t1.host = t2.host
AND t1.host_type = t2.host_type
AND t1.rep = t2.rep
AND t1.rep_type = t2.rep_type

Finding duplicates is more easily done in SQL than in PHP.
SELECT GROUP_CONCAT(rel_id)
FROM rels
GROUP BY host, host_type, rep, rep_type HAVING COUNT(rel_id)>1;
This will show the groups of rel_id that point to identical records. The HAVING COUNT(rel_id)>1 clause allows us to skip unduplicated records.

Related

php mysql updating a column only where there are duplicates in an if statement

I have this problem i can't seem to solve. I'm kinda new at programming so it could be really easy but i couldn't find anything. I have a big array through which I loop and if there are duplicates they need to be set to false in a column called address_correct. Here is my code:
$sql1 = "SELECT street, entity_id FROM adresfix";
$arraycorrect = [];
$arrayincorrect = [];
$i = 0;
$db_query = $db->query($sql1);
$adres_rows = $db_query->fetch_all();
var_dump($adres_rows);
foreach ($address_rows as $address) {
if (in_array($address_rows[$i][0] , $arraycorrect)){
echo "Good";
array_push($arrayincorrect, $address_rows[$i][0]);
$sql = "UPDATE adresfix SET address_correct = 'false' WHERE";
} else {
array_push($arraycorrect, $address_rows[$i][0]);
echo "False";
}
$i++;
}
I have no idea what is should put after where in the sql query to make sure only the duplicates are updated. Can anybody help me with this problem?
There are multiple ways to achieve this, but I am not sure what your endgoal is. The most easy thing to do to remove duplicates is create a new table and insert all unique records in that new table, like this:
CREATE TABLE adress_unique LIKE adresfix;
INSERT INTO address_unique
SELECT * FROM adresfix
GROUP BY street,number,city; // Combination of columns that contains duplicate values.
Eventually, after reviewing, you could remove the old table and rename the new one:
DROP TABLE adresfix;
ALTER TABLE address_unique RENAME TO adresfix;
Okay, so the select query to get all duplicate rows, based on multiple columns (assumption), is:
SELECT * FROM adresfix AS a1
INNER JOIN adresfix AS a2
WHERE (
a1.street = a2.street
AND
a1.number = a2.number
AND
a1.city = a2.city
)
Now, if you want to update those rows, you should add this query to the update query, as a subquery or join query. Example:
UPDATE adresfix
SET address_correct = false
WHERE entity_id IN (
SELECT entity_id FROM (
SELECT a1.entity_id
FROM adresfix AS a1
INNER JOIN adresfix a2
WHERE (
a1.street = a2.street
AND
a1.number = a2.number
AND
a1.city = a2.city
)
) AS foo
)
See this fiddle: http://sqlfiddle.com/#!9/8899eb/2

PHP/SQL: Faster Way to Combine Query Results

I'm joining data from two SQL queries and I'm wondering if there is a faster way to do this as a single SQL query because there is a lot of looping involved. I've got two queries that look for different string values in the "option_name" field:
$sql01= "SELECT user_id, option_value FROM wp_wlm_user_options WHERE option_name = 'wpm_login_date' ORDER BY user_id";
$sql02 = "SELECT user_id, option_value FROM wp_wlm_user_options WHERE option_name ='stripe_cust_id' ORDER BY user_id ";
Then I create two arrays:
//Process the 1st SQL query data into an Array
$result_array01 = array();
$j = 0;
while($r = mysql_fetch_assoc($result01)) {
if(!empty($r['option_value'])){
//User Id and Last Login
$result_array01[$j]['user_id'] = $r['user_id'];
$result_array01[$j]['last_login'] = $r['option_value'];
$j++;
}
}
//Process the 2nd SQL query data into an Array
$result_array02 = array();
$k = 0;
while($s = mysql_fetch_assoc($result02)) {
if(!empty($s['option_value'])){
//User Id and Stripe Customer Id
$result_array02[$k]['user_id'] = $s['user_id'];
$result_array02[$k]['cust_id'] = $s['option_value'];
$k++;
}
}
And finally, I combine the arrays:
//Combine the SQL query data in single Array
$combined_array = array();
$l = 0;
foreach($result_array01 as $arr01){
// Check type
if (is_array($arr01)) {
//mgc_account_print("hello: " . $arr01['user_id'] . "\r\n");
foreach($result_array02 as $arr02){
// Check type
if (is_array($arr02)) {
//Check if User Id matches
if($arr01['user_id'] == $arr02['user_id']){
//Create Array with User Id, Cust Id and Last Login
$combined_array[$l]['user_id'] = $arr01['user_id'];
$combined_array[$l]['last_login'] = $arr01['last_login'];
$combined_array[$l]['cust_id'] = $arr02['cust_id'];
$l++;
}
}
}
}
}
Why you doing in two different queries?
Use mysql IN('val', 'val2');
$sql01= "SELECT tbl1.user_id, tbl1.option_value FROM wp_wlm_user_options as tbl1 WHERE tbl1.option_name = 'wpm_login_date'
union all
SELECT tbl2.user_id, tbl2.option_value FROM wp_wlm_user_options as tbl2. WHERE tbl2.option_name ='stripe_cust_id' ";
But using OR/AND will your help you in your case , I didnt see at first that you want combined same table. I didnt delete my answer to help you for another solution
Also you should use DISTINCT to avoid multiple records.
SELECT DISTINCT USER_ID, OPTION VALUE FROM TABLE

How to ignore duplicate rows in foreach loop when retrieve the data from database by MYSQL

I have got the following code to find the similar keywords in body of a text and display the related links with same keyword.
But the problem is for example if two keywords are in row 2 body, Row 2 displays two times but I need the row 2 is displayed once. I tried SELECT DISTINCT but it does not work in foreach loop correctly.
$tags2=explode(",",$tags);
foreach ($tags2 as $i) {
$cat_sqlii="SELECT DISTINCT id, source,title,summary,newsText,photo,mainphoto,link,Date,tags FROM newxtext WHERE (newsText LIKE '%$i%')";
$cat_sql_queryii=mysqli_query($con,$cat_sqlii);
$cat_sql_rowii=mysqli_fetch_assoc($cat_sql_queryii);
do{
echo $cat_sql_rowii['id'].'<br/>';
}while($cat_sql_rowii=mysqli_fetch_assoc($cat_sql_queryii));
}
Just do one query that tests for any of the tags using OR.
$patterns = array();
foreach ($tag in explode(',', $tags)) {
$patterns[] = "newstext like '%$tag%'";
}
$where = implode(' OR ', $patterns);
$cat_sqlii="SELECT id, source,title,summary,newsText,photo,mainphoto,link,Date,tags
FROM newxtext
WHERE ($where)";
$cat_sql_queryii=mysqli_query($con,$cat_sqlii);
while ($cat_sql_rowii = mysqli_fetch_assoc($cat_sql_queryii)) {
echo $cat_sql_rowii['id'].'<br/>';
}
Another approach could be using a temporary table receiving the results for each iteration and querying that table in the end:
mysqli_query($con, "CREATE TEMPORARY TABLE tmpSearchResults(id int primary key) ENGINE=Memory");
$tags2=explode(",",$tags);
foreach ($tags2 as $i) {
$insertToTemp ="INSERT INTO tmpSearchResults
SELECT id
FROM newxtext
WHERE (newsText LIKE '%$i%')";
mysqli_query($con,$insertToTemp);
}
$queryFromTemp = "SELECT DISTINCT n.id, n.source,n.title,n.summary,n.newsText,n.photo,n.mainphoto,n.link,n.`Date`,n.tags
FROM tmpSearchResult r
JOIN newxtext n
WHERE r.id = n.id";
$resultSet = mysqli_query($con,$queryFromTemp);
while($data = mysqli_fetch_assoc($resultSet)){
// ... process here
}
mysqli_free_result($resultSet);
When you close the connection, the temporary table will be dropped automatically.
If you expect huge search results, consider using another storage engine than MEMORY for the temptable.

Php for loop and mysql query

I'm running a second database search using mysql inside a for loop but I can't get it show the correct amount of rows:
Original search:
$topicemailsql = "select se.id as id, se.users as users, se.topic as topic, se.body as body, se.postID as postID, DATE_FORMAT(se.sent, '%d.%m.%Y %H:%i:%s' ) as sent, (SELECT u.email from users u where u.users_id in (se.users)) as emails from sentEmail se LEFT OUTER JOIN topics t on (t.ID = se.theader_) where t.ID = '$topicID'";
$topicemailsqlquery = mysql_query($topicemailsql)or die(mysql_error());
$numrows = mysql_num_rows($topicemailsqlquery);
php for loop:
for ($i=0; $i < $numrows ; $i++){
$sqlarray = mysql_fetch_array($topicemailsqlquery);
$users = $sqlarray['users'];
$sqlemail = ('select email from users where users_id in ("'.$users.'")');
//echo $sqlemail;
$emailsqlquery = mysql_query($sqlemail)or die(mysql_error());
$amountofusers = mysql_num_rows($emailsqlquery);
$sqlarrayemail = mysql_fetch_array($emailsqlquery);
echo $amountofusers;
//echo $sqlarrayemail['email'];
for ($a=0; $a < $amountofusers ; $a++){
if($a == 0){
$email = $sqlarrayemail['email'];
}
else if($a < $amountofusers){
$email = $sqlarrayemail['email'].','.$sqlarrayemail['email'];
}
}
}
So based on this the $amountofusers should return more than 1 row but now it always return only one row.
When I echo the $sqlemail it should return 2 rows because it looks like this:
select email from users where users_id in ("4,82") --> this should return 2 rows and a count of 2 but it only returns one row.
Where does it go wrong?
Br,
Toby
I think, it's a very bad idea to store multiple values in one field. Furthermore I wouldn't fire up queries in a loop, if I can avoid it. Better fetch the data in one go and let PHP do the rest. If you try your query i.e. with PHPAdmin you would use
SELECT email
FROM users
WHERE users_id IN (4, 82)
The IN operator needs a comma separated list of arguments. You give one single value
"4,82"
That's a huge difference. MySQL would accept ("4","82") too (other DBMS are not as tolerant) and handle the not needed conversion for you.

MYSQL Joins - Where Unique ID lies in 1 table

I have 2 tables I want to connect in a strange / dangerous / non-dynamic way. I don't have control over these tables. I'm trying to pull the summary from one table that contains event id but not category id but I need to reference another table to make sure that said event is in said category. This table contains both event id and cat id. I'm trying to join them but I keep getting returned nothing.
I know this is dangerous, but I also have control over the categories so I know that my category ID's will not change unless I specify. Since it auto-increments - my categories will be 1, 2, 3.
The Gist of my Tables
*events_vevent*
- ev_id
- catid
---
*events_vevdetail*
- evdet_id
- summary
My Code
$data = array();
for($i = 0; $i < 3; $i++){
$summary = array();
$query_summary = mysql_query("SELECT events_vevdetail.summary FROM
events_vevdetail, events_vevent
WHERE 'events_vevent.evdet_id = $i' LIMIT 5")
or die(mysql_error());
while(($row = mysql_fetch_array($query_summary)))
$summary[] = $row[0];
switch($i){
case 0:
$data['cat1'] = $summary;
break;
case 1:
$data['cat2'] = $summary;
break;
case 2:
$data['cat3'] = $summary;
break;
}
}
echo json_encode($data);
Explanation
so what I'm trying to do is: Since I know category 1 will always have an ID of 0, I want to pull the most recent 5 posts, but only posts in category ID 0. Same for cateogry2 and 3. Right now I'm getting empty arrays. I feel like I need 2 mysql queries (one for each table) and then compare but I'm not 100% sure and would rather do this the right way than the long way.
tl;dr is my MYSQL right?
This query will return top most 5 records from each category.
SELECT e1 . *
FROM events_vevent e1
LEFT OUTER JOIN events_vevent e2 ON
( e1.catid = e2.catid AND e1.ev_id < e2.ev_id )
GROUP BY e1.ev_id
HAVING COUNT( * ) < 5
ORDER BY catid, ev_id DESC

Categories