Find duplicate entries in a table - php

I have a messcuts table with the following structure.
id, student_rollno, date.
The problem is there are some records duplicated ie. two records with same student_rollno in the same date. How do I remove them? Eg:
SELECT *
FROM `messcuts`
WHERE student_rollno = 'b070226'
|id |student_rollno|date
|259|B070226|2011-08-06
|260|B070226|2011-08-07
|1485|B070226|2011-08-12
|1486|B070226|2011-08-13
|1487|B070226|2011-08-14
|1488|B070226|2011-08-15
|2372|B070226|2011-08-27
|2369|B070226|2011-08-24
|2368|B070226|2011-08-23
|2371|B070226|2011-08-26
|2374|B070226|2011-08-29
|2373|B070226|2011-08-28
|2370|B070226|2011-08-25
|2367|B070226|2011-08-22
|2375|B070226|2011-08-30
|2376|B070226|2011-08-31
|2938|b070226|2011-08-06
See on 2011-08-06 there are two records.

select student_rollno, date
from messcuts
group by student_rollno, date
having count(*) > 1
and to delete:
delete from messcuts d where d.id in (
select max(s.id)
from messcuts as s
group by s.student_rollno, s.date
having count(*) > 1)
if not working in mysql:
delete from messcuts
using messcuts, messcuts as v_messcuts
where messcuts.id <> v_messcuts.id
and messcuts.student_rollno = v_messcuts.student_rollno
and messcuts.date = v_messcuts.date

This will show the duplicates
SELECT *
FROM messcuts
WHERE (student_rollno,date) IN
(
SELECT student_rollno,date
FROM messcuts
GROUP BY student_rollno,date
HAVING count(*)>1
)
and to delete:
CREATE VIEW dups AS
SELECT MAX(id) as id
FROM messcuts
GROUP BY student_rollno,date
HAVING count(*)>1
you will need to run this several times to get rid of all duplicates
DELETE FROM t2
WHERE id IN (SELECT id FROM dups)
once you have removed the duplicates It may be a good idea to add a unique constraint to stop furthur problems.
ALTER TABLE messcuts ADD UNIQUE std_dte (student_rollno,date)

Related

How to choose oldest row from the table of similar rows?

I have a table. Table has structure of id, name, color, product_id.
And the table has multiple rows with the same product_id.
With SQL query from PHP file - I would like to choose only one, the oldest, row. (The first one that was added to the current table).
What query should I use or approach?
Thank you!
Just making up a bit of mockup data ... Note the notes I put in. And I trust it's a newer version of MySQL, as the older ones did not support ROW_NUMBER() OVER() .
Here goes:
WITH
-- input ... you *need* a timestamp to identify the oldest ---
indata(id, name, color, product_id,ts) AS (
SELECT 1,'Arthur','blue' ,42,TIMESTAMP'2021-01-31 17:45:00'
UNION ALL SELECT 1,'Arthur','blue' ,42,TIMESTAMP'2021-01-31 17:50:00'
UNION ALL SELECT 1,'Arthur','blue' ,42,TIMESTAMP'2021-01-31 17:55:00'
UNION ALL SELECT 1,'Arthur','blue' ,42,TIMESTAMP'2021-01-31 18:00:00'
UNION ALL SELECT 2,'Ford' ,'red' ,42,TIMESTAMP'2021-01-31 17:45:00'
UNION ALL SELECT 2,'Ford' ,'blue', 42,TIMESTAMP'2021-01-31 17:50:00'
UNION ALL SELECT 2,'Ford' ,'green',42,TIMESTAMP'2021-01-31 17:55:00'
UNION ALL SELECT 2,'Ford' ,'cyan' ,42,TIMESTAMP'2021-01-31 18:00:00'
)
,
-- select all, plus a rank, on which you will filter outside ..
with_rank AS (
SELECT
*
, ROW_NUMBER() OVER(PARTITION BY id ORDER BY ts) AS rnk
FROM indata
)
SELECT
id
, name
, color
, product_id
, ts
FROM with_rank
WHERE rnk = 1
id|name |color|product_id|ts
1|Arthur|blue |42 |2021-01-31 17:45:00
2|Ford |red |42 |2021-01-31 17:45:00
One method is a correlated subquery:
select t.*
from t
where t.id = (select min(t2.id)
from t t2
where t2.product_id = t.product_id
);
This assumes that id is incrementing with each insertion. If not, you have no way of knowing what the "oldest" row is. SQL tables represent unordered sets, so there is no "oldest" row unless a column contains that information.
SELECT * FROM TableName WHERE product_id = ProductID ORDER BY product_id LIMIT 1;

How to retrieve the last record in each group

I am trying to get data from my table using group by . using group by works correctly but i need to take only last inserted item of every group but its not work. my query always return first item of each group.
my query
SELECT id,type,userId,performDate,eventId FROM
`user_marker` where `eventId`='842' and DATE_FORMAT(from_unixtime(performDate),'%Y%c%d')
=DATE_FORMAT(now(),'%Y%c%d')
and `visibility`='1'GROUP BY type ORDER BY id DESC
Please try
SELECT a.* FROM ( SELECT id,type,userId,performDate,eventId FROM
`user_marker` where `eventId`='842' and DATE_FORMAT(from_unixtime(performDate),'%Y%c%d')
=DATE_FORMAT(now(),'%Y%c%d') and `visibility`='1' ORDER BY id DESC ) as a GROUP BY a.type
You can try as per below-
SELECT b.id,b.type,b.userId,b.performDate,b.eventId
FROM user_marker b
JOIN (SELECT `type`,MAX(performDate)
FROM user_marker
WHERE `eventId`='842' AND DATE(FROM_UNIXTIME(performDate))=CURDATE() AND `visibility`='1'
GROUP BY `type`) a ON a.type=b.type AND a.performDate=b.performDate
ORDER BY b.`type`;

Delete duplicate rows in mysql, keep XX rows

I want to keep the 10 latest duplicate rows and delete all the others.
I'm using the below code, but it deletes all the records except for one.
DELETE FROM `history` WHERE id NOT IN (SELECT * FROM (SELECT MIN(n.id) FROM history n GROUP BY n.url) x)`
Please try this query
This code will remove all duplicate row who have more then 10 duplicates.
DELETE FROM `history` WHERE id NOT IN (
SELECT * FROM (
SELECT MIN(n.id) FROM history n GROUP BY n.url having count(n.url) > 10
) x
)
Try this code that will work.
DELETE
FROM
history
WHERE
id NOT IN
(SELECT
id
FROM
(
SELECT
#num:=IF(#current=url, #num+1, 1) AS row_num,
#current:=url AS group_url,
id
FROM
history
ORDER BY
id
) AS `internal`
WHERE
row_num<=15
)

MySQL query that checks in another table

I've got a problem with two mysql tables. I've done some code and I think I am close to the solution, but I'm not sure if this is right.
So here are the two tables:
Table 1: Blogs
Columns: ID, agp_name, agp_url, agp_username, agp_password
Table 2: Keywords
Columns: ID, agp_user_id, agp_order_id, agp_blog_id, agp_keywords, agp_keywords_date
What I want is to get one random row from Table1 based on the following condition: if the agp_keyword match one of the keywords in the last 5 days then do not include into the result.
So far I did this:
SELECT
t1.agp_user_id, t1.agp_order_id, t1.agp_blog_id, t1.agp_keywords, t1.agp_keywords_date, t2.agp_name, t2.agp_url, t2.agp_username, t2.agp_password
FROM table1 AS t1
INNER JOIN (
SELECT ID, agp_name, agp_url, agp_username, agp_password, agp_blogposts
FROM table2
) AS t2 ON t1.agp_blog_id = t2.ID
WHERE
t1.agp_keywords NOT LIKE "%keyword1%" AND
t1.agp_keywords NOT LIKE "%keyword2%" AND
t1.agp_keywords_date BETWEEN (1369440000 AND 1369932432)
ORDER BY RAND() LIMIT 1
However this does not work correctly. Any help will be appreciated.
Try this, your original specifications were a bit confusing :(
SELECT keywords.agp_user_id,
keywords.agp_order_id,
keywords.agp_blog_id,
keywords.agp_keywords,
keywords.agp_keywords_date,
blogs.agp_name,
blogs.agp_url,
blogs.agp_username,
blogs.agp_password
FROM blogs
LEFT JOIN keywords
ON keywords.agp_blog_id = blogs.ID
AND keywords.agp_keywords NOT LIKE "%keyword1%"
AND keywords.agp_keywords NOT LIKE "%keyword2%"
AND FROM_UNIXTIME(keywords.agp_keywords_date) > (DATE_SUB(CURDATE(), INTERVAL 5 DAYS))
ORDER BY RAND() LIMIT 1

Mysql ordering and then grouping a query in mysql

I need to order my query by date first...
So I used this:
SELECT * FROM `mfw_navnode` order by `id` DESC
I wanted to order my results from last to first.
Then what I am trying to do
is to add a query over it, which would group my results by node_name..
The result should be..all the top nodes grouped by "category/node name type", while the first node that I see is was ordered the highest for its category in the first query..
I thought to do something like this:
SELECT * FROM(
SELECT * FROM `mfw_navnode` order by `id` DESC) AS DD
WHERE (node_name='Eby' OR node_name='Laa' OR node_name='MIF' OR node_name='Amaur' OR node_name='Asn' )
GROUP BY DD.node_name
I get no result..or any response from phpmyadmin when I input that result..
Where do I get wrong?
Note , I dont want to group my results and then order them..
I want them to be ordered, and then grouped. After being grouped..I want the result of each group to have the highest value ..from the other rows in the group
It is not sufficient to perform the ordering first, as even then MySQL makes no guarantee over which record it will select for each group. From the manual:
The server is free to choose any value from each group, so unless they are the same, the values chosen are indeterminate.
You must instead identify the records of interest with a subquery, then join the result with your table again in order to obtain the related values:
SELECT *
FROM mfw_navnode NATURAL JOIN (
SELECT node_name, MAX(id) AS id FROM mfw_navnode GROUP BY node_name
) AS DD
WHERE node_name IN ('Eby', 'Laa', 'MIF', 'Amaur', 'Asn')
Ordered by ID and group by node_name
SELECT * FROM `mfw_navnode`
WHERE (node_name='Eby' OR node_name='Laa' OR node_name='MIF' OR node_name='Amaur' OR node_name='Asn' )
GROUP BY DD.node_name
ORDER BY `id` DESC
Grouping is used commonly when You are using some aggregate function (sum, max, min, count, etc). If You don't use such function in Your query then why do You want to group the results?
Anyway, this should do the trick:
SELECT *
FROM mfw_navnode
WHERE id IN (SELECT id
FROM mfw_navnode
WHERE node_name IN ('Eby', 'Laa', 'MIF', 'Amaur', 'Asn')
GROUP BY node_name)
ORDER BY id
The following SQL may yield you the required output:
SELECT node_name, MAX(id)
FROM mfw_navnode
GROUP BY node_name
ORDER BY node_name
I see two problems with your SQL.
1) placing the order by in the inline select does nothing (and is probably causing an error)
2) you are grouping on node_name but you are not aggregating anything
SELECT COUNT(id) as row_count, node_name FROM( SELECT * FROM mfw_navnode ) AS DD
WHERE (node_name='Eby' OR node_name='Laa' OR node_name='MIF' OR node_name='Amaur' OR node_name='Asn' )
GROUP BY DD.node_name
order by node_name desc
further I am not sure why you need the inline select as the where could simply be on the original select ( perhaps you have something more complex going on that you didn't show )
SELECT COUNT(id) as row_count, node_name
from mfw_navnode
WHERE node_name='Eby' OR node_name='Laa' OR node_name='MIF' OR node_name='Amaur' OR node_name='Asn'
GROUP BY node_name
order by node_name desc

Categories