SQL: Delete duplicated rows? (PHP) - php

I have the following database and want to delete the red ones because they are doubouled. So I have to check every row if another row is matching by pid, price, price_old, link and shop.
But how can I check that and how can I delete it then?
Maybe an easier way would be to generate a id from the values inside each row. So if the values inside a row would be equal also the id would be equal and who have only one value to compare with the other id's.
Is that a better way? - If yes, how can I do that?
Greetings!

Do the fact you have no way for get thi distinct row you could add uniqie id using
ALTER TABLE my_table
ADD id int NOT NULL AUTO_INCREMENT
Once done you could use not in where the id are not the min grouped by the value you need for define the duplication
delete from my_table
where id NOT in ( select min(id) from my_table
group by shop, link
)

The simplest way is to run a distinct query:
select distinct pid, price, price_old, link, shop
from t;
You can create a new table using into. That is the simplest way. Because all columns are the same, MySQL doesn't offer a simple method to delete duplicate rows (while leaving one of them).
However, it is possible that your current results are generated by a query. If so, you can just add select distinct to the query. However, it would be better to fix the query so it doesn't generate duplicates. If this is the case, then ask another question with sample data, desired results (as text, not an image), and the query you are currently using.

Test this first on a test table:
DELETE t1
FROM t t1, t t2
WHERE t1.id > t2.id AND t1.price = t2.price
AND t1.link = t2.link AND t1.shop = t2.shop
AND t1.price_old = t2.price_old;
Basically you are removing the one with the highest ID if those parameters are equal

select * from
(select pid, price, price_old, link ,
row_number() over(partition by pid, price, price_old, link, shop order by pid) as rank
from my_table) temp
where temp.rank = 1
This Query will group by all the columns first and rank them. Duplicate rows will have rank > 1. It does not matter we take first or second row as both are copy of each other. We just take rows with rank 1. Rows that are not duplicate will also be having rank 1 and hence won't be neglected.
One more way to this is by using union.
select * from my_table UNION select * from my_table

Related

How to show repeated row one time mysql?

Here is
I want to show rows with same stop name one time..
How i use Query and While Loop
I see you have and id column. Assuming that it is unique you can do this all in sql query, no need for while loop.
You will need 2 queries; first will get the maximum (could be minimum also) available id of only one distinct stop name, the second is a join query with the first results and the main table. Something like this:
select * from tablename
inner join
(
select stop, max(id) as id from tablename
group by stop
)
as uniqueIDs
on tablename.id=uniqueIDs.id
u may try this..this will help you to fetch duplicates from table
SELECT tablename.stop FROM tablename INNER JOIN
(SELECT stop FROM tablename GROUP BY stop HAVING COUNT(id) > 1) dup
ON tablename.stop = dup.stop;

MySQL NOT IN with condition

I need to write a query that will pull all pieces of hardware that are unassigned to a user. My tables that look like this:
table: hardware
ID, brand, date_of_purchase, purchase_price, serial_number, invoice_location
table: assigned_equipment
ID, user_id, object_id, object_type, is_assigned, date_assigned
Once a piece of hardware is checked out to a user, a new entry in assigned_equipment is made, and the column is_assigned is set to 1. It can be 0 if it is later unassigned.
That being said, my query looks like this:
SELECT * FROM hardware WHERE ID NOT IN (SELECT object_id FROM assigned_equipment);
I need a conditional statement that would add WHERE is_assigned = 0 otherwise if there's an entry it will not list. Ideas?
Simple extend the subquery to contain only assigned items:
SELECT * FROM hardware
WHERE ID NOT IN
(SELECT object_id FROM assigned_equipment WHERE is_assigned = 1);
So, every matching id is NOT in the subselect - therefore unassigned.
Columns in the assignment table with is_assigned=0 are no longer part of the subresult, and therefore part of your outer result.
You can't do this without a JOIN so you should ditch the subselect.
SELECT
hardware.*
FROM
hardware h
LEFT JOIN
assigned_equipment e
ON (e.object_id = h.id)
WHERE
e.id IS NULL
OR
(e.is_assigned = 0 AND e.user_id = ?);
If you take a semantic approach then the is_assigned column should not be required - as only assigned items should appear in the assigned_equipment table.
Which would make your query:
SELECT *
FROM `hardware`
WHERE `id` NOT IN (
SELECT `object_id`
FROM `assigned_equipment`
);
This of course means that when an item becomes unassigned you DELETE the row from the assigned_equipment table.
In my opinion this is better as it means you're not storing unnecessary data.

Mysql - How to get a row number after Order by?

Let's say I have a table with the following columns:
p_id
userid
points
Let's say these columns have over 5000 records. So we actually have users with points. Each user has an unique row for their point record. Imagine that every user can get points on the website by clicking somewhere. When they click I update the database with the points they get.
So we have a table with over 5000 records of people who have points, right? Now I would like to order them by their points (descending), so the user with the most point will be at the top of the page if I run a MySQL query.
I could do that by simply running a query like this:
SELECT `p_id` FROM `point_table` ORDER BY `points` DESC
This query would give me all the records in a descending order by points.
Okay, here my problem comes, now (when it is ordered) I would like to display each user which place are they actually. So I'd like to give each user something like this: "You are 623 of 5374 users". The problem is that I cannot specify that "623" number.
I would like to run a query which is order the table by points it should "search" or count the row number, where their records are and than return that value to me.
Can anyone help me how to build a query for this? It would be a really big help. Thank you.
This answer should work for you:
SET #rank=0;
SELECT #rank:=#rank+1 AS rank, p_id FROM point_table ORDER BY points DESC;
Update: You might also want to consider to calculate the rank when updating the points and saving it to an additional column in the same table. That way you can also select a single user and know his rank. It depends on your use cases what makes more sense and performs better.
Update: The final solution we worked out in the comments looked like this:
SELECT
rank, p_id
FROM
(SELECT
#rank:=#rank+1 AS rank, p_id, userid
FROM
point_table, (SELECT #rank := 0) r
ORDER BY points DESC
) t
WHERE userid = intval($sessionuserid);
Row number after order by
SELECT ( #rank:=#rank + 1) AS rank, m.* from
(
SELECT a.p_id, a.userid
FROM (SELECT #rank := 0) r, point_table a
ORDER BY a.points DESC
) m
For some reason the accepted answer doesn't work for me properly - it completely ignores "ORDER BY" statement, sorting by id (primary key)
What I did instead is:
SET #rn=0;
CREATE TEMPORARY TABLE tmp SELECT * FROM point_table ORDER BY points DESC;
SELECT #rn:=#rn+1 AS rank, tmp.* FROM tmp;
Add a new column for position to the table. Run a cron job regularly which gets all the table rows ordered by points and then update the table with the positions in a while loop.

SQL find same column data

I have a table with around 15 columns. What I would like to be able to do, is select a range of IDs and have all column data that is the same, presented to me.
At the minute, I have it structured as the following:
SELECT id, col_a, col_b ... count(id)
FROM table
GROUP BY col_a, col_b ...
Which returns rows grouped together that have identical data within all the rows - which is half what I want, but ideally I would like to be able to get a single row with either the value (if it's the same for every row id) or NULL if there is a single difference.
I'm not sure that it is possible, but I would rather see if it's doable in an SQL query than write some looping logic for PHP to go through and check each row's similarity.
Thanks,
Dan
UPDATE:
Just to keep this up-to-date, I worked through the problem by writing a PHP function that would find which were duplicates and then display the differences. However I have now since made a table for each column, and made the columns as references to the other tables.
E.G. In MainTable, ColA now refers to the table ColA
I'm still solving the problem with the PHP for the time being, mainly as I think it still leaves the problem mentioned above, but at least now Im not storing duplicate information.
Its a hairy thing to do, but you could do it similarly to how David Martensson suggested, I would write it like this, however:
Select a.id, a.col1, a.col2, a.col3
FROM myTable a, myTable b
WHERE a.id != b.id
and a.col1 = b.col1
and a.col2 = b.col2
and a.col3 = b.col3
That would give you the ids that are unique, but each result would have the same values for columns 1, 2, and 3. However, I agree with some of the commenters to your question that you should consider an alternative data structure, as this could better take advantage of an RDBMS model. In that case you would want to have 2 tables:
Table name: MyTableIds
Fields: id, attrId
Table name: MyTableAttrs
Fields: attrId, attr1, attr2, attr3, ect
In general, if you have data that is going to be duplicated for multiple records, you should pull it into a second table and create a relationship so that you only have to store the duplicated data 1 time and then reference it multiple times.
Make a join to a subquery with the group by:
SELECT a.id, b.col_a, b.col_b ... b.count)
FROM table a
LEFT JOIN (
SELECT id, col_a, col_b ... count(id) "count"
FROM table GROUP BY col_a, col_b ...
)b on a.id = b.id
That way the outer will select all rows.
If you still want to group answers you could use a UNION instead
SELECT id, col_a ...
WHERE id NOT IN ("SUBQUERY WITH GROUP BY")
UNION
"SUBQUERY WITH GROUP BY"
Not the nicest solution but it should work
It seems doable from how I have understood your question.
And here's a possible pattern:
SELECT
/* GROUP BY columns */
col_A,
col_B,
...
/* aggregated columns */
CASE MIN(col_M) WHEN MAX(col_M) THEN MIN(col_M) ELSE NULL END,
CASE MIN(col_N) WHEN MAX(col_N) THEN MIN(col_N) ELSE NULL END,
...
COUNT(...),
SUM(...),
WHATEVER(...),
...
FROM ...
GROUP BY col_A, col_B, ...

How do I delete duplicate rows in this (mildly) complicated MySQL database situation?

Ok. Please bear with me, I suck at explaining things.
I have a database of contact information that is gathered through a form on a website. Obviously, people press submit more than once accidentally (or on purpose, but fixing is a different issue) so there are a LOT of duplicate rows in this database.
So, table1 holds contact information as such:
ID | date | unique ID code | first name, blah blah
1 stuff 20110101ba78b joe
And table2 holds related data joined by the unique ID code field, as such:
ID | data | unique ID code
1 a 20110101ba78b
2 b 20110101ba78b
So, table2 holds multiple values for each person. That's the structure of the table (and there are about a million rows in table2, so I'd rather not change the structure right now).
So my dilemma is this: I know it's easy to make a temporary table and SELECT DISTINCT(all fields), but I want to keep the unique ID field for at least 1 of the duplicate rows. If I keep the unique ID field though, it is unique for each row, even if the other data is exactly the same so SELECT DISTINCT(all fields) will not work, it will keep every row. Hopefully I explained this thoroughly. Please ask me for more information if needed.
EDIT: I'm sure I could get rid of the ID field for each table, but as far as I'm concerned it's just .... there to be there.
With the first clarification and a little reading between the lines, we can guess that it will be satisfactory to keep just the first or last entry for a given 'Unique ID Code' in Table1, where first or last means oldest or newest entry. The queries are the same except for MAX vs MIN. I'm assuming the 'date' column contains a fine enough (1 second or smaller) granularity that you don't get the same Unique ID Code twice in a time quantum; this is unlikely to be the case if the 'date' column really only contains a DATE (year, month, day) value, but probably is the case if you have a TIMESTAMP(3) and might well be the case with TIMESTAMP.
As always with SQL, build the query up in stages, nice and gently.
Find the newest entry for each Unique ID Code with multiple entries
SELECT Unique_ID_Code, MAX(date) AS Newest
FROM Table1
GROUP BY Unique_ID_Code
HAVING COUNT(*) > 1
Find the details for the Unique ID Code matching the newest entry
SELECT T1.*
FROM Table1 AS T1
JOIN (SELECT Unique_ID_Code, MAX(date) AS Newest
FROM Table1
GROUP BY Unique_ID_Code
HAVING COUNT(*) > 1
) AS M
ON M.Unique_ID_Code = T1.Unique_ID_Code AND M.Newest = T1.Date
Now the tricky stuff
What you do next depends on how much you trust the transaction support in your DBMS and how big the Table1 is, and on whether you have ON DELETE CASCADE constraints on your foreign keys, and ...
You could create a temporary table with the rows selected by the second query above (MySQL syntax, I believe; other DBMS use different notations for this).
CREATE TEMPORARY TABLE KeepTheseRows
SELECT T1.*
FROM Table1 AS T1
JOIN (SELECT Unique_ID_Code, MAX(date) AS Newest
FROM Table1
GROUP BY Unique_ID_Code
HAVING COUNT(*) > 1
) AS M
ON M.Unique_ID_Code = T1.Unique_ID_Code AND M.Newest = T1.Date;
then delete all the rows from Table1 that match the duplicate unique ID codes:
DELETE FROM Table1
WHERE Unique_ID_Code IN (SELECT Unique_ID_Code FROM KeepTheseRows);
and then reinstate the rows to be kept:
INSERT INTO Table
SELECT * FROM KeepTheseRows;
You may need to defer constraint checking while this happens, or you may need to drop the foreign key constraints while this occurs. You need to worry about activity while this operation occurs; it would be best if people were not inserting rows into Table1 while this is running. If they are modifying the table as you run, you may find that you have to do the processing several times. You should add a unique constraint to Table1.Unique_ID_Code just as soon as possible so you don't get into the mess again. (And don't forget to re-enable any deferred constraints or recreate and dropped foreign keys.)
There probably are other equivalent ways to do this; this relies only on standard (SQL-92) SQL apart from the temporary table notation.
Experiment with a copy of your production database.
This to update Table 2 to use the lowest uniqueID number for identical contact info:
UPDATE Table2
SET Table2.uniqueID = (
SELECT T1.UniqueID
FROM Table1 T1, Table1 T2
WHERE T1.unique ID &lt T2.unique ID
AND T1.firstname = T2.firstname
AND T1.date = T2.date
AND T1.blah, blah = T2.blah, blah
)
WHERE Table2.uniqueID = (
Select T1.UniqueID
from Table1 T1, CopyOfTable1 T2
where T1.firstname = T2.firstname
and T1.date = T2.date
and T1.blah, blah = T2.blah, blah
);
This to remove all except ONE (with the lowest uniqueID) duplicate contact info records:
delete T1
from Table1 T1, CopyOfTable1 T2
where T1.unique ID > T2.unique ID
and T1.firstname = T2.firstname
and T1.date = T2.date
and T1.blah, blah = T2.blah, blah

Categories