Simple Design of friends list database - php

I have a small issue with making a friendship system database.
now I have a table called friends
let's say:
table friends:
you friend approve since
_________________________________________________
wetube youtube 1 4-12-2012
facebook wetube 1 4-12-2012
and i have this query code to fetch the friends of user called wetube.
mysql_query("SELECT f.* FROM friends f inner join users u on u.username =
f.you WHERE f.friend = 'wetube' UNION ALL SELECT f.* FROM friends f inner join users u on
u.username = f.friend WHERE f.you = 'wetube'");
now what I want exactly is how to fetch the friens of wetube and show it to him on his page.
fixed:
Finally I fixed the problem.
so this is the table:
CREATE TABLE IF NOT EXISTS `friends` (
`id` int(20) NOT NULL AUTO_INCREMENT,
`you` varchar(255) NOT NULL,
`friend` varchar(255) NOT NULL,
`type` varchar(255) NOT NULL,
`since` date NOT NULL,
`message` text NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `id` (`id`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 AUTO_INCREMENT=2 ;
and this is the php code to fetch the user friends
<?php
$username = $_SESSION['username'];
$friends_sql = mysql_query("SELECT * FROM friends WHERE (friend = '$username' OR you = '$username') ");
while($friends_row = mysql_fetch_assoc($friends_sql)){
if ($username == $friends_row['you']) {
echo $friends_row['friend']."<br/>";
} elseif($username == $friends_row['friend']) {
echo $friends_row['you']."<br/>";
}
}
?>
try it yourself it works 100%

$result = mysql_query("SELECT * FROM friends WHERE friend='wetube'");
$row = mysql_fetch_array($result);
echo $row['friend'], ' - ', $row['since'];

Not a direct answer to your question, but a completely normalised database would have two tables for this system.
Since this is a many-to-many relationship, I would do something like this:
UsersTable
------------------
id: int, serial
name: varchar
email: varchar
...
And
RelationshipTable
------------------
id: int, serial
user1_id: int, foreign key on UsersTable.id
user2_id: int, foreign key on UsersTable.id
approved: boolean
since: date
With a properly designed and normalised database, it will be much more easy to manage and create your queries.

You mean just the rows that have wetube in the [friend] column? You might just be over-thinking the joins:
SELECT * FROM [friends] WHERE [friend] = 'wetube'
If you want where wetube is in either column:
SELECT * FROM [friends] WHERE [friend] = 'wetube' OR [you] = 'wetube'
Or:
SELECT * FROM [friends] WHERE [friend] = 'wetube'
UNION ALL
SELECT * FROM [friends] WHERE [you] = 'wetube'

All of the wetube's friends are:
SELECT * FROM friends WHERE (friend = 'wetube' OR you = 'wetube') AND approve = 1
I would suggest removing the column approve and instead keeping a table of requests. This will mean that all approved requests are in the friends table and all pending approvals are in the friend_request (or whatever) table.
This two table design is more efficient because you do not always have to check that approve = 1 when you want to show friends (which is probably pretty often).

Maybe not understanding this exactly.
Select * from Friends f
where f.you = 'wetube' or f.friend = 'wetube'
Guess your looking for user info as well, reason for inner joins.
Maybe removing the approval column, and have 2 records. The 2nd record is when the friend approves as a friend.
Then you can easily see who YOU are friends with, and they in return are friends with.
wetube --> youtube
facebook --> youtube
youtube --> wetube wetube would approve a friendship request for youtube, adding a record
wetube --> facebook
Then you could much easily ask who is friends of wetube.
Just an idea, probably not the answer you were looking for.

Related

Accessing Row from Mysql DB by chaining query of foreign key from another table

I'm having difficulty trying to find the best way to get my results from a table. I want to get the targeted row from a table by one using the primary key from another using a foreign key.
The tables are would be set similar to this(minus a lot of other attributes for space):
user Table:
user_Id(pk)
name
type
venue_Id(unique/indexed)
venue Table:
venue_Id(fk)
rating
Logic flow is: user_Id is provided by a session variable. Query DB table 'user' to find that user. Go to type of user to identify if user is person or venue. Assuming user is venue, go to DB table 'venue' and query table for rating using foreign key from unique/indexed venue_Id from user table.
The query looks like
SELECT rating FROM `venue` WHERE `user_Id` = '$user_Id' AND `type` = 'venue'
Is this possible, and if so, what is the correct way to go about it?
You have a few ways to retrieve this information.
Using JOIN:
SELECT v.rating
FROM venue v INNER JOIN user u
ON v.venue_id= u.venue_id
AND u.`user_Id` = '$user_Id' AND u.`type` = 'venue'
Using an IN sub-query
SELECT rating
FROM venue
WHERE venue_id IN (SELECT venue_id FROM user
WHERE `user_Id` = '$user_Id' AND `type` = 'venue')
BTW, you should consider protect your code from potential SQL Injections
Its a bit unclear you explained that way.
From what I get, there is 2 table User and Venue.
In User table u have: user_id, venue_id, name, type.
While in Venue table u have: venue_id, rating.
You are expecting to get rating (Venue Table) while you use the WHERE clause in user_id and type which both stored on User Table.
Your Query:
SELECT rating FROM venue WHERE user_Id = '$user_Id' AND type = 'venue'
It is impossible to get it done like above because you are selecting from venue table while user_id and type is not from venue table. So it will make it unidentified even you have chaining the FK. Because FK will only to show and make some constraint to parent child table.
The query should be something like this:
SELECT rating FROM venue v JOIN user u on v.venue_id = u.venue_id WHERE u.user_Id = '$user_Id' AND u.type = 'venue'
Correct me if I am wrong..
Combining rows from two tables based on the tables having columns with equal values is called an equi-join operation, it's the pattern we typically use to "follow" foreign key relationships.
As an example:
$sql = "SELECT v.rating
FROM `venue` v
JOIN `user` s
ON s.venue_Id = v.venue_Id
AND s.type` = 'venue'
WHERE s.user_Id` = '" . mysqli_real_escape_string($con, $user_Id) ."'"
This isn't the only pattern, there are several other query forms that will return an equivalent result.
As an example of using an EXISTS predicate:
$sql = "SELECT v.rating
FROM `venue` v
WHERE EXISTS
( SELECT 1
FROM `user` s
WHERE s.venue_Id = v.venue_Id
AND s.type` = 'venue'
AND s.user_Id` = '"
. mysqli_real_escape_string($con, $user_Id)
."'"
)";
The original query appears to be vulnerable to SQL Injection; the example queries demonstrate the use of the mysqli_real_escape_string function to "escape" unsafe values and make them safe to include in SQL text. (That function would only be appropriate if you are using the mysqli interface. Using prepared statements with bind placeholders is another approach.

SQL fetch all in a self-join construction

My current database looks like
CREATE TABLE sites(
siteId INT(11) AUTO_INCREMENT,
siteType VARCHAR(255),
siteName VARCHAR(255),
siteDomain VARCHAR(255),
PRIMARY KEY(siteId)
);
CREATE TABLE siteSites(
parentId INT(11),
childId INT(11)
)
I'm trying to join all the tables and fetch all data.
like:
<?php
$q=mysql_query("SELECT * FROM sites s1, siteSites, sites s2 WHERE s1.siteId=parentId AND s2.siteId=childId");
$row=mysql_fetch_array($q);
?>
and than i want to get both the info from 's1' and 's2' out of the $row variable.
is this possible and if it is than how do i do it?
thank you
SELECT s1.siteId as ParentSiteId, s1.siteType as ParentType, s1.siteName as ParentName, s1.siteDomain as ParentDomain,
s2.siteId as ChildSiteId, s2.siteType as ChildType, s2.siteName as ChildName, s2.siteDomain as ChildDomain
FROM sites s1
INNER JOIN siteSites ss
ON s1.siteId = ss.parentId
INNER JOIN sites s2
ON ss.childId = s2.siteId
Your question isn't very clear...
If I'm making any sense of it, you'd like to pull the entire graph in a single query. If so, no, this is not possible in MySQL. Doing so would require a recursive with statement.
If not and your current query is correct, you need to alias the column names rather than select *, i.e. something like:
select s1.siteid as parent_id,
s1.sitename as parent_name,
...,
s2.siteid as child_id,
s2.sitename as child_name,
...
I don't know if I understood your question.
Try this:
SELECT s1.*, s2.*
FROM sites s1 JOIN siteSites ss
ON s1.siteId = ss.parentId
JOIN sites s2
ON ss.childId = s2.siteId
You only missing the while loop:
while($row=mysql_fetch_array($q))
{
echo $row['siteName'];
echo $row['siteDomain'];
echo $row['parentId'];
// etc ..., access to values by field name
}

MySQL inclusion/exclusion of posts

This post is taking a substantial amount of time to type because I'm trying to be as clear as possible, so please bear with me if it is still unclear.
Basically, what I have are a table of posts in the database which users can add privacy settings to.
ID | owner_id | post | other_info | privacy_level (int value)
From there, users can add their privacy details, allowing it to be viewable by all [privacy_level = 0), friends (privacy_level = 1), no one (privacy_level = 3), or specific people or filters (privacy_level = 4). For privacy levels specifying specific people (4), the query will reference the table "post_privacy_includes_for" in a subquery to see if the user (or a filter the user belongs to) exists in a row in the table.
ID | post_id | user_id | list_id
Also, the user has the ability to prevent some people from viewing their post in within a larger group by excluding them (e.g., Having it set for everyone to view but hiding it from a stalker user). For this, another reference table is added, "post_privacy_exclude_from" - it looks identical to the setup as "post_privacy_includes_for".
My problem is that this does not scale. At all. At the moment, there are about 1-2 million posts, the majority of them set to be viewable by everyone. For each post on the page it must check to see if there is a row that is excluding the post from being shown to the user - this moves really slow on a page that can be filled with 100-200 posts. It can take up to 2-4 seconds, especially when additional constraints are added to the query.
This also creates extremely large and complex queries that are just... awkward.
SELECT t.*
FROM posts t
WHERE ( (t.privacy_level = 3
AND t.owner_id = ?)
OR (t.privacy_level = 4
AND EXISTS
( SELECT i.id
FROM PostPrivacyIncludeFor i
WHERE i.user_id = ?
AND i.thought_id = t.id)
OR t.privacy_level = 4
AND t.owner_id = ?)
OR (t.privacy_level = 4
AND EXISTS
(SELECT i2.id
FROM PostPrivacyIncludeFor i2
WHERE i2.thought_id = t.id
AND EXISTS
(SELECT r.id
FROM FriendFilterIds r
WHERE r.list_id = i2.list_id
AND r.friend_id = ?))
OR t.privacy_level = 4
AND t.owner_id = ?)
OR (t.privacy_level = 1
AND EXISTS
(SELECT G.id
FROM Following G
WHERE follower_id = t.owner_id
AND following_id = ?
AND friend = 1)
OR t.privacy_level = 1
AND t.owner_id = ?)
OR (NOT EXISTS
(SELECT e.id
FROM PostPrivacyExcludeFrom e
WHERE e.thought_id = t.id
AND e.user_id = ?
AND NOT EXISTS
(SELECT e2.id
FROM PostPrivacyExcludeFrom e2
WHERE e2.thought_id = t.id
AND EXISTS
(SELECT l.id
FROM FriendFilterIds l
WHERE l.list_id = e2.list_id
AND l.friend_id = ?)))
AND t.privacy_level IN (0, 1, 4))
AND t.owner_id = ?
ORDER BY t.created_at LIMIT 100
(mock up query, similar to the query I use now in Doctrine ORM. It's a mess, but you get what I am saying.)
I guess my question is, how would you approach this situation to optimize it? Is there a better way to set up my database? I'm willing to completely scrap the method I have currently built up, but I wouldn't know what to move onto.
Thanks guys.
Updated: Fix the query to reflect the values I defined for privacy level above (I forgot to update it because I simplified the values)
Your query is too long to give a definitive solution for, but the approach I would follow is to simply the data lookups by converting the sub-queries into joins, and then build the logic into the where clause and column list of the select statement:
select t.*, i.*, r.*, G.*, e.* from posts t
left join PostPrivacyIncludeFor i on i.user_id = ? and i.thought_id = t.id
left join FriendFilterIds r on r.list_id = i.list_id and r.friend_id = ?
left join Following G on follower_id = t.owner_id and G.following_id = ? and G.friend=1
left join PostPrivacyExcludeFrom e on e.thought_id = t.id and e.user_id = ?
(This might need expanding: I couldn't follow the logic of the final clause.)
If you can get the simple select working fast AND including all the information needed, then all you need to do is build up the logic in the select list and where clause.
Had a quick stab at simplifying this without re-working your original design too much.
Using this solution your web page can now simply call the following stored procedure to get a list of filtered posts for a given user within a specified period.
call list_user_filtered_posts( <user_id>, <day_interval> );
The whole script can be found here : http://pastie.org/1212812
I haven't fully tested all of this and you may find this solution isn't performant enough for your needs but it may help you in fine tuning/modifying your existing design.
Tables
Dropped your post_privacy_exclude_from table and added a user_stalkers table which works pretty much like the inverse of user_friends. Kept the original post_privacy_includes_for table as per your design as this allows a user restrict a specific post to a subset of people.
drop table if exists users;
create table users
(
user_id int unsigned not null auto_increment primary key,
username varbinary(32) unique not null
)
engine=innodb;
drop table if exists user_friends;
create table user_friends
(
user_id int unsigned not null,
friend_user_id int unsigned not null,
primary key (user_id, friend_user_id)
)
engine=innodb;
drop table if exists user_stalkers;
create table user_stalkers
(
user_id int unsigned not null,
stalker_user_id int unsigned not null,
primary key (user_id, stalker_user_id)
)
engine=innodb;
drop table if exists posts;
create table posts
(
post_id int unsigned not null auto_increment primary key,
user_id int unsigned not null,
privacy_level tinyint unsigned not null default 0,
post_date datetime not null,
key user_idx(user_id),
key post_date_user_idx(post_date, user_id)
)
engine=innodb;
drop table if exists post_privacy_includes_for;
create table post_privacy_includes_for
(
post_id int unsigned not null,
user_id int unsigned not null,
primary key (post_id, user_id)
)
engine=innodb;
Stored Procedures
The stored procedure is relatively simple - it initially selects ALL posts within the specified period and then filters out posts as per your original requirements. I have not performance tested this sproc with large volumes but as the initial selection is relatively small it should be performant enough as well as simplifying your application/middle tier code.
drop procedure if exists list_user_filtered_posts;
delimiter #
create procedure list_user_filtered_posts
(
in p_user_id int unsigned,
in p_day_interval tinyint unsigned
)
proc_main:begin
drop temporary table if exists tmp_posts;
drop temporary table if exists tmp_priv_posts;
-- select ALL posts in the required date range (or whatever selection criteria you require)
create temporary table tmp_posts engine=memory
select
p.post_id, p.user_id, p.privacy_level, 0 as deleted
from
posts p
where
p.post_date between now() - interval p_day_interval day and now()
order by
p.user_id;
-- purge stalker posts (0,1,3,4)
update tmp_posts
inner join user_stalkers us on us.user_id = tmp_posts.user_id and us.stalker_user_id = p_user_id
set
tmp_posts.deleted = 1
where
tmp_posts.user_id != p_user_id;
-- purge other users private posts (3)
update tmp_posts set deleted = 1 where user_id != p_user_id and privacy_level = 3;
-- purge friend only posts (1) i.e where p_user_id is not a friend of the poster
/*
requires another temp table due to mysql temp table problem/bug
http://dev.mysql.com/doc/refman/5.0/en/temporary-table-problems.html
*/
-- the private posts (1) this user can see
create temporary table tmp_priv_posts engine=memory
select
tp.post_id
from
tmp_posts tp
inner join user_friends uf on uf.user_id = tp.user_id and uf.friend_user_id = p_user_id
where
tp.user_id != p_user_id and tp.privacy_level = 1;
-- remove private posts this user cant see
update tmp_posts
left outer join tmp_priv_posts tpp on tmp_posts.post_id = tpp.post_id
set
tmp_posts.deleted = 1
where
tpp.post_id is null and tmp_posts.privacy_level = 1;
-- purge filtered (4)
truncate table tmp_priv_posts; -- reuse tmp table
insert into tmp_priv_posts
select
tp.post_id
from
tmp_posts tp
inner join post_privacy_includes_for ppif on tp.post_id = ppif.post_id and ppif.user_id = p_user_id
where
tp.user_id != p_user_id and tp.privacy_level = 4;
-- remove private posts this user cant see
update tmp_posts
left outer join tmp_priv_posts tpp on tmp_posts.post_id = tpp.post_id
set
tmp_posts.deleted = 1
where
tpp.post_id is null and tmp_posts.privacy_level = 4;
drop temporary table if exists tmp_priv_posts;
-- output filtered posts (display ALL of these on web page)
select
p.*
from
posts p
inner join tmp_posts tp on p.post_id = tp.post_id
where
tp.deleted = 0
order by
p.post_id desc;
-- clean up
drop temporary table if exists tmp_posts;
end proc_main #
delimiter ;
Test Data
Some basic test data.
insert into users (username) values ('f00'),('bar'),('alpha'),('beta'),('gamma'),('omega');
insert into user_friends values
(1,2),(1,3),(1,5),
(2,1),(2,3),(2,4),
(3,1),(3,2),
(4,5),
(5,1),(5,4);
insert into user_stalkers values (4,1);
insert into posts (user_id, privacy_level, post_date) values
-- public (0)
(1,0,now() - interval 8 day),
(1,0,now() - interval 8 day),
(2,0,now() - interval 7 day),
(2,0,now() - interval 7 day),
(3,0,now() - interval 6 day),
(4,0,now() - interval 6 day),
(5,0,now() - interval 5 day),
-- friends only (1)
(1,1,now() - interval 5 day),
(2,1,now() - interval 4 day),
(4,1,now() - interval 4 day),
(5,1,now() - interval 3 day),
-- private (3)
(1,3,now() - interval 3 day),
(2,3,now() - interval 2 day),
(4,3,now() - interval 2 day),
-- filtered (4)
(1,4,now() - interval 1 day),
(4,4,now() - interval 1 day),
(5,4,now());
insert into post_privacy_includes_for values (15,4), (16,1), (17,6);
Testing
As I mentioned before I've not fully tested this but on the surface it seems to be working.
select * from posts;
call list_user_filtered_posts(1,14);
call list_user_filtered_posts(6,14);
call list_user_filtered_posts(1,7);
call list_user_filtered_posts(6,7);
Hope you find some of this of use.

PHP/Mysql: read data field value from lookup tables (split array)

I have 1 Mysql database with 2 tables:
DOCUMENTS
...
- staffID
.....
STAFF
- ID
- Name
The DOCUMENTS table assigns each document to a single or multiple users from the STAFF table therefore the staffID in the DOCUMENTS table consists of a comma separated array of staff ID's for example (2, 14).
I managed to split the array into individual values:
2
14
but rather than having the ID numbers I would like to have the actual names from the STAFF table - how can I achieve this. Any help would be greatly appreciated - please see my current code below.
$result = mysql_query("SELECT
organizations.orgName,
documents.docName,
documents.docEntry,
documents.staffID,
staff.Name,
staff.ID
FROM
documents
INNER JOIN organizations ON (documents.IDorg = organizations.IDorg)
INNER JOIN staff ON (documents.staffID = staff.ID)
")
or die(mysql_error());
while($row = mysql_fetch_array($result)){
$splitA = $row['staffID'];
$resultName = explode(',', $splitA );
$i=0;
for($i=0;$i<count($resultName);$i++)
{
echo "<a href='staffview.php?ID=".$row['docName'].
"'>". $resultName[$i]."</a><br>";
}
echo '<hr>';
}
It looks like your existing code might work where documents.staffID = staff.ID - that is where there is just a single staffID associated with the document?
You'd be better off adding a table to model the relationships between documents and staff separately from either, and removing or deprecating the staffID field in the documents table. You'd need something like
CREATE TABLE document_staff (
document_id <type>,
staff_id <type>
)
You can include compound indexes with ( document_id, staff_id ) and ( staff_id, document_id ) if you have lots of data and/or you want to traverse the relationship efficiently in both directions.
(You don't mention data types for your identity fields, but documents.staffID appears to be some sort of varchar based on what you say - perhaps you could use an integer type for these instead?)
But you can probably achieve what you want using the existing schema and the MySQL FIND_IN_SET function:
SELECT
organizations.orgName,
documents.docName,
documents.docEntry,
documents.staffID,
staff.Name,
staff.ID
FROM
documents
INNER JOIN organizations ON (documents.IDorg = organizations.IDorg)
INNER JOIN staff ON ( FIND_IN_SET( staff.ID, documents.staffID ) > 0 )
MySQL set types have limitations - maximum membership size of 64 for example - but may be sufficient for your needs.
If it was me though, I'd change the model rather than use FIND_IN_SET.
Thank you so much for you answer - greatly appreciated!
My table setup is:
DOCUMENTS:
CREATE TABLE documents (
docID int NOT NULL,
docTitle mediumblob NOT NULL,
staffID varchar(120) NOT NULL,
Author2 int,
IDorg int,
docName varchar(150) NOT NULL,
docEntry int AUTO_INCREMENT NOT NULL,
/* Keys */
PRIMARY KEY (docEntry)
) ENGINE = MyISAM;
STAFF:
CREATE TABLE staff (
ID int AUTO_INCREMENT NOT NULL,
Name varchar(60) NOT NULL,
Organization varchar(20),
documents varchar(150),
Photo mediumblob,
/* Keys */
PRIMARY KEY (ID)
) ENGINE = MyISAM;
The DOCUMENTS table reads via a lookup table (dropdown) from the STAFF table so that I can assign multiple staff members to a document. So I can access the staffID array in the DOCUMENTS table and split that and I wonder if there is a way to then associate the staffID with the staff.Name and print out the staff Name rather than the ID in the results of the query. Thanks again!

Removing duplicate field entries in SQL

Is there anyway I can erase all the duplicate entries from a certain table (users)? Here is a sample of the type of entries I have. I must say the table users consists of 3 fields, ID, user, and pass.
mysql_query("DELETE FROM users WHERE ???") or die(mysql_error());
randomtest
randomtest
randomtest
nextfile
baby
randomtest
dog
anothertest
randomtest
baby
nextfile
dog
anothertest
randomtest
randomtest
I want to be able to find the duplicate entries, and then delete all of the duplicates, and leave one.
You can solve it with only one query.
If your table has the following structure:
CREATE TABLE `users` (
`id` int(10) unsigned NOT NULL auto_increment,
`username` varchar(45) NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=MyISAM AUTO_INCREMENT=8 DEFAULT CHARSET=latin1;
you could do something like that (this will delete all duplicate users based on username with and ID greater than the smaller ID for that username):
DELETE users
FROM users INNER JOIN
(SELECT MIN(id) as id, username FROM users GROUP BY username) AS t
ON users.username = t.username AND users.id > t.id
It works and I've already use something similar to delete duplicates.
You can do it with three sqls:
create table tmp as select distinct name from users;
drop table users;
alter table tmp rename users;
This delete script (SQL Server syntax) should work:
DELETE FROM Users
WHERE ID NOT IN (
SELECT MIN(ID)
FROM Users
GROUP BY User
)
I assume that you have a structure like the following:
users
-----------------
| id | username |
-----------------
| 1 | joe |
| 2 | bob |
| 3 | jane |
| 4 | bob |
| 5 | bob |
| 6 | jane |
-----------------
Doing the magic with temporary is required since MySQL cannot use a sub-select in delete query that uses the delete's target table.
CREATE TEMPORARY TABLE IF NOT EXISTS users_to_delete (id INTEGER);
INSERT INTO users_to_delete (id)
SELECT MIN(u1.id) as id
FROM users u1
INNER JOIN users u2 ON u1.username = u2.username
GROUP BY u1.username;
DELETE FROM users WHERE id NOT IN (SELECT id FROM users_to_delete);
I know the query is a bit hairy but it does the work, even if the users table has more than 2 columns.
You need to be a bit careful of how the data in your table is used. If this really is a users table, there is likely other tables with FKs pointing to the ID column. In which case you need to update those tables to use ID you have selected to keep.
If it's just a standalone table (no table reference it)
CREATE TEMPORARY TABLE Tmp (ID int);
INSERT INTO Tmp SELECT ID FROM USERS GROUP BY User;
DELETE FROM Users WHERE ID NOT IN (SELECT ID FROM Tmp);
Users table linked from other tables
Create the temporary tables including a link table that holds all the old id's and the respective new ids which other tables should reference instead.
CREATE TEMPORARY TABLE Keep (ID int, User varchar(45));
CREATE TEMPORARY TABLE Remove (OldID int, NewID int);
INSERT INTO Keep SELECT ID, User FROM USERS GROUP BY User;
INSERT INTO Remove SELECT u1.ID, u2.ID FROM Users u1 INNER JOIN Keep u2 ON u2.User = u1.User WHERE u1.ID NOT IN (SELECT ID FROM Users GROUP BY User);
Go through any tables which reference your users table and update their FK column (likely called UserID) to point to the New unique ID which you have selected, like so...
UPDATE MYTABLE t INNER JOIN Remove r ON t.UserID = r.OldID
SET t.UserID = r.NewID;
Finally go back to your users table and remove the no longer referenced duplicates:
DELETE FROM Users WHERE ID NOT IN (SELECT ID FROM Keep);
Clean up those Tmp tables:
DROP TABLE KEEP;
DROP TABLE REMOVE;
A very simple solution would be to set an UNIQUE index on the table's column you wish to have unique values. Note that you subsequently cannot insert the same key twice.
Edit: My mistake, I hadn't read that last line: "I want to be able to find the duplicate entries".
I would get all the results, put them in an array of IDs and VALUES. Use a PHP function to work out the dupes, log all the IDs in an array, and use those values to delete the records.
I don't know your db schema, but the simplest solution seems to be to do SELECT DISTINCT on that table, keep the result in a variable (i.e. array), delete all records from the table and then reinsert the list returne by SELECT DISTINCT previously.
The temporary table is an excellent solution, but I'd like to provide a SELECT query that grabs duplicate rows from the table as an alternative:
SELECT * FROM `users` LEFT JOIN (
SELECT `name`, COUNT(`name`) AS `count`
FROM `users` GROUP BY `name`
) AS `grouped`
WHERE `grouped`.`name` = `users`.`name`
AND `grouped`.`count`>1
Select your 3 columns as per your table structure and apply condition as per your requirements.
SELECT user.userId,user.username user.password FROM user As user
GROUP BY user.userId, user.username
HAVING (COUNT(user.username) > 1));
Every answer above and/or below didn't work for me, therefore I decided to write my own little script. It's not the best, but it gets the job done.
Comments are included throughout, but this script is customized for my needs, and I hope the idea helps you.
I basically wrote the database contents to a temp file, called the temp file, applied the function to the called file to remove the duplicates, truncated the table, and then input the data right back into the SQL. Sounds like a lot, I know.
If you're confused as to what $setprofile is, it's a session that's created upon logging into my script (to establish a profile), and is cleared upon logging out.
<?php
// session and includes, you know the drill.
session_start();
include_once('connect/config.php');
// create a temp file with session id and current date
$datefile = date("m-j-Y");
$file = "temp/$setprofile-$datefile.txt";
$f = fopen($file, 'w'); // Open in write mode
// call the user and pass via SQL and write them to $file
$sql = mysql_query("SELECT * FROM _$setprofile ORDER BY user DESC");
while($row = mysql_fetch_array($sql))
{
$user = $row['user'];
$pass = $row['pass'];
$accounts = "$user:$pass "; // the white space right here is important, it defines the separator for the dupe check function
fwrite($f, $accounts);
}
fclose($f);
// **** Dupe Function **** //
// removes duplicate substrings between the seperator
function uniqueStrs($seperator, $str) {
// convert string to an array using ' ' as the seperator
$str_arr = explode($seperator, $str);
// remove duplicate array values
$result = array_unique($str_arr);
// convert array back to string, using ' ' to glue it back
$unique_str = implode(' ', $result);
// return the unique string
return $unique_str;
}
// **** END Dupe Function **** //
// call the list we made earlier, so we can use the function above to remove dupes
$str = file_get_contents($file);
// seperator
$seperator = ' ';
// use the function to save a unique string
$new_str = uniqueStrs($seperator, $str);
// empty the table
mysql_query("TRUNCATE TABLE _$setprofile") or die(mysql_error());
// prep for SQL by replacing test:test with ('test','test'), etc.
// this isn't a sufficient way of converting, as i said, it works for me.
$patterns = array("/([^\s:]+):([^\s:]+)/", "/\s++\(/");
$replacements = array("('$1', '$2')", ", (");
// insert the values into your table, and presto! no more dupes.
$sql = 'INSERT INTO `_'.$setprofile.'` (`user`, `pass`) VALUES ' . preg_replace($patterns, $replacements, $new_str) . ';';
$product = mysql_query($sql) or die(mysql_error()); // put $new_str here so it will replace new list with SQL formatting
// if all goes well.... OR wrong? :)
if($product){ echo "Completed!";
} else {
echo "Failed!";
}
unlink($file); // delete the temp file/list we made earlier
?>
This will work:
create table tmp like users;
insert into tmp select distinct name from users;
drop table users;
alter table tmp rename users;
If you have a Unique ID / Primary key on the table then:
DELETE FROM MyTable AS T1
WHERE MyID <
(
SELECT MAX(MyID)
FROM MyTable AS T2
WHERE T2.Col1 = T1.Col1
AND T2.Col2 = T1.Col2
... repeat for all columns to consider duplicates ...
)
if you don't have a Unique Key select all distinct values into a temporary table, delete all original rows, and copy back from temporary table - but this will be problematic if you have Foreign Keys referring to this table

Categories