How to find duplicates in mysql table using PHP? - php

I have a table with customer info. Normally, the PHP checks for duplicates before they new rows are inserted. However, I had to dump a lot of older rows manually and now that they are all in my table, I need to check for duplicates.
Example rows:
id, name, email, phone, fax
I would like to do a mysql query that will show all ID's with matching emails. I can modify the query later for phone, fax, etc.
I have a feeling I will be using DISTINCT, but I am not quite sure how it's done.

You can GROUP BY email with HAVING COUNT(*) > 1 to find all duplicate email addresses, then join the resulting duplicate emails with your table to fetch the ids:
SELECT id FROM my_table NATURAL JOIN (
SELECT email FROM my_table GROUP BY email HAVING COUNT(*) > 1
) t

Related

delete multiple entry in mysql but keep one php and mysql

i having a problem in cleaning data from the entry in database.
so i have a table "users" with the field (id, name, email, phone)
the problem i have is to delete the duplication based on the phone number. i have around 30k of data entry and i need to make sure that each of the data in the table must consist a record with different "phone number" bcoz right now what i have is
(example:
3 same people with the same phone number
name: Phone No:
john 1234
john 1234
john 1234
i only need to keep one record with one phone number.
is there any php script than can work on this case faster.hope you guys can help me.
You can delete with a join. In this example you keep the lowest users.id.
DELETE t1 FROM users AS t1
LEFT JOIN
(SELECT MIN(id) AS min_id FROM users GROUP BY phone) AS t2
ON t1.id = t2.min_id WHERE t2.min_id IS NULL;
You can use ALTER TABLE
ALTER IGNORE TABLE users
ADD UNIQUE INDEX p (phone);
This will drop all the duplicate rows and doesn't allow future INSERT with the same phone value.

Issue with SQL Join & Exclude Query

I have a set of queries that I am trying to run but I am having issues getting them to run together.
My set up is as follows with column names in parantheses:
Table 1 (Email / Date)
Table 2 (Email / Date_Submitted)
I have written 3 queries which each work perfectly, independent of each other, but I cannot seem to figure out how to connect them.
Query 1 - Distinct Emails from Table 1 (rfi_log)
SELECT DISTINCT email, date_submitted
FROM rfi_log
WHERE date_submitted BETWEEN '[start_date]' AND '[end_date]'
Query 2 - Distinct Emails from Table 2 (masterstudies)
SELECT DISTINCT email
FROM orutrimdb.mastersstudies
WHERE date BETWEEN '[start_date]' AND '[end_date]'
Query 3 - Join Query looking for duplicate emails from Table 1 & Table 2
SELECT rfi_log.email as emails, orutrimdb.mastersstudies.email
FROM rfi_log
CROSS JOIN orutrimdb.mastersstudies
ON orutrimdb.mastersstudies.email=rfi_log.email
WHERE date_submitted BETWEEN '[start_date]' AND '[end_date]';
My issue now is that I need to combine these queries by some fashion so that I can get a count of DISTINCT emails from both tables during the date range while EXCLUDING the emails identified from Query 3.
I need the following:
Query 3 = Count of Distinct Emails
Query 2 = Count of Distinct Emails (not identified in Query 3)
Query 1 = Count of Distinct Emails (not identified in Query 3)
Ultimately I need to get a total count of distinct emails during the date range that is "de-duplicated" since there are duplicates located in both tables.
How can this be accomplished?
One method for doing this is union all with aggregation. The following gets duplication information about each email:
select email, sum(isrfi) as numrfi, sum(isms) as numms
from ((select email, 1 as isrfi, 0 as isms
from rfilog
) union all
(select email, 0, 1
from orutrimdb.mastersstudies
)
) e
group by email;
An aggregation on top gives you the information you are looking for:
select numrfi, numms, count(*), min(email), max(email)
from (select email, sum(isrfi) as numrfi, sum(isms) as numms
from ((select email, 1 as isrfi, 0 as isms
from rfilog
) union all
(select email, 0, 1
from orutrimdb.mastersstudies
)
) e
group by email
) e
group by numrfi, numms;
Note that this also finds duplicates within a single table.

Comparing sql tables and

I want to know how to compare two tables, and if they have the same values show them.
My original table is user_information, and there are 30 other tables with different names, but all of them have
the same columns which are email, name and website.
How to compare user_information table with all the other 30 tables automatically and that includes any new table I will add later.
What you need is basically this (sql-server):
Select email, name, website from table1
intersect
Select email, name, website from table2 ....
And so on.
Should do the trick.
If you're running the query against a MySQL database:
SELECT DISTINCT email, name, website FROM table1
INNER JOIN table2
USING (email, name, website);

PHP/MySQL: Sorting Results a Different Way

I have to table, contacts and groups.
Inside of contacts I have 3 columns: id first_name last_last
Inside groups I have 2 columns: id linked
I want to put contacts into groups. So I thought the best way would be that when you added a contact to a group it would add that contacts ID to a list of contact ID's separated by commas in the linked column.
So if you added contact 1, 4, 14, and 24 to a specific group the linked column would have this value 1,4,14,24
Now I want to display the contacts in that group. So I explode() the linked value and then use a foreach() to cycle through the exploded array and on each loop select through MySQL the first name and last name of that contact and display it.
All of that works fine. My problem is I realized I want to sort the contacts being displayed in that group. But they display in the order they were placed in the group.
I want to be able to sort by first name ascending or descending but I have no idea how because I'm selecting each contact in a foreach() and echoing it.
P.S. I thought maybe make a column on the contact called groups and store all the groups that contact is in the same way I stored all the contacts that were in a group. So if the contact was in groups 1, 2, 4, and 6 then that contacts groups column would have a value of 1,2,4,6. The only problem is, lets say I'm search for all contacts in group 6, how do I search though that list in a MySQL statement? I thought the best way would be SELECT first_name FROM contacts WHERE groups LIKE '%6%' but then what if a contact is in group 60? Wouldn't it select that contact too even if it isn't in the right group?
Don't store multiple values in the same field in a database! As you have observed, you will get problems when you want to handle each value separately.
In your case, insert one row in the groups table for each group the person belongs to. The join the two tables to get the information in the way you want.
E.g: to find which groups Santa Claus blongs to:
SELECT groups.id
FROM contacts
INNER JOIN groups
ON groups.linked = contacts.id
WHERE firstname = 'Santa' AND lastname = 'Claus'
Or to find who is in group 1:
SELECT firstname, lastname
FROM contacts
INNER JOIN groups
ON groups.linked = contacts.id
WHERE groups.id = 1
ORDER BY lastname, firstname
To get the results as comma separated lists, you can use the GROUP_CONCAT function:
SELECT groups.id,
GROUP_CONCAT(firstname, ' ', lastname
ORDER BY lastname SEPARATOR ', ') AS groups
FROM contacts
INNER JOIN groups
ON groups.linked = contacts.id
GROUP BY groups.id
In order to store group names, you may benefit from using 3 tables
Contact (id, firstname, lastname)
Group (id, groupname)
Group_membership (contact_id, group_id)

"Conditional" Join Based on Field Value?

The site I'm working on has 3 different types of users: admin, applicants, reviewers. Each of these groups will have some basic info that will need to be stored (name, id, email, etc) as well as some data that is unique to each. I have created a users table as well as a table for each of the specific groups to store their unique data.
users: id, f_name, l_name, email, user_type
users_admin: id, user_id, office, emp_id
users_applicant: id, user_id, dob, address
users_reviewer: id, user_id, active_status, address, phone
If a user with user_type of "1" (applicant) logs in I will need to JOIN to the users_applicants table to retrieve their full record. I tried using a UNION but my tables have vastly different columns.
Is there a way to, based on a user's type, write a conditional query that will JOIN to the correct table? Am I going about this completely the wrong way?
Thank's in advance for your help!
Well, in the end your tables are already flawed. Why even have a table for each type? Why not put all those fields into the users table, or maybe a user_details table (if you really want an extra table for non-general data fields)? Currently, you're actually creating 4 independent user tables from a relational point of view.
So why do the type-tables have a surrogate key? Why isn't the user_id already the (only) primary key?
If you changed that, all you would need is the user id to retrieve the data you want, and you've already got that (or you wouldn't even be able to retrieve the user type).
Either you do it programmatically, or you can do this with a series of CASEs and LEFT JOINs.
For simplicity's sake let's do this with a table users where you can have a user of type 1 (normal user), 2 (power user) or 3 (administrator). Normal users have an email but no telephone, power users have an address and a field dubbed "superpower", and administrators have a telephone number and nothing else.
Since you want to use the same SELECT for all, of course you need to place all these in your SELECT:
SELECT user.id, user.type, email, address, superpower, telephone
and you will then need to LEFT JOIN to recover these
FROM user
LEFT JOIN users_data ON (user.id = users_data.user_id)
LEFT JOIN power_data ON (user.id = power_data.user_id)
LEFT JOIN admin_info ON (user.id = admin_info.user_id)
Now the "unused" fields will be NULL, but you can supply defaults:
SELECT
CASE WHEN user.type = 0 THEN email ELSE 'nobody#nowhere.com' END AS email,
CASE WHEN user.type = 1 OR user.type = 2 THEN ... ELSE ... END as whatever,
...
Specific WHERE conditions you can put in the JOIN itself, e.g. if you only want administrators from the J sector, you can use
LEFT JOIN admin_info ON (user.id = admin_info.user_id AND admin_info.sector = 'J')
The total query time should not be too bad, seeing as most of the JOINs will return little (and, if you specify a user ID, they will actually return nothing very quickly).
You could also do the same using a UNION, which would be even faster:
SELECT user.id, 'default' AS email, 'othermissingfield' AS missingfieldinthistable,
... FROM user JOIN user_data ON (user.id = user_data.user_id)
WHERE ...
UNION
SELECT user.id, email, 'othermissingfield' AS missingfieldinthistable,
... FROM user JOIN power_data ON (user.id = power_data.user_id)
WHERE ...
UNION
...
Now, if you specify the user ID, all queries except one will fail very fast. Each query has the same WHERE repeated plus any table-specific conditions. The UNION version is less maintainable (unless you generate it programmatically), but ought to be marginally faster.
In all cases, you'll be well advised in keeping updated indexes on the appropriate fields.
Instead i will suggest you reconstruct you tables structure like this.
Create a table
users_types :
id
type
Then create another table users with a foreign key
users :
id
f_name
l_name
email
office
emp_id
dob
address
active_status
phone
users_types_id
And now when you need to insert data insert null in the fields which are not required for a particular user. And you can simply fetch records on the basis of id. Also using left join will give you the name of user type.

Categories