I have a MySQL database with 1000s of personnel records, often with duplicates.
For each case with at least one duplicate I want to be able to delete all of the duplicates but one, then update any references to those deleted foreign keys with the one I did not.
For example, we see two instances of Star Lord below:
+-----------------------+
| `users` |
+------+----------------+
| id | name |
+------+----------------+
| 1 | Star Lord |
+------+----------------+
| 2 | Star Lord |
+------+----------------+
| 3 | Iron Man |
+------+-----+----------+
+-----------------------+
| `messages` |
+------+-----+----------+
| from | to | text |
+------+-----+----------+
| 1 | 5 | hi |
+------+-----+----------+
| 2 | 5 | how r u |
+------+-----+----------+
| 5 | 2 | Good, u? |
+------+-----+----------+
Those two tables should become:
+-----------------------+
| `users` |
+------+----------------+
| id | name |
+------+----------------+
| 1 | Star Lord |
+------+----------------+
| 3 | Iron Man |
+------+-----+----------+
+-----------------------+
| `messages` |
+------+-----+----------+
| from | to | text |
+------+-----+----------+
| 1 | 5 | hi |
+------+-----+----------+
| 1 | 5 | how r u |
+------+-----+----------+
| 5 | 1 | Good, u? |
+------+-----+----------+
Can this be done? I'm happy to use PHP as needed.
I found the following, but it's only for finding foreign key usage, not replacing instances for specific key values: MySQL: How to I find all tables that have foreign keys that reference particular table.column AND have values for those foreign keys?
Bonus Points
There may be additional data which needs to be merged in the users table. For example, Star Lord with ID #1 might have a phone field filled in, but Star Lord with ID #2 has an email field.
Worst case: they both have a field, with conflicting data.
I suggest:
Create a table of correct data. A good starting point might be:
CREATE TABLE users_new LIKE users;
ALTER TABLE users_new ADD UNIQUE (name);
INSERT INTO users_new
(id, name, phone, email)
SELECT MIN(id), name, GROUP_CONCAT(phone), GROUP_CONCAT(email)
FROM users
GROUP BY name;
Note that, due to your "worst case" observation under "Bonus Points", you may well want to manually verify the contents of this table before archiving the underlying users data (I advise against permanent deletion, just in case).
Update existing foreign relationships:
UPDATE messages
JOIN (users uf JOIN users_new unf USING (name)) ON uf.id = messages.from
JOIN (users ut JOIN users_new unt USING (name)) ON ut.id = messages.to
SET messages.from = unf.id,
messages.to = unt.id
If you have a lot of tables to update, you could cache the results of the join between users and users_new—either:
in a new_id column within the old users table:
ALTER TABLE users ADD new_id BIGINT UNSIGNED;
UPDATE users JOIN users_new USING (name)
SET users.new_id = users_new.id;
UPDATE messages
JOIN users uf ON uf.id = messages.from
JOIN users ut ON ut.id = messages.to
SET messages.from = uf.new_id,
messages.to = ut.new_id;
or else in a new (temporary) table:
CREATE TEMPORARY TABLE newid_cache (
PRIMARY KEY(old_id),
KEY(old_id, new_id)
) ENGINE=MEMORY
SELECT users.id AS old_id, users_new.id AS new_id
FROM users JOIN users_new USING (name);
UPDATE messages
JOIN newid_cache nf ON nf.old_id = messages.from
JOIN newid_cache nt ON nt.old_id = messages.to
SET messages.from = nf.new_id,
messages.to = nt.new_id;
Either replace users with users_new, or else modify your application to use the new table in place of the old one.
ALTER TABLE users RENAME TO users_old;
ALTER TABLE users_new RENAME TO users;
Update any foreign key constraints as appropriate.
I like to be really methodical about this, while you could write it all in one complex query, that's an optimisation, and unless it's obvious, an unnecessary one.
First backup your database :)
Create a table to hold the ids of the users you are going to keep.
Fill it with say
Insert into Keepers Select keep_id From (Select Min(id) as keep_id,`name` From `users`)
After that it's just some update with joins.
e.g.
UPDATE
`messages` m JOIN
keepers k
ON k.keeper_id = m.from
SET m.from = k.keeper_id
UPDATE
`messages` m JOIN
keepers k
ON k.keeper_id = m.to
SET m.to = k.keeper_id
Then get rid of the users you don't want
Delete `users`
from `users` u
outer join keepers on k.keeper_id = u.id
where i.id is null
When all is good e.g you have the same number of messages as you started with, no one is talking to themselves etc.
Delete the keepers table.
Syntax not checked, but it should be close.
Related
I have two arrays like this,
First Table (infos):
------------------------------
| ID | User1 | User2 | User3 |
------------------------------
| 1 | 20 | 30 | 12 |
------------------------------
Second Table (Users):
---------------------
| ID | Name | Email |
---------------------
| 12 | Test | Test# |
---------------------
| 20 | Bla | Test# |
---------------------
| 30 | Bate | Test# |
---------------------
I want to get the information of users on one row from the IDs on the first table.
I try by getting The row from the first table and fetching on users, but I want to optimize the function with just one Query.
SELECT * FROM infos;
SELECT * FROM Infos i,Users u WHERE u.ID = u.User1 (or 2 ...)
Is there any solution ?
You could use joining the table users 3 times, one for each userid you want show the related name (or other values):
select a.id
, a.user1
, b.Name as user1name
, a.user2
, c.name as user2name
, a.user3
, d.name as user3name
from infos a
inner join Users b on a.user1 = b.id
inner join Users c on a.user1 = c.id
inner join Users d on a.user1 = d.id
And just as suggested, you should not use old implicit join syntax based on comma-separated table names and where clause, you should use (since 1992) explicit joins. This syntax performs the same query, but is more clear.
This is a design error. Use a N:N relation (an additional table) to allow any number of users for the first table. With the relation, other queries will be easier.
A relation table looks like this:
create table relation
(
table1_id int unsigned not NULL,
table2_id int unsigned not NULL,
primary key(table1_id,table2_id)
);
A typical query (and I dislike a.* generally):
select a.*, b.*
from table1 a, table2 b, relation r
where r.table1_id = a.id
&& r.table2_id = b.id
Let me start by saying this should be a relatively simple problem which is / was made unnecessary complicated by bad Database design (not by me) that said im also no expert in mysql.
Consider the following
Table Schedule
Note how the columns homeID and visitorID contains the names of the teams and not the actual teamID's
In a bid to fix this I created a new table with columns containing teamID AND teamName as can be seen by below image.
Table Teams
My Problem(s)
I must get the teamID from table Teams for BOTH home team AND away team
So I created the Teams table and this simple script:
SELECT schedule.*, teams.*
FROM schedule
JOIN teams ON schedule.homeID = teams.teamName OR schedule.visitorID = teams.teamName
WHERE schedule.gameID = 411
LIMIT 1 #added Limit1 else the code generates to rows
Output of mysql Script
Limit 1
Notice above how teamID is only generated for 1 team with Limit 1
No Limit Statement (Double Iteration)
Notice above how teamID can get retrieved for BOTH teams. Problem is its doing a double iteration.
TLDR; The above presents the following problems
Firstly the script will generate two outputs one for home team and once for away team. As to be expected however I cant have that.
As a workaround to Problem number 1 -- I added Limit 1 the problem I get with Limit though is that it only gives back a single teamID (as to be expected, I guess)
Question
How can I get BOTH teamID's from table teams with a single iteration? Hope this make sense....
Extra
A demo of application with hard coded team names looks like this (just to give an idea of what they are trying to achieve)
Sounds like you want to join teams twice to schedule.
SELECT s.*,
th.*,
ta.*
FROM schedule s
INNER JOIN teams th
ON s.homeid = th.teamname
INNER JOIN teams ta
ON s.visitorid = ta.teamname
WHERE s.gameid = 411;
I guess that you want to show both team in one row instead of two rows.
If yes, then you need to join the table teams twice.
Consider this demo: http://www.sqlfiddle.com/#!9/bb5e61/1
This join will collect both teams into one row:
SELECT s.*,
t1.teamId as homeId_teamId,
t1.teamCode as homeId_teamCode,
t1.teamName as homeId_teamName,
t2.teamId as visitorId_teamId,
t2.teamCode as visitorId_teamCode,
t2.teamName as visitorId_teamName
FROM Schedule s
JOIN Teams t1 ON s.homeId = t1.teamName
JOIN Teams t2 ON s.visitorId = t2.teamName;
| id | homeId | visitorId | homeId_teamId | homeId_teamCode | homeId_teamName | visitorId_teamId | visitorId_teamCode | visitorId_teamName |
|----|--------|-----------|---------------|-----------------|-----------------|------------------|--------------------|--------------------|
| 1 | Poland | Colombia | 1 | PL | Poland | 2 | CO | Colombia |
However you can also consider LEFT joins instead on INNER joins, which will work in a case where there is no relevant data in the TEAMS table:
SELECT s.*,
t1.teamId as homeId_teamId,
t1.teamCode as homeId_teamCode,
t1.teamName as homeId_teamName,
t2.teamId as visitorId_teamId,
t2.teamCode as visitorId_teamCode,
t2.teamName as visitorId_teamName
FROM Schedule s
LEFT JOIN Teams t1 ON s.homeId = t1.teamName
LEFT JOIN Teams t2 ON s.visitorId = t2.teamName;
| id | homeId | visitorId | homeId_teamId | homeId_teamCode | homeId_teamName | visitorId_teamId | visitorId_teamCode | visitorId_teamName |
|----|----------|-----------|---------------|-----------------|-----------------|------------------|--------------------|--------------------|
| 1 | Poland | Colombia | 1 | PL | Poland | 2 | CO | Colombia |
| 3 | Ya Majka | Poland | (null) | (null) | (null) | 1 | PL | Poland |
| 2 | Ya Majka | Rossija | (null) | (null) | (null) | (null) | (null) | (null) |
Here are the scripts that make up the tables from the examples
CREATE TABLE Schedule(
id int, homeId varchar(20),visitorId varchar(20)
);
INSERT INTO Schedule VALUES
(1, 'Poland', 'Colombia' ),(2,'Ya Majka','Rossija'),
(3,'Ya Majka','Poland');
CREATE TABLE Teams(
teamId int, teamCode varchar(10), teamName varchar(20)
);
INSERT INTO Teams VALUES
(1, 'PL', 'Poland' ),(2,'CO','Colombia'),(3,'US','United States');
You can use a subquery (two of them in the same query) to solve this:
select
gameID,
weekNum,
gameTimeEastern,
(select teamName from teams where teamID = schedule.homeID) as homeName,
homeScore,
(select teamName from teams where teamID = schedule.visitorID) as visitorName,
visitorScore from schedule;
This doesn't get all the columns from schedule, just an example to show how it works. If you need various queries (including select *, though this isn't a good practice except for testing), you could create a view based on a query like the above (with ALL columns from schedule, except homeID and visitorID that get replaced with sub-queries from the teams table). Then you can place queries against that view - and they will work like the original table where you had team names directly in it.
I have two table name users and users_images. Both table have the value of userId. like
My user table
| userId | userName | user_address |
| 2 | John | CN-2, UK |
| 3 | Amit | India |
| 4 | David | Us |
| 5 | Shan | Canada |
.
.
...... and so on
| 125000 | Naved | Ukran |
**and my images table contain userid and Image name.
Now I want to merge ImageName field to user table without using any loop (I want to do it with single query (I have millions of records and I will have to do it many times to create temorary table) )
update users u
set
u.imageName = (
select imageName
from users_images i
where i.userid = u.userid GROUP BY u.userId )
you could use ON DUPLICATE KEY
for instance:
INSERT INTO table (a,b,c) VALUES (1,2,3)
ON DUPLICATE KEY UPDATE c=c+1;
I think you can use Update for this like:
UPDATE Users
SET ImageName =
(SELECT ImageName
FROM UserImages
WHERE UserImages.UserID = Uers.UserID)
Please take a backup of your database first
I need to select a list of entries, but need to skip those that have matching fields in 2 different tables.
Here's my DB structure
orders:
| orders_id | customers_id |
| 100 | 01 |
| 101 | 20 |
| 102 | 32 |
| 103 | 48 |
| 104 | 99 |
customers (for reference only):
| firstname | lastname | customers_id |
| John | Doe | 20 |
| Fred | Flinty | 22 |
| Mark | Smith | 32 |
testimonials:
| customers_id | testimonial |
| 20 | aaa |
| 32 | bbb |
| 38 | ccc |
| 49 | ddd |
| 55 | eee |
So, I need to select all customers who are in my Orders table, but need to skip them if they are in my Testimonials table. In the example shown above, I would need to select only customers 01, 48 and 99 because they don't exist in Testimonials table.
This is what I tried, but am obviously missing something:
SELECT c.firstname, c.lastname, c.customers_id, o.orders_id,
o.customers_id, s.date_added as status_date
FROM (orders o, testimonials t )
JOIN customers c
ON c.customers_id = o.customers_id
JOIN status_history s
ON s.orders_id = o.orders_id
and s.orders_status_id = o.orders_status
and o.customers_id != t.customer_id
order by o.orders_id ASC;
Can somebody please tell me what I'm doing wrong and how to skip customers that are found in both tables (orders and testimonials)?
I feel I'm on the right track because, if I change the and o.customers_id != t.customer_id to and o.customers_id = t.customer_id I get only the customers that are in both tables (in this case, 20 and 32).
You can LEFT JOIN on this.
The reason for using LEFT JOIN is because it will show all records from the table defined on lefthand side whether it has a matching record or not on the table define on the righthand side. When table Orders is joined with table testimonials, all the records that have no match will have a value of null for the columns in the testimonials table and that's the one you are looking for. To filter out, we only need to get records with NULL value by checking the column with IS NULL.
SELECT a.*, b.*
FROM orders a
LEFT JOIN testimonials c
ON a.customers_ID = c.customers_ID
LEFT JOIN customers b
ON a.customers_ID = b.customers_ID
WHERE c.customers_ID IS NULL
SQLFiddle Demo
SQLFiddle Demo (added some info on the mismatched customer)
To further gain more knowledge about joins, kindly visit the link below:
Visual Representation of SQL Joins
Adding up INDEX.
If on the real database the Orders table as well as Testimonials are always dependent on Customers table, then a FOREIGN KEY constraint should be enforce to preserve referential integrity.
Here's how:
ALTER TABLE Orders ADD CONSTRAINT tb_fk1
FOREIGN KEY (Customers_ID) REFERENCES Customers(Customers_ID);
ALTER TABLE Testimonials ADD CONSTRAINT tb_fk2
FOREIGN KEY (Customers_ID) REFERENCES Customers(Customers_ID);
This is easy way.
select c.* from order as o
join customers as c on o.customers_id = c.customers_id
where o.customers_id not in(select customers_id from testimonials)
I Have 2 Tables, One For New Pictures and One For New Users, i want to create like a wall that mixes the latest actions so it'll show new users & pictures ordered by date.
What i want is a single query and how to know inside the loop that the current entry is a photo or user.
TABLE: users
Columns: id,username,fullname,country,date
TABLE: photos
Columns: id,picurl,author,date
Desired Output:
Daniel from California Has just registred 5mins ago
New Picture By David ( click to view ) 15mins ago
And so on...
I'm begging you to not just give me the query syntax, i'm not pro and can't figure out how to deal with that inside the loop ( i only know how to fetch regular sql queries )
Thanks
You could use an union:
SELECT concat(username, " from ", country, " has just registered") txt, date FROM users
UNION
SELECT concat("New picture By ", username, " (click to view)") txt, date FROM photos INNER JOIN users ON author=users.id
ORDER BY date DESC
LIMIT 10
This assumes that author column in photos corresponds to the users table id. If author actually is a string containing the user name (which is a bad design), you'll have to do this instead:
SELECT concat(username, " from ", country, " has just registered") txt, date FROM users
UNION
SELECT concat("New picture By ", author, " (click to view)") txt, date FROM photos
ORDER BY date DESC
LIMIT 10
Make sure you have an index on date in both tables, or this will be very inefficient.
I've put together this little example for you to look at - you might find it helpful.
Full script can be found here : http://pastie.org/1279954
So it starts with 3 simple tables countries, users and user_photos.
Tables
Note: i've only included the minimum number of columns for this demo to work !
drop table if exists countries;
create table countries
(
country_id tinyint unsigned not null auto_increment primary key,
iso_code varchar(3) unique not null,
name varchar(255) unique not null
)
engine=innodb;
drop table if exists users;
create table users
(
user_id int unsigned not null auto_increment primary key,
country_id tinyint unsigned not null,
username varbinary(32) unique not null
-- all other detail omitted
)
engine=innodb;
drop table if exists user_photos;
create table user_photos
(
photo_id int unsigned not null auto_increment primary key,
user_id int unsigned not null,
-- all other detail omitted
key (user_id)
)
engine=innodb;
The important thing to note is that the primary keys of users and photos are unsigned integers and auto_increment (1,2,3..n) so I can find the latest 10 users and 10 photos by ordering by their primary keys (PK) descending and add a limit clause to restrict the number of rows returned.
-- change limit to increase rows returned
select * from users order by user_id desc limit 2;
select * from user_photos order by photo_id desc limit 2;
Test Data
insert into countries (iso_code, name) values ('GB','Great Britain'),('US','United States'),('DE','Germany');
insert into users (username, country_id) values ('f00',1),('bar',2),('stack',1),('overflow',3);
insert into user_photos (user_id) values (1),(1),(2),(3),(1),(4),(2),(1),(4),(2),(1);
So now we need a convenient way (single call) of selecting the latest 10 users and photos. The two tables are completely different so a union isnt going to be the best approach so what we'll do instead is write a stored procedure that returns two resultsets and handle generating the wall (merge resultsets) in our php script.
Stored procedure
Just a wrapper around some SQL code - think of it like SQL's version of a function call
drop procedure if exists list_latest_users_and_photos;
delimiter #
create procedure list_latest_users_and_photos()
begin
-- last 10 users
select
'U' as type_id, -- integer might be better
u.user_id,
u.country_id,
u.username,
-- other user columns...
c.name as country_name
from
users u
inner join countries c on u.country_id = c.country_id
order by
u.user_id desc limit 10;
-- last 10 photos
select
'P' as type_id,
up.photo_id,
up.user_id,
-- other photo columns...
u.username
-- other user columns...
from
user_photos up
inner join users u on up.user_id = u.user_id
order by
up.photo_id desc limit 10;
end #
delimiter ;
Testing
To test our stored procedure all we need to do is call it and look at the results.
mysql> call list_latest_users_and_photos();
+---------+---------+------------+----------+---------------+
| type_id | user_id | country_id | username | country_name |
+---------+---------+------------+----------+---------------+
| U | 4 | 3 | overflow | Germany |
| U | 3 | 1 | stack | Great Britain |
| U | 2 | 2 | bar | United States |
| U | 1 | 1 | f00 | Great Britain |
+---------+---------+------------+----------+---------------+
4 rows in set (0.00 sec)
+---------+----------+---------+----------+
| type_id | photo_id | user_id | username |
+---------+----------+---------+----------+
| P | 11 | 1 | f00 |
| P | 10 | 2 | bar |
| P | 9 | 4 | overflow |
| P | 8 | 1 | f00 |
| P | 7 | 2 | bar |
| P | 6 | 4 | overflow |
| P | 5 | 1 | f00 |
| P | 4 | 3 | stack |
| P | 3 | 2 | bar |
| P | 2 | 1 | f00 |
+---------+----------+---------+----------+
10 rows in set (0.01 sec)
Query OK, 0 rows affected (0.01 sec)
Now we know that works we can call it from php and generate the wall.
PHP Script
<?php
$conn = new Mysqli("localhost", "foo_dbo", "pass", "foo_db");
$result = $conn->query("call list_latest_users_and_photos()");
$users = array();
while($row = $result->fetch_assoc()) $users[] = $row;
$conn->next_result();
$result = $conn->use_result();
$photos = array();
while($row = $result->fetch_assoc()) $photos[] = $row;
$result->close();
$conn->close();
$wall = array_merge($users, $photos);
echo "<pre>", print_r($wall), "</pre>";
?>
Hope you find some of this helpful :)