MySQL - How to normalize column containing delimiter-separated IDs - php

I'm trying to normalize a table which a previous developer designed to have a column containing pipe-separated IDs which link to other rows in the same table.
Customers Table
id | aliases (VARCHAR)
----------------------------
1 | |4|58|76
2 |
3 |
4 | |1|58|76
... |
58 | |1|4|76
... |
76 | |1|4|58
So customer 1, 4, 58 and 76 are all "aliases" of each other. Customer 2 and 3 have no aliases, so the field contains an empty string.
I want to do away with the entire "alias" system, and normalise the data so I can map those other customers all to the one record. So I want related table data for customer 1, 4, 58, and 76 all to be mapped just to customer 1.
I figured I would populate a new table which later I can then join and perform updates on other tables.
Join Table
id | customer_id | alias_id
-------------------------------
1 | 1 | 4
2 | 1 | 58
3 | 1 | 76
How can I get the data from that first table, into the above format? If this is going to be an absolute nightmare in pure SQL, I will just write a PHP script which attempts to do this work and insert the data.

When I started to answer this question, I thought it would be quick and easy because I'd done something very similar once in SQL Server, but proving out the concept in translation burgeoned into this full solution.
One caveat that wasn't clear from your question is whether you have a condition for declaring the primary id vs the alias id. For instance, this solution will allow 1 to have an alias of 4 as well as 4 to have an alias of 1, which is consistent with the provided data in your simplified example question.
To setup the data for this example, I used this structure:
CREATE TABLE notnormal_customers (
id INT NOT NULL PRIMARY KEY,
aliases VARCHAR(10)
);
INSERT INTO notnormal_customers (id,aliases)
VALUES
(1,'|4|58|76'),
(2,''),
(3,''),
(4,'|1|58|76'),
(58,'|1|4|76'),
(76,'|1|4|58');
First, in order to represent the one-to-many relationship for one-customer to many-aliases, I created this table:
CREATE TABLE customer_aliases (
primary_id INT NOT NULL,
alias_id INT NOT NULL,
FOREIGN KEY (primary_id) REFERENCES notnormal_customers(id),
FOREIGN KEY (alias_id) REFERENCES notnormal_customers(id),
/* clustered primary key prevents duplicates */
PRIMARY KEY (primary_id,alias_id)
)
Most importantly, we'll use a custom SPLIT_STR function:
CREATE FUNCTION SPLIT_STR(
x VARCHAR(255),
delim VARCHAR(12),
pos INT
)
RETURNS VARCHAR(255)
RETURN REPLACE(SUBSTRING(SUBSTRING_INDEX(x, delim, pos),
LENGTH(SUBSTRING_INDEX(x, delim, pos -1)) + 1),
delim, '');
Then we'll create a stored procedure to do all the work. Code is annotated with comments to source references.
DELIMITER $$
CREATE PROCEDURE normalize_customers()
BEGIN
DECLARE cust_id INT DEFAULT 0;
DECLARE al_id INT UNSIGNED DEFAULT 0;
DECLARE alias_str VARCHAR(10) DEFAULT '';
/* set the value of the string delimiter */
DECLARE string_delim CHAR(1) DEFAULT '|';
DECLARE count_aliases INT DEFAULT 0;
DECLARE i INT DEFAULT 1;
/*
use cursor to iterate through all customer records
http://burnignorance.com/mysql-tips/how-to-loop-through-a-result-set-in-mysql-strored-procedure/
*/
DECLARE done INT DEFAULT 0;
DECLARE cur CURSOR FOR
SELECT `id`, `aliases`
FROM `notnormal_customers`;
DECLARE CONTINUE HANDLER FOR NOT FOUND SET done = 1;
OPEN cur;
read_loop: LOOP
/*
Fetch one record from CURSOR and set to customer id and alias string.
If not found then `done` will be set to 1 by continue handler.
*/
FETCH cur INTO cust_id, alias_str;
IF done THEN
/* If done set to 1 then exit the loop, else continue. */
LEAVE read_loop;
END IF;
/* skip to next record if no aliases */
IF alias_str = '' THEN
ITERATE read_loop;
END IF;
/*
get number of aliases
https://pisceansheart.wordpress.com/2008/04/15/count-occurrence-of-character-in-a-string-using-mysql/
*/
SET count_aliases = LENGTH(alias_str) - LENGTH(REPLACE(alias_str, string_delim, ''));
/* strip off the first pipe to make it compatible with our SPLIT_STR function */
SET alias_str = SUBSTR(alias_str, 2);
/*
iterate and get each alias from custom split string function
https://stackoverflow.com/questions/18304857/split-delimited-string-value-into-rows
*/
WHILE i <= count_aliases DO
/* get the next alias id */
SET al_id = CAST(SPLIT_STR(alias_str, string_delim, i) AS UNSIGNED);
/* REPLACE existing values instead of insert to prevent errors on primary key */
REPLACE INTO customer_aliases (primary_id,alias_id) VALUES (cust_id,al_id);
SET i = i+1;
END WHILE;
SET i = 1;
END LOOP;
CLOSE cur;
END$$
DELIMITER ;
Finally you can simply run it by calling:
CALL normalize_customers();
Then you can check the data in console:
mysql> select * from customer_aliases;
+------------+----------+
| primary_id | alias_id |
+------------+----------+
| 4 | 1 |
| 58 | 1 |
| 76 | 1 |
| 1 | 4 |
| 58 | 4 |
| 76 | 4 |
| 1 | 58 |
| 4 | 58 |
| 76 | 58 |
| 1 | 76 |
| 4 | 76 |
| 58 | 76 |
+------------+----------+
12 rows in set (0.00 sec)

Update 2 (One-Query-Solution)
Assuming that the aliases list is always sorted, you can achieve the result with only one query:
CREATE TABLE aliases (
id INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY,
customer_id INT UNSIGNED NOT NULL,
alias_id INT UNSIGNED NOT NULL
) AS
SELECT NULL AS id, c1.id AS customer_id, c2.id AS alias_id
FROM customers c1
JOIN customers c2
ON c2.aliases LIKE CONCAT('|', c1.id , '|%') -- c1.id is the first alias of c2.id
WHERE c1.id < (SUBSTRING(c1.aliases,2)+0) -- c1.id is smaller than the first alias of c2.id
It will also be much faster, if the aliases column is indexed, so the JOIN will be supported by a range search.
sqlfiddle
Original answer
If you replace the pipes with commas, you can use the FIND_IN_SET function.
I would first create a temporary table (does not need to be technicaly temporary) to store comma separated alias lists:
CREATE TABLE tmp (`id` int, `aliases` varchar(50));
INSERT INTO tmp(`id`, `aliases`)
SELECT id, REPLACE(aliases, '|', ',') AS aliases
FROM customers;
Then populate your normalized table using FIND_IN_SET in the JOINs ON clause:
CREATE TABLE aliases (`id` int, `customer_id` int, `alias_id` int) AS
SELECT t.id as customer_id, c.id AS alias_id
FROM tmp t
JOIN customers c ON find_in_set(c.id, t.aliases);
If needed - delete duplicates with higher customer_id (only keep lowest):
DELETE FROM aliases
WHERE customer_id IN (SELECT * FROM(
SELECT DISTINCT a1.customer_id
FROM aliases a1
JOIN aliases a2
ON a2.customer_id = a1.alias_id
AND a1.customer_id = a2.alias_id
AND a1.customer_id > a1.alias_id
)derived);
If needed - create AUTO_INCREMENT id:
ALTER TABLE aliases ADD column id INT(10) UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY FIRST;
The aliases table will now look like that:
| id | customer_id | alias_id |
|----|-------------|----------|
| 1 | 1 | 4 |
| 2 | 1 | 58 |
| 3 | 1 | 76 |
sqlfiddle
Don't forget to define proper indexes.
Update 1
You can skip creating a temporary table and populate the aliases table using LIKE instead of FIND_IN_SET:
CREATE TABLE aliases (`customer_id` int, `alias_id` int) AS
SELECT c2.id as customer_id, c1.id AS alias_id
FROM customers c1
JOIN customers c2
ON CONCAT(c1.aliases, '|') LIKE CONCAT('%|', c2.id , '|%');
sqlfiddle

Using a table of integers (0-9) - although you can achieve the same the thing with (SELECT 0 i UNION SELECT 1 UNION SELECT 2 UNION SELECT 3...etc.)...
SELECT DISTINCT id old_id /* the technique below inevitably creates duplicates. */
/* DISTINCT discards them. */
, SUBSTRING_INDEX(
SUBSTRING_INDEX(SUBSTR(aliases,2),'|',i+1) /* isolate text between */
,'|',-1) x /* each pipe and the next */
FROM customers
, ints /* do this for the first 10 pipes in each string */
ORDER
BY id,x+0 /* implicit CASTING */
+--------+------+
| old_id | x |
+--------+------+
| 1 | 4 |
| 1 | 58 |
| 1 | 76 |
| 2 | NULL |
| 3 | NULL |
| 4 | 1 |
| 4 | 58 |
| 4 | 76 |
| 58 | 1 |
| 58 | 4 |
| 58 | 76 |
| 76 | 1 |
| 76 | 4 |
| 76 | 58 |
+--------+------+
(Edit: In line comments added)

Related

Optimize SQL query to fetch file names

I've two tables, the first table contains information on the ideas submitted by user and the second table contains information on the file attachments that are part of the idea. An idea submitted by the user can have 0 or any number of attachments.
Table 1:
-------------------------------------
Id Title Content Originator
-------------------------------------
1 aaa bbb John
2 ccc ddd Peter
--------------------------------------
Table 2:
---------------------------------------------
Id Idea_id Attachment_name
---------------------------------------------
1 1 file1.doc
2 1 file2.doc
3 1 file3.doc
4 2 user2.doc
---------------------------------------------
Table 1 primary key is Id and table 2 primary key is Id as well. Idea_id is the foreign key in table 2 mapping to table 1 Id.
I'm trying to display all the ideas, along with their attachments in a html page. So what I've been doing is: get all the ideas from Table 1 and then for each idea record, retrieve the attachment records from table 2.It seems to be extremely inefficient. Could this be optimized so that I can retrieve idea records and their corresponding attachment records in one query?
I tried with left outer join(Table 1 left outer join Table 2) but that would give me three records for Id = 1 in table 1. I'm looking for a SQL query to club idea detail and attachment names in 1 row to make HTML page processing efficient. Otherwise, What would be the best solution for this?
If you want to get all attachments along with all ideas, you may use GROUP_CONCAT. such as
SELECT *, (SELECT GROUP_CONCAT(attachment_name separator ', ') FROM TABLE2 WHERE idea_id = TABLE1.id) attachments FROM TABLE1
I probably missed the point but a left join should bring back all the records
create table `ideas` (
`id` int(10) unsigned not null auto_increment,
`title` varchar(50) not null,
`content` varchar(50) not null,
`originator` varchar(50) not null,
primary key (`id`)
)
engine=innodb
auto_increment=3;
create table `attachments` (
`id` int(10) unsigned not null auto_increment,
`idea_id` int(10) unsigned not null default '0',
`attachment` varchar(50) not null default '0',
primary key (`id`),
index `idea_id` (`idea_id`),
constraint `fk_ideas` foreign key (`idea_id`) references `ideas` (`id`) on update cascade on delete cascade
)
engine=innodb
auto_increment=5;
mysql> select * from ideas;
+----+----------------+-----------+-----------------+
| id | title | content | originator |
+----+----------------+-----------+-----------------+
| 1 | Flux capacitor | Rubbish | Doc |
| 2 | Star Drive | Plutonium | Professor Frink |
+----+----------------+-----------+-----------------+
mysql> select * from attachments;
+----+---------+------------------------------+
| id | idea_id | attachment |
+----+---------+------------------------------+
| 1 | 1 | Flux capacitor schematic.jpg |
| 2 | 1 | Sensors.docx |
| 3 | 1 | fuel.docx |
| 4 | 2 | plans.jpg |
+----+---------+------------------------------+
mysql> select * from ideas i
-> left outer join attachments a on a.idea_id=i.id;
+----+----------------+-----------+-----------------+------+---------+------------------------------+
| id | title | content | originator | id | idea_id | attachment |
+----+----------------+-----------+-----------------+------+---------+------------------------------+
| 1 | Flux capacitor | Rubbish | Doc | 1 | 1 | Flux capacitor schematic.jpg |
| 1 | Flux capacitor | Rubbish | Doc | 2 | 1 | Sensors.docx |
| 1 | Flux capacitor | Rubbish | Doc | 3 | 1 | fuel.docx |
| 2 | Star Drive | Plutonium | Professor Frink | 4 | 2 | plans.jpg |
+----+----------------+-----------+-----------------+------+---------+------------------------------+

create mysql row with not really unique keys based on some other rows

Database example:
| country | animal | size | x_id* |
|---------+--------+--------+-------|
| 777 | 1001 | small | 1 |
| 777 | 2002 | medium | 2 |
| 777 | 7007 | medium | 3 |
| 777 | 7007 | large | 4 |
| 42 | 1001 | small | 1 |
| 42 | 2002 | medium | 2 |
| 42 | 7007 | large | 4 |
I need to generate the x_id continuously based on entries in (animal, size) and if x_id for the combination x_id exist use it again.
Currently i use the following PHP script for this action, but on a large db table it is very slow.
query("UPDATE myTable SET x_id = -1");
$i = $j;
$c = array();
$q = query("
SELECT animal, size
FROM myTable
WHERE x_id = -1
GROUP BY animal, size");
while($r = fetch_array($q)) {
$hk = $r['animal'] . '-' . $r['size'];
if( !isset( $c[$hk] ) ) $c[$hk] = $i++;
query("
UPDATE myTable
SET x_id = {$c[$hk]}
WHERE animal = '".$r['animal']."'
AND size = '".$r['size']."'
AND x_id = -1");
}
Is there a way to convert the PHP script to one or two mysql commands?
edit:
CREATE TABLE `myTable` (
`country` int(10) unsigned NOT NULL DEFAULT '1', -- country
`animal` int(3) NOT NULL,
`size` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
`lang_id` tinyint(4) NOT NULL DEFAULT '1',
`x_id` int(10) NOT NULL,
KEY `country` (`country`),
KEY `x_id` (`x_id`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
UPDATE myTable m
JOIN (
SELECT animal, size, #newid := #newid + 1 AS x_id
FROM myTable a
CROSS JOIN (SELECT #newid := 0) b
WHERE x_id = -1
GROUP BY animal, size
) t ON m.animal = t.animal AND m.size = t.size
SET m.x_id = t.x_id
;
http://sqlfiddle.com/#!9/5525ba/1
The group by in the subquery is not needed. It generates useless overhead. If it's fast enough, leave it like this, otherwise we can use distinct+another subquery instead.
User variables are awkward but should do the trick,tested on my machine
CREATE TABLE t
( animal VARCHAR(20),
size VARCHAR(20),
x_id INT);
INSERT INTO T(animal,size) VALUES('crocodile','small'),
('elephant','medium'),
('giraffe','medium'),
('giraffe','large'),
('crocodile','small'),
('elephant','medium'),
('giraffe','large');
UPDATE t RIGHT JOIN
(SELECT animal,size,
MIN(CASE WHEN #var:=CONCAT(animal,size) THEN #id ELSE #id:=#id+1 END)id
FROM t,
(SELECT #var:=CONCAT(animal,size) FROM t)x ,
(SELECT #id:=0)y
GROUP BY animal,size)q
ON t.animal=q.animal AND t.size=q.size
SET x_id=q.id
Results
"animal" "size" "x_id"
"crocodile" "small" "1"
"elephant" "medium" "2"
"giraffe" "medium" "3"
"giraffe" "large" "4"
"crocodile" "small" "1"
"elephant" "medium" "2"
"giraffe" "large" "4"
You want these indexes added for (a lot) faster access
ALTER TABLE `yourtable` ADD INDEX `as_idx` (`animal`,`size`);
ALTER TABLE `yourtable` ADD INDEX `id_idx` (`x_id`);
This is a conceptual. Worm it into your world if useful.
Schema
create table AnimalSize
( id int auto_increment primary key,
animal varchar(100) not null,
size varchar(100) not null,
unique key(animal,size) -- this is critical, no dupes
);
create table CountryAnimalSize
( id int auto_increment primary key,
country varchar(100) not null,
animal varchar(100) not null,
size varchar(100) not null,
xid int not null -- USE THE id achieved thru use of AnimalSize table
);
Some queries
-- truncate table animalsize; -- clobber and reset auto_increment back to 1
insert ignore AnimalSize(animal,size) values ('snake','small'); -- id=1
select last_insert_id(); -- 1
insert ignore AnimalSize(animal,size) values ('snake','small'); -- no real insert but creates id GAP (ie blows slot 2)
select last_insert_id(); -- 1
insert ignore AnimalSize(animal,size) values ('snake','small'); -- no real insert but creates id GAP (ie blows slot 3)
select last_insert_id(); -- 1
insert ignore AnimalSize(animal,size) values ('frog','medium'); -- id=4
select last_insert_id(); -- 4
insert ignore AnimalSize(animal,size) values ('snake','small'); -- no real insert but creates id GAP (ie blows slot 3)
select last_insert_id(); -- 4
Note: insert ignore says do it, and ignore the fact that it may die. In our case, it would fail due to unique key (which is fine). In general, do not use insert ignore unless you know what you are doing.
It is often thought of in connection with an insert on duplicate key update (IODKU) call. Or should I say thought about, as in, How can I solve this current predicament. But, that (IODKU) would be a stretch in this case. Yet, keep both in your toolchest for solutions.
After insert ignore fires off, you know, one way or the other, that the row is there.
Forgetting the INNODB GAP aspect, what the above suggests is that if the row already exists prior to insert ignore, that
You cannot rely on last_insert_id() for the id
So after firing off insert ignore, go and fetch the id that you know has to be there. Use that in subsequent calls against CountryAnimalSize
continue along this line of reasoning for your CountryAnimalSize table inserts where the row may or may not already be there.
There is no reason to formalize the solution here because, as you say, these aren't even your tables anyway in the Question.
Also, back to INNODB GAP. Google that. Figure out whether or not you can live with gaps created.
Most people have bigger fish to fry that keeping id's tight and gapless.
Other people (read: OCD) are so consumed by the perceived gap problem that they blow days on it.
So, these are general comments meant to help a broader audience, than to answer your question, which, as you say, isn't even your schema.
You can use x_id as this:
CONCAT(`animal`, '_', `size`) AS `x_id`
And then compare it with x_id, so that you will get something like:
+---------+-----------+--------+------------------+
| country | animal | size | x_id* |
+---------+-----------+--------+------------------+
| africa | crocodile | small | crocodile_small |
| africa | elephant | medium | elephant_medium |
| africa | giraffe | medium | giraffe_medium |
| africa | giraffe | large | giraffe_large |
| europe | crocodile | small | crocodile_small |
| europe | elephant | medium | elephant_medium |
| europe | giraffe | large | giraffe_large |
+---------+-----------+--------+------------------+
As I see, you are already using MyISAM engine type, You can just define both country and x_id field as PRIMARY KEY (jointly) and you can set the AUTO_INCREMENT for x_id field. Now MySQL will do the rest for you! BINGO!
Here is the SQL Fiddle for you!
CREATE TABLE `myTable` (
`country` int(10) unsigned NOT NULL DEFAULT '1', -- country
`animal` int(4) NOT NULL,
`size` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
`lang_id` tinyint(4) NOT NULL DEFAULT '1',
`x_id` int(10) NOT NULL AUTO_INCREMENT,
PRIMARY KEY (country,x_id)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
INSERT INTO `myTable` (`country`, `animal`, `size`) VALUES
(777, 1001, 'small'),
(777, 2002, 'medium'),
(777, 7007, 'medium'),
(777, 7007, 'large'),
(42, 1001, 'small'),
(42, 2002, 'medium'),
(42, 7007, 'large')
The result will be like this:
| country | animal | size |lang_id | x_id |
|---------+--------+--------+--------+-------|
| 777 | 1001 | small | 1 | 1 |
| 777 | 2002 | medium | 1 | 2 |
| 777 | 7007 | medium | 1 | 3 |
| 777 | 7007 | large | 1 | 4 |
| 42 | 1001 | small | 1 | 1 |
| 42 | 2002 | medium | 1 | 2 |
| 42 | 7007 | large | 1 | 4 |
NOTE: This will only work for MyISAM and BDB tables, for other engine types you will get error saying "Incorrect table definition; there can be only one auto column and it must be defined as a key!" See this answer for more on this : https://stackoverflow.com/a/5416667/5645769.

MYSQL: UPDATE rows based on SELECT from another table

I have a 'users' table with 100 entries, each having an empty 'first_name' column. I wish to update each of these with names from another table. They do not need to correspond, they can be random, I just need data from one table into the other. I have found other people asking similar questions, but they all seem to have corresponding columns, like "username" being the same in either table and can get it working using a JOIN ON. As there are no corresponding columns I cannot do this.
I currently have tried the following which does not work:
UPDATE users
SET first_name =
(
SELECT `user_firstname`
FROM past_users
WHERE `active` = '1' LIMIT 100
)
This gives the error:
Subquery returns more than 1 row
The only way it works is using LIMIT 1, which updates each entry with the same data. I want them each to be unique.
Ok, maybe this concept. The below is just an illustration. Uses random, and limit 1.
Schema
create table user
( userId int auto_increment primary key,
firstName varchar(50) not null
-- etc
);
create table prevUser
( userId int auto_increment primary key,
firstName varchar(50) not null,
active int not null
);
-- truncate table user;
-- truncate table prevuser;
insert user(firstName) values (''),(''),(''),(''),(''),(''),(''),(''),('');
insert prevUser(firstName,active) values
('user1prev',0),('snickers bar',1),('Stanley',1),('user4prev',0),('zinc',1),
('pluto',1),('us7545rev',0),('uffallfev',0),('user4prev',0),('tuna',1),
('Monty Python',1),('us4 tprev',0),('mouse',1),('user4prev',0),('Sir Robin',1),
('lizard',1),('Knights that says, Nee!',0),('mayo',1),('656user4prev',0),('kiwi',1);
Query (similar to yours)
UPDATE user
SET firstName =
(
SELECT firstName
FROM prevUser
WHERE `active` = '1'
order by rand()
limit 1
)
Results
select * from user;
+--------+--------------+
| userId | firstName |
+--------+--------------+
| 1 | snickers bar |
| 2 | tuna |
| 3 | mouse |
| 4 | Sir Robin |
| 5 | mouse |
| 6 | mayo |
| 7 | lizard |
| 8 | snickers bar |
| 9 | pluto |
+--------+--------------+
You need something like this:
UPDATE users
JOIN past_users ON past_users.user_id = users.id AND past_users.`active` = '1'
SET users.first_name = past_users.user_firstname

MySQL Update if column value is greater than previous or insert if one doesnt exist

I take in a userId, leaderboardId, and Score and need to insert if that user doesn't have a score for that leaderboard or update if it does have a score and the new score is larger.
My question is what is the SQL statement needed to accomplish the above.
I've looked into insert and on duplicate but that only seems to work for unique keys where in this example there can be multiple of the same userIds as long as they are in different leaderboards and vice versa.
Thanks
Solved!
Edit:
thanks everyone here is what I did to make it work!
UNIQUE KEY 'newKey' (userId, leaderboardId)
insert into score (UserId, LeaderboardId, Score) values(1,5,100)
ON DUPLICATE KEY UPDATE score = greatest(Score, values(Score))
Try this, if you got null result (= no rows) from this:
select score from table
where userid = 42
and leaderboardId = 2001
and score is not null;
then there are no score and you can insert your datas.
Otherwise you have to check, if your new score is greater as your result value, then you can update.
Otherwise you have nothing to do.
this is a cut and paste from one of my other answers, I will tweak it for your category thing, but bear with it until then:
Schema:
CREATE TABLE leaderBoard
( id int AUTO_INCREMENT primary key,
userID int not null,
leaderBoardID int not null,
score int not null,
UNIQUE KEY `combo_thingie1` (userID,leaderBoardID) -- unique composite
) ENGINE=InnoDB auto_increment=150;
Tests:
insert leaderBoard (userID,leaderBoardID,score) values (113,1,0)
on duplicate key update score=greatest(0,score);
insert leaderBoard (userID,leaderBoardID,score) values (113,2,0)
on duplicate key update score=greatest(0,score);
select * from leaderBoard;
+----+--------+---------------+-------+
| id | userID | leaderBoardID | score |
+----+--------+---------------+-------+
| 1 | 113 | 1 | 0 |
| 2 | 113 | 2 | 0 |
+----+--------+---------------+-------+
insert leaderBoard (userID,leaderBoardID,score) values (113,2,555)
on duplicate key update score=greatest(555,score);
select * from leaderBoard;
+----+--------+---------------+-------+
| id | userID | leaderBoardID | score |
+----+--------+---------------+-------+
| 1 | 113 | 1 | 0 |
| 2 | 113 | 2 | 555 |
+----+--------+---------------+-------+
insert leaderBoard (userID,leaderBoardID,score) values (113,2,444)
on duplicate key update score=greatest(444,score); -- ignores lower score
select * from leaderBoard;
+----+--------+---------------+-------+
| id | userID | leaderBoardID | score |
+----+--------+---------------+-------+
| 1 | 113 | 1 | 0 |
| 2 | 113 | 2 | 555 |
+----+--------+---------------+-------+

MYSQL query with 3 INNER JOINs

I'm trying to create a book-catalogue. I have 3 basic tables - books, authors, books_authors;
books
+------------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+------------+--------------+------+-----+---------+----------------+
| book_id | int(11) | NO | PRI | NULL | auto_increment |
| book_title | varchar(250) | NO | | NULL | |
+------------+--------------+------+-----+---------+----------------+
authors
+-------------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------------+--------------+------+-----+---------+----------------+
| author_id | int(11) | NO | PRI | NULL | auto_increment |
| author_name | varchar(250) | NO | | NULL | |
+-------------+--------------+------+-----+---------+----------------+
books_authors
+-----------+---------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-----------+---------+------+-----+---------+-------+
| book_id | int(11) | NO | MUL | NULL | |
| author_id | int(11) | NO | MUL | NULL | |
+-----------+---------+------+-----+---------+-------+
I have a query that takes the book name and all authors for each book and displays the result:
$booksAndAuthors = mysqli_query($connection, 'SELECT * FROM books LEFT JOIN books_authors ON books.book_id=books_authors.book_id LEFT JOIN authors ON authors.author_id=books_authors.author_id');
It returns:
Book Name -> Author 1, Author 2
Book Name 2 -> Author 3, Author 2
And so on.
And I have another query that it's:
$booksAndAuthors = mysqli_query($connection, 'SELECT * FROM books_authors as ba
INNER JOIN books as b ON ba.book_id=b.book_id
INNER JOIN books_authors as booaut ON booaut.book_id=ba.book_id
INNER JOIN authors as a ON booaut.author_id=a.author_id
WHERE ba.author_id=' . $author_id);
When I click over an author (authors are links), the query returns all books of an author the opposite; The queries all work;
My Question is:
Could someone explain to me why I'm comparing a table with itself. Just explain for dummie like myself. I want to understand the action that is done by this query, with words or something else.
*If my question isn't properly asked! Edit me!
*Regards!
A book can have more than one author. The point of the self-join is to find the other authors for the book.
FROM books_authors as ba
...
INNER JOIN books_authors as booaut ON booaut.book_id=ba.book_id
...
WHERE ba.author_id=42
The join picks up any author who co-authored a book with author 42.
Another way to write the query:
FROM books_authors as ba
...
WHERE EXISTS
(
SELECT *
FROM books_authors ba2
WHERE ba2.book_id = ba.book_id
and ba2.author_id = 42
)
This says, select all rows where a matching book_authors entry exists for author 42.
It seems a self join ( joining the table with itself ) is unnecessary here since you are picking the same rows.
Usually self joins are performed to join two different rows in a table. For example, if you have a table with monthly account balances for example
acount_id |as_of_date | balance_amount
-----------|---------------------------
12213 |2014-01-01 | 10000
12213 |2014-02-01 | 20000
12213 |2014-03-01 | 25000
Let's say the table name is monthly_account_balances
Now you want to compute the difference between monthly balances
For instance, between February and January the difference is 20000 - 10000 = 10000
And between March and February the difference is 25000 - 20000 = 5000
And the output you need is
acount_id |as_of_date | balance_amount|difference
-----------|-----------|---------------|-------------
12213 |2014-01-01 | 10000 | null
12213 |2014-02-01 | 20000 | 10000
12213 |2014-03-01 | 25000 | 5000
Here you do a self join as follows:
select a.*, b.balance_amount - a.balance_amount as difference
from monthly_account_balances a
inner join monthly_account_balances b on a.account_id = b.account_id
and a.as_of_date + interval '1 month' = b.as_of_date
Notice the date condition. It's comparing two different records with same id but different as_of_date. Self join is useful in such situations.
However in your case you are just joining on id and I see no point in doing that unless I am missing something

Categories