MySQL Query problem (duplicate results)

MySQL Query problem (duplicate results) - php

Im having a problem finding duplicate results in a mysql database (a cocktail recipe website). Here the setup:
Table 1: 'cocktail'
[cid,c_name] (cid = unique cocktail id, c_name = cocktail name)
Table 2: 'ingredients':
[iid,i_name] (iid = unique ingredient id, i_name = ingredient name)
Table 3: 'cocktail_ingredients' (the linking table)
[ciid,cid,iid] (ciid = unique row identifier, cid = cocktail cid, iid = ingredient iid)
So one cocktail can have multiple rows in the 'cocktail_ingredients' table (1 to many).
Setup is fine. The problem Im having now is finding if there are duplicate cocktails in my database.
For instance if the cocktail_ingredients table had these entries:
cid | iid
1 | 56
1 | 78
1 | 101
.
.
.
9 | 56
9 | 78
9 | 101
The cocktail is the same (for theoretical purposes here anyway).
If the 'cocktail_ingredients' table had one more row ...
9 | 103
Then it wouldn't be the same, as cocktail number 9 includes an extra ingredient.
So the mysql has to do 2 checks, firstly that the ingredient count is the same, and secondly that every ingredient id (iid) is the same for corresponding cocktails (cid).
Im stumped on this one, any help much appreciated. I'm thinking I might have to head down the PHP route as well to code in something more complex, but I'm struggling there as well so thought this would be a good place to stop and ask.
Thanks a ton
Nick

You may recall from a distant math class that the definition of set equality is that both A abd B are subsets of one another (non-strict) so just create a view or procedure that checks if every thin that is in A is also in B, then check the two cocktails are both subsets of one another. This is far from a complete answer, but it may be enough to get you going ;)
it will probably be easier to do the negation - find an ingredient in A that is not in B. none exist, then A must be a strict subset of B (assuming A and B can't both be empty)
Alternatively do a count of each ingredient in A, each ingredient ion B and each ingredient in A and B then if they are equal they are equivalent cocktails
CREATE VIEW ingredient_count AS
SELECT cid, count(*) as ingredients
FROM cocktail_ingredients
GROUP BY cid
CREATE VIEW shared_ingredients AS
SELECT c1.cid cid1, c2.cid cid2, count(*) as ingredients
FROM cocktail_ingredients as c1 INNER JOIN cocktail_ingredients as c2
ON (c1.cid != c2.cid AND c1.iid = c2.iid)
GROUP BY c1.cid,c2.cid
CREATE VIEW duplicates AS
SELECT cid1,cid2
FROM (ingredient_count AS ic1 INNER JOIN shared_ingredients
ON ic1.cid=cid1) INNER JOIN ingredient_count as ic2
ON ic2.cid=cid2
WHERE ic1.ingredients=ic2.ingredients
AND shared_ingredients=ic1.ingredients
Note this may be much faster in mysql with subselects with sensible where clauses rather than views, but this is easier to understand

You can impose such checking using TRIGGER.
But, yet there is a conceptual problem.
Say, you have two cocktails {1 | 56, 78, 101} and {9 | 56, 78, 101, 103} and also assume that you have implemented the check.
Now, you are inserting data for 1:
cid | iid
----------
1 | 56
Then, add rest two ingredients...
cid | iid
----------
1 | 56
1 | 78
1 | 101
Fine, now you start adding 9:
cid | iid
----------
1 | 56
1 | 78
1 | 101
9 | 56
You have three more ingredients, so continue adding them:
cid | iid
----------
1 | 56
1 | 78
1 | 101
9 | 56
9 | 78
Two more remaining (101,103)
But alas! You cannot add 101! If you try to add 101, then 9 would become identical to 1, which your trigger will prevent you from adding.
When a cocktail is subset of another, you have to add the subset later. I hope I could make you understand this.
You should not put any restriction in database. What I would do in my web application is:
In the cocktail entry/update interface, I would take user input (and not yet insert/update in DB)
When user clicks the save button (I would add a save button), check if the new/updated cocktail becomes copy of another (May be I would write a stored procedure, but it can be found using a select query only)
If the new/updated cocktail is not duplicate of another, insert/update database. If

Related

Adding values into a new mySQL db table column depending on an old column

ID | PID | NAME | VALUE |
-------------------------------------
60 | 1 | Test1 | 9999 |
21 | 2 | Test2 | 9999 |
44 | 1 | Test3 | 9999 |
37 | 4 | Test4 | 9999 |
24 | 1 | Test5 | 9999 |
Hey all!
I am kind of new to PHP and DBs so I really dont know how to start with this.
So I want to want to make a sorting inside a DB where the IDs differ too much.
(that means that the first ID starts with 34 and the next one is something like 43 next is 55 etc.)
I have a table which looks like the one above.
Now what I would like to do is changing the values in the column VALUE depending on the values which are in PID.
This means that if in PID the value equals 1 the VALUE on the same row should become 1001 and for the next one 1002, next 1003.
If PID = 2 then VALUE should be changed to 2001 then 2002 then 2003 etc.
This would be for an already existing table but I would also like to include the VALUE values everytime I add a new statement into that table.
So a simple check in pseudocode:
If PID equals 1
then check VALUE column for the highest number that starts with "1"
make it +1 and add it into the column of that row
Is that possible to do?
what would you guys suggest me to do instead (to make things easier)?
If you need further info, tell me please and I will try to explain things better, I dont know if my explanation says what I'm trying to do.
Thank you in advance.
Cheers,
K.

You can use UPDATE .. JOIN and join to a derived table containing the "rank" of each ID , and update accordingly :
UPDATE YourTable t
JOIN(SELECT s.ID,s.PID,COUNT(*) as cnt
FROM YourTable s
JOIN YourTable s2
ON(s.pid = s2.pid AND s.id >= s2.id)) p
ON(t.id = p.id)
SET t.value = (1000*t.pid) + p.cnt
The inner query here basically "ranks" the data by a self join. It joins to it self by the condition s.pid = s2.pid AND s.id >= s2.id , in words - Same PID that happen before me including me, so the first one will join to 1 record, the second to two and so on.. Then you just update value column to pid*1000 , plus the rank.

Select Query not work with where in

I have two table
one table is alldata ( here info_id is a text field data inserted using php )
=================
id | info_id
=================
1 | 2, 3, 5, 9
2 |
=================
second table is info
=================
id | name
=================
1 | one
2 | two
3 | three
4 | four
5 | five
6 | six
7 | seven
9 | eight
9 | nine
=================
now I want to select list of data from table two where data id will be matched with table one first item info_id data
my query is
SELECT i.* FROM `info` as i,`alldata` as a where i.id IN(a.info_id) and a.id=1
my query works but select only one item from table two.But there are multiple matched.

You have a very poor database design. First, storing numeric ids as strings is a bad idea -- numbers should be stored as numbers. Second, SQL offers this great data structure for storing lists. It is called a table, not a string.
You should really have a junction table, one one row per id and info_id.
That said, sometimes we a struck with substandard data structure. MySQL offers support for this. You can use:
SELECT i.*
FROM `info` i JOIN
`alldata` a
ON FIND_IN_SET(i.id, REPLACE(a.info_id, ', ', ',') ) > 0
WHERE a.id = 1;
You should also learn to use proper, explicit join syntax. If you use this method, instead of fixing the database design, you are not allowed to complain about performance. MySQL cannot take advantage of things like indexes to improve the performance of this type of query.

MySql db structure to store a list of items in sequence

I need to store and retrieve items of a course plan in sequence. I also need to be able to add or remove items at any point.
The data looks like this:
-- chapter 1
--- section 1
----- lesson a
----- lesson b
----- drill b
...
I need to be able to identify the sequence so that when the student completes lesson a, I know that he needs to move to lesson b. I also need to be able to insert items in the sequence, like say drill a, and of course now the student goes from lesson a to drill a instead of going to lesson b.
I understand relational databases are not intended for sequences. Originally, I thought about using a simple autoincrement column and use that to handle the sequence, but the insert requirement makes it unworkable.
I have seen this question and the first answer is interesting:
items table
item_id | item
1 | section 1
2 | lesson a
3 | lesson b
4 | drill a
sequence table
item_id | sequence
1 | 1
2 | 2
3 | 4
4 | 3
That way, I would keep adding items in the items table with whatever id and work out the sequence in the sequence table. The only problem with that system is that I need to change the sequence numbers for all items in the sequence table after an insertion. For instance, if I want to insert quiz a before drill a I need to update the sequence numbers.
Not a huge deal but the solutions seems a little overcomplicated. Is there an easier, smarter way to handle this?

Just relate records to the parent and use a sequence flag. You will still need to update all the records when you insert in the middle but I can't really think of a simple way around that without leaving yourself space to begin with.
items table:
id | name | parent_id | sequence
--------------------------------------
1 | chapter 1 | null | 1
2 | section 1 | 1 | 2
3 | lesson a | 2 | 3
4 | lesson b | 2 | 5
5 | drill a | 2 | 4
When you need to insert a record in the middle a query like this will work:
UPDATE items SET sequence=sequence+1 WHERE sequence > 3;
insert into items (name, parent_id, sequence) values('quiz a', 2, 4);
To select the data in order your query will look like:
select * from items order by sequence;

Safely auto increment MySQL field based on MAX() subquery upon insert

I have a table which contains a standard auto-incrementing ID, a type identifier, a number, and some other irrelevant fields. When I insert a new object into this table, the number should auto-increment based on the type identifier.
Here is an example of how the output should look:
id type_id number
1 1 1
2 1 2
3 2 1
4 1 3
5 3 1
6 3 2
7 1 4
8 2 2
As you can see, every time I insert a new object, the number increments according to the type_id (i.e. if I insert an object with type_id of 1 and there are 5 objects matching this type_id already, the number on the new object should be 6).
I'm trying to find a performant way of doing this with huge concurrency. For example, there might be 300 inserts within the same second for the same type_id and they need to be handled sequentially.
Methods I've tried already:
PHP
This was a bad idea but I've added it for completeness. A request was made to get the MAX() number for the item type and then add the number + 1 as part of an insert. This is quick but doesn't work concurrently as there could be 200 inserts between the request for MAX() and that particular insert leading to multiple objects with the same number and type_id.
Locking
Manually locking and unlocking the table before and after each insert in order to maintain the increment. This caused performance issues due to the number of concurrent inserts and because the table is constantly read from throughout the app.
Transaction with Subquery
This is how I'm currently doing it but it still causes massive performance issues:
START TRANSACTION;
INSERT INTO objects (type_id,number) VALUES ($type_id, (SELECT COALESCE(MAX(number),0)+1 FROM objects WHERE type_id = $type_id FOR UPDATE));
COMMIT;
Another negative thing about this approach is that I need to do a follow up query in order to get the number that was added (i.e. searching for an object with the $type_id ordered by number desc so I can see the number that was created - this is done based on a $user_id so it works but adds an extra query which I'd like to avoid)
Triggers
I looked into using a trigger in order to dynamically add the number upon insert but this wasn't performant as I need to perform a query on the table I'm inserting into (which isn't allowed so has to be within a subquery causing performance issues).
Grouped Auto-Increment
I've had a look at grouped auto-increment (so that the number would auto-increment based on type_id) but then I lose my auto-increment ID.
Does anybody have any ideas on how I can make this performant at the level of concurrent inserts that I need? My table is currently InnoDB on MySQL 5.5
Appreciate any help!
Update: Just in case it is relevant, the objects table has several million objects in it. Some of the type_id can have around 500,000 objects assigned to them.

Use transaction and select ... for update. This will solve concurrency conflicts.

In Transaction with Subquery
Try to make index on column type_id
I think by making index on column type_id it will speed up your subquery.

DROP TABLE IF EXISTS my_table;
CREATE TABLE my_table
(id INT NOT NULL AUTO_INCREMENT PRIMARY KEY
,type_id INT NOT NULL
);
INSERT INTO my_table VALUES
(1,1),(2,1),(3,2),(4,1),(5,3),(6,3),(7,1),(8,2);
SELECT x.*
, COUNT(*) rank
FROM my_table x
JOIN my_table y
ON y.type_id = x.type_id
AND y.id <= x.id
GROUP
BY id
ORDER
BY type_id
, rank;
+----+---------+------+
| id | type_id | rank |
+----+---------+------+
| 1 | 1 | 1 |
| 2 | 1 | 2 |
| 4 | 1 | 3 |
| 7 | 1 | 4 |
| 3 | 2 | 1 |
| 8 | 2 | 2 |
| 5 | 3 | 1 |
| 6 | 3 | 2 |
+----+---------+------+
or, if performance is an issue, just do the same thing with a couple of #variables.

Perhaps an idea to create a (temporary) table for all rows with a common "type_id".
In that table you can use auto-incrementing for your num colomn.
Then your num shoud be fully trustable.
Then you can select your data and update your first table.

MySQL: SELECT a Winner, returning their rank

Earlier I asked this question, which basically asked how to list 10 winners in a table with many winners, according to their points.
This was answered.
Now I'm looking to search for a given winner X in the table, and find out what position he is in, when the table is ordered by points.
For example, if this is the table:
Winners:
NAME:____|__POINTS:
Winner1 | 1241
Winner2 | 1199
Sally | 1000
Winner4 | 900
Winner5 | 889
Winner6 | 700
Winner7 | 667
Jacob | 623
Winner9 | 622
Winner10 | 605
Winner11 | 600
Winner12 | 586
Thomas | 455
Pamela | 434
Winner15 | 411
Winner16 | 410
These are possible inputs and outputs for what I want to do:
Query: "Sally", "Winner12", "Pamela", "Jacob"
Output: 3 12 14 623
How can I do this? Is it possible, using only a MySQL statement? Or do I need PHP as well?
This is the kind of thing I want:
WHEREIS FROM Winners WHERE Name='Sally' LIMIT 1
Ideas?
Edit - NOTE: You do not have to deal with the situation where two Winners have the same Points (assume for simplicity's sake that this does not happen).

I think this will get you the desired result. Note that i properly handles cases where the targeted winner is tied for points with another winner. (Both get the same postion).
SELECT COUNT(*) + 1 AS Position
FROM myTable
WHERE Points > (SELECT Points FROM myTable WHERE Winner = 'Sally')
Edit:
I'd like to "plug" Ignacio Vazquez-Abrams' answer which, in several ways, is better than the above.
For example, it allows listing all (or several) winners and their current position.
Another advantage is that it allows expressing a more complicated condition to indicate that a given player is ahead of another (see below). Reading incrediman's comment to the effect that there will not be "ties" prompted me to look into this; the query can be slightly modified as follow to handle the situation when players have same number of points (such players would formerly have been given the same Position value, now the position value is further tied to their relative Start values).
SELECT w1.name, (
SELECT COUNT(*)
FROM winners AS w2
WHERE (w2.points > w1.points)
OR (W2.points = W1.points AND W2.Start < W1.Start) -- Extra cond. to avoid ties.
)+1 AS rank
FROM winners AS w1
-- WHERE W1.name = 'Sally' -- optional where clause

SELECT w1.name, (
SELECT COUNT(*)
FROM winners AS w2
WHERE w2.points > w1.points
)+1 AS rank
FROM winners AS w1

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.