This question already has an answer here:
MYSQL/PHP find the most common item associate with a given item
(1 answer)
Closed 6 years ago.
I have a dataset of "collections" or let's call them groups or wishlists...
a collection is a list of items
collectionId | itemdId
---------------------------------
123 | 2345
123 | 3465
123 | 876
123 | 567
123 | 980
777 | 980
777 | 332
777 | 3465
777 | 876
777 | 678
777 | 567
you see item 876 and 980, are included in both collections (777 and 123) so they are a popular couple/pair
my users generate these collection, and I'm curious to extract two insights:
which are the most common items (this is easy)
which are the most common pairs/couple (or more than 2) of items (this is my question)
eg.
say many wish-lists contains iphones and a pink iphone covers
among other accessories but i want to extract in fact that iphone +
that pink iphone cover is a common recurring "couple"
all in all, basically i'm trying to do what Amazon does, if you see an iphone i want to suggest you a pink iphone cover because many other users have suggested/favorited that
Do I have to compare similarity between collection first? to see who many items they have in common? than rate the similarity with an index?
what is the best approach to this with mysql.
do i need PHP as well?
UPDATE:
in PHP I would probably do something loopy like in pseudo code
for total number of collection:
select all item from collection 1
select all item from collection 2
do array_interesct (c1,c2)
store the matching items
repeat...
select all item from collection 2
do array_interesct (c1,c3)
store the matching items
repeat...
...then elect all item from collection 2 and repeat all the iterations..
For two collection you can do a join
select a.itemID
from my_table a
join my_table b on a.itemID = b.ItemID
where a.collection = 123
and b.collection = 777
for all you can try with a cartesian product (for pair two table) .. for ( 3 ..3)
select a.itemID
from my_table a
cross join my_table b
where a.item = b.item
and a.collection <> b.collection
Related
I have a problem trying to apply rules about direct matches in a football[soccer] app. I have read this tread and it was very heplful on creating the standing positions table by the points criteria, difference and scored goals.
But i would like to know if is possible to order the teams position by direct matches:
look this positions table:
Pos Team Pld W D L F A GD Pts
1 FC Barcelona 5 2 3 0 8 5 3 9
2 **Inter Milan** 6 2 2 2 11 10 1 8
3 *Real Madrid* 6 2 2 2 8 8 0 8
4 AC Milan 5 0 3 2 8 12 -4 3
As you may see Inter Milan and Real Madrid are tied by points, and the Inter is heading real madrid because its goal difference. The result that i want to get is this :
Pos Team Pld W D L F A GD Pts
1 FC Barcelona 5 2 3 0 8 5 3 9
2 **Real Madrid** 6 2 2 2 8 8 0 8
3 *Inter Milan* 6 2 2 2 11 10 1 8
4 AC Milan 5 0 3 2 8 12 -4 3
Notice that in this time the real madrid is in front the inter milan because it won the two direct matches between them.
i have a table for teams and other for the results.
I would like to achive this using a query in mysql if is possible. Or maybe it would be better if i do this ordering on the server side (PHP).
Thanks any help would be appreciated.
It is impossible to efficiently do what you request in a single query that would return the results you ask for and sort the ties in points with that criteria.
The reasoning is simple: lets assume that you could get a column in your query that would provide or help with the kind of sorting you want. That is to say, it would order teams that are tied in points according to which one has more victories over the others (as this is very likely to happen to more than 2 teams). To make that calculation by hand you would need a double-entry table that shows the amount of matches won between those teams as follows:
| TeamA | TeamB | TeamC
------------------------------
TeamA | 0 | XAB | XAC
TeamB | XBA | 0 | XBC
TeamC | XCA | XCB | 0
So you would just add up each column row and sorting in descending order would provide you the needed data.
The problem is that you don't know which teams are tied before you actually get the data. So creating that column for the general case would mean you need to create the whole table of every team against every team (which is no small task); and then you need to add the logic to the query to only add up the columns of a team against those that are tied with it in points... for which you need the original result set (that you should be creating with the same query anyhow).
It may be possible to get that information in a single query, but it will surely be way too heavy on the DB. You're better off adding that logic in code afterwards getting the data you know you will need (getting the amount of games won by TeamA against TeamB or TeamC is not too complicated). You would still need to be careful about how you build that query and how many you run; after all, during the first few games of a league you will have lots of teams tied up against each other so getting the data will effectively be the same as building the whole double-entry table I used as an example before for all teams against all teams.
create temporary in a stored procedure and call to procedure...
create temporary table tab1 (position int not null auto_increment ,
team_name varchar(200),
points int,
goal_pt int,
primary key(position));
insert into tab1(team_name,
points,
goal_pt)
select team_name,
points,
goal_pt
from team
order by points desc,
goal_pt desc ;
I'm storing categories using a hierarchical model like so:
CATEGORIES
id | parent_id | name
---------------------
1 | 0 | Cars
2 | 0 | Planes
3 | 1 | Hatchbacks
4 | 1 | Convertibles
5 | 2 | Jets
6 | 3 | Peugeot
7 | 3 | BMW
8 | 6 | 206
9 | 6 | 306
I then store actual data with one of these category ids like so:
CARS
vehicle_id | category_id | name
-------------------------------
1 | 8 | Really fast silver Peugeot 206
2 | 9 | Really fast silver Peugeot 306
3 | 5 | Really fast Boeing 747
4 | 3 | Another Peugeot but only in Hatchbacks category
When searching for any of this data, I would like to find all child / grandchild / great grandchild etc. etc. nodes. So if someone wants to see all "Cars", they see everything with a parent_id of "Hatchbacks", and so everything with a parent_id of "Peugeot", and so on, to an arbitrary level.
So if I list a "really fast Peugeot 206" with a category_id of either 1, 3, 6, or 8, my query should be able to "travel up" the tree and find any higher categories which are parents/grandparents of that child category. E.g. a user searching for Peugeots in category "8" should find any Peugeots listed with categories 6, 3, or 1 - all of which category 8's descendants.
E.g. using the above data, searching for "Peugeot" in category 3 should actually find vehicles 1, 2 and 4, because vehicles 1 and 2 have a category ancestor trail which leads back up to category 3. See?
Sorry if I haven't explained this well. It's difficult! Thank you, though.
Note: I have read the MySQL dev article on hierarchies.
Normalized models are great, but not when you actually have to query them.
Just store the "path" to your category in category table. Like this: path = /1/3/4 and when query you database like "select .... where path like '/1/3/%'" It will be much more simple and fast than multiple hierarchical queries...
This article can help you http://www.phpro.org/tutorials/Managing-Hierarchical-Data-with-PHP-and-MySQL.html
I like the explanation provided by SitePoint. It gives you code and explains the theory behind it.
http://blogs.sitepoint.com/hierarchical-data-database/
Note: this method is better for reads than for writes. If you're constantly writing to the tree, I'd use a different algorithm. This method is optimized for reads (lookups).
You've represented your data as an Adjacency List model, whose querying in MySQL is best done using session variables. Now, this is not the only way you can represent a hierarchy in a relational database. For your particular problem, I would probably use a materialized path approach instead, where you do away with the actual categories table and instead have a column on your cars table that looks like Cars/Hatchbacks/Peugeot on a per record basis and use LIKE queries. Unfortunately that would be slow as the number of records grew. Now, if you know the maximum depth of your hierarchy (e.g. four levels) you could break that out into separate columns instead, which you allow you to take advantage of indexing.
Im having a problem finding duplicate results in a mysql database (a cocktail recipe website). Here the setup:
Table 1: 'cocktail'
[cid,c_name] (cid = unique cocktail id, c_name = cocktail name)
Table 2: 'ingredients':
[iid,i_name] (iid = unique ingredient id, i_name = ingredient name)
Table 3: 'cocktail_ingredients' (the linking table)
[ciid,cid,iid] (ciid = unique row identifier, cid = cocktail cid, iid = ingredient iid)
So one cocktail can have multiple rows in the 'cocktail_ingredients' table (1 to many).
Setup is fine. The problem Im having now is finding if there are duplicate cocktails in my database.
For instance if the cocktail_ingredients table had these entries:
cid | iid
1 | 56
1 | 78
1 | 101
.
.
.
9 | 56
9 | 78
9 | 101
The cocktail is the same (for theoretical purposes here anyway).
If the 'cocktail_ingredients' table had one more row ...
9 | 103
Then it wouldn't be the same, as cocktail number 9 includes an extra ingredient.
So the mysql has to do 2 checks, firstly that the ingredient count is the same, and secondly that every ingredient id (iid) is the same for corresponding cocktails (cid).
Im stumped on this one, any help much appreciated. I'm thinking I might have to head down the PHP route as well to code in something more complex, but I'm struggling there as well so thought this would be a good place to stop and ask.
Thanks a ton
Nick
You may recall from a distant math class that the definition of set equality is that both A abd B are subsets of one another (non-strict) so just create a view or procedure that checks if every thin that is in A is also in B, then check the two cocktails are both subsets of one another. This is far from a complete answer, but it may be enough to get you going ;)
it will probably be easier to do the negation - find an ingredient in A that is not in B. none exist, then A must be a strict subset of B (assuming A and B can't both be empty)
Alternatively do a count of each ingredient in A, each ingredient ion B and each ingredient in A and B then if they are equal they are equivalent cocktails
CREATE VIEW ingredient_count AS
SELECT cid, count(*) as ingredients
FROM cocktail_ingredients
GROUP BY cid
CREATE VIEW shared_ingredients AS
SELECT c1.cid cid1, c2.cid cid2, count(*) as ingredients
FROM cocktail_ingredients as c1 INNER JOIN cocktail_ingredients as c2
ON (c1.cid != c2.cid AND c1.iid = c2.iid)
GROUP BY c1.cid,c2.cid
CREATE VIEW duplicates AS
SELECT cid1,cid2
FROM (ingredient_count AS ic1 INNER JOIN shared_ingredients
ON ic1.cid=cid1) INNER JOIN ingredient_count as ic2
ON ic2.cid=cid2
WHERE ic1.ingredients=ic2.ingredients
AND shared_ingredients=ic1.ingredients
Note this may be much faster in mysql with subselects with sensible where clauses rather than views, but this is easier to understand
You can impose such checking using TRIGGER.
But, yet there is a conceptual problem.
Say, you have two cocktails {1 | 56, 78, 101} and {9 | 56, 78, 101, 103} and also assume that you have implemented the check.
Now, you are inserting data for 1:
cid | iid
----------
1 | 56
Then, add rest two ingredients...
cid | iid
----------
1 | 56
1 | 78
1 | 101
Fine, now you start adding 9:
cid | iid
----------
1 | 56
1 | 78
1 | 101
9 | 56
You have three more ingredients, so continue adding them:
cid | iid
----------
1 | 56
1 | 78
1 | 101
9 | 56
9 | 78
Two more remaining (101,103)
But alas! You cannot add 101! If you try to add 101, then 9 would become identical to 1, which your trigger will prevent you from adding.
When a cocktail is subset of another, you have to add the subset later. I hope I could make you understand this.
You should not put any restriction in database. What I would do in my web application is:
In the cocktail entry/update interface, I would take user input (and not yet insert/update in DB)
When user clicks the save button (I would add a save button), check if the new/updated cocktail becomes copy of another (May be I would write a stored procedure, but it can be found using a select query only)
If the new/updated cocktail is not duplicate of another, insert/update database. If
Earlier I asked this question, which basically asked how to list 10 winners in a table with many winners, according to their points.
This was answered.
Now I'm looking to search for a given winner X in the table, and find out what position he is in, when the table is ordered by points.
For example, if this is the table:
Winners:
NAME:____|__POINTS:
Winner1 | 1241
Winner2 | 1199
Sally | 1000
Winner4 | 900
Winner5 | 889
Winner6 | 700
Winner7 | 667
Jacob | 623
Winner9 | 622
Winner10 | 605
Winner11 | 600
Winner12 | 586
Thomas | 455
Pamela | 434
Winner15 | 411
Winner16 | 410
These are possible inputs and outputs for what I want to do:
Query: "Sally", "Winner12", "Pamela", "Jacob"
Output: 3 12 14 623
How can I do this? Is it possible, using only a MySQL statement? Or do I need PHP as well?
This is the kind of thing I want:
WHEREIS FROM Winners WHERE Name='Sally' LIMIT 1
Ideas?
Edit - NOTE: You do not have to deal with the situation where two Winners have the same Points (assume for simplicity's sake that this does not happen).
I think this will get you the desired result. Note that i properly handles cases where the targeted winner is tied for points with another winner. (Both get the same postion).
SELECT COUNT(*) + 1 AS Position
FROM myTable
WHERE Points > (SELECT Points FROM myTable WHERE Winner = 'Sally')
Edit:
I'd like to "plug" Ignacio Vazquez-Abrams' answer which, in several ways, is better than the above.
For example, it allows listing all (or several) winners and their current position.
Another advantage is that it allows expressing a more complicated condition to indicate that a given player is ahead of another (see below). Reading incrediman's comment to the effect that there will not be "ties" prompted me to look into this; the query can be slightly modified as follow to handle the situation when players have same number of points (such players would formerly have been given the same Position value, now the position value is further tied to their relative Start values).
SELECT w1.name, (
SELECT COUNT(*)
FROM winners AS w2
WHERE (w2.points > w1.points)
OR (W2.points = W1.points AND W2.Start < W1.Start) -- Extra cond. to avoid ties.
)+1 AS rank
FROM winners AS w1
-- WHERE W1.name = 'Sally' -- optional where clause
SELECT w1.name, (
SELECT COUNT(*)
FROM winners AS w2
WHERE w2.points > w1.points
)+1 AS rank
FROM winners AS w1
I am not sure if this is possible in mySQL. Here are my tables:-
Categories table:
id
name
parent_id (which points to Categories.id)
I use the above table to map all the categories and sub-categories.
Products table:
id
name
category_id
The category_id in the Products table points to the sub-category id in which it belongs.
e.g. If I have Toys > Educational > ABC where ABC is product, Toys is Category and Educational is sub Category, then ABC will have category_id as 2.
Now the problem is that I want to use a SQL query to display all the products (in all the sub-categories and their sub-categories.. n level) for a particular category.
e.g.:
select * from categories,products where category.name = 'Toys' and ....
The above query should display the products from Educational also and all other sub categories and their subcategories.
Is this possible using a mySQL query? If not what options do I have? I would like to avoid PHP recursion.
Update: Basically I want to display the top 10 products in the main category which I will be doing by adding a hits column to products table.
What I've done in previous projects where I've needed to do the same thing, I added two new columns.
i_depth: int value of how deep the category is
nvc_breadcrumb: complete path of the category in a breadcrumb type of format
And then I added a trigger to the table that houses the category information to do the following (all three updates are in the same trigger)...
-- Reset all branches
UPDATE t_org_branches
SET nvc_breadcrumb = NULL,
i_depth = NULL
-- Update the root branches first
UPDATE t_org_branches
SET nvc_breadcrumb = '/',
i_depth = 0
WHERE guid_branch_parent_id IS NULL
-- Update the child branches on a loop
WHILE EXISTS (SELECT * FROM t_branches WHERE i_depth IS NULL)
UPDATE tobA
SET tobA.i_depth = tobB.i_depth + 1,
tobA.nvc_breadcrumb = tobB.nvc_breadcrumb + Ltrim(tobA.guid_branch_parent_id) + '/'
FROM t_org_branches AS tobA
INNER JOIN t_org_branches AS tobB ON (tobA.guid_branch_parent_id = tobB.guid_branch_id)
WHERE tobB.i_depth >= 0
AND tobB.nvc_breadcrumb IS NOT NULL
AND tobA.i_depth IS NULL
And then just do a join with your products table on the category ID and do a "LIKE '%/[CATEGORYID]/%' ". Keep in mind that this was done in MS SQL, but it should be easy enough to translate into a MySQL version.
It might just be compatible enough for a cut and paste (after table and column name change).
Expansion of explanation...
t_categories (as it stands now)...
Cat Parent CategoryName
1 NULL MyStore
2 1 Electronics
3 1 Clothing
4 1 Books
5 2 Televisions
6 2 Stereos
7 5 Plasma
8 5 LCD
t_categories (after modification)...
Cat Parent CategoryName Depth Breadcrumb
1 NULL MyStore NULL NULL
2 1 Electronics NULL NULL
3 1 Clothing NULL NULL
4 1 Books NULL NULL
5 2 Televisions NULL NULL
6 2 Stereos NULL NULL
7 5 Plasma NULL NULL
8 5 LCD NULL NULL
t_categories (after use of the script I gave)
Cat Parent CategoryName Depth Breadcrumb
1 NULL MyStore 0 /
2 1 Electronics 1 /1/
3 1 Clothing 1 /1/
4 1 Books 1 /1/
5 2 Televisions 2 /1/2/
6 2 Stereos 2 /1/2/
7 5 LCD 3 /1/2/5/
8 7 Samsung 4 /1/2/5/7/
t_products (as you have it now, no modifications)...
ID Cat Name
1 8 Samsung LNT5271F
2 7 LCD TV mount, up to 36"
3 7 LCD TV mount, up to 52"
4 5 HDMI Cable, 6ft
Join categories and products (where categories is C, products is P)
C.Cat Parent CategoryName Depth Breadcrumb ID p.Cat Name
1 NULL MyStore 0 / NULL NULL NULL
2 1 Electronics 1 /1/ NULL NULL NULL
3 1 Clothing 1 /1/ NULL NULL NULL
4 1 Books 1 /1/ NULL NULL NULL
5 2 Televisions 2 /1/2/ 4 5 HDMI Cable, 6ft
6 2 Stereos 2 /1/2/ NULL NULL NULL
7 5 LCD 3 /1/2/5/ 2 7 LCD TV mount, up to 36"
7 5 LCD 3 /1/2/5/ 3 7 LCD TV mount, up to 52"
8 7 Samsung 4 /1/2/5/7/ 1 8 Samsung LNT5271F
Now assuming that the products table was more complete so that there is stuff in each category and no NULLs, you could do a "Breadcrumb LIKE '%/5/%'" to get the last three items of the last table I provided. Notice that it includes the direct items and children of the category (like the Samsung tv). If you want ONLY the specific category items, just do a "c.cat = 5".
I think the cleanest way to achieve this would be to use the nested set model. It's a bit complicated to implement, but powerful to use. MySQL has a tutorial named Managing Hierarchical Data in MySQL. One of the big SQL gurus Joe Celko wrote about the same thing here. If you need even more information have a look at Troel's links on storing hierarchical data.
In my case I would stay away from using a RDBMS to store this kind of data and use a graph database instead, as the data in this case actually is a directed graph.
Add a column to the Categories table that will contain the complete comma-delimited tree for each group. Using your example, sub-category Educational would have this as the tree '1,2', where 1 = Toys, 2 = Educational (it includes itself). The next nested level of categories would keep adding to the tree.
To get all products in a group, you use MySQL's FIND_IN_SET function, like so
SELECT p.ID
FROM Products p INNER JOIN Categories c ON p.category_ID = c.ID
WHERE FIND_IN_SET(your_category_id, c.tree)
I wouldn't use this method for big tables, as I don't think this query can use an index.
One way is to maintain a table that contains the ancestor to descendant relationships. You can query this particular table and get the list of all dependents.
Assuming MySQL, it'll be difficult to avoid recursion in PHP.
Your question is, essentially, how to mimic Oracle's CONNECT BY PRIOR syntax in MySQL. People ask this question repeatedly but it's a feature that's never made it in to MySQL and implementing is via stored procedures probably won't work because (now) stored functions cannot be recursive.
Beware of the database kludges offered so far.
The best information so far are the three links from nawroth:
Managing Hierarchical Data in MySQL, including the Nested Set Model.
Trees in SQL - nested set model by Joel Celko
Troels' links: Relational database systems: Hierarchical data in RDBMSs
How big is the table Categories? You may need to cache this on the application level and construct the appropriate query: ... where id in (2, 3, 6, 7)
Also, it's best if you fetch categories by id which is their unique ID, indexed and fast as opposed to finding by name.
Bear with me, because I have never done something like this.
BEGIN
SET cat = "5";
SET temp = "";
WHILE STRCMP(temp, cat) != 0 DO
SET temp = cat;
SET cat = SELECT CONCAT_WS(GROUP_CONCAT(id), cat) FROM Categories GROUP BY (parent_id) HAVING FIND_IN_SET(parent_id, cat);
END LOOP;
END;
SELECT * FROM products WHERE FIND_IN_SET(category_id, cat)
I can almost guarantee the above won't work, but you can see what I'm trying to do. I got this far and I just decided to not finish the end of the query (select the top N from each category), sorry. :P