How to get a list of similar items

How to get a list of similar items - php

I have 4 tables:
items
+----+------+---------+-----+
| id | name | city_id | ... |
+----+------+---------+-----+
attributes
+----+------+-----+
| id | name | ... |
+----+------+-----+
item_attribute
+----+---------+--------------+
| id | item_id | attribute_id |
+----+---------+--------------+
city
+----+------+-----+
| id | name | ... |
+----+------+-----+
Items and attributes have relations many-to-many.
Item is located only in one city one-to-many
Question:
I'm using php (Laravel). How can I get Items list (with LIMIT) for one Item with similar attributes in one city? Attribute list is never equals for 2 items.
Is it possible to do with MySQL query?
Example:
| ItemName | Attributes | City |
+----------+-----------------------+------+
| Alpha | one, two, three, four | NY |
| Beta | five, six, seven | NY |
| Gamma | one, three, seven | NY |
| Delta | one, six, eight | CA |
| Epsilon | two, three, four | NY |
| Zeta | ten, nine | NY |
I want to choose similar items for Alpha, they will be: Gamma, Epsilon because they have similar attributes.
Delta won't be chosen, because it's located in another city.

If you have both the item_id and the city_id to pass in:
SELECT i.name,
GROUP_CONCAT(a.name) attributes,
c.name
FROM items i
JOIN city c
ON c.id = i.city_id
JOIN item_attribute ia
ON ia.item_id = i.id
AND EXISTS (
SELECT 1
FROM item_attribute ia1
JOIN item_attribute ia2
ON ia2.attribute_id = ia1.attribute_id
AND ia2.item_id = ia.item_id
WHERE ia1.item_id = :item_id /* Pass in item id variable */
)
JOIN attributes a
ON a.id = ia.attribute_id
WHERE i.city_id = :city_id /* Pass in city id variable */
GROUP BY i.name, c.name
If you just want to pass the example item id: (A little bit sloppy, but should work)
SELECT i.name,
GROUP_CONCAT(a.name) attributes,
c.name
FROM items base
JOIN items i
ON i.city_id = base.city_id
JOIN city c
ON c.id = i.city_id
JOIN item_attribute ia
ON ia.item_id = i.id
AND EXISTS (
SELECT 1
FROM item_attribute ia1
JOIN item_attribute ia2
ON ia2.attribute_id = ia1.attribute_id
AND ia2.item_id = ia.item_id
WHERE ia1.item_id = base.id
)
JOIN attributes a
ON a.id = ia.attribute_id
WHERE base.id = :item_id /* Pass in item id variable */
GROUP BY i.name, c.name
** UPDATE **
Ordering:
...
JOIN (
SELECT ia2.item_id, COUNT(*) count
FROM item_attribute ia1
JOIN item_attribute ia2
ON ia2.attribute_id = ia1.attribute_id
AND ia2.item_id = ia1.item_id
/* AND ia2.id != ia1.id /* If you don't want the original item */
WHERE ia1.item_id = base.id
GROUP BY ia2.item_id
) similar
ON similar.id = ia.item_id
...
ORDER BY similar.count DESC

You can perform INNER JOINS in all
SELECT I.name,I_A.name,city.name FROM attributes as A
INNER JOIN item_attribute as I_A ON I_A.attribute_id = A.id
INNER JOIN city ON I_A.id = city.id
INNER JOIN items as I ON I.id = I_A.item_id
WHERE <Your condition>
To get comma separated values you can refer here
Let me know if I am not getting your point.

Related

count total volume by category

I have 3 tables
Company_categories
companies
daily_rates
I want to count total volume industry wise of all the company exist within the category of companies
for example category A contains 3 companies and category B contains 5 companies so I want to sum total volume of all 3 companies in category A and so on for all categories I tried to do so but I am confused how to do with 3rd table as of I am easily been able to count the companies contains in category but not sure how to count the volume of all companies exist in a category
my table structure
company_categories
id name
+------------+----------+
| 1 | A |
|------------|----------|
| 2 | B |
|------------|----------|
companies
id name category
+------------+----------+-----------+
| 1 | co 1 | 1 |
|------------|----------|-----------|
| 2 | co 2 | 2 |
|------------|----------|-----------|
| 3 | co 3 | 1 |
|------------|----------|-----------|
daily_stock_rates
id traded_volume company_id
+------------+------------------+---------------+
| 1 | 40 | 1 |
|------------|------------------|---------------|
| 2 | 80 | 2 |
|------------|------------------|---------------|
| 3 | 30 | 3 |
|------------|------------------|---------------|
here is my code
$sql = mysqli_query($connect, "SELECT c.id category_id, c.name category_name, com.id, com.category count( dsr.total_traded_volume ) total_volume
FROM company_categories c
INNER JOIN companies com ON c.id = com.category
LEFT JOIN daily_stock_rates dsr ON com.id = dsr.company_id
GROUP BY com.category
ORDER BY total_volume DESC LIMIT 10");
while($data = mysqli_fetch_assoc($sql)) {
echo $data['category_name'] . ": .".$data['total_volume'];
echo "<br />";
}
Can Anyone help me out

joint twice to get all the rates related to the category
SELECT cat.name, SUM(rat.traded_volume) volume
FROM company_categories cat
JOIN companies comp ON comp.category = cat.id
JOIN daily_stock_rates rat ON rat.company_id = comp.id
GROUP BY cat.name
ORDER BY volume DESC
LIMIT 10
Most important diff to your query:
you need SUM(), not COUNT()
select only what you asked: volume by category. You cannot select
names of companies alongside (what company you would want to see
there anyway)

I don't think it's the join you got wrong, it's the select list and the group by clause.
First of all, if you want the total of the volume, then use sum(), not count(). Also, do not include so many fields in the select list if you want the total by category:
SELECT c.id category_id, c.name category_name, sum( dsr.total_traded_volume ) total_volume
FROM company_categories c
LEFT JOIN companies com ON c.id = com.category
LEFT JOIN daily_stock_rates dsr ON com.id = dsr.company_id
GROUP BY c.id
ORDER BY total_volume DESC LIMIT 10

MySQL Join three tables and display 0 if null

I have three tables:
person_table
id| name | gender
1 | Joe | male
2 | Jane |female
3 | Janet | female
4| Jay | male
etc...
product_table
id| name
1 | magazine
2 | book
3 |paper
4 | novel
etc...
**person_product
person_id| product_id | quantity
1 | 1 | 1
1 | 3 | 3
2 | 3 | 1
4 | 4 | 2
etc...
I have tried to make a query that will return a table like this:
person_id| person_name | product_name| quantity
but i can't make it so that if lets say John has no books, it should display
(johns id) John|book|0
instead of just skipping this line.
Where did i go wrong?
here is what i managed to come up with:
SELECT p.*, f.name, l.quantity
FROM person_product AS l
INNER JOIN people_table AS p ON l.person_id=p.id
INNER JOIN product_table AS f ON l.product_id=f.id
ORDER BY id`

It seems that you're generating a report of all people, against all products with the relevant quantity; on a large data set this could take a while as you're not specifically joining product to person for anything other than quantity:
SELECT
p.id,
p.name,
p.gender,
f.name,
IFNULL(l.quantity,0) AS quantity
FROM person_table AS p
JOIN product_table AS f
LEFT JOIN person_product AS l
ON l.person_id = p.id
AND l.product_id = f.id
ORDER BY p.id, f.name
Which results in:
Is that more-or-less what you're after?

you need to start with people_table than using left join you need to bring other table data.
as you need 0 value if null than you can use function IFNULL
SELECT p.*, f.name, IFNULL(l.quantity,0)
FROM people_table AS p
LEFT JOIN person_product AS l ON l.person_id=p.id
LEFT JOIN product_table AS f ON l.product_id=f.id
ORDER BY p.id

if has no book shouldn't appear in the table , try this (easy to understand) :
SELECT NAME
,'0'
,'0'
FROM person_table
WHERE id NOT IN (
SELECT person_id
FROM person_product
)
UNION
SELECT person_id
,product_id
,quantity
FROM person_product;

Mysql query inside a query

First, apologies if the title doesn't match the question. Well, the problem is how to build this query...
I have a table called category It contains categories of my stuff(movies). It's like this...
--------------------------------
ID | name | parent_category
--------------------------------
1 | love | 0
2 | action | 0
3 | fear | 0
4 | passion| 1
5 | danger | 2
6 | death | 3
--------------------------------
So, as you see, each category has a parent category. Except the first 3. They're parents.
And movies table is like this...
--------------------------------
ID | name | category
--------------------------------
1 | aaaa | 1
2 | bbbbbb | 2
3 | cccc | 2
4 | ddddddd| 1
5 | eeeeee | 3
6 | fffff | 3
--------------------------------
So, what i want to do is, to select movies by parent category. Which means if I click category, love, it should select all the movies of categories that having love as the parent category.
So, how to write this in a single query ?

If the parents are only one level deep, then you can use joins:
select m.*,
coalesce(cp.id, c.id) as parent_id,
coalesce(cp.name, c.name) as parent_name
from movies m left join
categories c
on m.category = c.id left join
categories cp
on c.parent_category = cp.id;
Actually, if you only want the id, you don't need two joins:
select m.*,
(case when c.parent_id > 0 then c.parent_id else c.id end) as parent_id
from movies m left join
categories c
on m.category = c.id ;
Or, more simply:
select m.*, greatest(c.parent_id, c.id) as parent_id
. . .

to select rows filtered by condition on secend table use join in FROM clause or subquery in condition with IN or EXISTS function. To compare field with some string you can use LIKE operator.

If you are filtering based on parent_category -
SELECT b.*, a.name FROM movies b
LEFT JOIN categories a ON a.id = b.category
WHERE a.parent_category = 1;

MySQL / PHP: Find similar / related items by tag / taxonomy

I have a cities table which looks like this.
|id| Name |
|1 | Paris |
|2 | London |
|3 | New York|
I have a tags table which looks like this.
|id| tag |
|1 | Europe |
|2 | North America |
|3 | River |
and a cities_tags table:
|id| city_id | tag_id |
|1 | 1 | 1 |
|2 | 1 | 3 |
|3 | 2 | 1 |
|4 | 2 | 3 |
|5 | 3 | 2 |
|6 | 3 | 3 |
How do I calculate which are the most closely related city? For example. If I were looking at city 1 (Paris), the results should be: London (2), New York (3)
I have found the Jaccard index but I'm unsure as how best to implement this.

You question about How do I calculate which are the most closely related city? For example. If I were looking at city 1 (Paris), the results should be: London (2), New York (3) and based on your provided data set there is only one thing to relate that is the common tags between the cities so the cities which shares the common tags would be the closest one below is the subquery which finds the cities (other than which is provided to find its closest cities) that shares the common tags
SELECT * FROM `cities` WHERE id IN (
SELECT city_id FROM `cities_tags` WHERE tag_id IN (
SELECT tag_id FROM `cities_tags` WHERE city_id=1) AND city_id !=1 )
Working
I assume you will input one of the city id or name to find their closest one in my case "Paris" has the id one
SELECT tag_id FROM `cities_tags` WHERE city_id=1
It will find all the tags id which paris has then
SELECT city_id FROM `cities_tags` WHERE tag_id IN (
SELECT tag_id FROM `cities_tags` WHERE city_id=1) AND city_id !=1 )
It will fetch all the cities except paris that has the some same tags that paris also has
Here is your Fiddle
While reading about the Jaccard similarity/index found some stuff to understand about the what actualy the terms is lets take this example we have two sets A & B
Set A={A, B, C, D, E}
Set B={I, H, G, F, E, D}
Formula to calculate the jaccard similarity is JS=(A intersect B)/(A
union B)
A intersect B = {D,E}= 2
A union B ={A, B, C, D, E,I, H, G, F} =9
JS=2/9 =0.2222222222222222
Now move towards your scenario
Paris has the tag_ids 1,3 so we make the set of this and call our Set
P ={Europe,River}
London has the tag_ids 1,3 so we make the set of this and call our
Set L ={Europe,River}
New York has the tag_ids 2,3 so we make the set of this and call our
Set NW ={North America,River}
Calculting the JS Paris with London JSPL = P intersect L / P union L ,
JSPL = 2/2 = 1
Calculting the JS Paris with New York JSPNW = P intersect NW / P
union NW ,JSPNW = 1/3 = 0.3333333333
Here is the query so far which calcluates the perfect jaccard index you can see the below fiddle example
SELECT a.*,
( (CASE WHEN a.`intersect` =0 THEN a.`union` ELSE a.`intersect` END ) /a.`union`) AS jaccard_index
FROM (
SELECT q.* ,(q.sets + q.parisset) AS `union` ,
(q.sets - q.parisset) AS `intersect`
FROM (
SELECT cities.`id`, cities.`name` , GROUP_CONCAT(tag_id SEPARATOR ',') sets ,
(SELECT GROUP_CONCAT(tag_id SEPARATOR ',') FROM `cities_tags` WHERE city_id= 1)AS parisset
FROM `cities_tags`
LEFT JOIN `cities` ON (cities_tags.`city_id` = cities.`id`)
GROUP BY city_id ) q
) a ORDER BY jaccard_index DESC
In above query i have the i have derived the result set to two subselects in order get my custom calculated aliases
You can add the filter in above query not to calculate the similarity with itself
SELECT a.*,
( (CASE WHEN a.`intersect` =0 THEN a.`union` ELSE a.`intersect` END ) /a.`union`) AS jaccard_index
FROM (
SELECT q.* ,(q.sets + q.parisset) AS `union` ,
(q.sets - q.parisset) AS `intersect`
FROM (
SELECT cities.`id`, cities.`name` , GROUP_CONCAT(tag_id SEPARATOR ',') sets ,
(SELECT GROUP_CONCAT(tag_id SEPARATOR ',') FROM `cities_tags` WHERE city_id= 1)AS parisset
FROM `cities_tags`
LEFT JOIN `cities` ON (cities_tags.`city_id` = cities.`id`) WHERE cities.`id` !=1
GROUP BY city_id ) q
) a ORDER BY jaccard_index DESC
So the result shows Paris is closely related to London and then related to New York
Jaccard Similarity Fiddle

select c.name, cnt.val/(select count(*) from cities) as jaccard_index
from cities c
inner join
(
select city_id, count(*) as val
from cities_tags
where tag_id in (select tag_id from cities_tags where city_id=1)
and not city_id in (1)
group by city_id
) as cnt
on c.id=cnt.city_id
order by jaccard_index desc
This query is statically referring to city_id=1, so you'll have to make that a variable in both the where tag_id in clause, and the not city_id in clause.
If I understood the Jaccard index properly, then it also returns that value ordered by the 'most closely related'. The results in our example look like this:
|name |jaccard_index |
|London |0.6667 |
|New York |0.3333 |
Edit
With a better understanding of how to implement Jaccard Index:
After reading a bit more on wikipedia about the Jaccard Index, I've come up with a better way implement a query for our example dataset. Essentially, we will be comparing our chosen city with each other city in the list independently, and using the count of common tags divided by the count of distinct total tags selected between the two cities.
select c.name,
case -- when this city's tags are a subset of the chosen city's tags
when not_in.cnt is null
then -- then the union count is the chosen city's tag count
intersection.cnt/(select count(tag_id) from cities_tags where city_id=1)
else -- otherwise the union count is the chosen city's tag count plus everything not in the chosen city's tag list
intersection.cnt/(not_in.cnt+(select count(tag_id) from cities_tags where city_id=1))
end as jaccard_index
-- Jaccard index is defined as the size of the intersection of a dataset, divided by the size of the union of a dataset
from cities c
inner join
(
-- select the count of tags for each city that match our chosen city
select city_id, count(*) as cnt
from cities_tags
where tag_id in (select tag_id from cities_tags where city_id=1)
and city_id!=1
group by city_id
) as intersection
on c.id=intersection.city_id
left join
(
-- select the count of tags for each city that are not in our chosen city's tag list
select city_id, count(tag_id) as cnt
from cities_tags
where city_id!=1
and not tag_id in (select tag_id from cities_tags where city_id=1)
group by city_id
) as not_in
on c.id=not_in.city_id
order by jaccard_index desc
The query is a bit lengthy, and I don't know how well it will scale, but it does implement a true Jaccard Index, as requested in the question. Here are the results with the new query:
+----------+---------------+
| name | jaccard_index |
+----------+---------------+
| London | 1.0000 |
| New York | 0.3333 |
+----------+---------------+
Edited again to add comments to the query, and take into account when the current city's tags are a subset of the chosen city's tags

Too late, but I think that none of answers are fully correct. I got the best part of each one and put all together to make my own answer:
The Jaccard Index explanaiton of #m-khalid-junaid is very interesting and correct, but the implementation of (q.sets + q.parisset) AS union and (q.sets - q.parisset) AS intersect is very wrong.
The version of #n-lx is the way, but needs the Jaccard Index, this is very important, if a city have 2 tags and matches two tags of another city with 3 tags, the result will be the same of the matches on another city with only the same two tags. I think the full matches is most related.
My answer:
cities table like this.
| id | Name |
| 1 | Paris |
| 2 | Florence |
| 3 | New York |
| 4 | São Paulo |
| 5 | London |
cities_tag table like this.
| city_id | tag_id |
| 1 | 1 |
| 1 | 3 |
| 2 | 1 |
| 2 | 3 |
| 3 | 1 |
| 3 | 2 |
| 4 | 2 |
| 5 | 1 |
| 5 | 2 |
| 5 | 3 |
With this sample data, Florence have a full matches with Paris, New York matches one tag, São Paulo have no tags matches and London matches two tags and have another one. I think the Jaccard Index of this sample is:
Florence: 1.000 (2/2)
London: 0.666 (2/3)
New York: 0.333 (1/3)
São Paulo: 0.000 (0/3)
My query is like this:
select jaccard.city,
jaccard.intersect,
jaccard.union,
jaccard.intersect/jaccard.union as 'jaccard index'
from
(select
c2.name as city
,count(ct2.tag_id) as 'intersect'
,(select count(distinct ct3.tag_id)
from cities_tags ct3
where ct3.city_id in(c1.id, c2.id)) as 'union'
from
cities as c1
inner join cities as c2 on c1.id != c2.id
left join cities_tags as ct1 on ct1.city_id = c1.id
left join cities_tags as ct2 on ct2.city_id = c2.id and ct1.tag_id = ct2.tag_id
where c1.id = 1
group by c1.id, c2.id) as jaccard
order by jaccard.intersect/jaccard.union desc
SQL Fidde

This query is without any fancy functions or even sub queries. It is fast. Just make sure cities.id, cities_tags.id, cities_tags.city_id and cities_tags.tag_id have an index.
The queries returns a result containing: city1, city2 and the count of how many tags city1 and city2 have in common.
select
c1.name as city1
,c2.name as city2
,count(ct2.tag_id) as match_count
from
cities as c1
inner join cities as c2 on
c1.id != c2.id -- change != into > if you dont want duplicates
left join cities_tags as ct1 on -- use inner join to filter cities with no match
ct1.city_id = c1.id
left join cities_tags as ct2 on -- use inner join to filter cities with no match
ct2.city_id = c2.id
and ct1.tag_id = ct2.tag_id
group by
c1.id
,c2.id
order by
c1.id
,match_count desc
,c2.id
Change != into > to avoid each city to be returned twice. Meaning a city will then no longer appears once in the first column as well as once in the second column.
Change the two left join into inner join if you don't want to see the city combinations that have no tag matches.

Could this be a push in the right direction?
SELECT cities.name, (
SELECT cities.id FROM cities
JOIN cities_tags ON cities.id=cities_tags.city_id
WHERE tags.id IN(
SELECT cities_tags.tag_id
FROM cites_tags
WHERE cities_tags.city_id=cites.id
)
GROUP BY cities.id
HAVING count(*) > 0
) as matchCount
FROM cities
HAVING matchCount >0
What I tried was this:
// Find the citynames:
Get city.names (SUBQUERY) as matchCount FROM cities WHERE matchCount >0
// the subquery:
select the amount of tags cities have which (SUBSUBQUERY) also has
// the subsubquery
select the id of the tags the original name has

Counting the most amount of items assigned to a user using greatest

I am trying to query 5 separate tables in my mysql database, and display the actor with the most amount of items assigned to them. The table structures are as follows;
item
itemid | item | description | brand | date | time | path |
actor
actorid | name | actorthumb | bio |
brand
brandid | brandname | description | image |
movie
movieid | title | genre | year | moviethumb | synopsis|
request
requestid | userid | itemid | brandid | movieid | actorid | content | requestdate |
I presume I need to join the request, actor and items table together, using COUNT function count how many items are assigned to an actor, then use GREATEST to display the actor with the highest amount of items assigned to them?
The query to join all the tables is
$query = "SELECT greatest i.*, a.*, b.*, m.*, r.* FROM item AS i, actor AS a, brand AS b, movie AS m, request AS r
WHERE r.itemid = i.itemid
AND r.actorid = a.actorid
AND r.brandid = b.brandid
AND r.movieid = m.movieid";
Please confirm the best way to do the above?

SELECT a.actorid, a.name
FROM request r
INNER JOIN actor a
ON r.actorid = a.actorid
GROUP BY a.actorid, a.name
ORDER BY COUNT(DISTINCT r.itemid) DESC
LIMIT 5

The query to join all the tables is
$query = "SELECT i.*, a.*, b.*, m.*, r.* FROM item AS i, actor AS a, brand AS b, movie AS m, request AS r
WHERE r.itemid = i.itemid
AND r.actorid = a.actorid
AND r.brandid = b.brandid
AND r.movieid = m.movieid
ORDER BY count(i.*) DESC";

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

How to get a list of similar items - php

Related

count total volume by category

MySQL Join three tables and display 0 if null

Mysql query inside a query

MySQL / PHP: Find similar / related items by tag / taxonomy

Counting the most amount of items assigned to a user using greatest

Categories

Resources