SQL greatest-n-per-group with relational table joins - php

I have 3 tables. image, categories, image_category.
image: id | title | imageURL
categories: cat_id | cat_name
image_category: image_id | cat_id
My current query to select all images in order newest to oldest is:
SELECT image.id as ID, image.title as title, categories.cat_name as CAT
FROM image_category
LEFT JOIN image
ON image_category.image_id = image.id
INNER JOIN categories
ON image_category.cat_id = categories.cat_id
ORDER BY ID DESC
I would like to show the newest 4 images per category. The largest image.id are the newest images.
For example. If I had 3 categories and 40 images in each category. I want to show the newest 4 images from each category. I will later be trying to show the next 4 per category after that and then the next 4 per category until there are no images left.
This solution seems like what im looking for.
SELECT i1.*
FROM item i1
LEFT OUTER JOIN item i2
ON (i1.category_id = i2.category_id AND i1.item_id < i2.item_id)
GROUP BY i1.item_id
HAVING COUNT(*) < 4
ORDER BY category_id, date_listed;
but I have a relational table connecting my image_id and category_id. Cant figure out how to implement this with that extra table join.
Would appreciate help from an SQL guru.

You're almost there, you just need to do the grouping using your item_category table since that's where the cat_id's are.
SELECT ...
FROM item_category AS c1
LEFT OUTER JOIN item_category AS c2
ON c1.cat_id = c2.cat_id AND c1.image_id < c2.image_id
GROUP BY c1.cat_id
HAVING COUNT(*) < 4
Then once you've got that, you know that c1 contains the top four images per category. You can then join c1 to the image table to get other attributes:
SELECT i.id, i.title, c.cat_name AS CAT
FROM item_category AS c1
LEFT OUTER JOIN item_category AS c2
ON c1.cat_id = c2.cat_id AND c1.image_id < c2.image_id
INNER JOIN image AS on c1.image_id = i.id
INNER JOIN categories AS c on c1.cat_id = c.id
GROUP BY c1.image_id
HAVING COUNT(*) < 4;
Although this isn't strictly legal SQL due to the single-value rule, MySQL will permit it.
Copied from comments thread:
I would fetch the full result, store it in a cache, and then iterate over it however I want, using application code. That would be far simpler and have better performance. SQL is powerful, but another solution may be easier to develop, debug, and maintain.
You can certainly use LIMIT to iterate through the result set:
SELECT i.id, i.title, c.cat_name AS CAT
FROM item_category AS c1
LEFT OUTER JOIN item_category AS c2
ON c1.cat_id = c2.cat_id AND c1.image_id < c2.image_id
INNER JOIN image AS on c1.image_id = i.id
INNER JOIN categories AS c on c1.cat_id = c.id
GROUP BY c1.image_id
HAVING COUNT(*) < 4
ORDER BY c.cat_id
LIMIT 4 OFFSET 16;
But keep in mind that doing an OFFSET means that it has to run the query over again each time you view another set of them. There are optimizations in MySQL so that it quits a query once it has found enough rows, but it's still expensive if you iterate frequently, and advance far into the series of pages.
Two possible optimizations you can use: One is to cache part of the result, on the theory that few users will want to advance through every page of a large paginated result. So for example, fetch enough to populate ten pages worth of results, and cache that. It reduces the number of queries a lot, and perhaps only 1% of the times will user advance into the next set of ten pages.
SELECT i.id, i.title, c.cat_name AS CAT
FROM item_category AS c1
LEFT OUTER JOIN item_category AS c2
ON c1.cat_id = c2.cat_id AND c1.image_id < c2.image_id
INNER JOIN image AS on c1.image_id = i.id
INNER JOIN categories AS c on c1.cat_id = c.id
GROUP BY c1.image_id
HAVING COUNT(*) < 4
ORDER BY c.cat_id
LIMIT 40 OFFSET 40; /* second set of ten pages */
Another optimization, if you can assume that any view of page N will be coming from a view of page N-1, is for the request to filter the categories based on the greatest category id seen in the N-1st page. You need to do it this way because OFFSET works by row number in the result set, but indexed offsets work by values found on those rows. These aren't the same offset if there may be gaps or unused cat_id values.
SELECT i.id, i.title, c.cat_name AS CAT
FROM item_category AS c1
LEFT OUTER JOIN item_category AS c2
ON c1.cat_id = c2.cat_id AND c1.image_id < c2.image_id
INNER JOIN image AS on c1.image_id = i.id
INNER JOIN categories AS c on c1.cat_id = c.id
WHERE c1.cat_id > 47 /* this value is the largest seen in previous page */
GROUP BY c1.image_id
HAVING COUNT(*) < 4
ORDER BY c.cat_id
LIMIT 40; /* no offset needed */
Re your comments:
... using LIMIT and OFFSET will only trim those results and not move me down the list of rows.
LIMIT is working as intended; it applies to the resulting rows after GROUP BY and HAVING have done their work.
The way I was doing it before the greatest N per category query is by
1. pulling in x amount of images,
2. Remembering which was the last image, and then
3. using a sub query on my subsequent queries to get the next x amount of images with ids smaller than than the last image. Is something like that possible with greatest N per group?
That's what my WHERE clause does in the last example above, without using a subquery. And I'm assuming you're advancing to the next higher set of cat_id's. This solution works only if you're advancing one page at a time, and in the positive direction.
All right, there's another solution for greatest-n-per-group that works with MySQL, but it relies on the user variables feature. SQLite doesn't have this feature.
SELECT * FROM (
SELECT
p.id as image_ID, p.imageURL as URL, c.cat_name as CAT, ic.cat_id,
IF(#cat=ic.cat_id, #row:=#row+1, #row:=1) AS _row, #cat:=ic.cat_id AS _cat
FROM (SELECT #cat:=null, #row:=0) AS _init
CROSS JOIN image_category AS ic
INNER JOIN portfolio AS p ON ic.image_id = p.id
INNER JOIN categories AS c on ic.cat_id = c.cat_id
ORDER BY ic.cat_id, ic.image_id
) AS x
WHERE _row BETWEEN 4 AND 6; /* or choose any range you want */
This is similar to using ROW_NUMBER() OVER (PARTITION BY cat_id) that is supported by standard SQL and most RDBMS, but SQLite doesn't support that either yet.

SELECT *
FROM (
SELECT a.id as ID,a.title as title,b.cat_name as CAT, row_number() OVER (PARTITION BY b.cat_id ORDER BY b.cat_id,a.id desc) AS n
from images a, categories b, image_category c
where a.id = c.image_id
and b.cat_id = c.cat_id
) x
WHERE n < 4
ORDER BY b.cat_id,a.id desc;

Related

Mysql multiple join between two tables

I have got two tables. One is for news and second one for images. Each news can have 0-3 images (image_1, image_2, image_3 in table news - its id). Now iam trying to get all rows from images table but its giving me back only one.
Like that (but it is not working)
select news.id as nid, image_1, image_2, image_3, photos.id as pid, big, small
from news
left join photos
on image_1=photos.id, image_2=photos.id, image_3=photos.id
order by nid desc
Even #juergen has suggested better option and also guided you how to solve your problem in your way but if stil you are facing issue how to do then you can follow below query-
SELECT p.id AS pid, n1.image_1, n2.image_2, n3.image_3, big, small
FROM photos AS p
LEFT JOIN news AS n1 ON n1.image_1=p.id
LEFT JOIN news AS n2 ON n2.image_2=p.id
LEFT JOIN news AS n3 ON n1.image_3=p.id
ORDER BY n.id DESC;
You have to join the photos table 3 times with different aliases.
But you actually should rather change your table design. Add another table called news_photos
news_photos table
-----------------
news_id
photo_id
Then you can remove the image columns from the news table.
After the changes you can select news with all photos of like that
select n.*, p.name
from news
left join news_photos np on n.id = np.news_id
left join photos p on p.id = np.photo_id
where n.id = 1234

How to get multiple counts from multiple tables?

On a webpage, I am displaying a number of picture collections (I show the thumbnails for each collection). Each picture has five relevant tables:
likes (id, user_id, picture_id),
views (id, user_id, picture_id),
comments (id, user_id, picture_id, comment),
pictures (id (which equals the "picture_id" in the previous tables), collection_id, picture_url and several other columns),
collections (id (equal to collection_id in previous table), and several other columns.
When loading my page, I need to aggregate the number of likes, views and comments for all pictures in each collection, so as to show those numbers under each collection.
So basically: count the likes for each picture, count them all up, display number. Count the views for each picture, count them all up, display number. Count the comments for each picture, count them all up, display number. And then rinse and repeat for all collections.
I'm pretty new at mysql, and I'm struggling between selects, multiple joins, counts, php vs mysql, etc etc. I'm sure there's many ways I can do this that would be very inefficient, so I'm hoping you can tell me the best/fastest/most efficient way to do this.
Thanks in advance!
You can solve this with selects and left joins.
Since you'll count entries on each table for every pictureId, your pictures table will be the left side of each relation. So:
select
p.id as pictureId,
count(distinct l.id) as count_likes,
count(distinct v.id) as count_views,
count(distinct c.id) as count_comments
from
pictures as p
left join likes as l on p.id = l.pictureId
left join views as v on p.id = v.pictureId
left join comments as c on p.id = c.pictureId
group by
p.id
Basically, you are counting every record in each table for each record in the pictures table; if there are no records in likes, views or comments, the count will be zero, respectively.
Of course, you can expand this idea for collections:
select
c.id as collection_id,
p.id as picture_id,
count(distinct l.id) as count_likes,
count(distinct v.id) as count_views,
count(distinct c.id) as count_comments
from
collections as c
left join pictures as p on c.id = p.collection_id
left join likes as l on p.id = l.picture_Id
left join views as v on p.id = v.picture_Id
left join comments as c on p.id = c.picture_Id
group by
c.id,
p.id
If you want to filter your results for each collection, you only need to add where c.id = aValue before the group by (where aValue is the collection Id you want to retrieve)
Hope this helps you.
If you only need the aggregate data for each collection:
select
c.id as collection_id,
count(distinct l.id) as count_likes,
count(distinct v.id) as count_views,
count(distinct c.id) as count_comments
from
collections as c
left join pictures as p on c.id = p.collection_id
left join likes as l on p.id = l.picture_Id
left join views as v on p.id = v.picture_Id
left join comments as c on p.id = c.picture_Id
group by
c.id
This should do the trick ;-)
You could do this with subselects:
SELECT
collections.*,
( SELECT COUNT(*) FROM pictures, likes
WHERE pictures.id = likes.picture_id
AND pictures.collection_id = collection.id
) AS like_count,
( SELECT COUNT(*) FROM pictures, views
WHERE pictures.id = views.picture_id
AND pictures.collection_id = collection.id
) AS view_count,
( SELECT COUNT(*) FROM pictures, comments
WHERE pictures.id = comments.picture_id
AND pictures.collection_id = collection.id
) AS comment_count
FROM collections
WHERE ...
This looks like it's going over the pictures table thrice, but I suspect that MySQL might be able to optimize that using the join buffer. I should note that I haven't actually tested this query, however. I also have no idea how this compares performance-wise with Barranka's LEFT JOIN solution. (Both would be pretty horrible if implemented naïvely, so it comes down to how smart MySQL's query optimizer is in each case.)

Wrapping my head around what I assume is a complicated MySQL query

I have a table called categories and a table called business_categories_coupling. In Categories, you have the usual id, name, parent. In the Coupling table, you have business_id and category_id. Each business can have multiple categories, so I store them in that table. It kinda looks like this:
business_id category_id
73 80
73 81
73 90
74 4
74 10
Right now, my query is just selecting all the categories, doing a foreach and doing a db query in each loop to find how many businesses are in that category. Obviously not the right way to go about it.
Is there a way to do a SQL query that basically selects all the categories, gets the number of times it comes up in the coupling table, and add a count to each category?
SELECT
C.*
FROM
CATEGORIES AS C
LEFT JOIN
BUSINESS_CATEGORIES_COUPLING AS B
ON
C.id = B.category_id;
Kinda like that, but with a count somewhere. I've tried various setups but nothing works like I want. Any suggestions?
EDIT 1
Solution as provided by #phani-rahul, but I added a WHERE clause:
SELECT cat.id AS id, cat.name AS name, cat.slug AS slug, COUNT(cat.id) AS business_count
FROM categories AS cat
LEFT JOIN business_categories_coupling AS coupling ON cat.id=coupling.category_id
WHERE coupling.category_id IS NOT NULL
GROUP BY cat.id
Yes, there is.
you can use Group by clause:
select a.id as category, count(a.id) as count_of_category
from categories a
left join business_categories_coupling b on a.id=b.category_id
group by a.id
your result would be something like:
category count_of_category
80 2
81 5
90 1
. .
. .
. .
You also need to GROUP by the fields of C table.
SELECT C.id, C.field1, C.field2, COUNT(*)
FROM CATEGORIES AS C
LEFT JOIN BUSINESS_CATEGORIES_COUPLING AS B
ON (C.id = B.category_id)
GROUP BY C.id, C.field1, ...
(In MySQL you can GROUP BY the single value C.id; in other SQL dialects you can express the concept of "grouping by rows of C table" by grouping by "C.*"; in some others you need to specify all non-aggregate columns of your query, in this case all columns you select from C, one by one).
What you're looking for is a GROUP BY clause.
SELECT
C.*, count(C.id)
FROM
CATEGORIES AS C
LEFT JOIN
BUSINESS_CATEGORIES_COUPLING AS B
ON
C.id = B.category_id
GROUP BY B.category_id;

Category post count

I am building a blog with Codeigniter and MySQL. The question I have is this, I have a table with posts and one with categories. I also have a cross reference table with post_categories. What I am trying to do is get all the categories with their names and the number of posts they have under their name.
Example output would be: Hello World(1) Test(0) etc.
What I am having a hard time finding is a SQL query that will join the three tables and get me the counts, and I am also having a hard time wrapping my head around how to make that query.
Here is my table schema:
blgpost
====
id
*Other schema unimportant
blgpostcategories
=================
postid
categoryid
blgcategories
==========
id
name
*Other schema unimportant
This should give you the output you want....
SELECT c.name, COUNT(p.id) FROM
blgcategories c
INNER JOIN blgpostcategories pc ON c.id = pc.categoryid
INNER JOIN blgpost p ON pc.postid = p.id
GROUP BY c.id
You don't need to join the three tables - the blgpost table doesn't have any information in it that you need.
SELECT COUNT(*), blgcategories.name
FROM blgcategories INNER JOIN blgpostcategories
ON blgcategories.id=blgpostcategories.categoryid
GROUP BY blgcategories.id;
SELECT name, COUNT(pc.id)
FROM blgcategories c
LEFT JOIN
blgpostcategories pc
ON pc.categoryid = c.id
GROUP BY
c.id
Using LEFT JOIN will show 0 for empty categories (those without posts linked to them) rather than omitting them.

SQL Query over three different tables

i got three tables
CATS
id name
------------------------------
1 category1
2 category2
3 category3
4 category4
PRODUCT
id name
------------------------------
1 product1
2 product2
ZW-CAT-PRODUCT
id_cats id_product
------------------------------
1 1
3 1
4 2
now i want to get my products and their categories
product1 => category1,category3
product2 => category4
is there a way to get this array (or object or something) with one mysql query?
i tried a bit with JOINS, but it seems thats this is not exactly what i need, or?
currently i'm using 3 querys (i think thats too much).
any suggestions?
edit
and on the other way, what if i want to get ALL products of a specific category?
can this also be done in one query?
You can use GROUP_CONCAT to get a separated list in your results.
SELECT p.*,
GROUP_CONCAT(c.name SEPARATOR ',') as cats
FROM PRODUCT p
LEFT JOIN ZW-CAT-PRODUCT l
ON l.id_product=p.id
LEFT JOIN CATS c
ON c.id=l.id_cats
GROUP BY p.id
So basically, this first does some joins to get all the data. If you were to replace the GROUP_CONCAT line with just c.name, you would see a row for each product_id/category pair. The GROUP BY tells it to group results based on product ID, and then GROUP_CONCAT(c.name..) is telling to it take all the different c.name values that occur in a group (so for each product ID, since you're grouping by product ID) and concatenate those values into one string, using , as the separator.
So to get all products for a each category in the same style, it would be like this,
SELECT c.*,
GROUP_CONCAT(p.name SEPARATOR ',') as products
FROM CATS c
LEFT JOIN ZW-CAT-PRODUCT l
ON l.id_cats=c.id
LEFT JOIN PRODUCT p
ON p.id=l.id_product
GROUP BY c.id
EDIT: To get just the product rows for a particular category (as requested in comment), it's this.
SELECT p.*
FROM PRODUCT p
LEFT JOIN ZW-CAT-PRODUCT l
ON l.id_product=p.id
LEFT JOIN CATS c
ON c.id=l.id_cats
WHERE c.name='xyz';
If you need just comma-separated list of categories for every product, look at MySQL's GROUP_CONCAT() aggregate function:
SELECT p.*, GROUP_CONCAT(c.name) AS categories
FROM PRODUCT p
LEFT JOIN ZW-CAT-PRODUCT cp ON p.id = cp.id_product
LEFT JOIN CATS c ON cp.id_cats = c.id
GROUP BY p.id
To get all products of a specific category (by category ID):
SELECT p.*
FROM PRODUCT p
INNER JOIN ZW-CAT-PRODUCT cp ON p.id = cp.id_product
WHERE cp.id_cats = 42
The same, but by category name:
SELECT p.*
FROM PRODUCT p
INNER JOIN ZW-CAT-PRODUCT cp ON p.id = cp.id_product
INNER JOIN CATS c ON cp.id_cats = c.id
WHERE c.name = 'category1'
You could fit all 3 into 1 query yes, but consider having 3 huges tables, i'd rather process them one by one instead of getting the whole bulk back in 1 time.
This takes longer, but is (in my opinion) more data friendly.

Categories