This is the query. Im mostly interested if there is a better way to grab the stuff I use GROUP_CONCAT for, or if thats a fairy good way of grabbing this data. I then explode it, and put the ids/names into an array, and then use a for loop to echo them out.
SELECT
mov_id,
mov_title,
GROUP_CONCAT(DISTINCT categories.cat_name) as all_genres,
GROUP_CONCAT(DISTINCT cat_id) as all_genres_ids,
GROUP_CONCAT(DISTINCT case when gen_dominant = 1 then gen_catid else 0 end) as dominant_genre_ids,
GROUP_CONCAT(DISTINCT actors.act_name) as all_actors,
GROUP_CONCAT(DISTINCT actors.act_id) as all_actor_ids,
mov_desc,
mov_added,
mov_thumb,
mov_hits,
mov_numvotes,
mov_totalvote,
mov_imdb,
mov_release,
mov_html,
mov_type,
mov_buytickets,
ep_summary,
ep_airdate,
ep_id,
ep_hits,
ep_totalNs,
ep_totalRs,
mov_rating,
mov_rating_reason,
mrate_name,
dir_id,
dir_name
FROM movies
LEFT JOIN _genres
ON movies.mov_id = _genres.gen_movieid
LEFT JOIN categories
ON _genres.gen_catid = categories.cat_id
LEFT JOIN _actors
ON (movies.mov_id = _actors.ac_movid)
LEFT JOIN actors
ON (_actors.ac_actorid = actors.act_id AND act_famous = 1)
LEFT JOIN directors
ON movies.mov_director = directors.dir_id
LEFT JOIN movie_ratings
ON movies.mov_rating = movie_ratings.mrate_id
LEFT JOIN episodes
ON mov_id = ep_showid AND ep_season = 0 AND ep_num = 0
WHERE mov_id = *MOVIE_ID* AND mov_status = 1
GROUP BY mov_id
EXPLAIN of the query is here
alt text http://www.krayvee.com/o2/explain.gif
Personally, I would try to break the query up into multiple queries. Mostly I would recommend removing the Actor and Genre Joins so that you can get rid of all those group_concat functions. Then do separate queries to pull this data out. Not sure if it would speed things up, but it's probably worth a shot.
You've basically done a Cartesian product between genres, actors, directors, movie_ratings and episodes. That's why you have to use DISTINCT inside your GROUP_CONCAT(), because the pre-grouped result set has a number of rows equal to the product of the number of matching rows in each related table.
Note that this query wouldn't work at all in SQL, except that you're using MySQL which is permissive about the single-value rule.
Like #Kibbee, I usually recommend to run separate queries in cases like this. It's not always better to run a single query. Try breaking up the query and doing some profiling to be sure.
PS: What? No _directors table? So you can't represent a move with more than one director? :-)
Related
I am trying to get the following query to return the right information but it is returning different number of values for fields from the same table.
My data schema is 3 table:
airlines
id|descript|userid
travelers
id|airid|ftrav|ltrav|active
travair
id|travid|airid
Here is the query:
SELECT a.id as `aid`,a.descript,userid,
group_concat(t.id) as `tids`,
group_concat(t.ftrav) as `ftravs`,
group_concat(IFNULL(t.ltrav,'')) as `ltravs`,
group_concat(t.active) as `tactives`,
ta.airid
FROM airlines `a`
LEFT JOIN travair `ta`
ON a.id = ta.airid
LEFT JOIN travelers `t`
ON ta.travid = t.id
WHERE a.userid='$userid'
GROUP BY a.id
Basically, I am trying to query the airlines table to get airlines but also pull the travelers for each of the airlines by way of the ta table which joins the two.
However, the group_concat fields all have different numbers of values in them. In the actual table, I have largely eliminated missing values so that would not account for the differences in number of elements. There seems to be something wrong with query.
Can anyone spot my error? Have been struggling with this for a couple days.
In general, the problem with aggregation and joins is that the joins produce Cartesian products for matching keys. In English, this means that you are joining along multiple dimensions and getting all combinations of different items for the same user.
What can you do? The quick-and-dirty solution is to use the distinct keyword:
SELECT a.id as `aid`,a.descript,userid,
group_concat(distinct t.id) as `tids`,
group_concat(distinct t.ftrav) as `ftravs`,
group_concat(distinct coalesce(t.ltrav, '')) as `ltravs`,
group_concat(distinct t.active) as `tactives`
A more scalable solution is to pre-aggregate each dimension to get the list along each dimension.
Note: It is possible that distinct will not work in your case, if you happen to want all lists to be the same length.
GROUP_CONCAT, like most aggregation functions, ignores NULL values. Since you are using left joins, any GROUP_CONCATs on fields from tables on the right side of those joins may have null values for some of the pre-aggregation result rows.
Edit: If you want to synthesize "results" for the lacking data, you can aggregate calculated values instead; you've actually already done so once with this bit...
group_concat(IFNULL(t.ltrav,'')) as `ltravs`
.... you can just take it a bit further (make the lack of that data a bit more obvious) with something like this:
GROUP_CONCAT(IFNULL(theField, '[Not Recorded]')) AS theList
I have this database format below (taken from phpmyadmin, the tables are relational already):
I'm trying to get all "videos.Video_Name, videos.Video_URL" with a certain "tags.Tag_Name" via the "tagmap" relational mapping. I've never really used MySQL before for anything more than SELECT's and DELETE's and the syntax of JOIN is proving too much to bear, and at this point it'd just be faster to ask for help than to keep bashing my head against it.
I know I should be using JOIN but I have no idea of the syntax to accomplish what I want.
The completely invalid query I tried was:
SELECT videos.Video_URL, videos.Video_Name
FROM tagmap
INNER JOIN videos ON videos.Video_ID = tagmap.Video_ID
INNER JOIN tagmap ON tagmap.Tag_ID = tags.Tag_ID
WHERE tags.Tag_Name = '$_GET[tag]'
But it returned no rows.
If your query returned no raws and did not a return an error then it's not "completely invalid".
Indeed, looking at the code, it should do exactly what you say you are trying to achieve. Hence if it's returning no rows then the reason must be that there is no matching data.
Break it down to find out where the data is missing:
SELECT COUNT(*)
FROM tags
WHERE tags.Tag_Name = '$_GET[tag]';
If you get a non-zero value then try....
SELECT COUNT(DISTINCT tagmap.Video_ID), COUNT(*)
FROM tags INNER JOIN tagmp
ON tags.tag_ID=tagmap.tag_ID
WHERE tags.Tag_Name = '$_GET[tag]';
(BTW you might want to read up on SQL Injection).
Now Try this one.
SELECT videos.Video_Name, videos.Video_URL FROM videos,tags,tagmap
WHERE videos.Video_ID = tagmap.Video_ID AND tags.Tag_ID = tagmap.Tag_ID AND
tags.Tag_Name='$_GET[tag]'
Same Result with Joins
SELECT videos.Video_Name, videos.Video_URL FROM tagmap
RIGHT JOIN videos ON videos.Video_ID = tagmap.Video_ID
LEFT JOIN tags ON tags.Tag_ID = tagmap.Tag_ID
WHERE tags.Tag_Name = '$_GET[tag]'
Hope it will not give any error.
Thanks.
I've been racking my brain for hours trying work out how to join these two queries..
My goal is to return multiple venue rows (from venues) based on certain criteria... which is what my current query does....
SELECT venues.id AS ven_id,
venues.venue_name,
venues.sub_category_id,
venues.score,
venues.lat,
venues.lng,
venues.short_description,
sub_categories.id,
sub_categories.sub_cat_name,
sub_categories.category_id,
categories.id,
categories.category_name,
((ACOS( SIN(51.44*PI()/180)*SIN(lat*PI()/180) + COS(51.44*PI()/180)*COS(lat*PI()/180)*COS((-2.60796 - lng)*PI()/180)) * 180/PI())*60 * 1.1515) AS dist
FROM venues,
sub_categories,
categories
WHERE
venues.sub_category_id = sub_categories.id
AND sub_categories.category_id = categories.id
HAVING
dist < 5
ORDER BY score DESC
LIMIT 0, 100
However, I need to include another field in this query (thumbnail), which comes from another table (venue_images). The idea is to extract one image row based on which venue it's related to and it's order. Only one image needs to be extracted however. So LIMIT 1.
I basically need to insert this query:
SELECT
venue_images.thumb_image_filename,
venue_images.image_venue_id,
venue_images.image_order
FROM venue_images
WHERE venue_images.image_venue_id = ven_id //id from above query
ORDER BY venue_images.image_order
LIMIT 1
Into my first query, and label this new field as "thumbnail".
Any help would really be appreciated. Thanks!
First of all, you could write the first query using INNER JOIN:
SELECT
...
FROM
venues INNER JOIN sub_categories ON venues.sub_category_id = sub_categories.id
INNER JOIN categories ON sub_categories.category_id = categories.id
HAVING
...
the result should be identical, but i like this one more.
What I'd like to do next is to JOIN a subquery, something like this:
...
INNER JOIN (SELECT ... FROM venue_images
WHERE venue_images.image_venue_id = ven_id //id from above query
ORDER BY venue_images.image_order
LIMIT 1) first_image
but unfortunately this subquery can't see ven_id because it is evaluated first, before the outer query (I think it's a limitation of MySql), so we can't use that and we have to find another solution. And since you are using LIMIT 1, it's not easy to rewrite the condition you need using just JOINS.
It would be easier if MySql provided a FIRST() aggregate function, but since it doesn't, we have to simulate it, see for example this question: How to fetch the first and last record of a grouped record in a MySQL query with aggregate functions?
So using this trick, you can write a query that extracts first image_id for every image_venue_id:
SELECT
image_venue_id,
SUBSTRING_INDEX(
GROUP_CONCAT(image_id order by venue_images.image_order),',',1) as first_image_id
FROM venue_images
GROUP BY image_venue_id
and this query could be integrated in your query above:
SELECT
...
FROM
venues INNER JOIN sub_categories ON venues.sub_category_id = sub_categories.id
INNER JOIN categories ON sub_categories.category_id = categories.id
INNER JOIN (the query above) first_image on first_image.image_venue_id = venues.id
INNER JOIN venue_images on first_image.first_image_id = venue_images.image_id
HAVING
...
I also added one more JOIN, to join the first image id with the actual image. I couldn't check your query but the idea is to procede like this.
Since the query is now becoming more complicated and difficult to mantain, i think it would be better to create a view that extracts the first image for every venue, and then join just the view in your query. This is just an idea. Let me know if it works or if you need any help!
I'm not too sure about your data but a JOIN with the thumbnails table and a group by on your large query would probably work.
GROUP BY venues.id
Can you let me know if my interpretation is correct (the last AND part)?
$q = "SELECT title,name,company,address1,address2
FROM registrations
WHERE title != 0 AND id IN (
SELECT registrar_id
FROM registrations_industry
WHERE industry_id = '$industryid'
)";
Below was really where I am not sure:
... AND id IN (select registrar_id from registrations_industry where industry_id='$industryid')
Interpretation: Get any match on id(registrations id field) equals registrar_id(field) from the join table registrations_industry where industry_id equals the set $industryid
Is this select statement considered a sub routine since it's a query within the main query?
So an example would be with the register table id search to 23 would look like:
registrations(table)
id=23,title=owner,name=mike,company=nono,address1=1234 s walker lane,address2
registrations_industry(table)
id=256, registrar_id=23, industry_id=400<br>
id=159, registrar_id=23, industry_id=284<br>
id=227, registrar_id=23, industry_id=357
I assume this would return 3 records with the same registration table data And of course varying registrations_industry returns.
For a given test data set your query will return one record. This one:
id=23,title=owner,name=mike,company=nono,address1=1234 s walker lane,address2
To get three records with the same registration table data and varying registrations_industry you need to use JOIN.
Something like this:
SELECT r.title, r.name, r.company, r.address1, r.address2
FROM registrations AS r
LEFT OUTER JOIN registrations_industry AS ri
ON ri.registrar_id=r.id
WHERE r.title!=0 AND ri.industry_id={$industry_id}
Sorry for the essay, I didn't realize it was as long as it is until looking at it now. And although you've checked an answer, I hope you read this gain some insight into why this solution is preferred and how it evolved out of your original query.
First things first
Your query
$q = "SELECT title,name,company,address1,address2
FROM registrations
WHERE title != 0 AND id IN (
SELECT registrar_id
FROM registrations_industry
WHERE industry_id = '$industryid'
)";
seems fine. The IN syntax is equivalent to a number of OR matches. For example
WHERE field_id IN (101,102,103,105)
is functionally equivalent to
WHERE (field_id = 101
OR field_id = 102
OR field_id = 103
OR field_id = 105)
You complicate it a bit by introducing a subquery, no problem. As long as your subquery returns one column (and yours does), passing it to IN will be fine.
In your case, you're comparing registrations.id to registrations_industry.registrar_id. (Note: This is just <table>.<field> syntax, nothing special, but helpful to disambiguate what tables your fields are in.)
This seems fine.
What happens
SQL would first run the subquery, generating a result set of registrar_ids where the industry_id was set as specified.
SQL would then run the outer query, replacing the subquery with its results and you would get rows from registrations where registrations.id matched one of the registrar_ids returned from the subquery.
Subqueries are helpful to debug your code, because you can pull out the subquery and run it separately, ensuring its output is as you expect.
Optimization
While subqueries are good for debugging, they're slow, at least slower than using optmized JOIN statements.
And in this case, you can convert your query to a single-level query (without subqueries) by using a JOIN.
First, you'd start with basically the exact same outer query:
SELECT title,name,company,address1,address2
FROM registrations
WHERE title != 0 AND ...
But you're also interested in data from the registrations_industry table, so you need to include that. Giving us
SELECT title,name,company,address1,address2
FROM registrations, registrations_industry
WHERE title != 0 AND ...
We need to fix the ... and now that we have the registrations_industry table we can:
SELECT title,name,company,address1,address2
FROM registrations, registrations_industry
WHERE title != 0
AND id = registrar_id
AND industry_id = '$industryid'
Now a problem might arise if both tables have an id column -- since just saying id is ambiguous. We can disambiguate this by using the <table>.<field> syntax. As in
SELECT registrations.title, registrations.name,
registrations.company, registrations.address1, registrations.address2
FROM registrations, registrations_industry
WHERE registrations.title != 0
AND registrations_industry.industry_id = '$industryid'
We didn't have to use this syntax for all the field references, but we chose to for clarity. The query now is unnecessarily complex because of all the table names. We can shorten them while still providing disambiguation and clarity. We do this by creating table aliases.
SELECT r.title, r.name, r.company, r.address1, r.address2
FROM registrations r, registrations_industry ri
WHERE r.title != 0
AND ri.industry_id = '$industryid'
By placing r and ri after the two tables in the FROM clause, we're able to refer to them using these shortcuts. This cleans up the query but still gives us the ability to clearly specify which tables the fields are coming from.
Sidenote: We could be more explicit about the table aliases by including the optional AS e.g. FROM registrationsASr rather than just FROM registrations r, but I typically reserve AS for field aliases.
If you run the query now you will get what is called a "Cartesian product" or in SQL lingo, a CROSS JOIN. This is because we didn't define any relationship between the two tables when, in fact, there is one. To fix this we need to reintroduce part of the original query that was lost: the relationship between the two tables
r.id = ri.registrar_id
so that our query now looks like
SELECT r.title, r.name, r.company, r.address1, r.address2
FROM registrations r, registrations_industry ri
WHERE r.title != 0
AND r.id = ri.registrar_id
AND ri.industry_id = '$industryid'
And this should work perfectly.
Nitpicking -- implicit vs. explicit joins
But the nitpicker in me needs to point out that this is called an "implicit join". Basically you're joining tables but not using the JOIN syntax.
A simpler example of an implicit join is
SELECT *
FROM foo f, bar b
WHERE f.id = b.foo_id
The corresponding explicit syntax is
SELECT *
FROM foo f
JOIN bar b ON f.id = b.foo_id
The result will be identical but it is using proper (and clearer) syntax. (Its clearer because it explicitly stats that there is a relationship between the foo and bar tables and it is defined by f.id = b.foo_id.)
We could similarly express your implicit query
SELECT r.title, r.name, r.company, r.address1, r.address2
FROM registrations r, registrations_industry ri
WHERE r.title != 0
AND r.id = ri.registrar_id
AND ri.industry_id = '$industryid'
explicitly as follows
SELECT r.title, r.name, r.company, r.address1, r.address2
FROM registrations r
JOIN registrations_industry ri ON r.id = ri.registrar_id
WHERE r.title != 0
AND ri.industry_id = '$industryid'
As you can see, the relationship between the tables is now in the JOIN clause, so that the WHERE and subsequent AND and OR clauses are free to express any restrictions. Another way to look at this is if you took out the WHERE + AND/OR clauses, the relationship between tables would still hold and the results would still "make sense" whereas if you used the implicit method and removed the WHERE + AND/OR clauses, your result set would contain rows that were misleading.
Lastly, the JOIN syntax by itself will cause rows that are in registrations, but do not have any corresponding rows in registrations_industry to not be returned.
Depending on your use case, you may want rows from registrations to appear in the results even if there are no corresponding entries in registrations_industry. To do this you would use what's called an OUTER JOIN. In this case, we want what is called a LEFT OUTER JOIN because we want all of the rows of the table on the left (registrations). We could have alternatively used RIGHT OUTER JOIN for the right table or simply OUTER JOIN for the outer join of both tables.
Therefore our query becomes
SELECT r.title, r.name, r.company, r.address1, r.address2
FROM registrations r
LEFT OUTER JOIN registrations_industry ri ON r.id = ri.registrar_id
WHERE r.title != 0
AND ri.industry_id = '$industryid'
And we're done.
The end result is we have a query that is
faster in terms of runtime
more compact / concise
more explicit about what tables the fields are coming from
more explicit about the relationship between the tables
A simpler version of this query would be:
SELECT title, name, company, address1, address2
FROM registrations, registrations_industry
WHERE title != 0
AND id = registrar_id
AND industry_id = '$industryid'
Your version was a subquery, this version is a simple join. Your assumptions about your query are generally correct, but it is harder for SQL to optimize and a little harder to unravel to anyone trying to read the code. Also, you won't be able to extract the data from the registrations_industry table in that parent SELECT statement because it's not technically joining and the subtable is not a part of the parent query.
So I've got a little forum I'm trying to get data for, there are 4 tables, forum, forum_posts, forum_threads and users. What i'm trying to do is to get the latest post for each forum and giving the user a sneak peek of that post, i want to get the number of posts and number of threads in each forum aswell. Also, i want to do this in one query. So here's what i came up with:
SELECT lfx_forum_posts.*, lfx_forum.*, COUNT(lfx_forum_posts.pid) as posts_count,
lfx_users.username,
lfx_users.uid,
lfx_forum_threads.tid, lfx_forum_threads.parent_forum as t_parent,
lfx_forum_threads.text as t_text, COUNT(lfx_forum_threads.tid) as thread_count
FROM
lfx_forum
LEFT JOIN
(lfx_forum_threads
INNER JOIN
(lfx_forum_posts
INNER JOIN lfx_users
ON lfx_users.uid = lfx_forum_posts.author)
ON lfx_forum_threads.tid = lfx_forum_posts.parent_thread AND lfx_forum_posts.pid =
(SELECT MAX(lfx_forum_posts.pid)
FROM lfx_forum_posts
WHERE lfx_forum_posts.parent_forum = lfx_forum.fid
GROUP BY lfx_forum_posts.parent_forum)
)
ON lfx_forum.fid = lfx_forum_posts.parent_forum
GROUP BY lfx_forum.fid
ORDER BY lfx_forum.fid ASC
This get the latest post in each forum and gives me a sneakpeek of it, the problem is that
lfx_forum_posts.pid =
(SELECT MAX(lfx_forum_posts.pid)
FROM lfx_forum_posts
WHERE lfx_forum_posts.parent_forum = lfx_forum.fid
GROUP BY lfx_forum_posts.parent_forum)
Makes my COUNT(lfx_forum_posts.pid) go to one (aswell as the COUNT(lfx_forum_threads.tid) which isn't how i would like it to work. My question is: is there some somewhat easy way to make it show the correct number and at the same time fetch the correct post info (the latest one that is)?
If something is unclear please tell and i'll try to explain my issue further, it's my first time posting something here.
Hard to get an overview of the structure of your tables with only one big query like that.
Have you considered making a view to make it easier and faster to run the query?
Why do you have to keep it in one query? Personally I find that you can often gain both performance and code-readability by splitting overly complicated queries into more parts.
But hard to get an overview so can't really give a good answer to your question:)
Just add num_posts column to your table. Don't count posts with COUNT().
Can we get some...
Show Tables;
Desc Table lfx_forum_posts;
Desc Table lfx_forum_threads;
Desc Table lfx_forum_users;
Desc Table lfx_forum;
Here's some pseudo code
select f.*, (select count(*) from forum_posts fp where fp.forum_id = f.id) as num_posts,
(select count(*) from forum_threads ft where ft.forum_id = f.id) as num_threads,
(select max(fp.id) from forum_posts fp
where fp.id = f.id
) as latest_post_id,
from forums f;
Then go on to use latest_post_id in a seperate query to get it's information.
if it doesn't like using f before it's declared then make a temporary table for this then you update every time the query is ran.