MySQL Join returning 1 row - php

I have a problem with an SQL query. This is my first time using advanced SQL operations like this so it could be that I'm missing something basic. I am running this query:
SELECT countries.id,
countries.name,
AVG(users.points) AS average
FROM countries
LEFT JOIN users
ON countries.id = users.country
ORDER BY average DESC
This query is only returning 1 row and it's not following the ORDER BY because the returned value is . My aim with this is to get all the records in the Countries table and get the average of the points awarded to the users from each country. I want it to return those countries which do not have users assigned to them as well. I have done this in 2 queries and it worked but I thought that maybe I could do only one query. What am I missing?

It is only returning one row because it is an aggregation query without a group by. Perhaps you mean:
SELECT c.id, c.name, AVG(u.points) AS average
FROM countries c LEFT JOIN
users u
ON c.id = u.country
GROUP BY c.id, c.name
ORDER BY average DESC;
The AVG() makes this an aggregation query. Without the the group by, SQL interprets it as returning one row summarizing all the rows. MySQL supports an extension to the SQL standard where columns in the select do not have to be in the group by. In most databases, you query would return an error.

Related

MySQL Join - Sorting data, grouping data

I have two tables:
twitterusers table
twittergrowth Table
I am trying to do JOIN these 2 tables, get all fields from twitteruser and selective fields from twittergrowth, then fetch only the last 3 rows from this data.
Expected Output:
Current Output:
I.e the rows are repeating. I want rows unique by ID or usernames, and the last set of timestamps. So it would be the last 3 rows, which has the most recent timestamps.
The code I could type scribble out is :
SELECT
t1.*,
t2.new_followers_count,
t2.new_friends_count,
t2.new_timestamp
FROM twitterusers t1
JOIN twittergrowth t2 on (t1.username=t2.username)
Searched quite few pages/sites, but cant really figure out how to do it. I would appreciate any help. :)
Additionally, I would like to get a LIMIT parameter added to the final result, so that I can paginate the full result.
First you need to find a maximum new_timestamp (latest) within groups of the same user_id and username in twittergrowth table. This is a classic group-wise maximum problem and the subquery tgmax does that. Then you need to join back the same table (tg this time) to get other columns that aren't in the group by clause of subquery and are not used in aggregate functions (like max()). These columns are new_followers_count and new_friends_count.
If you tried to put them in the select of subquery mysql would return values from an unspecified row from the same group and not necessarily the same as the one with the latest timestamp. This is explained here.
Once you get desired output for twittergrowth table the only thing left is to join twitterusers table to get all other columns.
SELECT tu.*, tg.new_followers_count, tg.new_friends_count, tg.new_timestamp
FROM twitterusers tu
JOIN twittergrowth tg
ON tu.user_id = tg.user_id AND tu.username = tg.username
JOIN
( SELECT tgg.user_id, tgg.username, max(tgg.new_timestamp) as latest_timestamp
FROM twittergrowth tgg
GROUP BY tgg.user_id, tgg.username ) tgmax
ON tg.user_id = tgmax.user_id AND tg.username = tgmax.username
AND tg.new_timestamp = tgmax.latest_timestamp
Note that this query would benefit from a composite index on (user_id,username,new_timestamp) in the twittergrowth table.
You need to group by to achieve your expected output.
GROUP BY id
To limit, or split results into pages, you can simply add LIMIT X,Y where X is the starting record and Y is the total number of records.
So a query to pull the expected results you want, but only the first 10 would be like so:
SELECT
t1.*,
t2.new_followers_count,
t2.new_friends_count,
t2.new_timestamp
FROM twitterusers t1
JOIN twittergrowth t2 on t1.username=t2.username
GROUP BY t1.id
LIMIT 0,10

MySQL GROUP BY taking so much time to fetch records

I want to query the database to fetch the last visit time of every user here is the query:
SELECT
u.user_id,
u.firstname,
u.lastname,
u.email,
pv.visit_time
FROM
users u
LEFT OUTER JOIN pageviews pv
ON u.user_id = pv.user_id
GROUP BY pv.user_id
LIMIT 0, 12
This query is taking 30 to 40 seconds to execute on live server, however if i remove the GROUP BY clause then it is taking 3 to 6 seconds but with duplicate records. Any idea what's wrong with this query?
Also i have tried DISTINCT but found same issue.
Thanks, any help would be appreciated.
What are your indexes?
Do you really want a left join, as that would seem irrelevant? Using a LEFT OUTER JOIN it would just seem that you are going to get a row for user_id of NULL, but with nulls also in the other columns.
Further you are using GROUP BY to return a single row for each user. However which row is returned is not defined, so it could be any page views visit_time that is brought back for a user.
Also you have only a single column in the GROUP BY clause but other non aggregate columns in the select. With default options in MySQL this will work but will not work in most flavours of SQL and will also not work when MySQL is performing the group by in strict mode (see this manual page ).
Add a index on u.user_id and a compound index on pv.user_id AND pv.visit_time. Then assuming you want the latest visit time for each user try the query as:-
SELECT u.user_id,
u.firstname,
u.lastname,
u.email,
MAX(pv.visit_time)
FROM users u
INNER JOIN pageviews pv
ON u.user_id = pv.user_id
GROUP BY u.user_id, u.firstname, u.lastname, u.email
ORDER BY u.user_id
LIMIT 0, 12
(strictly speaking the ORDER BY clause is not required as it is implicitly done by the GROUP BY clause, but it does make it more explicit what is expected to anyone reading the code in future).
The group by clause and distinct requires a full scan of the table.
Maybe the query without the group by clause can be faster in returning the first rows, have you checked how long it takes to retrieve the whole result set?
If it takes only 3-6 seconds, I would refresh the statistics, maybe the optimiser is not doing the best choices for the join (I imagine that the table pageviews is a large one).
Select t1.x, t1.y, t1.z from table1 t1 Group by t1.x,t1.y,t1.z....
It will give better performance dude...
Group by fields (x,y,z) should be appended with select statement to get better performance..
Try it ...(group by operation will happen with in result set for above query)

How can I get this database to order before the GROUP BY [duplicate]

This question already has answers here:
MySQL Order before Group by
(10 answers)
Closed 9 years ago.
I made a website for golf scorecards. The page I am working on is the players profile. When you access a players profile, it shows each course in order of last played (DESC). Except, the order of last played is jumbled due to the ORDER BY command below. Instead, when it GROUPs, it takes the earliest date, rather than the most recent.
After the grouping is done, it correctly shows them in order (DESC)... just the wrong order due to the courses grouping by date_of_game ASC, rather than DESC. Hope this isn't too confusing.. Thank you.
$query_patrol321 = "SELECT t1.*,t2.* FROM games t1 LEFT JOIN scorecards t2 ON t1.game_id=t2.game_id WHERE t2.player_id='$player_id' GROUP BY t1.course_id ORDER BY t1.date_of_game DESC";
$result_patrol321 = mysql_query($query_patrol321) or die ("<br /><br />There's an error in the MySQL-query: ".mysql_error());
while ($row_patrol321 = mysql_fetch_array($result_patrol321)) {
$player_id_rank = $row_patrol321["player_id"];
$course_id = $row_patrol321["course_id"];
$game_id = $row_patrol321["game_id"];
$top_score = $row_patrol321["total_score"];
Try to remove the GROUP BY-clause from the query. You should use GROUP BY only when you have both normal columns and aggregate functions (min, max, sum, avg, count) in your SELECT. You have just normal columns.
The fact that it shows the grouping result in ASC order is a coincidence because that is the order of their insertion. In contrast to other RDBMS like MS SQL Server, MySQL allows you to add non-aggregated columns to a GROUPed query. This non-standard behavior creates the confusion you're seeing. If this were not MySQL, you'd need to define the aggregation for all your selected columns given the grouping.
MySQL's behavior is (I believe) to take the first row matching the the GROUP for non-aggregated columns. I would advise against doing this.
Even though you're aggregating, you're not ORDERing by the aggregated column.
So What you want to do is ORDER BY the MAX date DESC
In this way, you are ordering by the latest date per course (your grouping criteria).
SELECT
t1.* -- It would be better if you actually listed the aggregations you wanted
,t2.* -- Which columns do you really want?
FROM
games t1
LEFT JOIN
scorecards t2
ON t2.[game_id] =t1[.game_id]
WHERE
t2.[player_id]='$player_id'
GROUP BY
t1.[course_id]
ORDER BY
MAX(t1.[date_of_game]) DESC
If you want the maximum date, then insert logic to get it. Don't depend on the ordering of columns or on undocumented MySQL features. MySQL explicitly discourages the use of non-aggregated columns in the group by when the values are not identical:
MySQL extends the use of GROUP BY so that the select list can refer to nonaggregated columns not named in the GROUP BY clause. This means that the preceding query is legal in MySQL. You can use this feature to get better performance by avoiding unnecessary column sorting and grouping. However, this is useful primarily when all values in each nonaggregated column not named in the GROUP BY are the same for each group. The server is free to choose any value from each group, so unless they are the same, the values chosen are indeterminate. (see [here][1])
How do you do what you want? The following query finds the most recent date on each course and just uses that -- and no group by:
SELECT t1.*, t2.*
FROM games t1 LEFT JOIN
scorecards t2
ON t1.game_id=t2.game_id
WHERE t2.player_id='$player_id' and
t1.date_of_game in (select MAX(date_of_game)
from games g join
scorecards ss
on g.game_id = ss.game_id and
ss.player_id = '$player_id'
where t1.course_id = g.course_id
)
GROUP BY t1.course_id
ORDER BY t1.date_of_game DESC
If game_id is auto incrementing, you can use that instead of date_of_game. This is particularly important if two games can be on the same course on the same date.

MySql - Joining another table with multiple rows, inserting a query into a another query?

I've been racking my brain for hours trying work out how to join these two queries..
My goal is to return multiple venue rows (from venues) based on certain criteria... which is what my current query does....
SELECT venues.id AS ven_id,
venues.venue_name,
venues.sub_category_id,
venues.score,
venues.lat,
venues.lng,
venues.short_description,
sub_categories.id,
sub_categories.sub_cat_name,
sub_categories.category_id,
categories.id,
categories.category_name,
((ACOS( SIN(51.44*PI()/180)*SIN(lat*PI()/180) + COS(51.44*PI()/180)*COS(lat*PI()/180)*COS((-2.60796 - lng)*PI()/180)) * 180/PI())*60 * 1.1515) AS dist
FROM venues,
sub_categories,
categories
WHERE
venues.sub_category_id = sub_categories.id
AND sub_categories.category_id = categories.id
HAVING
dist < 5
ORDER BY score DESC
LIMIT 0, 100
However, I need to include another field in this query (thumbnail), which comes from another table (venue_images). The idea is to extract one image row based on which venue it's related to and it's order. Only one image needs to be extracted however. So LIMIT 1.
I basically need to insert this query:
SELECT
venue_images.thumb_image_filename,
venue_images.image_venue_id,
venue_images.image_order
FROM venue_images
WHERE venue_images.image_venue_id = ven_id //id from above query
ORDER BY venue_images.image_order
LIMIT 1
Into my first query, and label this new field as "thumbnail".
Any help would really be appreciated. Thanks!
First of all, you could write the first query using INNER JOIN:
SELECT
...
FROM
venues INNER JOIN sub_categories ON venues.sub_category_id = sub_categories.id
INNER JOIN categories ON sub_categories.category_id = categories.id
HAVING
...
the result should be identical, but i like this one more.
What I'd like to do next is to JOIN a subquery, something like this:
...
INNER JOIN (SELECT ... FROM venue_images
WHERE venue_images.image_venue_id = ven_id //id from above query
ORDER BY venue_images.image_order
LIMIT 1) first_image
but unfortunately this subquery can't see ven_id because it is evaluated first, before the outer query (I think it's a limitation of MySql), so we can't use that and we have to find another solution. And since you are using LIMIT 1, it's not easy to rewrite the condition you need using just JOINS.
It would be easier if MySql provided a FIRST() aggregate function, but since it doesn't, we have to simulate it, see for example this question: How to fetch the first and last record of a grouped record in a MySQL query with aggregate functions?
So using this trick, you can write a query that extracts first image_id for every image_venue_id:
SELECT
image_venue_id,
SUBSTRING_INDEX(
GROUP_CONCAT(image_id order by venue_images.image_order),',',1) as first_image_id
FROM venue_images
GROUP BY image_venue_id
and this query could be integrated in your query above:
SELECT
...
FROM
venues INNER JOIN sub_categories ON venues.sub_category_id = sub_categories.id
INNER JOIN categories ON sub_categories.category_id = categories.id
INNER JOIN (the query above) first_image on first_image.image_venue_id = venues.id
INNER JOIN venue_images on first_image.first_image_id = venue_images.image_id
HAVING
...
I also added one more JOIN, to join the first image id with the actual image. I couldn't check your query but the idea is to procede like this.
Since the query is now becoming more complicated and difficult to mantain, i think it would be better to create a view that extracts the first image for every venue, and then join just the view in your query. This is just an idea. Let me know if it works or if you need any help!
I'm not too sure about your data but a JOIN with the thumbnails table and a group by on your large query would probably work.
GROUP BY venues.id

Get values from database where column is unique

I need help with an advanced SQL-query (MSSQL 2000).
I have a table called Result that lists athletics 100 meter race-times. A runner can have several racetimes but I want to show only the best time from each runner.
The Result-table contains three columns, Result_id, athlete_id, result_time. So athlete_id must be unique when I list the values and result_time must be the fastest (lowest) value.
Any ideas?
In SQL Server 2000, you can't use windows functions. You can do this as follows:
select r.*
from result r join
(select athlete_id, min(result_time) as mintime
from result r
group by athlete_id
) rsum
on rsum.athlete_id = r.athlete_id and r.time = rsum.mintime
In more recent versions of SQL Server, you would use row_number().
If you simply need the fastest time for each athlete_id, do this:
select athelete_id, min(result_time) as FastestTime
from result
group by athelete_id
To show additional columns from the result table, you can join back to it like this:
select r.*
from result r
inner join (
select athelete_id, min(result_time) as FastestTime
from result
group by athelete_id
) rm on r.athelete_id = rm.athelete_id and r.result_time = rm.FastestTime
What you want is to use an aggregate function. in this case min() which will select the minumin data from all the rows that have the same data in the other selected columns. This means you also have to us the group by clause. The query below should give you the results you want.
Edit: If you need other columns, just bring them into the select clause, then add them to the group by clause like below:
select althlete_id, result_id, min(result_time) as result_time from result-table group by althlete_id, result_id
select althlete_id, result_id, min(result_time) as result_time, race_date from result-table group by althlete_id, race_date, result_id
Edit: You need to add all the columns into the group by that aren't part of an aggregate function. Aggregate functions are ones like min(), max(), avg() and so on.
Short answer: If you aren't putting a column in brackets, it probably has to be in the group by.

Categories