MySQL Join - Sorting data, grouping data - php

I have two tables:
twitterusers table
twittergrowth Table
I am trying to do JOIN these 2 tables, get all fields from twitteruser and selective fields from twittergrowth, then fetch only the last 3 rows from this data.
Expected Output:
Current Output:
I.e the rows are repeating. I want rows unique by ID or usernames, and the last set of timestamps. So it would be the last 3 rows, which has the most recent timestamps.
The code I could type scribble out is :
SELECT
t1.*,
t2.new_followers_count,
t2.new_friends_count,
t2.new_timestamp
FROM twitterusers t1
JOIN twittergrowth t2 on (t1.username=t2.username)
Searched quite few pages/sites, but cant really figure out how to do it. I would appreciate any help. :)
Additionally, I would like to get a LIMIT parameter added to the final result, so that I can paginate the full result.

First you need to find a maximum new_timestamp (latest) within groups of the same user_id and username in twittergrowth table. This is a classic group-wise maximum problem and the subquery tgmax does that. Then you need to join back the same table (tg this time) to get other columns that aren't in the group by clause of subquery and are not used in aggregate functions (like max()). These columns are new_followers_count and new_friends_count.
If you tried to put them in the select of subquery mysql would return values from an unspecified row from the same group and not necessarily the same as the one with the latest timestamp. This is explained here.
Once you get desired output for twittergrowth table the only thing left is to join twitterusers table to get all other columns.
SELECT tu.*, tg.new_followers_count, tg.new_friends_count, tg.new_timestamp
FROM twitterusers tu
JOIN twittergrowth tg
ON tu.user_id = tg.user_id AND tu.username = tg.username
JOIN
( SELECT tgg.user_id, tgg.username, max(tgg.new_timestamp) as latest_timestamp
FROM twittergrowth tgg
GROUP BY tgg.user_id, tgg.username ) tgmax
ON tg.user_id = tgmax.user_id AND tg.username = tgmax.username
AND tg.new_timestamp = tgmax.latest_timestamp
Note that this query would benefit from a composite index on (user_id,username,new_timestamp) in the twittergrowth table.

You need to group by to achieve your expected output.
GROUP BY id
To limit, or split results into pages, you can simply add LIMIT X,Y where X is the starting record and Y is the total number of records.
So a query to pull the expected results you want, but only the first 10 would be like so:
SELECT
t1.*,
t2.new_followers_count,
t2.new_friends_count,
t2.new_timestamp
FROM twitterusers t1
JOIN twittergrowth t2 on t1.username=t2.username
GROUP BY t1.id
LIMIT 0,10

Related

MySQL error 1242 - Subquery returns more than 1 row

i have two tables in a DB with the following structure:
table 1: 3 rows - category_id, product_id and position
table 2: 3 rows - category_id, product_id and position
i am trying to set table 1 position to table 2 position where category and product id is the same from the tables.
below is the sql i have tried to make this happen but returns MySQL error 1242 - subquery returns more then 1 row
UPDATE table1
SET position = (
SELECT position
FROM table2
WHERE table1.product_id = table2.product_id AND table1.category_id = table2.category_id
)
The solution is very simple and it can be done in two simple steps. The first step is just a preview of what will be changed, to avoid destroying data. It can be skipped if you are confident of your WHERE clause.
Step 1: preview the changes
Join the tables using the fields you want to match, select everything for visual validation of the match.
SELECT t1.*, t2.*
FROM table1 t1
INNER JOIN table2 t2
ON t1.category_id = t2.category_id
AND t1.product_id = t2.product_id
You can also add a WHERE clause if only some of the rows must be modified.
Step2: do the actual update
Replace the SELECT clause and the FROM keyword with UPDATE, add the SET clause where it belongs. Keep the WHERE clause:
UPDATE table1 t1
INNER JOIN table2 t2
ON t1.category_id = t2.category_id
AND t1.product_id = t2.product_id
SET t1.position = t2.position
That's all.
Technical considerations
Indexes on the columns used on the JOIN clause on both tables are a must when the tables have more than several hundred rows. If the query doesn't have WHERE conditions then MySQL will use indexes only for the biggest table. Indexes on the fields used on the WHERE condition will speed up the query. Prepend EXPLAIN to the SELECT query to check the execution plan and decide what indexes do you need.
You can add SORT BY and LIMIT to further reduce the set of changed rows using criteria that cannot be achieved using WHERE (for example, only the most recent/oldest 100 rows etc). Put them on the SELECT query first to validate the outcome then morph the SELECT into an UPDATE as described.
Of course, indexes on the columns used on the SORT BY clause are a must.
You can run this query to see what is happening:
SELECT product_id, category_id, count(*), min(position), max(position)
FROM table2
GROUP BY product_id, category_id
HAVING COUNT(*) > 1;
This will give you the list of product_id, category_id pairs that appear multiple times in table2. Then you can decide what to do. Do you want an arbitrary value of position? Is the value of position always the same? Do you need to fix the table?
It is easy enough to fix the particular problem by using limit 1 or an aggregation function. However, you may really need to fix the data in the table. A fix looks like:
UPDATE table1 t1
SET t1.position = (SELECT t2.position
FROM table2 t2
WHERE t2.product_id = t1.product_id AND t2.category_id = t1.category_id
LIMIT 1
);

Multiple table select grouped query

We need to grab the last and newest 20 entries from different tables. However, the GROUP BY statement skips records because we are working with LEFT JOIN on tables.
All these records are linked to unique persons in another table. We store these person's id's in an array for more queries later.
We have a few tables (in which all those person id's are stored) and we want to get them sorted and grouped.
The tables are like this:
SELECT lastRecord+personID FROM t1
SELECT lastRecord+personID FROM t2
SELECT lastRecord+personID FROM t3
SELECT lastRecord+personID FROM t4
WHERE t5.Essential_Column_Name = '1'
GROUP BY personID
ORDER BY 'all the latest entries'
LIMIT 20
With that, the relevance of all the latest entries should be equal.
We do have a timestamp column as well. Perhaps that might work better.
Any input is highly appreciated!
For people looking for an answer on this; this is the right post, answer and update to this Q:
UNION mysql gives weird numbered results
With thanks to all for the ideas and providing the paths to the right solution.

MSSQL Joining 3 Tables with more than 1 row to a single line result over multiple records

Ok, I'll try to explain this as best I can here.
I have multiple tables that are to be connected through a JOIN where certain reference points meet.
In one of the tables, there are 2 or more results over several rows that I need to bring back to separate columns.
In the diagram below (I hope that explains it better), T2.ColA is connected to T3.ColA and T1,ColA is connected to T2.ColB.
In ColC of T1, there is a latest record. These are the only ones that are required to results. Note that ColC could be different dates between rows 1 and 2 for example. But it needs the latest for each ColB based on ColA.
But in T1, there are two rows which need Col B to return to the result in separate columns.
By the way, this is just one entry - there will be thousands of rows that need to return a result - not just 1.
Let me know if you need any more info.
Try this query
select t1.cola,t2.cola,t1.colb,t2.colb,t3.colb FROM table2 t2 INNER JOIN table1 t1 ON t1.cola = t2.colb INNER JOIN table3 ON t3.cola =t2.cola
Possibly this could solve your query..
SELECT T1.COLA, T1.COLB, T2.COLA, T2.COLA
FROM
TABLE1 T1 INNER JOIN TABLE2 T2
INNER JOIN TABLE3 T3
WHERE
T1.COLC = (
SELECT MAX(T4.COLC) FROM TABLE1 T4 WHERE T4.COLA = T1.COLA
)
** But as you have specified in your dig, it's not possible to display 1123, 3211 in a single row. It has to be in two different rows, coz its changes dynamically depending on the row count. You can modify as you want using your front end application.

MySql - Joining another table with multiple rows, inserting a query into a another query?

I've been racking my brain for hours trying work out how to join these two queries..
My goal is to return multiple venue rows (from venues) based on certain criteria... which is what my current query does....
SELECT venues.id AS ven_id,
venues.venue_name,
venues.sub_category_id,
venues.score,
venues.lat,
venues.lng,
venues.short_description,
sub_categories.id,
sub_categories.sub_cat_name,
sub_categories.category_id,
categories.id,
categories.category_name,
((ACOS( SIN(51.44*PI()/180)*SIN(lat*PI()/180) + COS(51.44*PI()/180)*COS(lat*PI()/180)*COS((-2.60796 - lng)*PI()/180)) * 180/PI())*60 * 1.1515) AS dist
FROM venues,
sub_categories,
categories
WHERE
venues.sub_category_id = sub_categories.id
AND sub_categories.category_id = categories.id
HAVING
dist < 5
ORDER BY score DESC
LIMIT 0, 100
However, I need to include another field in this query (thumbnail), which comes from another table (venue_images). The idea is to extract one image row based on which venue it's related to and it's order. Only one image needs to be extracted however. So LIMIT 1.
I basically need to insert this query:
SELECT
venue_images.thumb_image_filename,
venue_images.image_venue_id,
venue_images.image_order
FROM venue_images
WHERE venue_images.image_venue_id = ven_id //id from above query
ORDER BY venue_images.image_order
LIMIT 1
Into my first query, and label this new field as "thumbnail".
Any help would really be appreciated. Thanks!
First of all, you could write the first query using INNER JOIN:
SELECT
...
FROM
venues INNER JOIN sub_categories ON venues.sub_category_id = sub_categories.id
INNER JOIN categories ON sub_categories.category_id = categories.id
HAVING
...
the result should be identical, but i like this one more.
What I'd like to do next is to JOIN a subquery, something like this:
...
INNER JOIN (SELECT ... FROM venue_images
WHERE venue_images.image_venue_id = ven_id //id from above query
ORDER BY venue_images.image_order
LIMIT 1) first_image
but unfortunately this subquery can't see ven_id because it is evaluated first, before the outer query (I think it's a limitation of MySql), so we can't use that and we have to find another solution. And since you are using LIMIT 1, it's not easy to rewrite the condition you need using just JOINS.
It would be easier if MySql provided a FIRST() aggregate function, but since it doesn't, we have to simulate it, see for example this question: How to fetch the first and last record of a grouped record in a MySQL query with aggregate functions?
So using this trick, you can write a query that extracts first image_id for every image_venue_id:
SELECT
image_venue_id,
SUBSTRING_INDEX(
GROUP_CONCAT(image_id order by venue_images.image_order),',',1) as first_image_id
FROM venue_images
GROUP BY image_venue_id
and this query could be integrated in your query above:
SELECT
...
FROM
venues INNER JOIN sub_categories ON venues.sub_category_id = sub_categories.id
INNER JOIN categories ON sub_categories.category_id = categories.id
INNER JOIN (the query above) first_image on first_image.image_venue_id = venues.id
INNER JOIN venue_images on first_image.first_image_id = venue_images.image_id
HAVING
...
I also added one more JOIN, to join the first image id with the actual image. I couldn't check your query but the idea is to procede like this.
Since the query is now becoming more complicated and difficult to mantain, i think it would be better to create a view that extracts the first image for every venue, and then join just the view in your query. This is just an idea. Let me know if it works or if you need any help!
I'm not too sure about your data but a JOIN with the thumbnails table and a group by on your large query would probably work.
GROUP BY venues.id

Get values from database where column is unique

I need help with an advanced SQL-query (MSSQL 2000).
I have a table called Result that lists athletics 100 meter race-times. A runner can have several racetimes but I want to show only the best time from each runner.
The Result-table contains three columns, Result_id, athlete_id, result_time. So athlete_id must be unique when I list the values and result_time must be the fastest (lowest) value.
Any ideas?
In SQL Server 2000, you can't use windows functions. You can do this as follows:
select r.*
from result r join
(select athlete_id, min(result_time) as mintime
from result r
group by athlete_id
) rsum
on rsum.athlete_id = r.athlete_id and r.time = rsum.mintime
In more recent versions of SQL Server, you would use row_number().
If you simply need the fastest time for each athlete_id, do this:
select athelete_id, min(result_time) as FastestTime
from result
group by athelete_id
To show additional columns from the result table, you can join back to it like this:
select r.*
from result r
inner join (
select athelete_id, min(result_time) as FastestTime
from result
group by athelete_id
) rm on r.athelete_id = rm.athelete_id and r.result_time = rm.FastestTime
What you want is to use an aggregate function. in this case min() which will select the minumin data from all the rows that have the same data in the other selected columns. This means you also have to us the group by clause. The query below should give you the results you want.
Edit: If you need other columns, just bring them into the select clause, then add them to the group by clause like below:
select althlete_id, result_id, min(result_time) as result_time from result-table group by althlete_id, result_id
select althlete_id, result_id, min(result_time) as result_time, race_date from result-table group by althlete_id, race_date, result_id
Edit: You need to add all the columns into the group by that aren't part of an aggregate function. Aggregate functions are ones like min(), max(), avg() and so on.
Short answer: If you aren't putting a column in brackets, it probably has to be in the group by.

Categories