I have a tbl_user , which contains information about user, and I have a tbl_article, which contains articles + the ID of the tbl_user
We have a parent-child relation, because every user may have many articles, that's why I included user_id in the articles table.
I'd like to list the 10 users that have most articles... I've searched everywhere though I couldn't find it...I've thought about it , but in vain, I'm not good in SQL Queries.
Thank you in advance
SELECT TOP(10)
tbl_user.id,
COUNT(tbl_article.user_id)
FROM
tbl_user
LEFT JOIN
tbl_article
ON tbl_user.id = tbl_article.user_id
GROUP BY
tbl_user.id
ORDER BY
COUNT(tbl_article.user_id) DESC
LIMIT
10
Depending on which RDBMS you use, you may need TOP(10) or LIMIT 10, etc. I included both so you can see, but only use the one that is used by your RDBMS ;)
SELECT TOP 10
UserID, COUNT(Article)
FROM tbl_User u
INNER JOIN tbl_Article a
ON a.Userid = u.userid
GROUP BY userid
ORDER BY COUNT(article) DESC
All you need is a GROUP BY and a JOIN.
If there is a potential for users with 0 articles that you want to include, you should use a LEFT JOIN.
Optionally you can also COUNT(DISTINCT Article) if there is a concern about duplicates.
Related
I have been looking for a solution for this for over an hour now and decided to resort to asking here.
I am creating a "Twitter-like", following system for users of my Website and I wanted to be able to display each and every one of the users that a specific user follows or is followed by, I also want to then order this by the timestamp on the follow table, descending so that the latest follower is at the top.
The solutions I have come across seem to use inner joins etc. which is all well and good, but I was wondering whether there is a logical solution for my current query to do this.
Table structures:
users:
id | username
follows:
id | follower_id | following_id | timestamp
My current query:
SELECT * FROM users WHERE id IN (SELECT follower_id FROM follows WHERE following_id = $user_id) ORDER BY id ASC
Of course this will simply order by the user ID, how would I (using the current query structure), be able to add the order to list by the follows timestamp?
MySQL INNER JOIN
"SELECT users.* FROM users
INNER JOIN follows ON follows.follower_id = users.id
WHERE follows.following_id = $user_id
ORDER BY follows.timestamp DESC";
You can sort using multiple columns like this:
ORDER BY [column1] [ASC|DESC], [columm2] [ASC|DESC], ...
Therefore, edit your query's order by clause to include the second column and sort it descending.
You must use a join to add the column; here's the basic syntax of a join:
SELECT [table_name].[column_name], ...
FROM [table1]
JOIN [table2] ON [join condition]
...
Your code should look somewhat like this:
SELECT users.*
FROM users
JOIN follows ON users.id = follows.following_id
WHERE follows.following_id = $user_id
ORDER BY users.id ASC, follows.timestamp DESC
As far as I know, there is no way to do this without joining the tables; perhaps its possible to sort the returned list, but no guarantees):
SELECT * FROM users
WHERE id IN (SELECT follower_id FROM follows
WHERE following_id = $user_id
ORDER BY timestamp DESC)
ORDER BY id ASC;
The above may or may not work (I didn't test it); if it does not, you must use a join query.
Let there be two tables, one holding user information and one holding user records of some sort, say receipts. There is a one-to-many relationship between the users and receipts.
What would be the best SQL method of retrieving users, sorted by the greatest number of receipts?
The best way I can think of is using a join and count(?) to return an array of users and their number of associated receipts.
Is there a way to make use of the count function in this instance?
select * from `users` inner join `receipts` on `users`.`id` = `receipts`.`uId`
If OP wishes to include additional information (additional aggregations, etc...) utilizing data from users table:
SELECT `users`.`id`,
count(`receipts`.`uId`)
FROM `users`
INNER JOIN `receipts` ON `users`.`id` = `receipts`.`uId`
GROUP BY `users`.`id`
ORDER BY count(`receipts`.`uId`) DESC
Otherwise, only the receipts table is required...
SELECT `users`.`id`,
count(`receipts`.`uId`)
FROM `receipts`
GROUP BY `receipts`.`uId`
ORDER BY count(`receipts`.`uId`) DESC
Two answers provided by Dave and meewoK will accomplish what you need. I'm providing an alternative, which should provide better performance and allow you to show more user information because in the case with Dave's answer you can only SELECT columns that are used by an aggregate function or in the group clause.
SELECT users.id, users.name, r.numReceipts
FROM users u
INNER JOIN (
SELECT uId, count(receipts) as numReceipts
FROM receipts
GROUP BY receipts.id
) as r ON r.uId = u.id
ORDER BY r.numReceipts DESC
This creates an inline view. Only return the count of receipts of each user and then join this inline view on the user's ID.
Some one correct me if I'm wrong, but I've been told that the planner isn't as efficient when you do a scalar subquery in the SELECT clause. It's better to join on a temporary table this way. There are multiple ways to write this query and it all depends on how you want to use the information!!! Cheers!
try this
SELECT a.`id`, count(b.`recipts`) as total_receipts
FROM `users` a
INNER JOIN `receipts` b
ON a.`id` = b.`uId`
GROUP BY a.`id`
ORDER BY count(b.`receipts`) desc
SELECT users.*, (SELECT COUNT(*) FROM tblreceipts WHERE tblreciepts.uId=users.id) as counter FROMusersORDER BY counter DESC
Something like this may work (not sure on the speed though if its big tables)
If you want to include all users, even those with no receipts, then a good way is a left outer join:
SELECT u.*, count(r.uid) as NumReceipts
FROM `users` u left outer join
`receipts` r
ON u.id = r.`uId
GROUP BY `u.id
ORDER BY NumReceipts DESC;
If you only want the id for users that have receipts, then the join is not even necessary:
SELECT r.uid, count(*) as NumReceipts
FROM receipts r
GROUP BY r.uid
ORDER BY NumReceipts
I have two tables:
users: user_id, user_zip
settings: user_id, pref_ex_loc
I need to find the single most popular 'pref_ex_loc' from the settings table based on a particular user_zip, which will be specified as the variable $userzip.
Here is the query that I have now and obviously it doesn't work.
$popularexloc = "SELECT pref_ex_loc, user_id COUNT(pref_ex_loc) AS countloc
FROM settings FULL OUTER JOIN users ON settings.user_id = users.user_id
WHERE users.user_zip='$userzip'
GROUP BY settings.pref_ex_loc
ORDER BY countloc LIMIT 1";
$popexloc = mysql_query($popularexloc) or die('SQL Error :: '.mysql_error());
$exlocrow = mysql_fetch_array($popexloc);
$mostpopexloc=$exlocrow[0];
echo '<option value="'.$mostpopexloc.'">'.$mostpopexloc.'</option>';
What am I doing wrong here? I'm not getting any kind of error from this either.
Give this a try:
select s.pref_ex_loc from settings s
join users u on (u.user_id = s.user_id)
where user_zip = $userzip
group by s.pref_ex_loc
order by count(*) desc
limit 1
As you said, this will give you the "single most popular 'pref_ex_loc' from the settings table based on a particular user_zip"
Well, for one thing you are missing a comma before the COUNT():
SELECT pref_ex_loc, user_id COUNT(...
You should have a comma between each field in your select-list:
SELECT pref_ex_loc, user_id, COUNT(...
I would recommend using COUNT(*) instead of COUNT(pref_ex_loc). In this case, either should give the right answer, but in MySQL COUNT(*) usually performs slightly better.
You're using outer join, but then in the WHERE clause you're testing one of the columns of users so it's effectively not an outer join anymore. In this query, I believe you simply need an INNER JOIN, unless you need to handle the possibility that none of the users reference any of your pref_ex_loc values. Read A Visual Explanation of SQL Joins.
Also, MySQL does not support FULL OUTER JOIN.
Your user_id in the select-list, when it is neither in the GROUP BY clause nor in an aggregate function, is an ambiguous field, taking its value from one arbitrary row in the group. You should remove user_id from the select-list.
Sort by the countloc DESC to get the greatest value first.
So here's what I see as a better query:
SELECT pref_ex_loc, COUNT(*) AS countloc
FROM settings INNER JOIN users ON settings.user_id = users.user_id
WHERE users.user_zip='$userzip' GROUP BY settings.pref_ex_loc
ORDER BY countloc DESC LIMIT 1
this will allow values (duplicate most popular) with the highest pref_ex_loc to be shown in the list.
It doesn't use LIMIT, because LIMIT forces the maximum number of rows to be shown. Now, here's the question, What if there are two or more rows that ties up with the most popular pref_ex_loc?
SELECT b.pref_ex_loc
FROM users a
INNER JOIN settings b
ON a.user_ID = b.user_ID
WHERE a.user_zip = 1 -- change the value here
GROUP BY b.pref_ex_loc
HAVING COUNT(*) =
(
SELECT MAX(totalCount)
FROM
(
SELECT b.pref_ex_loc, COUNT(*) totalCount
FROM users a
INNER JOIN settings b
ON a.user_ID = b.user_ID
WHERE a.user_zip = 1 -- change the value here
GROUP BY b.pref_ex_loc
) s
)
SQLFiddle Demo
SQLFiddle Demo (with duplicate most popular)
Try with this query:
SELECT user_id, COUNT(pref_ex_loc) AS countloc
FROM users LEFT JOIN settings ON users.user_id = settings.user_id
WHERE users.user_zip='$userzip' GROUP BY user_id ORDER BY countloc LIMIT 1
dear php and mysql expertor
i have two table one large for posts artices 200,000records (index colume: sid) , and one small table (index colume topicid ) for topics has 20 record .. have same topicid
curent im using : ( it took round 0.4s)
+do get last 50 record from table:
SELECT sid, aid, title, time, topic, informant, ihome, alanguage, counter, type, images, chainid FROM veryzoo_stories ORDER BY sid DESC LIMIT 0,50
+then do while loop in each records for find the maching name of topic in each post:
while ( .. ) {
SELECT topicname FROM veryzoo_topics WHERE topicid='$topic'"
....
}
+Now
I going to use Inner Join for speed up process but as my test it took much longer from 1.5s up to 3.5s
SELECT a.sid, a.aid, a.title, a.time, a.topic, a.informant, a.ihome, a.alanguage, a.counter, a.type, a.images, a.chainid, t.topicname FROM veryzoo_stories a INNER JOIN veryzoo_topics t ON a.topic = t.topicid ORDER BY sid DESC LIMIT 0,50
It look like the inner join do all joining 200k records from two table fist then limit result at 50 .. that took long time..
Please help to point me right way doing this..
eg take last 50 records from table one.. then join it to table 2 .. ect
Do not use inner join unless the two tables share the same primary key, or you'll get duplicate values (and of course a slower query).
Please try this :
SELECT *
FROM (
SELECT a.sid, a.aid, a.title, a.time, a.topic, a.informant, a.ihome, a.alanguage, a.counter, a.type, a.images, a.chainid
FROM veryzoo_stories a
ORDER BY sid DESC
LIMIT 0 , 50
)b
INNER JOIN veryzoo_topics t ON b.topic = t.topicid
I made a small test and it seems to be faster. It uses a subquery (nested query) to first select the 50 records and then join.
Also make sure that veryzoo_stories.sid, veryzoo_stories.topic and veryzoo_topics.topicid are indexes (and that the relation exists if you use InnoDB). It should improve the performance.
Now it leaves the problem of the ORDER BY LIMIT. It is heavy because it orders the 200,000 records before selecting. I guess it's necessary. The indexes are very important when using ORDER BY.
Here is an article on the problem : ORDER BY … LIMIT Performance Optimization
I'm just give test to nested query + inner join and suprised that performace increase much: it now took only 0.22s . Here is my query:
SELECT a.*, t.topicname
FROM (SELECT sid, aid, title, TIME, topic, informant, ihome, alanguage, counter, TYPE, images, chainid
FROM veryzoo_stories
ORDER BY sid DESC
LIMIT 0, 50) a
INNER JOIN veryzoo_topics t ON a.topic = t.topicid
if no more solution come up , i may use this one .. thanks for anyone look at this post
I'm working on building a forum with kohana. I know there is already good, free, forum software out there, but it's for a family site, so I thought I'd use it as a learning experience. I'm also not using the ORM that is built into Kohana, as I would like to learn more about SQL in the process of building the forum.
For my forum I have 4 main tables:
USERS
TOPICS
POSTS
COMMENTS
TOPICS table: id (auto incremented), topic row.
USERS table: username, email, first and last name and a few other non related rows
POSTS table: id (auto incremented), post-title, post-body, topic-id, user-id, post-date, updated-date, updated-by(which will contain the user-id of the person who made the most recent comment)
COMMENTS table: id (auto incremented), post-id, user-id and comment
On the main forum page I would like to have:
a list of all of the topics
the number of posts for each topic
the last updated post, and who updated it
the most recently updated topic to be on top, most likely an "ORDER BY updated-date"
Here is the query I have so far:
SELECT topics.id AS topic-id,
topics.topic,
post-user.id AS user-id,
CONCAT_WS(' ', post-user.first-name, post-user.last-name) AS name,
recent-post.id AS post-id,
post-num.post-total,
recent-post.title AS post-title,
recent-post.update_date AS updated-date,
recent-post.updated-by AS updated-by
FROM topics
JOIN (SELECT posts.topic-id,
COUNT(*) AS post-total
FROM POSTS
WHERE posts.topic-id = topic-id
GROUP BY posts.topic-id) AS post-num ON topics.id = post-num.topic-id
JOIN (SELECT posts.*
FROM posts
ORDER BY posts.update-date DESC) AS recent-post ON topics.id = recent-post.topic-id
JOIN (SELECT users.*,
posts.user-id
FROM users, posts
WHERE posts.user-id = users.id) as post-user ON recent-post.user_id = post-user.id
GROUP BY topics.id
This query almost works as it will get all of information for topics that have posts. But it doesn't return the topics that don't have any posts.
I'm sure that the query is inefficient and wrong since it makes two sub-selects to the posts table, but it was the only way I could get to the point I'm at.
Dash is not a valid character in SQL identifiers, but you can use "_" instead.
You don't necessarily have to get everything from a single SQL query. In fact, trying to do so makes it harder to code, and also sometimes makes it harder for the SQL optimizer to execute.
It makes no sense to use ORDER BY in a subquery.
Name your primary key columns topic_id, user_id, and so on (instead of "id" in every table), and you won't have to alias them in the select-list.
Here's how I would solve this:
First get the most recent post per topic, with associated user information:
SELECT t.topic_id, t.topic,
u.user_id, CONCAT_WS(' ', u.first_name, u.last_name) AS full_name,
p.post_id, p.title, p.update_date, p.updated_by
FROM topics t
INNER JOIN
(posts p INNER JOIN users u ON (p.updated_by = u.user_id))
ON (t.topic_id = p.topic_id)
LEFT OUTER JOIN posts p2
ON (p.topic_id = p2.topic_id AND p.update_date < p2.update_date)
WHERE p2.post_id IS NULL;
Then get the counts of posts per topic in a separate, simpler query.
SELECT t.topic_id, COUNT(*) AS post_total
FROM topics t LEFT OUTER JOIN posts p USING (topic_id)
GROUP BY t.topic_id;
Merge the two data sets in your application.
to ensure you get results for topics without posts, you'll need to use LEFT JOIN instead of JOIN for the first join between topics and the next table. LEFT JOIN means "always return a result set row for every row in the left table, even if there's no match with the right table."
Gotta go now, but I'll try to look at the efficiency issues later.
This is a very complicated query. You should note that JOIN statements will limit your topics to those that have posts. If a topic does not have a post, a JOIN statement will filter it out.
Try the following query.
SELECT *
FROM
(
SELECT T.Topic,
COUNT(AllTopicPosts.ID) NumberOfPosts,
MAX(IFNULL(MostRecentPost.Post-Title, '') MostRecentPostTitle,
MAX(IFNULL(MostRecentPostUser.UserName, '') MostRecentPostUser
MAX(IFNULL(MostRecentPost.Updated_Date, '') MostRecentPostDate
FROM TOPICS
LEFT JOIN POSTS AllTopicPosts ON AllTopicPosts.Topic_Id = TOPICS.ID
LEFT JOIN
(
SELECT *
FROM Posts P
WHERE P.Topic_id = TOPICS.id
ORDER BY P.Updated_Date DESC
LIMIT 1
) MostRecentPost ON MostRecentPost.Topic_Id = TOPICS.ID
LEFT JOIN USERS MostRecentPostUser ON MostRecentPostUser.ID = MostRecentPost.User_Id
GROUP BY T.Topic
)
ORDER BY MostRecentPostDate DESC
I'd use a left join inside a subquery to pull back the correct topic, and then you can do a little legwork outside of that to get some of the user info.
select
s.topic_id,
s.topic,
u.user_id as last_updated_by_id,
u.user_name as last_updated_by,
s.last_post,
s.post_count
from
(
select
t.id as topic_id,
t.topic,
t.user_id as orig_poster,
max(coalesce(p.post_date, t.post_date)) as last_post,
count(*) as post_count --would be p.post_id if you don't want to count the topic
from
topics t
left join posts p on
t.id = p.topic_id
group by
t.topic_id,
t.topic,
t.user_id
) s
left join posts p on
s.topic_id = p.topic_id
and s.last_post = p.post_date
and s.post_count > 1 --0 if you're using p.post_id up top
inner join users u on
u.id = coalesce(p.user_id, s.orig_poster)
order by
s.last_post desc
This query does introduce coalesce and left join, and they are very good concepts to look into. For two arguments (like used here), you can also use ifnull in MySQL, since it is functionally equivalent.
Keep in mind that that's exclusive to MySQL (if you need to port this code). Other databases have other functions for that (isnull in SQL Server, nvl in Oracle, etc., etc.). I used coalesce so that I could keep this query all ANSI-fied.