MySQL query giving duplicate results - php

I am querying four tables (activities, notes, categories, user_entries) looking for matches that are EITHER activities OR notes. In my development database, I have 8 activities and 2 notes that match: instead I get 16 results. Each activity is duplicated: one result including the first note, the other including the second.
This is the query as it stands:
SELECT a.`aid`, a.`activityname`, a.`date`, u.`points`, u.`enid`, c.`catname`, n.`nid`, n.`notename`, n.`dates`
FROM activities a
INNER JOIN user_entries u ON a.`aid` = u.`aid_FK`
INNER JOIN categories c ON a.`category` = c.`cat`
INNER JOIN notes n ON u.`mem_no_FK` = n.`mem_no_FK`
WHERE u.`mem_no_FK` = 1995
GROUP BY a.`aid`, a.`activityname`, a.`date`, u.`points`, u.`enid`, c.`catname`, n.`nid`, n.`notename`, n.`dates`
ORDER BY `date`
I have looked at quite a few similar questions here, (especially question 7696248): but nothing I've tried has worked. Despite trying many suggestions (with/without DISTINCT, with/without GROUP BY, many join types, etc.) I get the same result each time: there is something basically wrong with the query. My knowledge of SQL is not good and I don't know how to fix it.
Can anyone help?
EDIT: FOR CLARITY
The database is a record of continuing professional development activities. Notes (confusingly named, but not by me!) are not linked to activities: they are there for users to record significant events- "No CPD done for 3 months due to road accident", and so on.
I need to list the notes and the activities on the same page: if it's possible, I'd prefer to run one DB query; if it isn't, I'll have to run two queries and amalgamate the two results arrays.

Please try DISTINCT for aid
SELECT DISTINCT(a.`aid`), a.`activityname`, a.`date`, u.`points`, u.`enid`, c.`catname`, n.`nid`, n.`notename`, n.`dates`
FROM activities a
INNER JOIN user_entries u ON a.`aid` = u.`aid_FK`
INNER JOIN categories c ON a.`category` = c.`cat`
INNER JOIN notes n ON u.`mem_no_FK` = n.`mem_no_FK`
WHERE u.`mem_no_FK` = 1995
GROUP BY a.`aid`, a.`activityname`, a.`date`, u.`points`, u.`enid`, c.`catname`, n.`nid`, n.`notename`, n.`dates`
ORDER BY `date`

If you want to select all notes in one column you can use GROUP_CONCAT or other aggregate functions
SELECT a.`aid`, a.`activityname`, a.`date`, u.`points`, u.`enid`, c.`catname`, GROUP_CONCAT(n.`notename` SEPARATOR ' ') as notes
FROM activities a
JOIN user_entries u ON a.`aid` = u.`aid_FK`
JOIN categories c ON a.`category` = c.`cat`
JOIN notes n ON u.`mem_no_FK` = n.`mem_no_FK`
WHERE u.`mem_no_FK` = 1995
GROUP BY a.`aid`
ORDER BY `date`

Related

Left join and count in same query return incorrect result

The problem is that if there is 0 comment or 1 comment the count shows 1 while the rest is working well means that 2, 3, etc working fine.
$sql = "SELECT blog.*,count(blog.id) as Total FROM blog left JOIN comment on comment.id = blog.id GROUP BY date desc";
Your query should look like this:
SELECT b.date, count(c.id) as Total
FROM blog b LEFT JOIN
comment c
ON c.id = b.id
GROUP BY b.date DESC;
This assumes that date comes from blog (which should be the case if your current query is working). The difference is that you are counting from the second table, not the first.
This does not use * for columns from blog. That is usually a very, very bad idea when using GROUP BY. The best practice (enforced by almost all SQL engines) is to only include unaggregated columns in the SELECT when they are in the GROUP BY.
Note: It seems very awkward that the same column id is used for the JOIN between two very different entities (blogs and comments).
i just change to count(comment.id) from count(blog.id)

Mysql query inner join on most recent date (today, yesterday, or before)

I'm attempting to pull the latest pricing data from a table on an Inner Join. Prices get updated throughout the day but aren't necessary updated at midnight.
The following query works great when the data is updated on prices by the end of the day. But how do I get it to get yesterdays data if today's data is blank?
I'm indexing off of a column that is formatted like this date_itemnumber => 2015-05-22_12341234
SELECT h.*, collection.*, history.price
FROM collection
INNER JOIN h ON collection.itemid=h.id
INNER JOIN history ON collection.itemid=history.itemid
AND concat('2015-05-23_',collection.itemid)=history.date_itemid
WHERE h.description LIKE '%Awesome%'
Production Query time: .046 sec
To be clear, I want it to check for the most up to date record for that item. Regardless on if it is today, yesterday or before that.
SQLFiddle1
The following query gives me the desired results but with my production dataset it takes over 3 minutes to return results. As my dataset gets larger, it would take longer. So this can't be the most efficient way to do this.
SELECT h.*, collection.*, history.price
FROM collection
INNER JOIN h ON collection.itemid=h.id
INNER JOIN history ON collection.itemid=history.itemid
AND (select history.date_itemid from history WHERE itemid=collection.itemid GROUP BY date_itemid DESC LIMIT 1)=history.date_itemid
WHERE h.description LIKE '%Awesome%'
Production Query time: 181.140 sec
SQLFiddle2
SELECT x.*
FROM history x
JOIN
( SELECT itemid
, MAX(date_itemid) max_date_itemid
FROM history
-- optional JOINS and WHERE here --
GROUP
BY itemid
) y
ON y.itemid = x.itemid
AND y.max_date_itemid = x.date_itemid;
http://sqlfiddle.com/#!9/975f5/13
This should works:
SELECT h.*, collection.*, history.price
FROM collection
INNER JOIN h ON collection.itemid=h.id
INNER JOIN(
SELECT a.*
FROM history a
INNER JOIN
( SELECT itemid,MAX(date_itemid) max_date_itemid
FROM history
GROUP BY itemid
) b ON b.itemid = a.itemid AND b.max_date_itemid = a.date_itemid
) AS history ON history.itemid = collection.itemid
WHERE h.description LIKE '%Awesome%'
I don't know if this take a lot of execution time. Please do try it, since you might have more data in your tables it will be a good test to see the query execution time.
This is actually a fairly common problem in SQL, at least I feel like I run into it a lot. What you want to do is join a one to many table, but only join to the latest or oldest record in that table.
The trick to this is to do a self LEFT join on the table with many records, specifying the foreign key and also that the id should be greater or less than the other records' ids (or dates or whatever you're using). Then in the WHERE conditions, you just add a condition that the left joined table has a NULL id - it wasn't able to be joined with a more recent record because it was the latest.
In your case the SQL should look something like this:
SELECT h.*, collection.*, history.price
FROM collection
INNER JOIN h ON collection.itemid=h.id
INNER JOIN history ON collection.itemid=history.itemid
-- left join history table again
LEFT JOIN history AS history2 ON history.itemid = history2.itemid AND history2.id > history.id
-- filter left join results to the most recent record
WHERE history2.id IS NULL
AND h.description LIKE '%Awesome%'
This is another approach that cuts one inner join statement
select h.*,his.date_itemid, his.price from history his
INNER JOIN h ON his.itemid=h.id
WHERE his.itemid IN (select itemid from collection) AND h.description LIKE '%Awesome%' and his.id IN (select max(id) from history group by history.itemid)
you can try it here http://sqlfiddle.com/#!9/837a8/1
I am not sure if this is what you want but i give it a try
EDIT: modified
CREATE VIEW LatestDatesforIds
AS
SELECT
MAX(`history`.`date_itemid`) AS `lastPriceDate`,
MAX(`history`.`id`) AS `matchingId`
FROM `history`
GROUP BY `history`.`itemid`;
CREATE VIEW MatchDatesToPrices
AS
SELECT
`ldi`.`lastPriceDate` AS `lastPriceDate`,
`ldi`.`matchingId` AS `matchingId`,
`h`.`id` AS `id`,
`h`.`itemid` AS `itemid`,
`h`.`price` AS `price`,
`h`.`date_itemid` AS `date_itemid`
FROM (`LatestDatesforIds` `ldi`
JOIN `history` `h`
ON ((`ldi`.`matchingId` = `h`.`id`)));
SELECT c.itemid,price,lastpriceDate,description
FROM collection c
INNER JOIN MatchDatesToPrices mp
ON c.itemid = mp.itemid
INNER JOIN h ON c.itemid = h.id
Difficult to test the speed on such a small dataset but avoiding 'Group By' might speed things up. You could try conditionally joining the history table to itself instead of Grouping?
e.g.
SELECT h.*, c.*, h1.price
FROM h
INNER JOIN history h1 ON h1.itemid = h.id
LEFT OUTER JOIN history h2 ON h2.itemid = h.id
AND h1.date_itemid < h2.date_itemid
INNER JOIN collection c ON c.itemid = h.id
WHERE h2.id IS NULL
AND h.description LIKE '%Awesome%'
Changing this line
AND h1.date_itemid < h2.date_itemid
to actually work on a sequential indexed field (preferably unique) will speed things up too. e.g. order by id ASC

Multiple joins in a MySQL query giving incorrect results

I've got a large mysql query with 5 joins which may not seem efficient but I'm struggling to find a different solution which would work.
The views table is the main table here, because both clicks and conversions table rely on it via the token column(which is indexed and set as a foreign key in all tables).
The query:
SELECT
var.id,
var.disabled,
var.name,
var.updated,
var.cid,
var.outdated,
IF(var.type <> 0,'DL','LP') AS `type`,
COUNT(DISTINCT v.id) AS `views`,
COUNT(DISTINCT c.id) AS `clicks`,
COUNT(DISTINCT co.id) AS `conversions`,
SUM(tc.cost) AS `cost`,
SUM(cp.value) AS `revenue`
FROM variants AS var
LEFT JOIN views AS v ON v.vid = var.id
LEFT JOIN traffic_cost AS tc ON tc.id = v.source
LEFT JOIN clicks AS c ON c.token = v.token
LEFT JOIN conversions AS co ON co.token = v.token
LEFT JOIN c_profiles AS cp ON cp.id = co.profile
WHERE var.cid = 28
GROUP BY var.id
The results I'm getting are:
The problem is the revenue and cost results are too hight, because for views,clicks and impressions only the distinct rows are counted, but for revenue and cost for some reason(I would really appreciate an explanation here) all rows in all tables are taken into the result set.
I know this is a large query, but both clicks and conversions tables rely on the views table which is used for filtering the results e.g. views.country = 'uk'. I've tried doing 3 queries and merging them, but that didn't work(it gave me wrong results).
One more thing that I find weird is that if I remove the joins with clicks, conversions, c_profiles the costs column shows correct results.
Any help would be appreciated.
In the end I had to use 3 different queries and do a merge on them. Seemed like an overhead, but worked for me.

how to do this complex fetch in 1 query?

I have an application with tutors and courses and subscribers and ratings. These are the tables I am using:
tbl_tutors:
id
name
tbl_subscribers:
id
user_id
course_id
tbl_courses:
id
name
tutor_id
tbl_ratings:
id
user_id
course_id
rating
I need to get 1 tutor with the number of courses he has, the number of total subscribers for those courses and the average course rating for all his courses. This is a lot of data; can it be done in 1 sql query or do I need to code foreach statements in php to get the average ratings and the total subscribers for those courses?
Well do you need totals per tutor-course combination or a total (and average) at the tutor level?
And what is the rating table adding over the subscriber table? Aren't they both unique user-course combinations?
If one user attends multiple courses by the same tutor, how many subscribers do they count as?
The SQL provided by #alfasin is easily extended to all tutors. The syntax below is for SQL server, you may need to change for MySQL
Select t.name, count(distinct c.id) courseCount, count(s.id) subscribers, avg(r.rating) subRating
From tbl_tutors t
Inner join tbl_courses c on c.tutorid = t.id
Inner join tbl_subscribers s on s.courseid = c.id
Inner join tbl_ratings r on r.userid = s.userid and r.courseid = c.id
Group by t.name
Note that when trying to build queries like this it's usually best to do them without grouping so you can inspect which rows are contributing to the counts and ensure you're including everything you expect and that you're not duplicating results
select t.name "Tutor", count(c.id) "# courses", count(s.id) "# subscribers"
from tbl_tutors t, tbl_subscribers s, tbl_courses c, tbl_ratings r
where t.id = XXX
and c.tutor_id = t.id
and s.course_id = c.id
and r.user_id = s.user_id
group by t.name
this sql will get you all you need besides the courses average (substitute the XXX with the tutor-id you want to find). for courses average you can run a separate select.

Need help with a multiple table query in mysql

I'm working on building a forum with kohana. I know there is already good, free, forum software out there, but it's for a family site, so I thought I'd use it as a learning experience. I'm also not using the ORM that is built into Kohana, as I would like to learn more about SQL in the process of building the forum.
For my forum I have 4 main tables:
USERS
TOPICS
POSTS
COMMENTS
TOPICS table: id (auto incremented), topic row.
USERS table: username, email, first and last name and a few other non related rows
POSTS table: id (auto incremented), post-title, post-body, topic-id, user-id, post-date, updated-date, updated-by(which will contain the user-id of the person who made the most recent comment)
COMMENTS table: id (auto incremented), post-id, user-id and comment
On the main forum page I would like to have:
a list of all of the topics
the number of posts for each topic
the last updated post, and who updated it
the most recently updated topic to be on top, most likely an "ORDER BY updated-date"
Here is the query I have so far:
SELECT topics.id AS topic-id,
topics.topic,
post-user.id AS user-id,
CONCAT_WS(' ', post-user.first-name, post-user.last-name) AS name,
recent-post.id AS post-id,
post-num.post-total,
recent-post.title AS post-title,
recent-post.update_date AS updated-date,
recent-post.updated-by AS updated-by
FROM topics
JOIN (SELECT posts.topic-id,
COUNT(*) AS post-total
FROM POSTS
WHERE posts.topic-id = topic-id
GROUP BY posts.topic-id) AS post-num ON topics.id = post-num.topic-id
JOIN (SELECT posts.*
FROM posts
ORDER BY posts.update-date DESC) AS recent-post ON topics.id = recent-post.topic-id
JOIN (SELECT users.*,
posts.user-id
FROM users, posts
WHERE posts.user-id = users.id) as post-user ON recent-post.user_id = post-user.id
GROUP BY topics.id
This query almost works as it will get all of information for topics that have posts. But it doesn't return the topics that don't have any posts.
I'm sure that the query is inefficient and wrong since it makes two sub-selects to the posts table, but it was the only way I could get to the point I'm at.
Dash is not a valid character in SQL identifiers, but you can use "_" instead.
You don't necessarily have to get everything from a single SQL query. In fact, trying to do so makes it harder to code, and also sometimes makes it harder for the SQL optimizer to execute.
It makes no sense to use ORDER BY in a subquery.
Name your primary key columns topic_id, user_id, and so on (instead of "id" in every table), and you won't have to alias them in the select-list.
Here's how I would solve this:
First get the most recent post per topic, with associated user information:
SELECT t.topic_id, t.topic,
u.user_id, CONCAT_WS(' ', u.first_name, u.last_name) AS full_name,
p.post_id, p.title, p.update_date, p.updated_by
FROM topics t
INNER JOIN
(posts p INNER JOIN users u ON (p.updated_by = u.user_id))
ON (t.topic_id = p.topic_id)
LEFT OUTER JOIN posts p2
ON (p.topic_id = p2.topic_id AND p.update_date < p2.update_date)
WHERE p2.post_id IS NULL;
Then get the counts of posts per topic in a separate, simpler query.
SELECT t.topic_id, COUNT(*) AS post_total
FROM topics t LEFT OUTER JOIN posts p USING (topic_id)
GROUP BY t.topic_id;
Merge the two data sets in your application.
to ensure you get results for topics without posts, you'll need to use LEFT JOIN instead of JOIN for the first join between topics and the next table. LEFT JOIN means "always return a result set row for every row in the left table, even if there's no match with the right table."
Gotta go now, but I'll try to look at the efficiency issues later.
This is a very complicated query. You should note that JOIN statements will limit your topics to those that have posts. If a topic does not have a post, a JOIN statement will filter it out.
Try the following query.
SELECT *
FROM
(
SELECT T.Topic,
COUNT(AllTopicPosts.ID) NumberOfPosts,
MAX(IFNULL(MostRecentPost.Post-Title, '') MostRecentPostTitle,
MAX(IFNULL(MostRecentPostUser.UserName, '') MostRecentPostUser
MAX(IFNULL(MostRecentPost.Updated_Date, '') MostRecentPostDate
FROM TOPICS
LEFT JOIN POSTS AllTopicPosts ON AllTopicPosts.Topic_Id = TOPICS.ID
LEFT JOIN
(
SELECT *
FROM Posts P
WHERE P.Topic_id = TOPICS.id
ORDER BY P.Updated_Date DESC
LIMIT 1
) MostRecentPost ON MostRecentPost.Topic_Id = TOPICS.ID
LEFT JOIN USERS MostRecentPostUser ON MostRecentPostUser.ID = MostRecentPost.User_Id
GROUP BY T.Topic
)
ORDER BY MostRecentPostDate DESC
I'd use a left join inside a subquery to pull back the correct topic, and then you can do a little legwork outside of that to get some of the user info.
select
s.topic_id,
s.topic,
u.user_id as last_updated_by_id,
u.user_name as last_updated_by,
s.last_post,
s.post_count
from
(
select
t.id as topic_id,
t.topic,
t.user_id as orig_poster,
max(coalesce(p.post_date, t.post_date)) as last_post,
count(*) as post_count --would be p.post_id if you don't want to count the topic
from
topics t
left join posts p on
t.id = p.topic_id
group by
t.topic_id,
t.topic,
t.user_id
) s
left join posts p on
s.topic_id = p.topic_id
and s.last_post = p.post_date
and s.post_count > 1 --0 if you're using p.post_id up top
inner join users u on
u.id = coalesce(p.user_id, s.orig_poster)
order by
s.last_post desc
This query does introduce coalesce and left join, and they are very good concepts to look into. For two arguments (like used here), you can also use ifnull in MySQL, since it is functionally equivalent.
Keep in mind that that's exclusive to MySQL (if you need to port this code). Other databases have other functions for that (isnull in SQL Server, nvl in Oracle, etc., etc.). I used coalesce so that I could keep this query all ANSI-fied.

Categories