Matching interests(Nearest neighbour) search in SQL - php

I'm trying to find users with similar set of interests, with the following schema..
USERS - ID name etc
Interests - ID UID PID
where ID is unique ID for Interests, UIS is user ID and PID is a product ID. I have looked at other similar questions at SO, but none of them had an exact answer.
Example- Let's say I'm interested in getting users with similar interest to John, and this is how to two tables look like ...
ID Name
11 John
12 Mary
13 Scott
14 Tim
ID UID PID
3 12 123
4 12 231
5 12 612
6 13 123
7 13 612
8 14 931
9 14 214
10 11 123
11 11 231
12 11 781
13 11 612
I would like a result with in that order.
I was thinking of doing a set intersection of the user I'm interested in with all other users. It doesn't sound like a very good solution, because it will have to be done everytime a user adds interest or another user is added. Its a small project, and as of now I'll be limiting users to 100. I still think that the above approach will not be efficient at all as it will take 1002 time.
Can someone guide me in the right direction? What are the possible solutions, and which one will be the best with above given constraints. I'm looking at ANN to see if I can use that.

This starts by counting the number of interests that each user has in common with John. The approach is to take all of John's interests, join back to the interests table and aggregate to the the count of common interests. Here is the SQL for that:
select i.uid, COUNT(*) as cnt
from (select i.*
from interests i join
users u
on i.uid = i.id
where u.name = 'John'
) ilist join
interests i
on ilist.pid = i.pid and
ilist.uid <> i.uid -- forget about John
group by i.uid
But, you actually want the list of products, rather than just the count. So, you have to join back to the interests table:
select i.*
from (select i.uid, COUNT(*) as cnt
from (select i.*
from interests i join
users u
on i.uid = i.id
where u.name = 'John'
) ilist join
interests i
on ilist.pid = i.pid and
ilist.uid <> i.uid -- forget about John
group by i.uid
) t join
interests i
on t.uid = i.uid
group by t.cnt, i.uid

The following query finds others users with atleast 2 or more similar interests according to the interests of user 11.
SELECT in2.UID FROM users u
INNER JOIN interest in1 ON (in1.UID = u.ID)
INNER JOIN interest in2 ON (in2.PID = in1.PID AND in2.UID <> u.ID)
WHERE u.ID = 11
GROUP BY in2.UID
HAVING COUNT(in2.UID) >= 2
ORDER BY COUNT(in2.UID) DESC
The ORDER BY ensures that users with the most similar interests ends up first. The HAVING COUNT(in2.UID) >= 2) makes sure the users which are found have atleast 2 or more similar interest.

Related

Mysql display random 4 users with more than 5 articles

I have users table and also articles tables. Article table contains articles submitted by users. I am working on a sql query to display random 4 users with more than 5 articles. user_id is stored in articles table. I have searched around in stackoverflow and google even though there are some similar questions, i couldn't find anything specific to mine.
Can anyone let me know if this question has been answered before and give me a link if yes otherwise I have the following query:
SELECT *
FROM users WHERE type = 3
INNER JOIN articles ON
users.user_id = articles.user_id HAVING COUNT(user_id) > 5
This doesn't seem to work. I will appreciate any help to improve this query.
Database table is as follows:
USERS:
user_id
username
email
type
ARTICLES:
id
user_id
title
For example, total user count is 100. User with user_id 49 has 10 articles, and another user with user_id 50 has 20 articles and the rest of the users have less than 5 articles. So the query should return only the user 49 and 50.
Hope this makes sense.
regards
I've mocked up some table data to test my query. WHERE clauses must be positioned after JOINs. You are also a little ambiguous about the comparison of COUNT AND 5 -- if you want more than 5 then >5, if you want 5 or more then >=5.
SQL: (SQLFiddle Demo)
SELECT a.user_id,a.username,COUNT(b.user_id)
FROM users a
INNER JOIN articles b ON a.user_id=b.user_id
WHERE a.type=3
GROUP BY a.user_id
HAVING COUNT(b.user_id)>5
ORDER BY RAND()
LIMIT 4
You have where join and having in wrong position, you missed group by for a correct functioning of having
and you need an order by rand and limit 4
SELECT u.user_id
FROM users u
INNER JOIN articles a ON u.user_id = a.user_id
WHERE a.type = 3
group by u.user_id
HAVING COUNT(a.user_id)>= 5
order by rand() limit 4

CodeIgniter query to exclude a subset from results

I'm having some trouble figuring out how to write the proper query after doing a JOIN. I need to get all users in Group 1 while excluding a subset of these results.
Table users:
id name
1 John Smith
2 Joe Blow
3 Mary Jane
Table users_groups:
user_id group_id
1 1
1 3
1 4
2 1
2 4
2 5
3 1
3 6
Everyone in Group 6 will also be in Group 1, however, not everyone in Group 1 will be in Group 6. In other words, Group 6 is a sub-set of Group 1.
I need a query that will give me a list of all users who are in Group 1 (while excluding the users in Group 6). For the example above, I should get two results, John Smith and Joe Blow.
I'm using CodeIgniter v3
Here is my attempt (I removed the cache code for clarity)...
$this->db->from('users');
$this->db->select('
users.id AS `id`,
users.name AS `name`,
users_groups.group_id AS `group_id`
', FALSE);
$this->db->join('users_groups', 'users_groups.user_id = users.id', 'LEFT');
$this->db->group_by('users.email'); // remove duplication caused by JOIN
$this->db->where('users_groups.group_id = 1'); // get all users in Group 1
$this->db->where('users_groups.group_id <> 6'); // ignore all users in Group 6
return $this->db->get()->result_array();
The problem I'm having here is that I always get the full list of users in Group 1. Because the JOIN produces a list of all users and all groups, where the same user is listed multiple times, one entry for every Group that person belongs. My query is removing the Group 6 entries, but this is no good since the same users are also in Group 1.
I just explained why my query is failing, but I still cannot figure out how to achieve success. How do I get the Group 1 users and then remove the subset of users that are in Groups 1 & 6? These users can also be in other Groups, but these should be ignored... I just want to exclude users who are in Groups 1 & 6 from the list of users in Group 1.
Each user in the result:
must be in Group 1
must not be in Group 6
may or may not be in any other Group
Any suggestions appreciated.
You need a "not exists" clause in there as a filter.
And not exists (select 1 from users_groups x where
x.user_id = users_groups.user_id and group_id = 6
Im not familiar with code ignite but im sure this is doable
Thanks to Philip's answer, it's working. This is how to do it within CodeIgniter...
$this->db->where('users_groups.group_id = 1'); // get all users in Group 1
$this->db->where('
NOT EXISTS (
SELECT 1 FROM users_groups x
WHERE x.user_id = users_groups.user_id AND group_id = 6
)
'); // exclude users in Group 6

Selecting latest entries for distinct entry

Im having a brain fart as to how I would do this.
I need to select only the latest entry in a group of same id entries
I have records in an appointment table.
lead_id app_id
4 42
3 43
1 44
2 45
2 46 (want this one)
1 48
3 49 (this one)
4 50 (this one)
1 51 (this one)
The results I require are app_id 46,49,50,51
Only the latest entries in the appointment table, based on duplicate lead_id identifiers.
Here is the query you're looking for:
SELECT A.lead_id
,MAX(A.app_id) AS [last_app_id]
FROM appointment A
GROUP BY A.lead_id
If you want to have every columns corresponding to these expected rows:
SELECT A.*
FROM appointment A
INNER JOIN (SELECT A2.lead_id
,MAX(A2.app_id) AS [last_app_id]
FROM appointment A2
GROUP BY A2.lead_id) M ON M.lead_id = A.lead_id
AND M.last_app_id = A.app_id
ORDER BY A.lead_id
Here i simply use the previous query for a jointure in order to get only the desired rows.
Hope this will help you.
The accepted answer by George Garchagudashvili is not a good answer, because it has group by with unaggregated columns in the select. Select * with group by is simply something that should not be allowed in SQL -- and it isn't in almost all databases. Happily, the default version of the more recent versions of MySQL also rejects this syntax.
An efficient solution is:
select a.*
from appointment a
where a.app_id = (select max(a2.app_id)
from appointment a2
where a2.lead_id = a.lead_id
);
With an index on appointment(lead_id, app_id), this should be as fast or faster than George's query.
I think this is much more optimal and efficient way of doing it (sorting next grouping):
SELECT * FROM (
SELECT * FROM appointment
ORDER BY lead_id, app_id DESC
) AS ord
GROUP BY lead_id
this will be useful when you need all other fields too from the table without complicated queries
Result:
lead_id app_id
1 51
2 46
3 49
4 50

Selecting N rows from each group in MYSQL

I need to search for the updates sent by the friends of a giving user.
There is a table called friendship. It has a column called profile1 and another one called profile2. It represents the friendship between two users in this websystem, and a friendship is the presence of two giving ids, no matter in what position. So the profile with id 1 may have 2 friends, profile with id 2 and with id 3 as following:
friendship
profile1 profile2
1 2 <--
3 1 <--
2 5
...
Now I want to search for the updates sent by some user's friends. There is this table update
update
id content time profile
1 A text ... 2
2 A text ... 2
3 A text ... 3
4 A text ... 2
5 A text ... 3
6 A text ... 2
7 A text ... 10
8 A text ... 11
If my profile/user is identified by the id 1, and it has only 2 friends (the profiles identified by id 2 and 3) and also I need my search to return only 2 results by each user, my SELECT has to return updates 1,2,3 and 5.
Preferably updates should be grouped by its author and it would be great if I could set the number of different profiles to be considered in this search (for example, if profile 1 had 10 friends and I wanted only updates from 3 profiles, the most recent must appear first).
Do you know how can I achieve this??
thank you very much!
#EDIT
This returns all updates sent by friends of profile 1. But i'm not sure whether or not i'm in the right direction
SELECT u.*
FROM `update` u
INNER JOIN friendship f1 ON f1.profile1 = u.author
WHERE f1.profile2 =1
UNION
SELECT u.*
FROM `update` u
INNER JOIN friendship f2 ON f2.profile2 = u.author
WHERE f2.profile1 =1
If you are willing to do it in two queries, you can do it like this. First, get three profiles who have most recently posted based on your constraints:
-- Get the three latest updated profiles from here.
-- (we can't use a CTE because MySQL doesn't support
-- them yet).
SELECT DISTINCT p.profile FROM
(
SELECT ui.profile, ui.time FROM
(
SELECT u.profile, u.time
FROM `update` u
INNER JOIN `friendship` f ON f.profile2 = u.profile
WHERE f.profile1 = 1
UNION ALL
SELECT u.profile, u.time
FROM `update` u
INNER JOIN `friendship` f ON f.profile1 = u.profile
WHERE f.profile2 = 1
) ui ORDER BY ui.time DESC
) p LIMIT 0, 3;
From that query, get the three profile IDs out and put them in place of <id1>, <id2> and <id3> in the following query
-- Use a union to get the result set back
(SELECT a.content, a.time, a.profile FROM `update` a
WHERE a.profile = <id1>
ORDER BY a.time DESC
LIMIT 0, 2)
UNION ALL
(SELECT a.content, a.time, a.profile FROM `update` a
WHERE a.profile = <id2>
ORDER BY a.time DESC
LIMIT 0, 2)
UNION ALL
(SELECT a.content, a.time, a.profile FROM `update` a
WHERE a.profile = <id3>
ORDER BY a.time DESC
LIMIT 0, 2);
If you get less than three profiles back, either remove parts of the query in your PHP code, or set the WHERE clause to something like 0 so it always evaluates to fault (assuming you don't have a profile ID of zero)
The 2 in the limit clauses above can be changed if you want more or fewer results per profile.
Sample SQL fiddle: http://sqlfiddle.com/#!2/22e57/1 (updated fiddle to make the content more meaningful and to use times)
I would suggest doing a series of queries for each author within one transaction, that way there would not be a need for grouping - you could simply append results together outside of your SQL.
SELECT * FROM `update` WHERE
profile IN (SELECT profile2 FROM `friendship` WHERE profile1=1) OR
profile IN (SELECT profile1 FROM `friendship` WHERE profile2=1);
try this sqlFiddle
SELECT T1.profile,T1.content,T1.time
FROM
(SELECT UPD.profile,UPD.content,UPD.time,
IF (#prevProfile != UPD.profile,#timeRank:=1,#timeRank:=#timeRank+1) as timeRank,
#prevProfile := UPD.profile
FROM
(SELECT UP.profile,UP.content,UP.time
FROM
(SELECT profile,max(time) as latestUpdateTime
FROM friendship F INNER JOIN updates U
ON (F.profile1 = 1 AND U.profile = profile2) /* <-- specify profile on this line */
OR(F.profile2 = 1 AND U.profile = profile1) /* <-- specify profile on this line */
GROUP BY profile
ORDER BY latestUpdateTime DESC
LIMIT 3 /* limit to 3 friends profiles that have the most recent updates */
)as LU
INNER JOIN updates UP
ON (UP.profile = LU.profile)
ORDER BY profile,time DESC
)as UPD,(SELECT #prevProfile:=0,#timeRank:=0)variables
)T1
WHERE T1.timeRank BETWEEN 1 AND 2 /* grab 2 lastest updates for each profile */
ORDER BY T1.time DESC
in my example, profile id 1 has more than 3 friends, but i am only grabbing 3 friends that made the most recent updates.
explanation of above query.
LU grabs 3 profiles that are friends with profile id 1 that made the latest updates.
UPD grabs all contents that belong to these 3 friends.
T1 returns the contents along with a timeRank number for each content in order from 1 counting upward order by time DESCENDING for each profile
and finally the WHERE we only grab 2 content updates for each profile
then we finally ORDER these updates based on TIME starting from most recent.

Mysql facebook style message join 3 tables

//USER TABLE
user_id name
1 ben
2 alex
3 john
//CONVERSION TABLE
c_id user_one user_2
1 2(alex) 1(ben)
2 2(alex) 3(john)
3 1(ben) 3(john)
//MESSAGE TABLE
m_id c_id send receive message
1 1 2(alex) 1(ben) hi ben
2 1 2(alex) 1(ben) ben, u there?
3 2 1(ben) 3(john) whatever...
//QUERY 1
SELECT * FROM conversion WHERE user_one=1(ben)
OR user_two=1(ben)
So now i know ben have 2 conversations (one with alex another with john)
my question is
how to join 3 tables and fetch out like this
conversation_1 - Alex(id=2) - Last message in cv_1(ben, u there?)
conversationi_3 - John(id=3) - Last message in cv_3(whatever...)
like facebook message
The main idea is that you have to use joins. Standard JOIN syntax will be of no help here because you cannot have OR statement in JOIN .. ON. But something like this will do the trick
SELECT c.c_id, u.user_id, u.name, MAX(m_id), message FROM message m, conversation c, user u
WHERE m.c_id = c.c_id
AND
(
c.user_one = u.user_id
OR
c.user_2 = u.user_id
)
GROUP BY c.c_id
Here we join 3 tables together, getting maximum message ID (I assume ID is auto incremental so it is safe to assume that the higher ID the older the message) and group by conversation id. This is how we will have the oldest message and conversation details of messages where Ben (logged in user for instance) was involved
A nice article I saw somewhere on stack overflow before is http://www.khankennels.com/blog/index.php/archives/2007/04/20/getting-joins/. The current approach is INNER JOIN.
SQL Fiddle - http://sqlfiddle.com/#!2/ae6e6/14

Categories