I have users who can "like" categories. For instance, we may have 2 users:
John likes apples, oranges, pears
Bob likes apples, oranges, pie, cake
They both like apples, oranges
This isn't an issue with two users, but when I imagine scaling it to thousands of users, with thousands of likes, there will be major efficiency concerns.
I need to be able to compare a user with all other users, and determine which likes they have in common.
I have tried array_intersect, but it does not scale. I need a mysql solution.
How would I efficiently return users who share the same likes, and the likes that are shared?
users
+-------+-------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------+-------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| name | varchar(16) | NO | | NULL | |
+-------+-------------+------+-----+---------+----------------+
categories
+-------+-------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------+-------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| name | varchar(32) | NO | | NULL | |
+-------+-------------+------+-----+---------+----------------+
likes
+-------------+---------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------------+---------+------+-----+---------+-------+
| user_id | int(11) | NO | MUL | NULL | |
| category_id | int(11) | NO | MUL | NULL | |
+-------------+---------+------+-----+---------+-------+
function find_intersect($likes1, $likes2){
sort($likes1);
sort($likes2);
$intersect = array();
$i = 0;
$j = 0;
while ($i < count($likes1) and $j < count($likes2)){
if ($likes1[$i] == $likes2[$j]){
array_push($intersect, $likes1[$i]);
$i++;
$j++;
}
else if ($likes1[$i] < $likes2[$j])
$i++;
else
$j++;
}
return $intersect;
}
Above is what I drummed up and it should be most efficient way of finding intersection of two arrays. I do agree with #DanFarrell though in that MySQL or some database will be way more efficient in managing information when it comes to thousands of users.
I was able to solve my problem with the following:
SELECT user_id, count(category_id) AS count, group_concat(category_id separator "|")
FROM likes
WHERE category_id IN (
SELECT category_id
FROM likes
WHERE user_id=1
)
AND user_id != 1
GROUP BY user_id;
This will return the user id, the amount of categories shared, and the items shared separated by the pipeline character. Obviously it will need joins to get username and category, but for simplicity/readability, I left them out.
Related
I am trying to display information from my SQL tables on my web blog. I have two tables blog_posts and blog_members which look like
Blog_members
+----------+------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+----------+------------------+------+-----+---------+----------------+
| memberID | int(11) unsigned | NO | PRI | NULL | auto_increment |
| username | varchar(255) | YES | | NULL | |
| password | varchar(255) | YES | | NULL | |
| email | varchar(255) | YES | | NULL | |
+----------+------------------+------+-----+---------+----------------+
and
blog_posts
+-----------+------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-----------+------------------+------+-----+---------+----------------+
| postID | int(11) unsigned | NO | PRI | NULL | auto_increment |
| postTitle | varchar(255) | YES | | NULL | |
| postDesc | text | YES | | NULL | |
| postCont | text | YES | | NULL | |
| postDate | datetime | YES | | NULL | |
+-----------+------------------+------+-----+---------+----------------+
I am able to add the information from one table but I want to display memeberID when I post an article do I need additional columns in the blog_posts table if so how would I go about this would I need to use a join?
I am displaying the information on my blog using the PHP below.
$stmt = $db->query('SELECT postID, postTitle, postDesc, postDate FROM blog_posts ORDER BY postID DESC');
// $stmt = $db->query('SELECT memberID FROM blog_members');
while($row = $stmt->fetch()){
echo '<div>';
echo '<h1>'.$row['postTitle'].'</h1>';
echo '<p>Posted on '.date('jS M Y H:i:s', strtotime($row['postDate'])).'</p>';
// echo '<p> by'.$row['memberID'].'</p>';
echo '<p>'.$row['postDesc'].'</p>';
echo '<p1>Read More</p1>';
echo '</div>';
echo '<hr />';
that displays the posts but not the memberID I would like a post to have the member that created it aswell.
Think about a row in your posts table. How do you know which member that row belongs to? As mentioned in the comments, you can add a memberId to your posts table so that you can join the two tables and find both the member that belongs to a post, as well as the posts for a particular member. One convention for these columns is to prefix them with something like fk (foreign key) to indicate their role on the table. The table might look like this:
+-------------+------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------------+------------------+------+-----+---------+----------------+
| postID | int(11) unsigned | NO | PRI | NULL | auto_increment |
| postTitle | varchar(255) | YES | | NULL | |
| postDesc | text | YES | | NULL | |
| postCont | text | YES | | NULL | |
| postDate | datetime | YES | | NULL | |
| fkMemberID | int(11) unsigned | NO | | NULL | |
+-------------+------------------+------+-----+---------+----------------+
Then after you have retrieved a post, you will have the memberId that is the owner of that post, and you can retrieve the member details using that id.
(You would also create a separate foreign key object in the database which maintains the integrity of the foreign key columns. i.e. it makes sure you don't put a memberId of say 53 into the posts table fkMemberId column unless there is a memberId of 53 in the member table. You probably already know that but just thought to mention. :) )
I have a big mysql table ('d_operations') with more than 2 million records (and more to come). I have written a PHP webpage that shows a chart with the number of operations in a day for each half an hour (0:00 0:30 1:00 1:30 ... 23:59).
It works great but takes too much time to get the results so I am wondering if my table and queries could be optimized.
For each half an hour in a day I do a select query asking MySQL for the number of operations done in that period of time.
This takes more than a minute to finish!
This is the table schema:
mysql> describe d_operations;
+-----------+------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-----------+------------------+------+-----+---------+----------------+
| idx | int(11) unsigned | NO | PRI | NULL | auto_increment |
| system_id | int(11) | YES | | NULL | |
| dev_id | varchar(17) | YES | | NULL | |
| name | varchar(17) | YES | | NULL | |
| nond | smallint(6) | YES | | NULL | |
| is_new | smallint(6) | YES | | NULL | |
| tstamp | int(10) unsigned | YES | | NULL | |
+-----------+------------------+------+-----+---------+----------------+
I have a auto_increment primary key, that doesn't seem to help in the queries. The rest of the fields can be repeated (a device can do several operations in that period of time and it can be rows with the same tstamp).
tstamp is UNIX timestamp
This is how I do the queries in PHP:
for($i=$GLOBALS['init_hour'];$i<=($GLOBALS['end_hour']-1800);$i+=1800){
$n=$i+1800;
$sql="SELECT count(*) as num from d_operations where (tstamp >= $i and tstamp < $n);";
$r=mysqli_query($GLOBALS['con'],$sql);
$row = mysqli_fetch_row($r);
$values = ($i == $GLOBALS['init_hour']) ? $row[0] : $values.",".$row[0];
$GLOBALS['a_average'][$i]=$row[0];
}
In the worst case, I loop through every half an hour in that day, that is 48 queries.
This is the MySQL explain command:
mysql> explain select count(*) as num from d_operations where (tstamp >= 1464739200 and tstamp < 1464825599);
+----+-------------+--------------+------+---------------+------+---------+------+---------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------------+------+---------------+------+---------+------+---------+-------------+
| 1 | SIMPLE | d_operations | ALL | NULL | NULL | NULL | NULL | 2215384 | Using where |
+----+-------------+--------------+------+---------------+------+---------+------+---------+-------------+
1 row in set (0.00 sec)
Is there a more efficient way for doing this? (table definition, MySQL query optimization...)
Thanks
As Jon Stirling and Mark Baker suggested, the solution was as simple as creating an index for the tstamp column:
ALTER TABLE d_operations ADD INDEX ts_index(tstamp);
Thanks!
I'm not a SQL veteran so please excuse me if this is obvious; I'm learning.
I have two tables in a database for a wedding invite system; guests and invites. One invite (invitation) can contain many guests.
For purposes of creating a mail merge for the invites, I'm trying to select the firstname and lastname from the guest table, where the guest's inviteID is the same as the others; effectively returning on row of data containing the inviteID's data and a column each for the names of the guests.
My problem is I can return the data, but across multiple rows which won't work for the mail-merge. I can create a PHP script to do a work-around, but I would like to learn how this could be achieved in pure SQL.
Can anybody shed some light? Can this be done? Is this sheer madness?
Hoping to achieve:
*************************** 1. row ***************************
inviteID: 39
inviteURLSlug: thewinnetts
....
guestFirstName1: Sid
guestSurname1: Winnett
guestFirstName2: Claire
guestSurname2: Winnett
'invite' table:
+---------------------+--------------+
| Field | Type |
+---------------------+--------------+
| inviteID | int(11) |
| inviteURLSlug | varchar(64) |
| inviteQRValue | varchar(255) |
| inviteQRImageURL | varchar(255) |
| inviteAddress1 | varchar(32) |
| inviteAddress2 | varchar(32) |
| inviteAddress3 | varchar(32) |
| inviteCity | varchar(32) |
| inviteCounty | varchar(32) |
| inviteCountry | varchar(32) |
| invitePostcode | varchar(16) |
| inviteDateSend | datetime |
| inviteDateResponded | datetime |
| inviteCreated | datetime |
| inviteUpdated | timestamp |
+---------------------+--------------+
'guest' table:
+-------------------+--------------+
| Field | Type |
+-------------------+--------------+
| guestID | int(11) |
| inviteID | int(11) |
| guestFirstName | varchar(32) |
| guestSurname | varchar(32) |
| guestSide | varchar(8) |
| guestAttending | tinyint(1) |
| guestEmail | varchar(255) |
| guestPhone | varchar(32) |
| guestMobile | varchar(16) |
| guestAddress1 | varchar(32) |
| guestAddress2 | varchar(32) |
| guestAddress3 | varchar(32) |
| guestCity | varchar(32) |
| guestCounty | varchar(32) |
| guestCountry | varchar(32) |
| guestPostCode | varchar(16) |
| guestProfilePhoto | varchar(64) |
| guestFoodVeg | tinyint(1) |
| guestFoodReq | varchar(255) |
| guestTwitter | varchar(15) |
| guestFacebook | varchar(32) |
| guestPlusone | int(1) |
| guestCreated | datetime |
| guestUpdated | timestamp |
+-------------------+--------------+
Failed Join attempt and cropped results sample:
SELECT * FROM guest INNER JOIN invite on guest.inviteID = invite.inviteID \G
*************************** 64. row ***************************
guestID: 72
inviteID: 39
guestFirstName: Claire
guestSurname: Winnett
.......
*************************** 65. row ***************************
guestID: 73
inviteID: 39
guestFirstName: Sid
guestSurname: Winnett
.......
To achieve those results will require additional processing in PHP.
To do so, modify your query to this:
SELECT i.*,
g.guestFirstName,
g.guestSurname
GROUP_CONCAT(DISTINCT g.guestFirstName, '|', g.guestSurname ORDER BY g.guestSurname DESC) AS names
FROM invite i
INNER JOIN guest g ON i.inviteID = g.inviteID;
making sure to return your results as an array. From there, convert the concatenated results into an array then iterate through them to set the custom key name and corresponding value (assumes your query results are in array format with the variable name $queryresults):
$names = explode(',' $queryresults['names']);
unset($queryresults['names']);
$i = 1;
foreach ($names as $name) {
$split_name = str_split("|", $name);
$queryresults['guestFirstName' . $i] = $split_name[0];
$queryresults['guestSurname' . $i] = $split_name[1];
$i++;
}
This should give you your desired results.
SQL is not good at this sort of thing. It wants to work in sets, where all the items in the set are the same. You are asking for two items within the set to be smooshed together. Here is a possible solution:
SELECT
t1.inviteID,
t1.guestID,
t1.guestFirstName,
t1.guestSurname,
t2.guestFirstName AS guestFirstName2,
t2.guestSurname AS guestSurname2
FROM
guests as t1
LEFT OUTER JOIN
guests as t2
ON t1.inviteID = t2.inviteID
AND t1.guestID <> t2.guestID
WHERE
t1.guestID = (select min(t3.guestID) from guests as t3 where t3.inviteID = t1.inviteID)
;
The guests table is used twice, once to provide the first guest on each invite, and a second time, using a LEFT OUTER JOIN to provide the second. The LEFT OUTER ensures you still get the first even if there isn't a second. The other criteria are there to ensure you don't join a row to itself, and you only output the firsts, with the seconds attached (and not the other way around).
Here is a sample fiddle (in MySQL)
Is something like this what you are looking for?
Updated after your comment.
SELECT GROUP_CONCAT( DISTINCT `guest`.`guestFirstName`,`guest`.`guestSurname`, `invite`.`inviteURLSlug` ) FROM `guest`
LEFT JOIN `invite` ON `invite`.`inviteID` = `guest`.`inviteID`
WHERE `invite`.`inviteID` = 5
I just made two simple tables with minimal information for testing, but is easily expandable to whatever you have/want.
You are going to have some fields repeated since the number of rows is going to be related to the number of guests.
I have created a voting system in php and mysql. When a user votes on an id, a record is inserted in "votes" referencing the FK media_id. When I then display the entries I use this query to get the number of votes for each entry:
$sql = "SELECT COUNT(*) FROM insta_votes WHERE media_id ='".$mediaid."'";
if ($res = $db->query($sql)) {
return $res->fetchColumn();
}
return 0;
This works fine, but I want to be able to sort the results by the number of votes they have. Preferably using just one query. How can I achieve this?
The tables are structured like this:
votes table
+-----------+--------------+------+-----+-------------------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-----------+--------------+------+-----+-------------------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| media_id | varchar(255) | NO | | NULL | |
| ip | varchar(20) | NO | | NULL | |
| c_time | timestamp | NO | | CURRENT_TIMESTAMP | |
| sessionid | varchar(30) | NO | | NULL | |
+-----------+--------------+------+-----+-------------------+----------------+
entries table
+---------------+--------------+------+-----+-------------------+-------+
| Field | Type | Null | Key | Default | Extra |
+---------------+--------------+------+-----+-------------------+-------+
| page_id | int(11) | NO | MUL | NULL | |
| media_id | varchar(255) | NO | PRI | NULL | |
| url | varchar(255) | NO | | NULL | |
| c_time | datetime | NO | | NULL | |
| likes | int(11) | YES | | NULL | |
| deleted | tinyint(1) | NO | | 0 | |
| inserted_time | timestamp | YES | | CURRENT_TIMESTAMP | |
| numReports | int(11) | NO | | 0 | |
+---------------+--------------+------+-----+-------------------+-------+
Thank you!
If I understand the tables correctly (and I may not), each entries row may reference multiple votes rows. In that case, the query you need will go something like this:
SELECT
entries.page_id,
COUNT(*) AS VoteCount
FROM entries
INNER JOIN votes ON entries.media_id = votes.media_id
GROUP BY entries.page_id
ORDER BY VoteCount
If you add additional entries columns to the SELECT list, be sure to add them to the GROUP BY list as well.
Addendum: #JuanPabloCalifano pointed out, correctly, that this query won't include entries with zero votes. Here's how to include them:
SELECT
entries.page_id,
COALESCE(COUNT(votes.id), 0) AS VoteCount
FROM entries
LEFT JOIN votes ON entries.media_id = votes.media_id
GROUP BY entries.page_id
ORDER BY VoteCount
SELECT COUNT(*) as CNT, `media_id` FROM `insta_votes` GROUP BY `media_id` order by 1;
SELECT COUNT(*), media_id FROM insta_votes
GROUP BY media_id
ORDER BY COUNT(*);"
The first is the place table where the general information is kept and the second is the wait table where users sign up (like a waiting list)
+---------+--------------+------+-----+-------------------+----------------+
| Field | Type | Null | Key | Default | Extra |
+---------+--------------+------+-----+-------------------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| name | varchar(30) | YES | | NULL | |
| userid | int(3) | YES | | NULL | |
| address | varchar(300) | YES | | NULL | |
| desc | varchar(550) | YES | | NULL | |
| phone | int(15) | YES | | NULL | |
| image | varchar(50) | YES | | NULL | |
| website | varchar(100) | YES | | NULL | |
| cat | varchar(25) | YES | | NULL | |
| date | timestamp | NO | | CURRENT_TIMESTAMP | |
+---------+--------------+------+-----+-------------------+----------------+
+----------+-----------+------+-----+-------------------+----------------+
| Field | Type | Null | Key | Default | Extra |
+----------+-----------+------+-----+-------------------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| userid | int(11) | YES | | NULL | |
| place_id | int(11) | YES | | NULL | |
| date | timestamp | NO | | CURRENT_TIMESTAMP | |
+----------+-----------+------+-----+-------------------+----------------+
For now I m doing a SELECT * FROM place; and displaying the data on the home page. Something like tihs:
<? foreach($places as $place): ?>
<? echo $place->name; ?>; <? echo $place->userid; ?> etc ...
Click this to insert your userid and $place->id into wait table
<? endforeach ?>
This is where I got lost. I would like to do something like:
<? if($current_user_id == $userid_from_wait_that_matches_place_id): ?>
<p>You already registered for this!</p>
<? else: ?>
Click this to insert your userid and $place->id into wait table
<? endif; ?>
Not sure if it's better to check for the user's id in the model that adds data to the wait table or to check in the model that grabs data for the home page. From what I've read, the second option would be better. Or should I use two separate queries ?
I think your database design is wrong: you should create seperate users table with user-specific data (name, image,...) plus an user_id. And an another table with "general" information (as you said): name, desc, map, etc. And in this table doesn't use user-specific information only user_id.
And if your database isn't too large you can use a select tag with valid user_ids so you don't need validation.
EDIT if you want to know what are the user_ids which isn't in wait table, use similar query:
SELECT user.userid
FROM user
LEFT JOIN wait ON user.userid=wait.userid
WHERE ISNULL(wait.place_id)
These userid can put into a select-list.
Please read up on joins in select queries. Looks like you need to use a left outer join between your master table and your temporary table: http://www.tizag.com/mysqlTutorial/mysqlleftjoin.php.
You could use a query like this one:
select *
from wait_table
left join general_info_table on wait_table.user_id = general_info_table.user_id
where wait_table.user_id = 1;
This way, IF the user_id is in the wait_table it would return you the info on the client... if it doesn't exists in the table, well, should return null.
I would filter out which table fields i really need from the query, though.