Querying data for a Facebook-like news feed

Querying data for a Facebook-like news feed - php

I have a social networking site where users update their moods, profiles and add photos.
I'm currently logging all updates in a table called "update_log" with the following structure:
update_id int (auto),
userid int,
update_type int (1=profile, 2=photo, 3=mood)
pictureid int
mood_a int
mood_b int
mood_c int
update_time int
Profile update record:
(auto), 1, 1, 0, 0, 0, 0, 1239003781
Photo update record:
(auto), 1, 2, 11544, 0, 0, 0, 1239003781
Mood update record:
(auto), 1, 3, 0, 1, 490, 70, 1239003781
For the photo record, there's a corresponding table userphotos which holds the caption and filename/location data
For moods, there is a mood lookup table that holds the mood descriptions (i.e., I'm lazy =\ )
What I need to do is query this data to show on a user's profile page, it will show this feed for any of their favorite users for the last x hours of activity.
The problem I'm running into is that if a user uploads five photos over the course of a half hour or something, I just want that to be one line in the feed, not an entry for each photo upload.
Same goes for profile updates.
I need to query the data so the user will see something like this:
user x updated their mood! (I'm tired) on Apr 4, 2009 10:35 pm
user y uploaded x new photos on April 4, 2009 10:20 pm
user x updated their profile on April 4, 2009 10:15 pm
How do I group the photo updates into one record returned in a query based on all records being within let's say an hour of each other?
Is there a way to do this with one query?
Thanks!

You want something like
SELECT * FROM update_log WHERE update_time > NOW() - 30 MINUTES;
With 30 minutes being the period of time you're looking back.
I'm assuming you just needed to know how to return in a single query the updates of the last 30 minutes.
If you're trying to group all of the photos together into 30 minute blocks, say for the last two days, you'd be better off changing your database structure and creating a photo_group table [containing a primary key, userid, and time of creation] and adding a group_id column to the update_log table.
When adding a new photo, check for an existing group created by that user in the last 30 minutes. SELECT * FROM photo_group WHERE user_id = XXX AND created > NOW () - 30 MINUTES;
If one does not exist, create it. Link the photos to the newest by adding the primary key of the photo_group table as the group_id in the update_log.
When you retrieve rows later, you can group them by group_id using your scripting language.
The disadvantage of this method is your grouping structure will be difficult to modify later, as previous entries will be grouped by their old groups when you change the rules for creating new groups.
If you want to do this without storing the groups, you'll have to handle the logic in your scripting language by grouping the photos together in a loop that checks the creation time of a photo, groups following photos in an array with it if they have been created within a specific time period, or restarts the loop, using the most recent photo that did not fit with the previous. This would be more overhead than adding a new table, but it would be easier to modify later.

Have you considered trying to do this with PHP rather than SQL queries? It might be less complex to query the results you need (All updates between these times) and then use PHP to compare the timestamps in order to determine how they should be grouped.

Related

Merge activities in activity feed

I'm building an app with activity feed, just like Facebook or Instagram. When for instance a user uploads a picture, this picture automatically creates an entry in activities table. There is a polymorphic relationship between the picture uploaded and the activity. The activity model will handle follows, likes and other things. The Activity model looks like that
Activity
model_id 12
model_type picture
user_id 1313
activity_type picture_upload
If the user decides to upload again another picture in a short period of time, I would like to merge these similar activities related to the same model type into one just like Instagram does.
This activity would have a relation with both pictures uploaded.
For this I created an additional field in activities table called multiple_ids, and there I was inserting the consequent picture id's
Activity
model_id 12
model_type picture
user_id 1313
activity_type picture_upload
multiple_ids 13,14,15
This is completely working, I created methods for the activity in order to get those "multiple models". But for example if the user decides to remove a picture included in the multiple_ids field I would have to make a LIKE query in multiple_ids. I think that there should be better ways of doing this. How would you do that? A scope merging "similar activities" when retrieving activities would be a good practice?

Personally, I'd have a cross-reference table that simply stores the relationship of ID's.
parent | related
----------------
12 | 13
12 | 14
12 | 15
Then if the user deleted 13, 14 or 15, you'd simply delete this relationship from the table.
The tricky part comes if someone deletes 12. Here, you'll have to make a decision on how you want to handle it. You could delete all with a parent of 12, essentially levaing you with orphan activities. OR, you could with a little bit of work assosicate 14 and 15 with activity 13 instead.

I think its good idea to generate group id for activity, and if activity repeats you could assign same group id to it.
Activity
model_id 12
model_type picture
user_id 1313
activity_type picture_upload
group_id picture_upload_1313_14505049334
created_at 2017-02-01 11:55:00
When activity is creating, you could subscribe on event and check if in last five minutes this user did this activity - then you assign same group id
it could look like this on Laravel 4
Activity::creating(function($activity) {
$originatingActivity = Activity::where('created_at', '>', Date::subMinutes(5))->where('activity_type', $activity->activity_type)->where('user_id',$activity->user_id)->first();
if ($originatingActivity) {
$activity->group_id = $originatingActivity->group_id;
}
else {
$activity->group_id = $activity->activity_type . $activity->user_id . time()
}
})
And then in code you could get activities by group_id, so if you will delete one activity it doesn't break anything, you can write methods with groupBy statement to easily count them.
Also this way, with some code changes, you can group different relation types if needed

sorting data by date and forming graph(date vs no of occur) like google webmaster

my mysql table stores the date and id of the particular product
when a users access that product in mysql table it is written and it stores the details.the table structure is
id is auto-increment .product id is id of product not unique it can appear several times and time is time stamp that product was accessed by users
id productid time
1 33 3443434333
2 444 334344444
3 44 445435434
i want to sort out the result for any productid datewise. suppose productid 33 was accessed 1000 times in one month then i want to plot a graph datewise just like google webmaster where it shows 30 days details .for example in 1.08.12 view 30 times , 2.08.12 view 50 times etc.
which query and graph i should use so that i can show the details just like google webmaster

You can create a grouping key with concat from the product id and the time's date part for the per-day-counts like this (see this fiddle):
select
count(*) value,
concat(date(time), '_', productid) prod
from
productlog
group by
prod
order by
prod
Once you get the data, there are numerous charting libraries for displaying it. Since you mentioned google's webmaster tools, you can try google's own charting api, or gRaphaël, or highcharts (its not free for commercial use), and probably countless other options as well.

php/mysql classifieds view counter by date

Im building a classifieds website of adverts where I want to store a count of the number of views of each advert which I want to be able to display in a graph at a later date by day and month etc.. for each user and each of their adverts. Im just struggling with deciding how best to implement the mysql database to store potentially a large amount of data for each advert.
I am going to create a table for the page views as follows which would store a record for each view for each advert, for example if advert (id 1) has 200 views the table will store 200 records:
Advert_id (unique id of advert)
date_time (date and time of view)
ip_address ( unique ip address of person viewing advert)
page_referrer (url of referrer page)
As mentioned I am going to create the functionality for each member of the site to view a graph for the view statistics for each of their adverts so they can see how many total views each of their adverts have had, and also how many views their advert has had each day (between 2 given dates) and also how many views per month each advert has had. I'll do this by grouping by the date_time field.
If my site grows quite large and for example has 40,000 adverts and each advert has on average 3,000 page views, that would mean the table has 120 Million records. Is this too large ? and would the mysql queries to produce the graphs be very slow?
Do you think the table and method above is the best way to store these advert view statistics or is there a better way to do this?

Unless you really need to store all that data it would probably be better to just increment the count when the advert is viewed. So you just have one row for each advert (or even a column in the row for the advert).
Another option is to save this into a text file and then process it offline but it's generally better to process data as you get it and incorporate that into your applications process.
If you really need to save all of that data then rotating the log table weekly maybe (after processing it) would reduce the overhead of storing all of that information indefinitely.

I was working with website with 50.000 unique visitors per day, and i had same table as you.
Table was growthing ~200-500 MB/day, but i was able to clean table every day.
Best option is make second table, count visitors every day, add result to 2nd table, and flush 1st table.
first table example:
advert_id
date & time
ip address
page referrer
second table example (for graph):
advert_id
date
visitors
unique visitors
Example SQL query to count unqiue visitors:
SELECT
advert_id,
Count(DISTINCT ip_address),
SUBSTRING(Date,1,10) as Date
FROM
adverts
GROUP BY
advert_id,
Date
Problem is not even perfomance (MySQL ISAM Engine is quite smart and fast), problem is storage such big data.
90% statistics tools (even google analytics or webalyzer) is making graphs only once per day, not in real-time.
And quite good idea is store IP as INT using function ip2long()

Which of these methods provides for the fastest page loading?

I am building a database in MySQL that will be accessed by PHP scripts. I have a table that is the activity stream. This includes everything that goes on on the website (following of many different things, liking, upvoting etc.). From this activity stream I am going to run an algorithm for each user depending on their activity and display relevant activity. Should I create another table that stores the activity for each user once the algorithm has been run on the activity or should I run the algorithm on the activity table every time the user accesses the site?
UPDATE:(this is what is above except rephrased hopefully in an easier to understand way)
I have a database table called activity. This table creates a new row every time an action is performed by a user on the website.
Every time a user logs in I am going to run an algorithm on the new rows (since the users last login) in the table (activity) that apply to them. For example if the user is following a user who upvoted a post in the activity stream that post will be displayed when the user logs in. I want the ability for the user to be able to access previous content applying to them. Would it be easiest to create another table that saved the rows that have already been run over with the algorithm except attached to individual users names? (a row can apply to multiple different users)

I would start with a single table and appropriate indexes. Using a union statement, you can perform several queries (using different indexes) and then mash all the results together.
As an example, lets assume that you are friends with user 37, 42, and 56, and you are interested in basketball and knitting. And, lets assume you have an index on user_id and an index on subject. This query should be quite performant.
SELECT * FROM activity WHERE user_id IN (37, 42, 56)
UNION DISTINCT
SELECT * FROM activity WHERE subject IN ("basketball", "knitting")
ORDER BY created
LIMIT 50

I would recommend tracking your user specific activities in a separate table and then upon login you could show all user activities that relate to them more easily. ie. So if a user is say big into baseball and hockey you could retrieve that from their recent activity, then got to your everything activities table and grab relevant items from it.

User's possibilities on site

I want to build a system on the website, that allows users to do some things depend on their rating. For example I have rule for rating value X:
1 post in 3 days
10 comments in 1 day
20 votes in 2 days
for rating value Y, rule may be following:
3 post in 1 day
50 comments in 1 day
30 votes in 1 day
Each night I recalculate users' ratings, so I know what each user is able to do.
Possibilities don't sum or reset on each rating's recalculation.
One more important thing is that admin can fill concrete user's possibilities at any time.
What is optimal database (MySQL) structure for desired?
I can count what concrete user has done:
SELECT COUNT(*) FROM posts WHERE UserID=XXX AND DateOfPost >= 'YYY'
SELECT COUNT(*) FROM comments WHERE UserID=XXX AND CommentOfPost >= 'YYY'
But how can I do admin filling possibilities in this case?

I would log the number of actions of each user each day and use that table to compare.
This table would contain the following fields:
date: the day when the action took place
count: the number of actions took that day
userId: who did this action
action: which action post/comment/vote/...
ignore: boolean, if this is set, admin has reset the values
Checking a rule: SELECT SUM(count) FROM log WHERE userId = XXX AND action = YYY AND ignore = 0 AND DATEDIFF(date, NOW()) <= DAYS
Resetting a rule: UPDATE ignore = 1 FROM log WHERE userId = XXX
If his rating changes the result is still valid (you'll just compare with on other total)
When you create a rules table:
action
limits
days
rating_min
rating_max
You can query for permissions like this:
SELECT action, IF(SUM(count) < MIN(limits), 1, 0) as can_do_action FROM log LEFT JOIN rules ON rules.action = log.action WHERE userId = XXX AND rating_min <= RATING AND rating_max >= RATING AND ignore = 0 AND DATEDIFF(date, NOW()) <= days
So you get a table loggin like this:
- comment => 1
- votes => 0
You do have to update this table every action (create a new row if first action of the day or update the count of the row)
The absence of a rule means no actions have been made, so we can ignore it.

If I understand you correctly you have a user that can post 1 blog, and comment 10 times. Now he/she has commented 5 times and posted a blog. You want the admin to click a button, and now the user can again post a blog and comment 10 times?
It might be a bit of a hack, but you could count the actions that are being reset/ignored, and substract that from the current actions?
e.g.: user has 1 blog and 5 comments. Admin presses "reset", and you save those values.
Now as the user posts another blog, and you check if that's allowed, you'll get
SELECT COUNT(*) FROM posts WHERE UserID=XXX AND DateOfPost >= 'YYY'
And you do something like this
SELECT changes FROM adminTable WHERE UserID=XXX AND type = 'post'
And if count - changes is ok, you're set.

What about having, in the user table, three columns called remainingPosts, remainingComments and remainingVotes? You'll decrease the columns when the user has performed a specific action, and in this way the admin can always "refill" those columns, even above the original limit.
===
Another option is to store the expiration of the permissions in those columns, so that you can reset the permissions just putting the expiry for a certain column to the day before. You can then use your queries to get the remaining number of posts/comments/votes for the user.

I suggest separating the two concerns entirely:
The process of enabling features/possibilities to users
The data model of user features
For example, you could have a simple many-to-many table representing user features:
user_features(
user_id
,feature_id
,source (admin|earned)
,primary key(user_id, feature_id)
);
This makes it really easy for an administrator to disable/enable parts or all of the feature set.
Your nightly job would query relevant tables and grant/revoke features by inserting/deleting from this table.
If you go with this approach, you can actually give the features either based on a rating or specific actions.
A rule such as "3 posts in 3 days", can be implemented like this:
when a user posts, check if the previous post was made within 24 hours.
if yes then
increment counter by 1
record current timestamp
if counter = 3 then
grant feature to user
else
reset counter to 1
record current timestamp
You would need two columns (post_count:int, last_post:date) in some table keyed by user_id.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Querying data for a Facebook-like news feed - php

Have you considered trying to do this with PHP rather than SQL queries? It might be less complex to query the results you need (All updates between these times) and then use PHP to compare the timestamps in order to determine how they should be grouped.

Related

Merge activities in activity feed

sorting data by date and forming graph(date vs no of occur) like google webmaster

php/mysql classifieds view counter by date

Which of these methods provides for the fastest page loading?

User's possibilities on site

Categories

Resources