Laravel 5.5 - Notifications/Subnotifications for 1 million followers? - php

I use a notifications table and a subnotifications table, and I am also using queues so it runs in the background when a user posts something. When a user has 10 followers and they create a post, the notifications table gets a single entry which includes the post data for the notification, and the subnotifications table gets 10 entries (one subnotification per follower, each referring to the id of the notification, so we don't have to repeat the notification data 10 times, with a read_at to know if it was read or not by that follower).
This is quick and works great without any issues. However, when testing with 1 million followers, it takes about ~6 hours to insert the subnotifications for one post! This of course is not acceptable, as it is takes too long to insert 1 million subnotifications, one per follower. Imagine that same user posts 10 posts, that'll be like ~60 hours of inserting and 10 million subnotification rows.
I just want followers to know there is a new post if they didn't read it yet. Is there a better, more efficient way that scales?
UPDATE: Stuck with current approach see below...
If a follower $user has 100 leaders they follow (which they followed at different created_at timestamps of course in the followers table), what would the correct query be to know about leader new posts from the time the follower followed each leader? I get stuck at created_at with this pseudo code:
// Assume `leader_id` is a column in the notifications table
DB::table('notifications')
->whereIn('leader_id', $leaderIds)
->where(`created_at`, '>', $whatTimestampsGoHere)
->paginate(20);
There is 100 different timestamps and I am stuck on how to solve this one correctly and efficiently. Any ideas?

As stated in the comments, you can reduce the inserts, if you only insert to the child table i.e. subnotifications when the user reads it and not on creating it on the notification creation, which avoids that issue.
When trying to check if user has seen the notification, just check if they exist in subnotifications for the user in question and the notification.
Also as said, when fetching notifications to show to users fetch them from notifications but limit the notifications to the notifications created after the user started following so that new users don't get flooded with notifications.

Related

what is the most efficient way of looping through a list in Redis

I am trying to create an activity feed for each user in my application. I want to use Redis to store two lists per user. One for storing the ids of user's followers and one for storing the ids of the activities from every user he/she is following.
So what I want to do is when a user adds a post/creates an activity I want to push the id of that post/created activity to the list of every user whose id is in the user's followers list.
I am assuming I would have to loop through the 'followers list' to this, but I have no idea how to accomplish this in Redis.
I am using PHP as my server side language but I don't think it would make sense to do it there or is it ok to do that way?
I believe Redis has a much more efficient and faster way of doing this, especially if the 'followers list' contains 1000000s of records.
E.g
// to add followers to user with id of 1
lpush user.1.followers 4 90 3 48 8 2 45
// to get all user with id 1 followers
lrange user.1.followers 0 -1

Mysql keep track of users views for each post in timelime

I have a screen that looks very much like facebook timeline
users can view posts of other users etc.
to get these posts i do something like
select user.id,user.name,posts.title,posts.body from posts left join users;
now data i need to collect is "Who saw this post" .
is there any elegant way to do it ?
right now all what i can think of is every time i fetch posts. i loop over them, then collect all ids of posts that the query returned and then push in another table
user_views [pk:user_id+postId]
userId,postId
1 , 1
Then when i'm fetching posts next time i can do count of user_views.
select *,count(user_views.id) from posts join user_views on post_id = post.id
but this sound like a lot of work for each VIEW, specially that most probably user will see a most multiple times,
is there any known patterns for such need ?
This is a design question and the answer really depends on your needs.
If you want to know exactly who viewed what post and how many times, then you need to collect the data on user - post level.
However, you may decide that you do not really care who viewed which post how many times, you just want to know how many times a post was viewed. In this case you may only have a table with post id and view count fields and you just increment the view count every time a post is being viewed.
Obviously, you can apply a mixed approach and have a detailed user - post table (perhaps even with timestamp) and have an aggregate table with post id and view count fields. The detailed table can be used to analyse your user's behaviour in a greater detail, or present them a track of their own activities, while your aggretage table can be used to quickly fetch overall view counts for a post. The aggregate table can be updated by a trigger.

Database Design - "Push" Model, or Fan-out-on-write

Background Info :
I'm trying to retrieve images from people I follow, sort by latest time. It's like a twitter news feed where they show the latest feed by your friends.
Plans:
Currently there is only 1 item i need to keep in consideration, which is the images. In future i'm planning to analyse user's behavior and add in other images they might like into their feed, etc.
http://www.quora.com/What-are-best-practices-for-building-something-like-a-News-Feed
I personally feel that "Pull" Model, or Fan-out-on-load where i pull all info at real time would be worst than the push model. Because imagine i have 100 following, i would have to fetch and sort by time. (Let me know if i'm wrong eg, Read is 100x better than Write(Push Model)
The current design of the push model i have in mind is as follows
Table users_feed(ID, User_ID, Image_ID,datetime)
Option 1 : Store A list of Image_ID
Option 2 : Store one image ID and duplicate rows(More Rows of same User_ID but different Image_ID)
The plan is to limit each Row a user can have in this feed , which means , there would always be a max of 50 images. If they want more items beyond the 50 images in their news feed. They cant(I might code a alternative to store more so they can view more in future)
Question 1
Since when user following users add a item into their "collection" i have to push it into each of their follower's feed. Wont there be a problem in Write? 200 followers = 200 writes?
Question 2
Which method would be better for me keeping in consideration that i only have one type of data which is images. Feeds of images.
Question 3
If i choose to store the feed in advance(push method) how do i actually write it into all my friends?
Insert xxx into feeds whereIn (array of FriendsID)?
Any form of advice would be greatly appreciated. Thanks in advance!
I would recommend you to follow pull method over push method for the following reasons:
It gives to more freedom for extencibility in the future.
Less number of writes ( imagine 10M followers then there has to be
10M writes for just 1 post).
You can get all feed of a user simply by query similar to:
SELECT * FROM users_feed as a WHERE a.user_id in ( < //select all
user_ids of followers of loged in user// > )
(Syntax not followed as table
structure of followers is not known)

php/mysql classifieds view counter by date

Im building a classifieds website of adverts where I want to store a count of the number of views of each advert which I want to be able to display in a graph at a later date by day and month etc.. for each user and each of their adverts. Im just struggling with deciding how best to implement the mysql database to store potentially a large amount of data for each advert.
I am going to create a table for the page views as follows which would store a record for each view for each advert, for example if advert (id 1) has 200 views the table will store 200 records:
Advert_id (unique id of advert)
date_time (date and time of view)
ip_address ( unique ip address of person viewing advert)
page_referrer (url of referrer page)
As mentioned I am going to create the functionality for each member of the site to view a graph for the view statistics for each of their adverts so they can see how many total views each of their adverts have had, and also how many views their advert has had each day (between 2 given dates) and also how many views per month each advert has had. I'll do this by grouping by the date_time field.
If my site grows quite large and for example has 40,000 adverts and each advert has on average 3,000 page views, that would mean the table has 120 Million records. Is this too large ? and would the mysql queries to produce the graphs be very slow?
Do you think the table and method above is the best way to store these advert view statistics or is there a better way to do this?
Unless you really need to store all that data it would probably be better to just increment the count when the advert is viewed. So you just have one row for each advert (or even a column in the row for the advert).
Another option is to save this into a text file and then process it offline but it's generally better to process data as you get it and incorporate that into your applications process.
If you really need to save all of that data then rotating the log table weekly maybe (after processing it) would reduce the overhead of storing all of that information indefinitely.
I was working with website with 50.000 unique visitors per day, and i had same table as you.
Table was growthing ~200-500 MB/day, but i was able to clean table every day.
Best option is make second table, count visitors every day, add result to 2nd table, and flush 1st table.
first table example:
advert_id
date & time
ip address
page referrer
second table example (for graph):
advert_id
date
visitors
unique visitors
Example SQL query to count unqiue visitors:
SELECT
advert_id,
Count(DISTINCT ip_address),
SUBSTRING(Date,1,10) as Date
FROM
adverts
GROUP BY
advert_id,
Date
Problem is not even perfomance (MySQL ISAM Engine is quite smart and fast), problem is storage such big data.
90% statistics tools (even google analytics or webalyzer) is making graphs only once per day, not in real-time.
And quite good idea is store IP as INT using function ip2long()

Which of these methods provides for the fastest page loading?

I am building a database in MySQL that will be accessed by PHP scripts. I have a table that is the activity stream. This includes everything that goes on on the website (following of many different things, liking, upvoting etc.). From this activity stream I am going to run an algorithm for each user depending on their activity and display relevant activity. Should I create another table that stores the activity for each user once the algorithm has been run on the activity or should I run the algorithm on the activity table every time the user accesses the site?
UPDATE:(this is what is above except rephrased hopefully in an easier to understand way)
I have a database table called activity. This table creates a new row every time an action is performed by a user on the website.
Every time a user logs in I am going to run an algorithm on the new rows (since the users last login) in the table (activity) that apply to them. For example if the user is following a user who upvoted a post in the activity stream that post will be displayed when the user logs in. I want the ability for the user to be able to access previous content applying to them. Would it be easiest to create another table that saved the rows that have already been run over with the algorithm except attached to individual users names? (a row can apply to multiple different users)
I would start with a single table and appropriate indexes. Using a union statement, you can perform several queries (using different indexes) and then mash all the results together.
As an example, lets assume that you are friends with user 37, 42, and 56, and you are interested in basketball and knitting. And, lets assume you have an index on user_id and an index on subject. This query should be quite performant.
SELECT * FROM activity WHERE user_id IN (37, 42, 56)
UNION DISTINCT
SELECT * FROM activity WHERE subject IN ("basketball", "knitting")
ORDER BY created
LIMIT 50
I would recommend tracking your user specific activities in a separate table and then upon login you could show all user activities that relate to them more easily. ie. So if a user is say big into baseball and hockey you could retrieve that from their recent activity, then got to your everything activities table and grab relevant items from it.

Categories