Database Design - "Push" Model, or Fan-out-on-write - php

Background Info :
I'm trying to retrieve images from people I follow, sort by latest time. It's like a twitter news feed where they show the latest feed by your friends.
Plans:
Currently there is only 1 item i need to keep in consideration, which is the images. In future i'm planning to analyse user's behavior and add in other images they might like into their feed, etc.
http://www.quora.com/What-are-best-practices-for-building-something-like-a-News-Feed
I personally feel that "Pull" Model, or Fan-out-on-load where i pull all info at real time would be worst than the push model. Because imagine i have 100 following, i would have to fetch and sort by time. (Let me know if i'm wrong eg, Read is 100x better than Write(Push Model)
The current design of the push model i have in mind is as follows
Table users_feed(ID, User_ID, Image_ID,datetime)
Option 1 : Store A list of Image_ID
Option 2 : Store one image ID and duplicate rows(More Rows of same User_ID but different Image_ID)
The plan is to limit each Row a user can have in this feed , which means , there would always be a max of 50 images. If they want more items beyond the 50 images in their news feed. They cant(I might code a alternative to store more so they can view more in future)
Question 1
Since when user following users add a item into their "collection" i have to push it into each of their follower's feed. Wont there be a problem in Write? 200 followers = 200 writes?
Question 2
Which method would be better for me keeping in consideration that i only have one type of data which is images. Feeds of images.
Question 3
If i choose to store the feed in advance(push method) how do i actually write it into all my friends?
Insert xxx into feeds whereIn (array of FriendsID)?
Any form of advice would be greatly appreciated. Thanks in advance!

I would recommend you to follow pull method over push method for the following reasons:
It gives to more freedom for extencibility in the future.
Less number of writes ( imagine 10M followers then there has to be
10M writes for just 1 post).
You can get all feed of a user simply by query similar to:
SELECT * FROM users_feed as a WHERE a.user_id in ( < //select all
user_ids of followers of loged in user// > )
(Syntax not followed as table
structure of followers is not known)

Related

PHP user personalization

There is a news site - about 50 000 news in mysql db for now. I need to create a list of most interesting and relevant news for each news page and remove the already viewed items for the current user (the actual personalization).
I have made a list of news viewed in cookies already. So all I need is an architectural best approach for the way to filter viewed news.
I see only tow options:
Keep in memory already calculated full list of most popular news (20-30k items) and for each customer request remove viewed ones.
Each time user opens the page create a list of popular items for him again.
In option 1 we can use caching with APC, REDIS etc., but always have a big arrays of data copied to each request which is eating a lot of memory. But in the option 2 we would have to request db each time so it would be not fast and CPU and DB resource consuming.
So is there any way I can avoid using so many resources and make it fast?
You can make something like
SELECT ... article data .. FROM Articles
LEFT JOIN ViewedArticles USING (articleId)
LEFT JOIN Users USING (userId)
WHERE ViewedArticles.articleId IS NULL AND Users.userId = :id
That should select select only the articles, that don't have matching articleId in the ViewedArticles table with matching userId.

Mysql keep track of users views for each post in timelime

I have a screen that looks very much like facebook timeline
users can view posts of other users etc.
to get these posts i do something like
select user.id,user.name,posts.title,posts.body from posts left join users;
now data i need to collect is "Who saw this post" .
is there any elegant way to do it ?
right now all what i can think of is every time i fetch posts. i loop over them, then collect all ids of posts that the query returned and then push in another table
user_views [pk:user_id+postId]
userId,postId
1 , 1
Then when i'm fetching posts next time i can do count of user_views.
select *,count(user_views.id) from posts join user_views on post_id = post.id
but this sound like a lot of work for each VIEW, specially that most probably user will see a most multiple times,
is there any known patterns for such need ?
This is a design question and the answer really depends on your needs.
If you want to know exactly who viewed what post and how many times, then you need to collect the data on user - post level.
However, you may decide that you do not really care who viewed which post how many times, you just want to know how many times a post was viewed. In this case you may only have a table with post id and view count fields and you just increment the view count every time a post is being viewed.
Obviously, you can apply a mixed approach and have a detailed user - post table (perhaps even with timestamp) and have an aggregate table with post id and view count fields. The detailed table can be used to analyse your user's behaviour in a greater detail, or present them a track of their own activities, while your aggretage table can be used to quickly fetch overall view counts for a post. The aggregate table can be updated by a trigger.

php/mysql classifieds view counter by date

Im building a classifieds website of adverts where I want to store a count of the number of views of each advert which I want to be able to display in a graph at a later date by day and month etc.. for each user and each of their adverts. Im just struggling with deciding how best to implement the mysql database to store potentially a large amount of data for each advert.
I am going to create a table for the page views as follows which would store a record for each view for each advert, for example if advert (id 1) has 200 views the table will store 200 records:
Advert_id (unique id of advert)
date_time (date and time of view)
ip_address ( unique ip address of person viewing advert)
page_referrer (url of referrer page)
As mentioned I am going to create the functionality for each member of the site to view a graph for the view statistics for each of their adverts so they can see how many total views each of their adverts have had, and also how many views their advert has had each day (between 2 given dates) and also how many views per month each advert has had. I'll do this by grouping by the date_time field.
If my site grows quite large and for example has 40,000 adverts and each advert has on average 3,000 page views, that would mean the table has 120 Million records. Is this too large ? and would the mysql queries to produce the graphs be very slow?
Do you think the table and method above is the best way to store these advert view statistics or is there a better way to do this?
Unless you really need to store all that data it would probably be better to just increment the count when the advert is viewed. So you just have one row for each advert (or even a column in the row for the advert).
Another option is to save this into a text file and then process it offline but it's generally better to process data as you get it and incorporate that into your applications process.
If you really need to save all of that data then rotating the log table weekly maybe (after processing it) would reduce the overhead of storing all of that information indefinitely.
I was working with website with 50.000 unique visitors per day, and i had same table as you.
Table was growthing ~200-500 MB/day, but i was able to clean table every day.
Best option is make second table, count visitors every day, add result to 2nd table, and flush 1st table.
first table example:
advert_id
date & time
ip address
page referrer
second table example (for graph):
advert_id
date
visitors
unique visitors
Example SQL query to count unqiue visitors:
SELECT
advert_id,
Count(DISTINCT ip_address),
SUBSTRING(Date,1,10) as Date
FROM
adverts
GROUP BY
advert_id,
Date
Problem is not even perfomance (MySQL ISAM Engine is quite smart and fast), problem is storage such big data.
90% statistics tools (even google analytics or webalyzer) is making graphs only once per day, not in real-time.
And quite good idea is store IP as INT using function ip2long()

Thumb rating for wordpress - top user

I am currently using gd star rating (thumb rating) to rate posts (articles - no rating on comments). What I really want to do is show a table of the top 5 users and their number of votes based on the total number of thumbs up they get for all their posts. For instance
| user | No of votes |
If this is not possible with this plugin, is there any other plugin that is capable of such. Or is there a manual way of achieving what I want. I don't mind manual coding with the right nudge.
Many thanks guys
The easiest way to answer your question is to look at the database and see how the plugin is storing the "votes". I don't know that plugin, but it has to track the up/down votes somewhere, either in it's own table, or by adding a column to one of the default WP tables.
Once you track down the table that's tracking the votes, you can write a basic query that returns a limited set of users, ordered by their up/down votes. It'll look something like this:
$wpdb->get_results("SELECT id, name FROM up_down_votes_table ORDER BY what_ever_the_column is named");
The full documentation for querying the database is here.

Optimal MySQL design for user-specific activity feeds

I'm building a website that constructs both site-wide and user-specific activity feeds. I hope that you can see the structure below and share you insight as to whether my solution is doing the job. This is complicated by the fact that I have multiple types of users that right now are not stored in one master table. This is because the types of users are quite different and constructing multiple different tables for user meta-data would I think be too much trouble. In addition, there are multiple types of content that can be acted upon, and multiple types of activity (following, submitting, commenting, etc.).
Constructing a site-wide activity feed is simple because everything is logged to the main feed table and I just build out a list. I have a master feed table in MySQL that simple logs:
type of activity;
type of target entity;
id of target entity;
type of source entity (i.e., user or organization);
id of source entity.
(This is just a big reference table that points the script generating the feed to the appropriate table(s) for each feed entry).
In generating the user-specific feed, I'm trying to figure out some way to join the relationship table with the feed table, and using that to parse results. I have a relationships table, comprised of 'following' relationships, that is similar to the feed table. It is simpler though b/c only one type of user is allowed to follow other content types/users.
user/source id;
type of target entity;
id of target entity.
Columns 2 & 3 in the feed and follow table are the same, and I have been trying to use various JOIN methodologies to match them up, and then limit them by any relationships in the follow table that the user has. This is has not been very successful.
The basic query I am using is:
SELECT *
FROM (`feed` as fe) LEFT OUTER JOIN `follow` as fo
ON `fe`.`feed_target_type` = `fo`.`follow_e_type`
AND fo.follow_e_id = fe.feed_target_id
WHERE `fo`.`follow_u_id` = 1 OR fe.feed_e_id = 1
AND fe.feed_e_type = 'user'
ORDER BY `fe`.`feed_timestamp` desc LIMIT 10
This query also attempts to grab any content that the user has created (which data is logged in the feed table) that the user is, in effect, following by default.
This query seems to work, but it took me sometime to get to it and am pretty sure I'm missing a more elegant solution. Any ideas?
The first site I made with an activity feed had a notifications table where activities were logged, and then friends actions were pulled from that. However a few months down the line this hit millions of records.
The solution I am programming now pulls latest "friends" activities from separate tables and then orders by date. The query is at home, can post the example later if interested?

Categories