Likes on posts on a heavy trafic site - php

I have this website that lets people post stuff that other people can like, on all the main pages it shows a certain amounts of small tumbs with the amount of likes the post has. The website also has a 'hot' page wich shows the most liked post in the last 24 hours.
i curently was thinking about making the database like this:
CREATE TABLE likes (
`id` int(11) NOT NULL AUTO_INCREMENT,
`post` int(11) NOT NULL,
`user` int(11) NOT NULL,
`liked` TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (`id`),
) ENGINE=MyISAM DEFAULT CHARSET=latin1 AUTO_INCREMENT=1 ;
but iam scared it will cause a lott of performace isues sinds the table will get really big really fast so to query it all it will slow my database down alott, would there be a good way to index this to help with performance? also would it be a good idea to make a second table just for the 'hot' page containing only the likes of the last 24hours wich i then take out the expired (older then 24hours) with a cron job every day?
im far from a expert on databases so some explination with the awnser would be much appreciated, thanks in advance

Your method would work if it is important to you to log who made which "likes". The table will grow linearly with use, which should not be a performance problem unless the site becomes very popular. Just be sure to use InnoDB tables as MyISAM will lock the entire table on writes and that is a write-heave database.
If simply knowing the amount of likes is sufficient, then do something like this:
CREATE TABLE IF NOT EXISTS posts (
`id` int(11) NOT NULL AUTO_INCREMENT,
`likes` int(11) NOT NULL,
PRIMARY KEY (`id`),
) ENGINE=InnoDB DEFAULT CHARSET utf8 AUTO_INCREMENT=1 ;
And on each "Like" run this query:
UPDATE posts SET `likes` = `likes`+1 WHERE id = {$id}";

Ok you can do one thing, First add the "Like" in your likes table and create another table naming "hotlikes" or whatever you want to call it....
Keep a field in hotlikes for DATE, and add date into it, automatically delete the dates which of previous date by creating a cron job
Show only 24hrs like from "hotlikes" table....

Lets say you have fairly large MySql(lets say aws ec2 large instance) which has been tuned properly, indexed properly, uses innodb as storage engine and uses caching you wouldn't have problem serving 100-200 simultaneous users. You will also have to consider what other tables or databases are on the same database machine is doing.
I suppose the post and user in the schema refers to the ids if yes then I would create the following indexes
Primary key (obviously)
post + timestamp
user + timestamp (I am guess you might want to query by users if not then this is not required)
Every 24 hours I will run script to build the top likes and store it in some kind of cache so that it can be retrieved with out hitting the db server wit optional write to db (fail safe just in case cache failed).
Another option is to use key-value stores(redis maybe) but them again it depends on whats yours use case and how much users you will have.

You can do what #Abhilash Shukla suggested you to, plus you can cache the number of likes on the post in the post table. You can update the cache once per X minutes, for example, like VBulletin and some other forums do for thread views. For this, you can create a table to store just id's of the posts, for which you need to recalculate the amount of likes. And do recalculation on cron.
You can even partition likes table by postid range, if you find that suitable for the task.

Related

Database Designing for Vote Buttons

I am working on a Project in which i have Voting Options Up and Down similar to StackOverFlow. I am not much experienced in DB Designing and thus i got up with the following issue,
First of all, here is my table structure for voting :
voteId-----AUTO_INCREMENT with PRIMARY KEY.
mediaId----The Media for which user gives an up/down vote.
userId-----The User who Voted.
voteMode---1 for Up Vote and 0 for Down Vote. This is an Integer Field.
In this case if i have 100 users and 100 medias, then i will have 100x100 records all total in this table.
The Problem arises here is the DB is getting up with a lot of records and the vote button is dead slow to react now. This is making my client unhappy and me in trouble.
Can anyone suggest me a better model to avoid this Huge Table?
I am using jQuery.ajax to post my vote to server. Also, the project is based on PHP and Zend Framework 1.11. So when i click on UP Icon, it will take some time to respond. Mozilla used to Crash certain times. I tested with inserting via a loop with lots of junk records (around 15000).
You can try these up gradation of your table schema:
//All id's are now unsigned , As you do not need any sign
ALTER TABLE `voting`
CHANGE `voteid` `voteid` INT(11) UNSIGNED NOT NULL AUTO_INCREMENT,
CHANGE `mediaId` `mediaId` INT(11) UNSIGNED NOT NULL,
CHANGE `userId` `userId` INT(11) UNSIGNED NOT NULL,
//ENUM datatype as you need only 2 type of value
CHANGE `voteMode` `voteMode` ENUM('1' , '2') NOT NULL ;
//Adding index , it will surely increase some speed
//Do **not use** index in **columns you do not need**
ALTER TABLE `voting` ADD INDEX ( `mediaId` , `userId` ) ;
Go through Mysql Index to know more about indexing.
If you are using MyISAM Storage Engine , then i will suggest you to go for InnoDB Storage Engine. It may help you to decide about which engine you should use.
And some other hacks that may help you are:
MySQL Query Cache
Prepared Statements in php
COLUMNS Partitioning
Some resources about MySql Database optimizations :
MySQL Tuning
Mysql Optimization
Real World Scalability MySQL.
OK, two things. 15k records is nothing so that can't hardly give a problem. I'm using tables with 150M rows and queries are still performing well under .005s
I'm suspecting you're using MyIsam and not InnoDB. With MyIsam, each insert (or update) locks the entire table. So while someone is voting, the tables is locked and others can't read from it. This might become a problem if you have thousands of users.
Make sure you have the right indexes. I'm not sure what queries are slow (and how slow!) but make sure you have an index on the columns you are searching for (probably mediaId).
If you want better advise, post the queries that are slow.
If you want to keep track of what user has voted for x media, and every user votes, then your minimal data amount is users * media.
If you want to have less data, you have to make a concession. Perhaps let a user register and vote anonymously? Most users are not very happy if there personal preferences can be distilled from there voting behavior.

way to manage audit trail of changing record in mysql

I am building a application. It is basically a E-commerce Order fulfillment application. IN this audit trials i.e. who changed what and how many times it was changed and others make a important aspect. How should i maintain this in database / table level ? Say if a record is altered by 3 people, how will i maintain all the changes and track of who changed what ?
First, you need create for every table which you want track with structure like this:
create table user_track_logs {
id bigint(20) primary key auto_increment,
key_id int (11),
user_id (int),
created timestamp default now(),
field varchar(128)// set bigger if you have long named columns (like this_columns_is_very_important_for_me_and_my_employers...)
field_value_was text null,
field_value_new text null,
}
Second you need set current user ID in var in MySQL's connection, or you can use MySQL's user. You can create for every user separate MySQL's login.
Third create triggers insert/update/delete which will be store in user_track_logs.
Or you can emulate this process in PHP, but in PHP it will be more difficult.

Proper unique news article view counter approach

I have looked at different ways to approach this but I would like a method which does not allow people to get around it. Just need a simple, light-weight method to count the number of views off different news articles which are stored in a database:
id | title | body | date | views
1 Stack Overflow 2010-01-01 23
Session
- Could they not just clear browser data and reload page for another view? Any way to stop this?
Database table of ip addresses
- Tons of entries, may hinder performance
Log file
- Same issue as database however I've seen lots of examples
For a performance critical system and for ensuring accuracy, which method should I look into further?
Thanks.
If you're looking to figure out how many unique visitors you have to a given page, then you need to keep information that is unique to each visitor somewhere in your application to reference.
IP addresses are definitely the "safest" way to go, as a user would have to jump through a good many hoops to manually change their IP address. That being said you would have to store a pretty massive amount of data if this is a commercial web-site for each and every page.
What is more reasonable to do is to store the information in a cookie on the client's machine. Sure if your client doesn't allow cookies you will have a skewed number and sure the user can wipe their browser history and you will have a skewed number but overall your number should be relatively accurate.
You could potentially keep this information cached or in session-level variables, but then if your application crashes or restarts you're SOL.
If you REALLY need to have nearly 100% accurate numbers then your best bet is to log the IP addresses of each page's unique visitors. This will ensure you the most accurate count. This is pretty extreme though and if you can take a ~5+% hit in accuracy then I would definitely go for the cookies.
I think that to keep it lightweight you should use someone else's processing power, so for that reason you should sign up to Google Analytics and insert their code into your pages that you want to track.
If you want more accuracy then track each database request in the database itself; or employ a log reading tool that then drops summaries of page reads into a database or file system each day.
Another suggestion:
When the user visits your website log their IP address in a table and drop a cookie with a unique ID. Store this unique ID in a table, along with a reference to the IP address record. This way you are able to figure out a more accurate count (and make adjustments to your final number)
Setup an automated task to create summary tables - making querying the data much faster. This will also allow you to prune the data on a regular basis.
If you're happy to sacrifice better accuracy then this might be a solution:
This would be the "holding" table - which contains the raw data. This is not the table you'd use to query data from - it'd just be for writing to. You'd run through this whole table on a daily/weekly/monthly basis. Yet again - you may need indexes dependant on how you wish to prune this.
CREATE TABLE `article_views` (
`article_id` int(10) unsigned NOT NULL,
`doy` smallint(5) unsigned NOT NULL,
`ip_address` int(10) unsigned NOT NULL
) ENGINE=InnoDB
You'd then have a summary table, which you would update on a daily/weekly or monthly basis which would be super fast to query.
CREATE TABLE `summary_article_uniques_2011` (
`article_id` int(10) unsigned NOT NULL,
`doy` smallint(5) unsigned NOT NULL,
`unique_count` int(10) unsigned NOT NULL,
PRIMARY KEY (`article_id`,`doy`),
KEY(`doy`)
) ENGINE=InnoDB
Example queries:
Unique count for a specific article on a day:
SELECT unique_count FROM summary_article_uniques_2011 WHERE article_id=? AND doy=" . date('z') . "
Counts per day for a specific article:
SELECT unique_count FROM summary_article_uniques_2011 WHERE article_id=?
Counts across the entire site, most popular articles today:
SELECT article_id FROM summary_article_uniques WHERE doy=? ORDER BY unique_count DESC LIMIT 10 // note this query will not hit an index, if you are going to have a lot of articles your best bet is to add another summary table/index "unique_count"

php/mysql - logging users activities & huge database load

Assuming we have to log all the users activties of a community, i guess that in brief time our database will become very huge; so my question is:
is this anyway an acceptable compromise (to have a huge DB table) in order to offer this kind of service? Or we can do this in more efficent way?
EDIT:
the kind of activity to be logged is a "classic" social-networking activity-log whre people can look what others are doing or have done and viceversa, so it will track for example when user edit profile, post something, login, logout etc...
EDIT 2:
my table is already optimized in order to store only id's
log_activity_table(
id int
user int
ip varchar
event varchar #event-name
time varchar
callbacks text #some-info-from-the-triggered-event
)
Im actually working on a similar system so Im interested in the answers you get.
For my project, having a full historical accounting was not important so we chose to keep the table fairly lean much like what youre doing. Our tables look something like this:
CREATE TABLE `activity_log_entry` (
`id` bigint(20) NOT NULL AUTO_INCREMENT,
`event` varchar(50) NOT NULL,
`subject` text,
`publisher_id` bigint(20) NOT NULL,
`created_at` datetime NOT NULL,
`expires_at` datetime NOT NULL,
PRIMARY KEY (`id`),
KEY `event_log_entry_action_idx` (`action`),
KEY `event_log_entry_publisher_id_idx` (`publisher_id`),
CONSTRAINT `event_log_entry_publisher_id_user_id`
FOREIGN KEY (`publisher_id`)
REFERENCES `user` (`id`) ON DELETE CASCADE
) ENGINE=InnoDB DEFAULT CHARSET=utf8
We decided that we dont want to store history forever so we will have a cron job that kills history after a certain time period. We have both created_at and expired_at columns simply out of convenience. When an event is logged these columns are updated automatically by the model and we use a simple strftime('%F %T', strtotime($expr)) where $expr is a string like '+30 days' we pull from configuration.
Our subject column is similar to your callback one. We also chose not to directly relate the subject of the activity to other tables because there is a possibility that not all event subjects will have a table, additionally its not even important to hold this relationship because the only thing we do with this event log is display activity feed messages. We store a serialized value object of data pertinent to the event for use in predetermined message templates. We also directly encode what the event pertained to (ie. profile, comment, status, etc..).
Our events (aka activities.) are simple strings like 'update','create', etc.. These are used in some queries and of course to help determine which message to display to a user.
We are still in the early stages so this may change quite a bit (possibly based on comments and answers to this question) but given our requirements it seemed like a good approach.
Case: When all user activities have different tables. Eg. Like, comment, post, become a member.
Then these table should have a key associating the entry to a user. Given a user you can get recent activities by querying each table by the user_key.
Hence if you don't have a schema yet or you are privileged to change it, go with having different tables for different activities and search multiple activities.
Case: There are some activities which are say generic and don't have individual table for it
Then have table for generic activities and search it along with other activity tables.
Do you need to store the specific activity of each user, or do you just want to log the kind of activity that is happening over time. If the latter, then you might consider something like RRDtool (or a similar approach) and store the amount of activity over different timesteps in a circular buffer, the size of which stays constant over time. See http://en.wikipedia.org/wiki/RRDtool.

Best way to store views / stats in MySQL

I'm working no a site which stores individual page views in a 'views' table:
CREATE TABLE `views` (
`view_id` bigint(16) NOT NULL auto_increment,
`user_id` int(10) NOT NULL,
`user_ip` varchar(15) NOT NULL,
`view_url` varchar(255) NOT NULL,
`view_referrer` varchar(255) NOT NULL,
`view_date` date NOT NULL,
`view_created` int(10) NOT NULL,
PRIMARY KEY (`view_id`),
KEY `view_url` (`view_url`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 AUTO_INCREMENT=1 ;
It's pretty basic, stores user_id (the user's id on the site), their IP address, the url (without the domain to reduce the size of the table a little), the referral url (not really using that right now and might get rid of it), the date (YYYY-MM-DD format of course), and the unix timestamp of when the view occurred.
The table, of course, is getting rather big (4 million rows at the moment and it's a rather young site) and running queries on it are slow.
For some basic optimization I've now created a 'views_archive' table:
CREATE TABLE `views_archive` (
`archive_id` bigint(16) NOT NULL auto_increment,
`view_url` varchar(255) NOT NULL,
`view_count` smallint(5) NOT NULL,
`view_date` date NOT NULL,
PRIMARY KEY (`archive_id`),
KEY `view_url` (`view_url`),
KEY `view_date` (`view_date`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 AUTO_INCREMENT=1 ;
This ignores the user info (and referral url) and stores how many times a url was viewed per day. This is probably how we'll generally want to use the data (how many times a page was viewed on a per day basis) so should make querying pretty quick, but even if I use it to mainly replace the 'views' table (right now I imagine I could show page views by hour for the last week/month or so and then show daily views beyond that and so would only need the 'views' table to contain data from the last week/month) but it's still a large table.
Anyway, long story short, I'm wondering if you can give me any tips on how to best handle the storage of stats/page views in a MySQL site, the goal being to both keep the size of the table(s) in the db as small as possible and still be able to easily (and at least relatively quickly) query the info. I've looked at partitioned tables a little, but the site doesn't have MySQL 5.1 installed. Any other tips or thoughts you could offer would be much appreciated.
You probably want to have a table just for pages, and have the user views have a reference to that table. Another possible optimization would be to have the user IP stored in a different table, perhaps some session table information. That should reduce your query times somewhat. You're on the right track with the archive table; the same optimizations should help that as well.
MySQL's Archive Storage Engine
http://dev.mysql.com/tech-resources/articles/storage-engine.html
It is great for logs, it is quick to write, the one downside is reading is a bit slower. but it is great for log tables.
Assuming your application is a blog and you want to keep track of views for your blog posts, you will probably have a table called blog_posts. In this table, I suggest you create a column called "views" and in this column, you will store a static value of how many views this post has. You will still use the views table, but that will only be utilized to keep track of all the views (and to do checks if they are "unique" or not).
Basically, when a user visits a blog post post, it will check the views table to see if it should be added. If so, it will also increment the "views" field in the corresponding row for the blog post in blog_posts. That way, you can just refer to the "views" field for each post to get a quick peek at how many views it has. You can take this a step further and add redudancy by setting up a CRON job to re-count and verify all the views and update each blog_posts row accordingly at the end of the day. Or if you prefer, you can also perform a re-count on each update if accuracy to-the-second is key.
This solution works well if your site is read-intensive and you are constantly having to get a count of how many views each blog post has (again, assuming that is your application :-))

Categories