php/mysql - logging users activities & huge database load

php/mysql - logging users activities & huge database load - php

Assuming we have to log all the users activties of a community, i guess that in brief time our database will become very huge; so my question is:
is this anyway an acceptable compromise (to have a huge DB table) in order to offer this kind of service? Or we can do this in more efficent way?
EDIT:
the kind of activity to be logged is a "classic" social-networking activity-log whre people can look what others are doing or have done and viceversa, so it will track for example when user edit profile, post something, login, logout etc...
EDIT 2:
my table is already optimized in order to store only id's
log_activity_table(
id int
user int
ip varchar
event varchar #event-name
time varchar
callbacks text #some-info-from-the-triggered-event
)

Im actually working on a similar system so Im interested in the answers you get.
For my project, having a full historical accounting was not important so we chose to keep the table fairly lean much like what youre doing. Our tables look something like this:
CREATE TABLE `activity_log_entry` (
`id` bigint(20) NOT NULL AUTO_INCREMENT,
`event` varchar(50) NOT NULL,
`subject` text,
`publisher_id` bigint(20) NOT NULL,
`created_at` datetime NOT NULL,
`expires_at` datetime NOT NULL,
PRIMARY KEY (`id`),
KEY `event_log_entry_action_idx` (`action`),
KEY `event_log_entry_publisher_id_idx` (`publisher_id`),
CONSTRAINT `event_log_entry_publisher_id_user_id`
FOREIGN KEY (`publisher_id`)
REFERENCES `user` (`id`) ON DELETE CASCADE
) ENGINE=InnoDB DEFAULT CHARSET=utf8
We decided that we dont want to store history forever so we will have a cron job that kills history after a certain time period. We have both created_at and expired_at columns simply out of convenience. When an event is logged these columns are updated automatically by the model and we use a simple strftime('%F %T', strtotime($expr)) where $expr is a string like '+30 days' we pull from configuration.
Our subject column is similar to your callback one. We also chose not to directly relate the subject of the activity to other tables because there is a possibility that not all event subjects will have a table, additionally its not even important to hold this relationship because the only thing we do with this event log is display activity feed messages. We store a serialized value object of data pertinent to the event for use in predetermined message templates. We also directly encode what the event pertained to (ie. profile, comment, status, etc..).
Our events (aka activities.) are simple strings like 'update','create', etc.. These are used in some queries and of course to help determine which message to display to a user.
We are still in the early stages so this may change quite a bit (possibly based on comments and answers to this question) but given our requirements it seemed like a good approach.

Case: When all user activities have different tables. Eg. Like, comment, post, become a member.
Then these table should have a key associating the entry to a user. Given a user you can get recent activities by querying each table by the user_key.
Hence if you don't have a schema yet or you are privileged to change it, go with having different tables for different activities and search multiple activities.
Case: There are some activities which are say generic and don't have individual table for it
Then have table for generic activities and search it along with other activity tables.

Do you need to store the specific activity of each user, or do you just want to log the kind of activity that is happening over time. If the latter, then you might consider something like RRDtool (or a similar approach) and store the amount of activity over different timesteps in a circular buffer, the size of which stays constant over time. See http://en.wikipedia.org/wiki/RRDtool.

Related

way to manage audit trail of changing record in mysql

I am building a application. It is basically a E-commerce Order fulfillment application. IN this audit trials i.e. who changed what and how many times it was changed and others make a important aspect. How should i maintain this in database / table level ? Say if a record is altered by 3 people, how will i maintain all the changes and track of who changed what ?

First, you need create for every table which you want track with structure like this:
create table user_track_logs {
id bigint(20) primary key auto_increment,
key_id int (11),
user_id (int),
created timestamp default now(),
field varchar(128)// set bigger if you have long named columns (like this_columns_is_very_important_for_me_and_my_employers...)
field_value_was text null,
field_value_new text null,
}
Second you need set current user ID in var in MySQL's connection, or you can use MySQL's user. You can create for every user separate MySQL's login.
Third create triggers insert/update/delete which will be store in user_track_logs.
Or you can emulate this process in PHP, but in PHP it will be more difficult.

Likes on posts on a heavy trafic site

I have this website that lets people post stuff that other people can like, on all the main pages it shows a certain amounts of small tumbs with the amount of likes the post has. The website also has a 'hot' page wich shows the most liked post in the last 24 hours.
i curently was thinking about making the database like this:
CREATE TABLE likes (
`id` int(11) NOT NULL AUTO_INCREMENT,
`post` int(11) NOT NULL,
`user` int(11) NOT NULL,
`liked` TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (`id`),
) ENGINE=MyISAM DEFAULT CHARSET=latin1 AUTO_INCREMENT=1 ;
but iam scared it will cause a lott of performace isues sinds the table will get really big really fast so to query it all it will slow my database down alott, would there be a good way to index this to help with performance? also would it be a good idea to make a second table just for the 'hot' page containing only the likes of the last 24hours wich i then take out the expired (older then 24hours) with a cron job every day?
im far from a expert on databases so some explination with the awnser would be much appreciated, thanks in advance

Your method would work if it is important to you to log who made which "likes". The table will grow linearly with use, which should not be a performance problem unless the site becomes very popular. Just be sure to use InnoDB tables as MyISAM will lock the entire table on writes and that is a write-heave database.
If simply knowing the amount of likes is sufficient, then do something like this:
CREATE TABLE IF NOT EXISTS posts (
`id` int(11) NOT NULL AUTO_INCREMENT,
`likes` int(11) NOT NULL,
PRIMARY KEY (`id`),
) ENGINE=InnoDB DEFAULT CHARSET utf8 AUTO_INCREMENT=1 ;
And on each "Like" run this query:
UPDATE posts SET `likes` = `likes`+1 WHERE id = {$id}";

Ok you can do one thing, First add the "Like" in your likes table and create another table naming "hotlikes" or whatever you want to call it....
Keep a field in hotlikes for DATE, and add date into it, automatically delete the dates which of previous date by creating a cron job
Show only 24hrs like from "hotlikes" table....

Lets say you have fairly large MySql(lets say aws ec2 large instance) which has been tuned properly, indexed properly, uses innodb as storage engine and uses caching you wouldn't have problem serving 100-200 simultaneous users. You will also have to consider what other tables or databases are on the same database machine is doing.
I suppose the post and user in the schema refers to the ids if yes then I would create the following indexes
Primary key (obviously)
post + timestamp
user + timestamp (I am guess you might want to query by users if not then this is not required)
Every 24 hours I will run script to build the top likes and store it in some kind of cache so that it can be retrieved with out hitting the db server wit optional write to db (fail safe just in case cache failed).
Another option is to use key-value stores(redis maybe) but them again it depends on whats yours use case and how much users you will have.

You can do what #Abhilash Shukla suggested you to, plus you can cache the number of likes on the post in the post table. You can update the cache once per X minutes, for example, like VBulletin and some other forums do for thread views. For this, you can create a table to store just id's of the posts, for which you need to recalculate the amount of likes. And do recalculation on cron.
You can even partition likes table by postid range, if you find that suitable for the task.

Dynamic project data managment with forms and mysql

I am currently responsible for creating a web based project management application for the department I work in. Each project has a bunch of different data points that describe it and my first plan was to just set up a table in the mysql database and an html form to manage the data in that table.
My managers just let me know they will need to be able to add/delete data points for the projects in case their work flow and project tracking changes. (This would be something that happens MAYBE a few times a year if at all)
So I am attempting to figure out the best way to go about storing this data in MySQL. The first approach that came to mind was give them an interface that allows them to add columns to the 'projects' table. and have a 'master' table that tracks all the column names and data types. But that feels like a REALLY Bad idea and a bit of a nightmare to maintain.
Another possible option would be to have the interface add a new table that stores all the information for that data point AND the id of the project that is using the data.
I understand that both of these could be really screwy ways of doing things. If there is a better way I would love to hear about it. If I need to clarify something let me know.
Thank you for your time

CREATE TABLE projects (
id INT PRIMARY KEY AUTO_INCREMENT,
name VARCHAR(50) NOT NULL
)
CREATE TABLE datapoints (
id INT PRIMARY KEY AUTO_INCREMENT,
projectid INT NOT NULL,
name VARCHAR(50) NOT NULL,
value VARCHAR(250) NOT NULL,
INDEX(projectid),
INDEX(name)
)
If you want more fancy, do one or more of
Put datapoint names in a table, reference them isnstead of naming them in table datapoints
Have datapoints have a field for each of numeric, pit, text, longtext OR use different tables

Best way to store views / stats in MySQL

I'm working no a site which stores individual page views in a 'views' table:
CREATE TABLE `views` (
`view_id` bigint(16) NOT NULL auto_increment,
`user_id` int(10) NOT NULL,
`user_ip` varchar(15) NOT NULL,
`view_url` varchar(255) NOT NULL,
`view_referrer` varchar(255) NOT NULL,
`view_date` date NOT NULL,
`view_created` int(10) NOT NULL,
PRIMARY KEY (`view_id`),
KEY `view_url` (`view_url`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 AUTO_INCREMENT=1 ;
It's pretty basic, stores user_id (the user's id on the site), their IP address, the url (without the domain to reduce the size of the table a little), the referral url (not really using that right now and might get rid of it), the date (YYYY-MM-DD format of course), and the unix timestamp of when the view occurred.
The table, of course, is getting rather big (4 million rows at the moment and it's a rather young site) and running queries on it are slow.
For some basic optimization I've now created a 'views_archive' table:
CREATE TABLE `views_archive` (
`archive_id` bigint(16) NOT NULL auto_increment,
`view_url` varchar(255) NOT NULL,
`view_count` smallint(5) NOT NULL,
`view_date` date NOT NULL,
PRIMARY KEY (`archive_id`),
KEY `view_url` (`view_url`),
KEY `view_date` (`view_date`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 AUTO_INCREMENT=1 ;
This ignores the user info (and referral url) and stores how many times a url was viewed per day. This is probably how we'll generally want to use the data (how many times a page was viewed on a per day basis) so should make querying pretty quick, but even if I use it to mainly replace the 'views' table (right now I imagine I could show page views by hour for the last week/month or so and then show daily views beyond that and so would only need the 'views' table to contain data from the last week/month) but it's still a large table.
Anyway, long story short, I'm wondering if you can give me any tips on how to best handle the storage of stats/page views in a MySQL site, the goal being to both keep the size of the table(s) in the db as small as possible and still be able to easily (and at least relatively quickly) query the info. I've looked at partitioned tables a little, but the site doesn't have MySQL 5.1 installed. Any other tips or thoughts you could offer would be much appreciated.

You probably want to have a table just for pages, and have the user views have a reference to that table. Another possible optimization would be to have the user IP stored in a different table, perhaps some session table information. That should reduce your query times somewhat. You're on the right track with the archive table; the same optimizations should help that as well.

MySQL's Archive Storage Engine
http://dev.mysql.com/tech-resources/articles/storage-engine.html
It is great for logs, it is quick to write, the one downside is reading is a bit slower. but it is great for log tables.

Assuming your application is a blog and you want to keep track of views for your blog posts, you will probably have a table called blog_posts. In this table, I suggest you create a column called "views" and in this column, you will store a static value of how many views this post has. You will still use the views table, but that will only be utilized to keep track of all the views (and to do checks if they are "unique" or not).
Basically, when a user visits a blog post post, it will check the views table to see if it should be added. If so, it will also increment the "views" field in the corresponding row for the blog post in blog_posts. That way, you can just refer to the "views" field for each post to get a quick peek at how many views it has. You can take this a step further and add redudancy by setting up a CRON job to re-count and verify all the views and update each blog_posts row accordingly at the end of the day. Or if you prefer, you can also perform a re-count on each update if accuracy to-the-second is key.
This solution works well if your site is read-intensive and you are constantly having to get a count of how many views each blog post has (again, assuming that is your application :-))

How to manage an AJAX based 'like/dislike' feature?

I'd like to add a like/dislike-upvote/downovote-type feature to each one of the posts in the forum script I'm writing (much like the one here in SO). I'm having two difficulties trying to figure out how it can be done:
1) I can't figure out a db schema that'd do it efficiently. I could use a separate `likeordislike` table to make a relation between user and post (xyz likes post #123), or I can use a column of type \'text\' in the `posts` table listing out all the users who have liked (or disliked) the post. The latter of course means I'd have to parse the field for userIDs to make any use of it.
2) Make sure the user doesn't get to like/dislike a post twice.
It's probably trivial but I can only think of ways that make a lot of mysql calls on server side processes. Thanks.

Make a separate table in which you keep track of who likes something and who doesn't. That table will be used to check if a user already did something, so you can prevent him doing it twice.
Then add another field (if you will have votes) or two (if you will have likes/dislikes) in which you will store the total amount of likes/dislikes or score, so you don't have to calculate this on-the-fly every time you display the post. And you will, off course, update this column (or columns) when somebody votes on the post.
And don't bother disabling the vote link. Just check if the user has already voted when he clicks on the link and deny him the vote if he already cast one.

(Similar answer to Jan Hančič here, but I decided my take on the ratings was different enough...)
Your initial thought of a separate table to store likes/dislikes is absolutely the way I would go. I'd put indexes on the two main columns (player ID and post ID) which is critical to the next bit.
For example:
create table users (
userId varchar(254) not null,
-- ...
primary key (userId)
)
ENGINE=...;
create table posts (
postId int(11) not null,
title varchar(254) not null,
content text not null,
-- ...
primary key (postId)
)
ENGINE=...;
create table userPostRatings (
userId varchar(254) not null,
postId int(11) not null,
rating int(2) not null,
-- ...
)
ENGINE=...;
create index userPostRatings_userId on userPostRatings(userId);
create index userPostRatings_postId on userPostRatings(postId);
I'd then use a joined query (whether in a stored procedure, in the code, or in a view) to get the post information, e.g.:
select p.postId, p.title, p.content, avg(r.rating)
from posts p
left outer join userPostRatings r on p.postId = r.postId
where p.postId = ?
group by p.postId;
(That will return NULL for the average for any posts that don't have any ratings yet.)
Although this is a join, because of the indexes on the userPostRatings table it's a fairly efficient one.
If you find that the join is killing you (very high concurrency sites do), then you may want to de-normalize a bit in the way Jan suggested, by adding an average rating column to posts and keeping it updated. You'd just change your query to use it. But I wouldn't start out denormalized, it's more code and arguably premature optimisation. Your database is there to keep track of this information; as soon as you duplicate the information it's tracking, you're introducing maintenance and synchronisation issues. Those may be justified in the end, but I wouldn't start out assuming my DB couldn't help me with this. Making the adjustment (if you plan ahead for it) if the join is a problem in your particular situation isn't a big deal.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.