Storing comments for users? - php

I'm making a site where users can make posts and comments, where number of comments made by a single user could get over 1000 comments. On their profile, it would show a list of all comments (by latest, splitting them into pages with 20 comments per page) made by that user.
Considering the database used to store comments would get extremely large, I'm wondering what would be the best way to go about this, since people with more comments would likely be more popular and running a query searching for the user's id through a list of all comments would be the best way to go about it.
Was thinking an alternative could be making a separate column on the user database, which would store all comment ids, and whenever someone were to visit their page, it would go through the comments looking for those ids (limiting to 20 at a time or so).
Unsure which method would be faster, and if the second method is practical. Also if there's any other better method to go about it. First time doing something like this and would appreciate any guidance.

If you are using SQL 2012 new syntax was added to make this really easy. See Implement paging (skip / take) functionality with this query
Skip 20 * page depending on the page you're looking for.

Related

Creating forum software - looking for the best way to go about 1 thing [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
Alright, so I enjoy making forum software with PHP and MySQL, though there's one thing that has always troubled me, and one thing only;
The main page of the forums, where you view the list of the forums. Each forum shows the forum name, the number of posts made in that forum, the number of discussions made in that forum, and the last poster in the forum. There lies the problem, getting all of that data when all of those things are stored in different tables. It's not much of a problem to GET it, not really a problem at all, but to do it EFFICIENTLY is what I'm after.
My current approach is this;
Store the current number of posts, discussions, and the last poster statically in the forum table itself instead of going out and grabbing the data from the different tables - "posts", "discussions", "forums", etc. Then when a user posts, it updates that "forums" table, incrementing the number of posts by 1 and updating the last poster, and also incrementing the discussions by 1 if they're making a new discussion. This just seems inefficient and dirty to me for some reason, but maybe it's just me.
And here's another approach that I fear would be horribly inefficient;
Actually going out to each table - "posts", "discussions", "forums" - and grabbing the data. The problem with this is, there can be hundreds of forums on one page... And I'd have to use a COUNT statement to fetch the number of posts or discussions, meaning I'd have to use subqueries - not to mention a third subquery to fetch the last poster. That being said... The query would be something like this psuedo-code-like-thing:
SELECT foruminfo, (
SELECT COUNT(id)
FROM posts
WHERE forumId = someid
), (
SELECT COUNT(id)
FROM discussions
WHERE forumId = someid
), (
SELECT postinfo
FROM posts
WHERE forumId = someid
ORDER BY postdate
DESC LIMIT 1
)
FROM forums
ORDER BY position DESC;
So basically those subqueries could be run hundreds of times if I have hundreds of forums being listed. And with hundreds of users viewing the page every second, would this not put quite a bit of strain on? I'm not entirely sure if subqueries cause the same amount of load as normal queries or not, but if they do then it seems like it would certainly be horribly inefficient.
Any ideas? :(
I've built a large scale forum systems before, and the key to making it performant is to de-normalize anything and everything you can.
You cannot realistically use JOIN on really popular pages. You must keep the number of queries you issue to the absolute minimum. You should never use sub-selects. Always be sure your indexes cover your exact use cases and no more. A query that takes longer than 1-5ms to run is probably way too slow to work on a site that's running at scale. When due to severe load things suddenly take ten times longer to run a 15ms query will take a crippling 150ms or more while your optimized 1ms queries will take an acceptable 10ms. You're aiming for them to be 0.00s all the time, and it's possible to do this.
Remember that any time you're executing a query and waiting for a response, you're not able to do anything else. If you get a little careless, you'll have requests coming in faster than you can process them and the whole system will buckle.
Keep your schema simple, even stupidly simple, and by that I mean think about the layout of your page, the information you're showing, and make the schema match that as exactly as possible. Strip it down to the bare essentials. Represent it in a format that's as close as possible to the final output without making needless compromises.
If you're showing username, avatar, post title, number of posts, date of posting, then that's the fields you have in your database. Yes, you will still have a separate users database, but transpose anything and everything you can into a straight-forward structure that makes it as simple as this:
SELECT id, username, user_avatar, post_title, post_count, post_time FROM posts
WHERE forum_id=?
ORDER BY id DESC
Normally you'd have to join against users to get their name, maybe another table to get their particular avatar, and the discussions table to get the post count. You can avoid all that by changing your storage strategy.
In the case I was working with, it was a requirement to be able to post things in the future as well as in the past, so I had to create a specific "sort key" independent of ID, like your position. If this is not the case for you, just use the id primary key for ordering, something like this:
INDEX post_order (forum_id, id)
Using SUM or COUNT is completely out of the question. You need counter-cache columns. These are things that save counts of how many messages are in a particular forum. Yes, they will drift out of sync once in a while like any de-normalized data, so you will need to add tools to keep them in check, to rebuild completely them if required. Usually you can do this as a cron-job that runs once daily to repair any minor corruption that might've occurred. Most of the time, if you get your implementation correct, they will be perfectly in sync.
Other things to note, split up posts into threads if you can. The smaller your tables are, the faster they'll be. Sifting through all posts to find the top-level posts of each thread is brutally slow, especially on popular systems.
Also, cache anything you can get away with in something like Memcached if that's an option. For example, a user's friends listing won't change unless a friend is added or removed, so you don't need to select that list constantly from the database. The fastest database query is the one you never make, right?
To do this properly, you'll need to know the layout of each page and what information is going on it. Pages that aren't too popular need less optimization, but anything in the main line of fire will have to be carefully examined. Like a lot of things, there's probably an 80/20 rule going on, where 80% of your traffic hits only 20% of your code-base. That's where you'll want to be at your best.

current status of mysql table displayed every x seconds

Ok. I didn't know how to put this question in Title so here is a quick description.
Let's say I have site with some promotional stuff to give for free (or not:).
When I have something to give I announce this on facebook and twitter etc. and people can come to website and fill quick form, couple quick questions and of course name and address.
But the problem is I have for example 20 pieces of this thing to give for free.
When you submit the form this goes automatically to database table.
I know how to display current status for this offer with some PHP (like: there's only 12 items left.hurry up!), there is also no problem with refreshing this every couple seconds with AJAX. But problem I see in here is when let's say this will become more popular and I will have many offers during short time.
I don't want database to be overloaded with queries from hundreds of people every two seconds.
Is there any way to send just one query every two seconds (somewhere on the sever?) and just somehow update value from this query in any browser currently visiting the website?
I'm not sure if this is clear question but what I'm asking is what would be the best practice for this kind of situation.
Is my concern about overloading the database even reasonable?
And extra problem...
In this particular situation - with the limit for amount of people that can participate - is there any threat that I can have strange behavior when two people will send form in exactly the same time when there is only one item left?
I would love to see any directions in this subject. Even general one will do :)
PS: No, english is not my first language :)
Thx
Can't you have one script loading from the database and save it somewhere that isn't the database (a file, preferably) and then it can be extracted from there? This will include a cronjob for that script to be run every 5 second.

PHP idea for storing unread posts and notifications

So I have been playing around with a forum I am building and have been stuck on one aspect of it for a while, how to track unread posts and notifications without storing loads of data in the database. After looking at some solutions I believe I came up with a solution that may suit my needs but need a set of fresh eyes to point out what I didn't think of. Here is the architecture of my idea.
1) When a user logs in, check for posts made between current time() and last login time().
2) If posts found, add to array, then serialize() array and save to member row in database.
3) Output array to user if not empty.
This way it will only check for unread posts and store on users who actually log in to the forum, instead of taking up unnecessary space holding unread IDs of inactive users. I'm still wondering if this isn't such a good idea since if the user doesn't read posts then the serialization in the database might become too large to manage.
Does anyone see a problem in my way of thinking? If so please let me know.
Don't worry about the space until there's actually a problem. A table storing the post ID (integer) and the user ID (another integer) will be small. Even if you have thousands of posts and thousands of users, you can safely assume that:
a large part of the users will be inactive (one-time registrations to post something and forget the whole issue)
even the active members will not read all the posts, but rather only a (relatively small) part of the ones that are in topics that interest them.
One other thing: don't store unread posts if you really want to minimise space. Store only the last read post in each thread. That's one record per thread per user, and only assuming the user has ever opened the thread.
If the user logs in, but does not read posts, your scheme still marks them as read.
If the user logs in twice at once (as from a desktop computer and an iPad), what will happen?
What is the problem with keeping each user's view of the forum with a flag to indicate whether they read each one? Such a mechanism is obviously useful to expand into upvoting, favorites, etc.

In a social network environment, what would be the easiest way to check for friendships globally?

For reference, here is a question on SO that I asked recently that is relevant to this question: How to model Friendship relationships
On that question, we figured out a way to display news feed items only if posted by friends. However, what I need is advice on how to check for friendship in a more dynamic way so it can be used throughout several site functions.
For example, I've just installed a comment system, allowing users to post comments on news posts. However, this isn't restricted to friends, and it should be (and later should be made optional by the author of the post).
Posting news items only by friends was slightly different because I was getting data directly from the database, and using SELECT subqueries to only get posts by friends of the current user. However, in the example of the commenting, I only want to display the comment post form if the person is a friend. I'm not actually pulling anything from the database to pick, sort, and display.
Knowing issues like this will arise many times throughout the site, what would be the simplest way to check for friendship? Could I somehow pull all the friends' user IDs from the database into a session array of some kind and then do an if(in_array($friends)) whenever I needed to determine if the person in question is a friend of the currently logged in user? Off the top of my head, this sounds like it would work fine, but I'd like your input first.
The question I linked to above explains how my friendship table works, in case that help you help me with this.
Actually, it is a very bad idea to stor the friend array in the session. What happens if some you add a new friend after the session variable is created? It does not get updated in the session.
Since you will be checking the list of friends a lot of times on the same page, why not just query it out and store it in a local array that you can keep using on the same page.
When the page finish executing, the array will be discarded.
So basically, you will only query the list out once.
A recommended implementation would be to follow the advise of "pst" in the above comment, and just query for every time you need to find the relationship first, as that is simple to implement. Later, when the speed of the query starts to become a issue, you can just change the internal of that method to cache the friend list in a local array when it is first called to speed things up. (exchange memory usage for processor usage).

showing threads or posts unread by a user

I am in the process of writing my own basic forum to plug into a code igntier site. I'm a little stuck on how to display threads/latest posts unread by a user.
I was thinking of a table that holds each thread_id visited, but this table has the potential to get rather large.
What's are some ways to approach this requirement?
A simple idea: record the last datetime that a user visits the site/forum/subforum. This could be as granular as the thread or subforum, as you like. Perhaps create/update this key-value pair of thread_id and last_visit_date in a cookie. Perhaps store this in a cookie, rather than in your RDBMS. Ask: is this mission critical data, or an important feature that can/cannot withstand a loss of data?
When the user returns, find all the threads/posts whose create_date is greater than the last_visit_date for the forum.
I'm assuming that the act of visiting the forum (list of threads) is same as 'viewing'. Assuming that if the info was presented, that you'd 'viewed' the thread title, regardless of whether you actually drilled into the thread.
The easiest way out would probably be just to keep a cookie of the time of user's last visit and query posts posted/edit after this. You don't get exactly all read threads but most forums seems to work this way, otherwise you have to save all read threads somewhere.
I don't think you really need to create any table to log thread ids as you have thought because its going to grow by the size of your users and by the numbers of threads/posts created. You can just show threads or posts that were created after the user's last visit as unread. I think thats what I am going to do.

Categories