Im making an application where users are able to comment on a lot of things, like blog posts, uploaded songs, pictures and so on.
Is it better to store ALL comments in one table where you have a column that points to what the comment was posted, e.g. blog, picture etc.
Or is it better to store them in separate tables, like "blogcomments" table, "picturecomments" table etc? Say for a site with 10000 plus users?
Thanks
If all comments have the same data being stored (e.g. comment content, user who posted the comment, etc), then it would make the post sense to keep them all in a single table. If they have different formats, then put them in separate tables.
In my opinion the first approach is the better one. Infact, this way, you can add (or remove) type(s) of comment (I mean post comment, blog comment and so on) without adding or deleting every time a table. I think this it is a more scalable solution.
Related
I'm trying to create a Like/Unlike system akin to Facebook's for an existing comments section of a website, and I need help in designing the system.
Currently, every product on the website has a comments section and members can post and like comments. I need to know each member has posted how many comments and each of his comments has received how many likes. Of course, I need to know who liked what comments too (partly so that I can prevent a user from liking a comment more than once) for analytical purposes.
The naive way of implementing a Like system to the current comments module is to create a new table in the database that has foreign keys to the CommentID and UserID. Then for every "like" given to a comment by a user, I would insert a row to this new table with the targeting comment ID and user ID.
While this might work, the massive amount of comments and users is going to cause this table to grow quickly and retrieving records from and doing counts on this huge table will become slow and inefficient. I can index either one of the columns, but I don't know how effective it would be. The website has over a million comments.
I'm using PHP and MySQL. For a system like this with a huge database, how should I designing a Like system so that it is more optimised and stable?
For scalability, do not include the count column in the same table with other things. This is a rare case where "vertical partitioning" is beneficial. Why? The LIKEs/UNLIKEs will come fast and furious. If the code to do the increment/decrement hits a table used for other things (such as the text of the Comment), there will be an unacceptable amount of contention between the two.
This tip is the first of many steps toward being able to scale to Facebook levels. The other tips will come, not from a free forum, but from the team of smart engineers you will have to hire to get to that level. (Hints: Sharding, Buffering, Showing Estimates, etc.)
Your main concern will be a lot of counts, so the easy thing to do is to keep a separate count in your comments table.
Then you can create a TRIGGER that increments/decrements the count based on a like/unlike.
That way you only use the big table to figure out if a user already voted.
Here goes a very basic question.
Here is my stroll in trying to create a form for work:
I created the HTML form. Added some JavaScript to make it do some things I needed. Stylized it with CSS, wrote PHP code and created a Database (I had no idea of how to do it at first) for the entered data to be saved.
I didn't know how to do any of that, but in the past two weeks I've managed to make it exactly the way I needed, and I'm very pleased with myself. After a lot of work, the form perfectly sends the data to the database and displays it in the page after you hit submit, and also looks really good too.
The thing is... this that I am creating is an Activities Bank for us to use here at work (I teach English) and the page (base) I have created is only ONE of MANY that are needed in this data bank. Let me explain... Let's say the page I've created is the post and display of, say, Book3 Chapter1 Activities, and I need to have many other pages (which will be exact copies of this one).
My question is... will I have to create (actually, it's copy and paste) new databases/tables manually (which will be more than one hundred of them) or there is a way to automatize this process?
I mean, all the pages will share the same variables and the same form... the only different thing will be the title and the entered data, of course.
Will I have to create a database for each page? Or a new table for each page in the same database?
If you still don't understand what I need, here is how this is supposed to be:
Book1 has 40 chapters, so, 40 copies of the same form (which already works fine);
PLUS
Book2 that has 40 more chapters, etc.
Thanks in advance for any clarification.
Sorry if this is such a basic question, but if it isn't, if otherwise, what I wanna do is very complicated, I don't mind that I don't know much about all this and I will take the challenge, like I have, when I was making this form from scratch, without ever hearing about "databases". Any words of help are appreciated.
That isn't how databases or tables work. You should be creating a new row in one or more tables for each form submission. You should almost never be dynamically creating tables, and even less often databases.
It sounds like you want a books table, and a chapters table. Each row in the books table will have many rows in the chapters table "pointing" to it via foreign keys.
I think in your case your case two tables.in total.
As you have already added one, you need another one, where the first one will contain the common data with a primary key column and the other table will keep the primary key of the first table and the data which occurs in multiple. Then later you needs to join (Sql join)rotatable to get your data.
I am making a website with a large pool of images added by users.
I want to choose randomly one image out of this pool, and display it to the user, but I want to make sure that this user has never seen this image before.
So i was thinking that: when a user views an image, I make a row INSERT in MYSQL that would say "This USER has watched THIS IMAGE at (TIME)" for every entry.
But the thing is, since there might be a lot of users and a lot of images, this table can easily grow to tens of thousands of entries quite rapidly.
So alternatively, it might be done like that:
I was thinking of making a row INSERT for every USER, and in ONE field, I insert an array all id's of images that user has watched.
I can even do that to the array:
base64_encode(gzcompress(serialize($array)
And then:
unserialize(gzuncompress(base64_decode($array))
What do you think I should do?
Is the encoding/decoding functions fast enough, or at least faster than the conventional way i was describing at the beginning of the post?
Is that compression good enough to store large chunks of data into only ONE database field? (imagine if the user has viewed thousands images?)
Thanks a lot
in ONE field, I insert an array all id's
In almost all cases, serializing values like this is bad practice. Let the database do what it's designed to do -- efficiently handle large amounts of data. As long as you ensure that your cross table has an index on the user field, retrieving the list of images that a user has seen will not be an expensive operation, regardless of the number of rows in the table. Tens of thousands of entries is nothing.
You should create a new table UserImageViews with columns user_id and image_id (additionally, you could add more information on the view, such as Date/Time, IP and Browser).
That will make queries like "What images the user has (not) seen" much faster.
You should use a table. Serializing data into a single field in a database is a bad practice, as the DBMS has no clue what that data represents and cannot be used in ANY queries. For example, if you wanted to see which users had viewed an image, you wouldn't be able to in SQL alone.
Tens of thousands of entries isn't much, BTW. The main application we develop has multiple tables with hundreds of thousands of records, and we're not that big. Some web applications have tables with millions of rows. Don't worry about having "too much data" unless it starts becoming a problem - the solutions for that problem will be complex and might even slow down your queries until you get to that amount of data.
EDIT: Oh yeah, and joins against those 100k+ tables happen in under a second. Just some perspective for ya...
I don't really think that tens of thousands of rows will be a problem for a database lookup. I will recommend using the first approach over the second.
I want to choose randomly one image out of this pool, and display it
to the user, but I want to make sure that this user has never seen
this image before.
For what it's worth, that's not a random algorithm; that's a shuffle algorithm. (Knowing that will make it easier to Google when you need more details about it.) But that's not your biggest problem.
So i was thinking that: when a user views an image, I make a row
INSERT in MYSQL that would say "This USER has watched THIS IMAGE at
(TIME)" for every entry.
Good thought. Using a table that stores the fact that a user has seen a specific image makes sense in your case. Unless I've missed something, you don't need to store the time. (And you probably shouldn't. It doesn't seem to serve any useful business purpose.) Something along these lines should work well.
-- Predicate: User identified by [user_id] has seen image identified by
-- [image_filename] at least once.
create table images_seen (
user_id integer not null references users (user_id),
image_filename not null references images (image_filename),
primary key (user_id, image_filename)
);
Test that and look at the output of EXPLAIN. If you need a secondary index on image_filename . . .
create index images_seen_img_filename on images_seen (image_filename);
This still isn't your biggest problem.
The biggest problem is that you didn't test this yourself. If you know any scripting language, you should be able to generate 10,000 rows for testing in a matter of a couple of minutes. If you'd done that, you'd find that a table like that will perform well even with several million rows.
I sometimes generate millions of rows to test my ideas before I answer a question on StackOverlow.
Learning to generate large amounts of random(ish) data for testing is a fundamental skill for database and application developers.
Im programming my own simple blog with comments in php and mysql. I have one database, and I have one table for posts, (called posts) and for each post I make I manually create a new table called comments1,comments2,comments3, etc. Each blog post has an id and when I retrieve the comments for a post I use a query like:
SELECT * FROM `comments".$id."`"
When I add user text to a comment I use:
htmlspecialchars(mysql_real_escape_string($_POST['name']));
Is this structure ok or is there a better way I'm missing?
Also, would creating a different database for comments be a better practice than grouping it with the posts database? Does having 2 dbconnect functions in one file slow down performance by a lot? And one last worry: how do i make absolutely sure that my php files aren't served to the user as plaintext, because then they could see db login info and such.
Thanks for the help.
No, this is TERRIBLE structure.
Make one comments table and store all comments in there, along with corresponding post_id.
Add an index on post_id and you'll be able to quickly get all comments for a given post.
I am working on a small project where a client would like to have a custom comment system to be shared within the internal network of there company. The logic is something like Google+, Facebook (other?) Where a user will make a Post and have the ability to choose people to share it with where the default (none) will go to everyone in that persons list.
My question is what is the best way to build up a table to store posts where it could have all or select people as the able viewers of said post. I guess my biggest issue is wrapping my head around the logic of it at the moment. Do I have multiple rows per post each with an id of the user(s) able to see said post, should I have a column on a single row for the post where I store an array or object of people able to view the post, I am open to suggestions. I haven't started working on it as of yet. So I am ultimately looking for advice on a good way to build the table that would support sound query logic, that won't cost me over head on either multiple queries or multiple rows I don't need. Don't want to begin without figuring something out as I don't want to box myself into something that will be harder to back out of in the long run.
What you are proposing is a one-to-many relationship. There is a ton of information about db relationships on the internet. Each Post could have Many people that would be allowed to use it. So you would have a posts table and a users table and a users_post table. The users post table would contain a post_id and a user_id. You would then have to check if the user could view the post through this relationship.
You could also put the users in groups, which would simplify this.
You should never store multiple values in an array in one column of the db.