i dealing with a case that i allow every topic to have only 100 lasts comments
in case a topic have already 100 comments , and a new comment i coming i want to delete the very first comment in the 100 comments chain and add the new one , here is an example :
1,2,3 .... 99 , 100
2,3,4 ....100 , 101
as you can see the very first comment that was far behind the others got deleted and the new one got in the 100 comments chain
here starts the problem , if i have in my case forum , and this forum have hundreds thousands of topics , it can reach millions of comments , which means if i check how many comments there are with every new incoming comment it will cause slowing the site with every comment adding , how can i minimize the data base query's ? is there any system / known ways to facing that kind of things?
Why would you delete old comments? If you want to show last 100 comments, then just SELECT id, thread_id, user_id, comment_body from COMMENTS where thread_id = #thread_id LIMIT 100. Also be sure to have indexing on foreign column so that it gets queried fast and query only columns you need.
And no, if you are going to be wise with queries, apply indexing where needed then you don't need to worry about each comment slowing down the database. Will you have millions of millions of comments? If so, you can think about partitioning the database say every 1000000* threads based on thread_id.
You may be interested in this question: Database architecture for millions of new rows per day
* For anyone who reads: Don't assume this number as some advice or suggestion out of experience. Never say: "someone on SO mentioned this number x, so..." I have no experience or benchmarks to say that it would be good to do it at this number. I'm only creating my first partition myself. Evaluate yourself what is good for you.
What's your database structure for this stuff? Presumably you have a "posts" table, and a "comments" table, which links back to the posts table. Assuming decent design, the comments will have an auto_increment primary key, so your logic would be:
1. insert new comment
2. count the comments attached to the post. If it's > 100, then
3. find min(comments.id) and delete that ID.
Related
I'm using PHP 7, MySQL and a small custom-built forum and a query for grabbing 7 columns with 2 SQL join statements into a "latest post" page. When the time comes that I hit 1 million rows will the limit 30 stop at 30 rows or will it have to sort the entire DB each run?
The reason I'm asking is I'm trying to wrap my head around how to paginate this custom forum I've built and if that pagination will be "ok" once it has to (theoretically) read through a million rows?
EDIT: My current query is a limit 30, sort desc.
EDIT2: Currently I'm getting about 500-600 posts give or take 50 a day. It's quickly adding up so I'm trying to monitor this before I get 1 million. That being said I'm only looking up one table right now, tblTopics and topic_id, topic_name, and topic_author (a fk). Then I'm doing another another lookup after that with the topic itself's foreign keys, topic_rating, and topic_category. The original lookup is where I have the sort and limit.
Sort is applied on the complete set, limit is applied after the sort, so adding a limit to an ORDER BY query does not make it a lot faster.
It depends.
SELECT ... FROM tbl ORDER BY x LIMIT 30;
INDEX(x)
will probably use the index and stop after 30 rows, not 1 million.
SELECT ... FROM tbl GROUP BY zz ORDER BY x LIMIT 30;
will scan all million rows, do the grouping, write to a tmp table, sort that tmp table, and only then deliver 30 rows.
SELECT ... FROM tbl WHERE yy = 123 ORDER BY x LIMIT 30;
INDEX(yy)
will probably prefer INDEX(yy), and it is hard to say how efficient it will be.
SELECT ... FROM tbl WHERE yy = 123 ORDER BY x LIMIT 30;
INDEX(yy, x)
will be very efficient -- not only can it use the index for filtering, but also for the ORDER BY and the LIMIT. Only 30 rows will be touched.
SELECT ... FROM tbl LIMIT 30;
is of dubious use. You will get some 30 rows, but who knows which 30? But it will be fast.
Well, this is still not answering you question. Your question involves a JOIN. Can you guess how much more complex the question becomes with JOIN involved?
If you would like to discuss your specific query, please provide the query and SHOW CREATE TABLE for each table and how many rows in each table.
If you are joining a 1-row table to a million row table, the 1-row table probably does not add any complexity.
If you are joining two million-row tables together without any indexes, then you are looking at a trillion intermediate 'rows' to work with!
Oh, and then you will want the 'second' 30 rows? That adds another dimension of complexity. I could spend a few more paragraphs on what can go wrong with OFFSET.
If this forum is somewhat open-ended where anyone can post "topics" and be the originating author, you probably want at a minimum a topics table with a PKID, Name, Author as you have, but also date added and most recent post and also count of posts against it. Too many times people build web sites that want counters all over the place and try to do aggregates, or the most recent, etc. Come to mention the most recent post, hold the ID of the most recent post too so you don't have to find the max date, then get the join base on that.
Then secondary table would be the details associated for a given post.
Then, via a trigger on your detail table for whatever you are posting against, you can do an update to the parent topic id and stamp it with count +1, most recent date of now, and the last ID with the ID of the newest record just created.
So now, joining to get that most recent context entry is a simple join and not overly complex.
Index on your topics table on the most recent post date so you are now getting ex: the most recent 30 topics, not necessarily the most recent 30 posts, such as 3 posts have a bunch of hits and account for all 30. Get 30 distinct topics, then let user see the details as they select the topic of interest. Your query at the top level is never going against the underlying details.
Obviously brief on true context of your website, but hopefully suggestions make sense for you to run with.
I need to display the number of comments a user has post. I can think about two different ways of doing it, and I would like to know which one is better.
METHOD ONE: Each time I need to display the number of comments, query the comments table to select all comments with user_id x, and count the number of results.
METHOD TWO: Add a new column to the user table to store the number of comments a particular user has post. This value will be updated each time the user enters a new comment. This way every time I need to show the number of comments, I just need to query this value in the datbase.
I think the second method is more efficient, but I would like to know other opinions.
Any comment will be appreciated.
Thanks in advance,
Sonia
Well it depends. I suppose you use SQL. Counting is pretty fast of you have correct indexes (eg. SELECT COUNT(1) FROM articles WHERE user_id = ?). If this would be bottleneck than I would consider caching of these results.
At scale, option #2 is the only one that is viable. Counts may eventually be skewed some and you may need to rebuild the stats but this is a relatively low cost compared to trying to count the number of rows matching a secondary index.
Hello again Stackoverflow!
I'm currently working on custom forumsoftware and one of the things you like to see on a forum is a viewcounter.
All the approaches for a viewcounter that I found would just select the topic from the database, retrieve the number from a "views" column, add one and update it.
But here's my thought: If, lets say 400, people at the exact same time open a topic, the MySQL database probably won´t count all views because it takes time for the queries to complete, and so the last person (of the 400) might overwrites the first persons (of the 400) view.
Ofcourse one could argue that on a normal site this is never going to happen, but if you have ~7 people opening that topic at the exact same second and the server is struggleing at that moment, you could have the same problem.
Is there any other good approach to count views?
EDIT
Woah, could the one who voted down specify why?
I ment by "Retrieving the number of views and adding one" that I would use SELECT to retrieve the number, add one using PHP (note the tags) and updating it using UPDATE. I had no idea of the other methods specified below, that's why I asked.
If, lets say 400, people at the exact same time open a topic, the MySQL database apparently would count all the views because this is exactly what databases were invented for.
All the approaches for a viewcounter that you have found are wrong. To update a field you don't need to retrieve it, but just already update:
UPDATE forum SET views + 1 WHERE id = ?
So something like that will work:
UPDATE tbl SET cnt = cnt+1 WHERE ...
UPDATE is guaranteed to be atomic. That means no one will be able to alter cnt between the time it is read and the time it is replaced. If you have several concurrent UPDATE for the same row (InnoDB) or table (MyISAM) they have to wait their turn to update the date.
See Is incrementing a field in MySQL atomic?
and http://dev.mysql.com/doc/refman/5.1/en/ansi-diff-transactions.html
I have recently written a survey application that has done it's job and all the data is gathered. Now i have to analyze the data and i'm having some time issues.
I have to find out how many people selected what option and display it all.
I'm using this query, which does do it's job:
SELECT COUNT(*)
FROM survey
WHERE users = ? AND table = ? AND col = ? AND row = ? AND selected = ?
GROUP BY users,table,col,row,selected
As evident by the "?" i'm using MySQLi (in php) to fetch the data when needed, but i fear this is causing it to be so slow.
The table consists of all the elements above (+ an unique ID) and all of them are integers.
To explain some of the fields:
Each survey was divided into 3 or 4 tables (sized from 2x3 to 5x5) with a 1 to 10 happiness grade to select form. (questions are on the right and top of the table, then you answer where the questions intersect)
users - age groups
table, row, col - explained above
selected - dooooh explained above
Now with the surveys complete and around 1 million entries in the table the query is getting very slow. Sometimes it takes like 3 minutes, sometimes (i guess) the time limit expires and you get no data at all. I also don't have access to the full database, just my empty "testing" one since the costumer is kinda paranoid :S (and his server seems to be a bit slow)
Now (after the initial essay) my questions are: I left indexing out intentionally because with a lot of data being written during the survey, it would be a bad idea. But since no new data is coming in at this point, would it make sense to index all the fields of a table? How much sense does it make to index integers that never go above 10? (as you can guess i haven't got a clue about indexes). Do i need the primary unique ID in this table? I
I read somewhere that indexing may help groups but only if you group by the first columns in a table (and since my ID is first and from my point of view useless can i remove it and gain anything by it?)
Is there another way to write my query that would basically do the same thing but in a shorter period of time?
Thanks for all your suggestions in advance!
Add an index on entries that you "GROUP BY" or do "WHERE". So that's ONE index incorporating users,table,col,row and selected in your case.
Some quick rules:
combine fields to have the WHERE first, and the GROUP BY elements last.
If you have other queries that only use part of it (e.g. users,table,col and selected) then leave the missing value (row, in this example) last.
Don't use too many indexes/indeces, as each will slow the table to updates marginally - so on really large system you need to balance queries with indexes.
Edit: do you need the GROUP BY user,col,row as these are used in the WHERE. If the WHERE has already filtered them out, you only need group by "selected".
I'm trying to write a commenting system, where people can comment on other comments, and these are displayed as recursive threads on the page. (Reddit's Commenting system is an example of what I'm trying to achieve), however I am confused on how to implement such a system that would not be very slow and computationally expensive.
I imagine that each comment would be stored in a comment table, and contain a parent_id, which would be a foreign key to another comment. My problem comes with how to get all of this data without a ton of queries, and then how to efficiently organize the comments into the order belong in. Does anyone have any ideas on how to best implement this?
Try using a nested set model. It is described in Managing Hierarchical Data in MySQL.
The big benefit is that you don't have to use recursion to retrieve child nodes, and the queries are pretty straightforward. The downside is that inserting and deleting takes a little more work.
It also scales really well. I know of one extremely huge system which stores discussion hierarchies using this method.
Here's another site providing information on that method + some source code.
It's just a suggestion, but since I'm facing the same problem right now,
How about add a sequence field (int), and a depth field in the comments table,
and update it as new comments are inserted.
The sequence field would serve the purpose of ordering the comments.
And the depth field would indicates the recursion level of the comment.
Then the hard part would be do the right updates as users insert new comments.
I don't know yet how hard this is to implement,
but I'm pretty sure once implemented, we will have a performance gain over nested model based
solutions.
I created a small tutorial explaining the basic concepts behind the recursive approach. As people have said above, the recursive function doesn't scale as well, however, inserts are far more efficient.
Here are the links:
http://www.evanpetersen.com/index.php/item/php-and-mysql-recursion.html
and
http://www.evanpetersen.com/index.php/item/php-mysql-revisited.html
I normaly work with a parent - child system.
For example, consider the following:
Table comment(
commentID,
pageID,
userID,
comment
[, parentID]
)
parentID is a foreign key to commentID (from the same table) which is optional (can be NULL).
For selecting comments use this for a 'root' comment:
SELECT * FROM comments WHERE pageID=:pageid AND parentID IS NULL
And this for a child:
SELECT * FROM comments WHERE pageID=:pageid AND parentID=:parentid
I had to implement recursive comments too.
I broke my head with nested model, let me explain why :
Let's say you want comments for an article.
Let's call root comments the comments directly attached to this article.
Let's calls reply comments the comments that are an answer to another comment.
I noticed ( unfortunately ) that I wanted the root comments to be ordered by date desc,
BUT I wanted the reply comments to be ordered date asc !!
Paradoxal !!
So the nested model didn't help me to alleviate the number of queries.
Here is my solution :
Create a comment table with following fields :
id
article_id
parent_id (nullable)
date_creation
email
whateverYouLike
sequence
depth
The 3 key fields of this implementation are parent_id, sequence and depth.
parent_id and depth helps to insert new nodes.
Sequence is the real key field, it's kind of nested model emulation.
Each time you insert a new root comment, it is multiple of x.
I choose x=1000, which basically means that I can have 1000 maximum nested comments (That' s the only drawback I found
for this system, but this limit can easily be modified, it's enough for my needs now).
The most recent root comment has to be the one with the greatest sequence number.
Now reply comments :
we have two cases :
reply for a root comment, or reply for a reply comment.
In both cases the algoritm is the same :
take the parent's sequence, and retrieve one to get your sequence number.
Then you have to update the sequences numbers which are below the parent's sequence and above the base sequence,
which is the sequence of the root comment just below the root comment concerned.
I don't expect you to understand all this since I'm not a very good explainer,
but I hope it may give you new ideas.
( At least it worked for me better than nested model would= less requests which is the real goal ).
I’m taking a simple approach.
Save root id (if it’s comments then post_id)
Save parent_id
Then fetch all comments with post_id and recursively order them on the client.
I don’t care if there’s 1000 comments. This happens in memory.
It’s one database call, and that’s te expensive part.