Store row count in database or get dynamically - php

Say I have a table full of comments, each from different users, and I want to count how many comments each user has. Should I have a separate table with the count, and update that on creation/deletion of comments, or should I query the count every time?
I feel like the latter is better, but I want some more experienced input on the matter. Thanks.

Following the good old YAGNI principle, I would suggest you go with the simplest solution for now, which is just counting the number of comments as needed. This is just pragmatic coding.
If, down the line, you find this is causing even small performance problems, then you should replace it with a cached value using a stored procedure or similar, but chances are it will serve you just fine.
So, I realise this probably isn't the clear answer you want, but: if you're making something small, go with the easy solution (counting); if you're making something bigger, go with the easy solution (counting) then upgrade to the harder solution (storing a value) if you find you need it. If you know what you're making is guaranteed to be big (lucky you!) then fine, go straight for the harder solution.
Note: I've said "harder solution" but as you probably know it's only fractionally harder than the easy solution.

After a commnent is created or deleted you should count right in the hour.
If you count every time you will make some desnecessary querys.

Doing a MYSQL count for each user comments would be the easiest way to go about this as below :
$userid= ? ;
SELECT COUNT(*) FROM comments WHERE userid=userid
Updating the table every-time will result to multiple MYSQL calls each time you need to query the comments.
Alternatively you would create a column for comments count in the user , and add the comments for that specific user each time he/she adds a comment , then just querying the user you will have the comments for that user

Related

How to arrange delete process in project

I have big project where I need to arrange delete of several things like clients, orders, products etc. But theres a point that I need to make an archive for restoring.
So which solution is best.
I have researched and some my ideas were not successful.
1. First was make all rows with status is deleted. But it makes problems in selections and making program work slow.
2. The idea was making separate tables for deleted items but it made problem in orders point as I couldn't manage relations between not deleted orders and deleted clients and products.
Please if theres some ideas how can it be solved ping me.
Well your first idea is kind of the best here. It should'nt make that big kind of problems in selection as you always got the WHERE parameter aviable (though this will take some time and work). Its most easy to restore all of your data then.
And well it shouldnt really slower your programm as databases are build to handle big numbers of data. You should look to optimize your queries first, thats mostly the problem. But like i said your first idea was a good way you were heading
you can add a field in your table call it "visible" that has two values 0 or 1.
then you can use a query like this:
SELECT * FROM tablename WHERE visible=1 // shows all records that has this value, now if you want a record deleted(hidden actually) go to your table and change the field visible of the record to 0, now it won't show on your page but still exist in your table, to show this record again change visible to 1, hope this helps.

Checking if two ids are identical

I've added a feature to a web site that shows what visitors have visited a user profile. The table representing this holds the id of the user profile and the id of user visiting the profile.
Obviously, it's pointless showing that someone has visited their own profile so I modified the PHP code to detect this. In the meantime, a bit of data was written. This isn't a problem because it represents only a handful of users and I can edit the information by hand.
My question is as follows. In the hypothetical case where I'd have to do the same thing for more data, what would be a good approach to finding rows where id1 = id2 and removing them?
DELETE
FROM table
WHERE id1 = id2;
DELETE FROM `profiletracking` WHERE `visitor_id` = `profile_id`;
If you need to delete it, harakiri's query is good but I have a question, why to add a record in the first place? In time your website could grow bigger and things might get complicated.
I would suggest you to not record it to the database in the first place. You just do more actions and queries while there is a shorter way.
<?php
// Get ID of profile owner;
/* do your query here */
if ($_SESSION['id'] != $profileOwner['user_id']) {
// add it to your database
}
?>
I believe such approach is more elegant and useful, considering in the future your web site might grow bigger and you might need to check your codes again.
Please don't forget such things might be headache. This is a fatal mistake for a programmer. In the beginning, many thinks, ok for now this do the trick, why to bother coding more? In time you will add more and more codes, later you might lose yourself in it. It will be too late once your visitors / customer will start to complain about slow opening pages, eventually bad coding.

Use count or have a field that tallies?

Fairly simple concept, making an extremely basic message board system and I want users to have a post count. Now I was debating on whether or not to have a tally in their row that is added each time a post by them is created, or subtracted by one each time a post of theirs is deleted. However I'm sure that performing a count query when the post count is requested would be more accurate due to unforseen circumstances (say a thread gets deleted and it doesn't lower their tally properly), however this seems like it would be less efficient to run a query EVERY time their post count is loaded, especially in the case of them having 10 posts on the same page and it lists their post count each post.
Thoughts/Advice?
Thanks
post_count should definitely be a column in the user table. the little extra effort to get this right is minimal compared to the additional database load you produce with running a few count query on every thread view.
if you use some sort of orm or database abstraction, it should be quite simple to add the counting to their create / delete filters.
Just go for count each time. Unless your load is going to be astronomical, COUNT shouldn't be a problem, and reduces the amount of effort involved in saving and updating data.
Just make sure you put an index on your user_id column, so that you can filter the data with a WHERE clause efficiently.
If you get to the point where this doesn't do it for you, you can implement caching strategies, but given that it's a simple message board, you shouldn't encounter that problem for a while.
EDIT:
Just saw your second concern about the same query repeating 10 times on a page. Don't do that :) Just pull the data once and store it in a variable. No need to repeat the same query multiple times.
Just use COUNT. It will be more accurate and will avoid any possible missed cases.
The case you mention of displaying the post count multiple times on a page won't be a problem unless you have an extremely high traffic site.
In any other case, the query cache of your database server will execute the query, then keep a cache of the response until any of the tables that the query relies on change. In the course of a single page load, nothing else should change, so you will only be executing the query once.
If you really need to worry about it, you can just cache it yourself in a variable and just execute the query once.
Generally speaking, your database queries will always be extremely efficient compared to your app logic. As such, the time wasted on maintaining the post_count in the user table will most probably be far far less than is needed to run a query to update the user table whenever a comment is posted.
Also, it is usually considered bad DB structure to have a field such as you are describing.
There are arguments for both, so ultimately it depends on the volume of traffic you expect. If your code is solid and properly layered, you can confidently keep a row count in your users' record without worrying about losing accuracy, and over time, count() will potentially get heavy, but updating a row count also adds overhead.
For a small site, it makes next to no difference, so if (and only if) you're a stickler for efficiency, the only way to get a useful answer is to run some benchmarks and find out for yourself. One way or another, it's going to be 3/10ths of 2/8ths of diddley squat, so do whatever feels right :)
It's totally reasonable to store the post counts in a column in your Users table. Then, to ensure that your post counts don't become increasingly inaccurate over time, run a scheduled task (e.g. nightly) to update them based on your Posts table.

Performance question sql

I'm making a forum.
And I'm wondering if i should store the number of replies in the topic table or count the posts of the topic?
How much slower will it be if i use sql and count them? Lets say i have a billion posts.
Will it be much slower? Im not planning on being that big but what if? How much slower would i be compared to stroing the num in topics?
Thanks
It will be slower as your db grows in size. If you are planning on having a large post table, store the value in the topic table
I just ran some tests on a MySQL 4.0 box we have using a table with over 1 million records.
SELECT COUNT(*) FROM MyTable; ~1 million took 22ms
SELECT COUNT(*) FROM MyTable WHERE Role=1; ~800,000 took 3.2s
SELECT COUNT(*) FROM MyTable WHERE Role=2; ~20 took 12ms
The Role column in this case was indexed and this was connecting to the MySQL remotely.
I think your posts table will have to get very large for the query times to really become an issue. I also think it is a pre-optimization to put the cache of the count in your topics table. Build it without it for now and if it becomes a problem its a pretty easy update to change it.
Do not store the value in a table.
Cache the value in the application for some time so the count(*) query wont be executed too often.
Choose cache time depending on the server load: higher for very busy and zero for couple of users.
The count(*) in SqlServer is pretty fast (assuming you have index on the field you are counting on). So you just need to reduce number of hits under the heavy load.
If you will store the value in a table you will have a lot of hassle maintaining it.
This is going to affect scaling and is an issue of normalization. Hardcore normalization nerds will tell you that you shouldn't keep the number of posts on the topic because it causes redundant data. But you need to keep in mind that if you don't store it there you need to do an extra query on every load to fetch the number. The alternative is to do an extra query on every update/insert instead, which will almost always occur much less often than select's. As you scale a site to support a lot of traffic it becomes almost inevitable that you have to eventually start to de-normalize some of your data, especially in cases like this.
Redundant data isn't inherently bad. Poorly managed redundancy is. As long as you have the proper checks in place to prevent the data from getting out of sync then the potential benefit of storing the number of posts on the thread is worth the extra bit of code IMO.
I think a lot of this will depend on how rapidly you're pushing data in. If you store the value in a topic table, then you may find that you're needing to increment (or decrement if you delete records) very frequently too.
Indexes (indices?) may be a nicer option, as you can store a tiny subset of the data, and be able to access richer information. Consider the fact that it can be quite quick to count how many Farleys there are in the phone-book, because I can go straight there and easily count them.
So, as is often the case, the answer is probably 'It depends'.
I like storing counts in the table rather than counting them every time. It's such an easy operation and you never have to think about the expense of showing it when you're retrieving it. With a forum you're going to be displaying it more often than you're going to be changing it anyway so it makes sense to make that as cheap as possible. It might be a bit premature but it might save you some headaches later.

Faster to query in MYSQL or to use PHP logic

I have a page that will pull many headlines from multiple categories based off a category id.
I'm wondering if it makes more sense to pull all the headlines and then sort them out via PHP if/ifelse statements or it is better to run multiple queries that each contain the headlines from each category.
Why not do it in one query? Something like:
SELECT headline FROM headlines WHERE category_id IN (1, 2, 3, ...);
If you filter your headlines in PHP, think how many you'll be throwing away. If you end up with removing just 10% of the headlines, it won't matter as much as when you'd be throwing away 90% of the results.
These kinds of questions are always hard to answer because the situation determines the best course. There is never a truly correct answer, only better ways. In my experience doesn't really matter whether you attempt to do the work in PHP or in the database because you should always try to cache the results of any expensive operation using a caching engine such as memcached. That way you are not going to spend a lot of time in the db or in php itself since the results will be cached and ready instantaneously for use. When it comes down to it, unlss you profile your application using a tool like xDebug, what you think are your performance bottlenecks are just guesses.
It's usually better not to overload the DB, because you might cause a bottleneck if you have many simultaneous queries.
However, handling your processing in PHP is usually better, as Apache will fork threads as it needs to handle multiple requests.
As usual, it all comes down to: "How much traffic is there?"
MySQL can already do the selecting and ordering of the data for you. I suggest to be lazy and use this.
Also I'd look for a (1) query that fetches all the categories and their headlines at once. Would an ORDER BY category, publishdate or something do?
Every trip to the database costs you something. Returning extra data that you then decide to ignore costs you something. So you're almost certainly better to let the database do your pruning.
I'm sure one could come up with some case where deciding what data you need makes the query hugely complex and thus difficult for the database to optimize, while you could do it in your code easily. But if we're talking about "select headline from story where category='Sports'" followed by "select headline from story where category='Politics'" then "select headline from story where category='Health'" etc, versus "select category, headline from story where category in ('Health','Sports','Politics')", the latter is clearly better.
On the topic of "Faster to query in MYSQL or to use PHP logic", which is how I ended up on this question 10 years later. I have determined that the correct answer is "it depends". There are just too many examples where using the DB saves processing time over writing PHP Code.... but there are just as many examples where writing PHP Code saves time on excessively complex MySQL queries.
There is no right answer here. If you end up here, like I did, then the best I can suggest is try to solve your problem with the skills that you have. Start first with the Query and try to solve it, if you run into issues then start thinking about just gathering the data and running the logic through PHP code to come up with a solution.
At the end of the day, you need to solve a problem.... if you solve it, but its not fast enough, then thats another problem... work on optimizing which may end up meaning that you go back to writing more MySQL logic.
Use the 80/20 rule and try to get things 80% of the way there as quickly as possible. You can go back and optimize once its workable. Spending all your effort on making it perfect the first time will surely mean you miss your deadline.
Thats my $0.02

Categories