Looking for recommended methods for storing/cacheing counts [closed]

Looking for recommended methods for storing/cacheing counts [closed] - php

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 8 years ago.
Improve this question
I'm building a website using php/mysql where there will be Posts and Comments.
Posts need to show number of comments they have. I have count_comments column in Posts table and update it every time comment is created or deleted.
Someone recently advised me that denormalazing this way is a bad idea and I should be using caching instead.

My take is: You are doing the right thing. Here is why:
See the field count_comments as not being part of your data model - this is easily provable, you can delete all contents of this field and it is trivial to recreate it.
Instead see it as a cache, the storage of which is just co-located with the post - perfectly smart, as you get it for free whenever you have to query for the post(s)

I do not think this is a bad approach.
One thing i do recognize is that its very easy to introduce side effects as code base is expanded by having a more rigid approach. The nice part is at some point the amount of rows in the database will have to be calculated or kept track of, there is not really a way of getting out of this.
I would not advise against this. There are other solutions to getting comment counts. Check out Which is fastest? SELECT SQL_CALC_FOUND_ROWS FROM `table`, or SELECT COUNT(*)
The solution is slower upon selects, but requires less code to keep track of comment count.
I will say that your approach avoids LIMIT DE-optimization, which is a plus.

This is an optimization that is almost never needed for two reasons:
1) Proper indexing will make simple counts extremely fast. Ensure that your comments.post_id column has an index.
2) By the time you need to cache this value, you will need to cache much more. If your site has so many posts, comments, users and traffic that you need to cache the comments total, then you will almost definitely need to be employing caching strategies for much of your data/output (saving built pages to static, memcache, etc.). Those strategies will, no doubt, encompass your comments total, making the table field approach moot.

I have no idea what was meant by "Caching" and I'll be interested in some other answer that the one I have to offer:
Remove redundant information from your database is important and, in a "Believer way" (means that I didn't really test it, its merely speculative), I think that using SUM() function from your database is a better way to go for it.
Assuming that all your comments has a post_id, all you need is something like:
SELECT SUM(id) FROM comments WHERE id = {post_id_variation_here}
That way, you reduce 1 constant CRUD happening just to read how much comments there are and increase performance.

Unless you haven't hundreds or thousands of hits per seconds on your application there's nothing wrong about using a SQL statement like this:
select posts_field1, ..., (select count(*) from comments where comments_parent = posts_id) as commentNumber from posts
you can go with caching the html output of your page anyway. than no database query has to be done at all.

Maby you could connect the post and comment tables to each other and count the comments rows in mysql with the mysql function: mysql_num_rows. Like so:
Post table
postid*
postcontent
Comment table
commentid
postid*
comment
And then count the comments in mysql like:
$link = mysql_connect("localhost", "mysql_user", "mysql_password");
mysql_select_db("database", $link);
$result = mysql_query("SELECT * FROM commenttable WHERE postid = '1'", $link);
$num_rows = mysql_num_rows($result);

Related

Data fetching performance improve from mysql database

For example, i have a table "tbl_book" with 100 records or more with multiple column like book_name, book_publisher,book_author,book_rate in mysql "db_bookshop". Now i would like to fetch them all by one query without iterate 100 times instead of one or two time looping. Is it possible? Is there any tricky way to do that. Generally we do what
$result = mysql_query("SELECT desire_column_name FROM table_name WHERE clause");
while( $row = mysql_fetch_array($result) ) {
$row['book_name'];
$row['book_publisher'];
$row['book_author'];
..........
$row['book_rate'];
}
// Or we may can use mysqli_query(); -mysqli_fetch_row(), mysqli_fetch_array(), mysqli_fetch_assoc();
My question is, is there any idea or any tricky way that we can be
avoided 1oo times iterate for fetching 1oo records? It's may be wired
to someone but one of the most experience programmer told me that it's
possible. But unfortunately i was not able to learn it from him. I
feel sorry for him because he is not anymore. Advance thanks for your idea sharing.

You should not use mysql_query the mysql extension is deprecated:
This extension is deprecated as of PHP 5.5.0, and has been removed as of PHP 7.0.0.
-- https://secure.php.net/manual/en/intro.mysql.php
When you use PDO you can fetch all items without looping over query like this
$connection = new PDO('mysql:host=localhost;dbname=testdb', 'dbuser', 'dbpass');
$statement = $connection->query('SELECT ...');
$rows = $statement->fetchAll();

The short answer - NO, it's impossible to fetch more than one record from a database without a loop.
But the the question here is that you don't want it.
There is no point in "just fetching" the data - you're always going to do something with it. With each row. Obviously, a loop is a natural way to do something with each row. Therefore, there is no point in trying to avoid a loop.
Which renders your question rather meaningless.
Regarding performance. The truth is that you experience not a single performance problem related to fetching just 100 records from a database. Which renters your problem an imaginary one.
The only plausible question I can think off your post is your performance as a programmer, as lack of education makes you write a lot of unnecessary code. If you manage to ask a certain question regarding that matter, you'll be shown a way to avoid the useless repetitive typing.

Have you tried using mysql_fetch_assoc?
$result = mysql_query("SELECT desire_column_name FROM table_name WHERE clause");
while ($row = mysql_fetch_assoc($result)) {
// do stuff here like..
if (!empty($row['some_field'])){
echo $row["some_field"];
}
}

It is possible to read all 100 records without loop by hardcoding the main column values, but that would involve 100 x number of columns to be listed, and there could be limitation on the number of columns you can display in MySQL.
eg,
select
case when book_name='abc' then book_name end Name,
case when book_name='abc' then book_publisher end as Publisher,
case when book_name='abc' then book_author end as Author,
case when book_name='xyz' then book_name end Name,
case when book_name='xyz' then book_publisher end as Publisher,
case when book_name='xyz' then book_author end as Author,
...
...
from
db_bookshop;
It's not practical but if you have less rows to query you might find it useful.

The time taken to ask the MySQL server for something is far greater than one iteration through a client-side WHILE loop. So, to improve performance, the goal is to have the SELECT go to the server in one round trip. Different API calls do this or don't do this; read their details.
I have written a lot of UIs with MySQL under the covers. I think nothing of fetching a few dozen rows at once, and then build a <table> (or something) with the results. I rarely fetch more than 100, not because of performance, but because 100 is (usually) too much for the user to take in on a single web page.
Also, I think nothing of issuing several, maybe dozens, of queries in support of a single web page. The delay is insignificant, especially when compared to the user's time for reading, digesting, and moving to the next page. So, I try to give the user a digestible amount of info without having to click to another page to get more. There are tradeoffs.
When it is practical to have SQL do the 'digesting', do so. It is faster for MySQL do do a SUM() and return just the total, rather than return dozens of rows for the client to add up. This is mostly a 'bandwidth' issue. Either way, MySQL will fetch (internally) all the needed rows.

Paginate via PHP vs MySQL best practice? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
When dealing with records of at least 1 million rows, in terms of performance, is it better to:
Select the whole record e.g., SELECT * FROM tbl then paginate the result using array_chunk() or array_slice()
or
Select only part of the record e.g., SELECT * FROM tbl LIMIT x per page?

i think it depends, you can stock the whole response in the memory using memcache if your table is not too big and it will avoid HDD request which is more time consuming, but as you dont know if your user will look for lot of pages, it would be better to limit it with SQL.

It depends.
Does data change often in this table?
Yes -> you need query DB.
Is database big and changes often?
Then use some kind of search engine like Elasticsearch and don't query DB just populate search engine
Is database small but queries take long time?
Use some kind of cache like redis/memcache
It really depends on your needs.

The best method will depend on your context. If you choose to use the database directly, beware of this issue:
The naive LIMIT method will give you problems when you get into later pages. ORDER BY some_key LIMIT offset,page_size works like this - go through the key, through away the first offset records, then return page_size records. So offset + page_size records examined, if offset is high you have a problem.
Better - remember the last key value of the current page. When fetching next page use it like this:
SELECT * FROM tbl WHERE the_key > $last_key ORDER BY the_key ASC LIMIT $page_size
If your key is not unique, make it unique by adding an extra unique ID column at the end.

It REALLY depends on context.
In general you want to make heavy use of indexes to select the content that you want out of a large dataset with fast results. It's also faster to paginate through the programming language than to use the database. The database is often times the bottleneck. We had to do it this way for an application that had 100's of queries a minute. Hits to the database needed to be capped so we needed to return datasets that we knew may not need another query to the DB, around 100 results, and then paginate by 25 in the application.
In general, index and narrow your results with these indexes and if performance is key with lots of activity on the db, tune your db and your code to decrease I/O and DB hits by paginating in the application. You'll know why when your server is bleeding with a load of 12 and your I/O is showing 20 utilization. You'll need to hit the operating table stat!

It is better to use LIMIT. think about it.. The first one will get all even if you have 1000000 rows. vs limit which will only get your set number each time.
You will then want to make sure you have your offsets set correctly to get the next set of items from the table.

How to Handle a great number of rows with SQL Queries and take only small amount of data efficiently? [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 7 years ago.
Improve this question
I'm coding a site in PHP, and site will contain really much messages(like 100.000 , 200.000 or more) which users will post on the site. The problem is, messages will be stored on a table called 'site_messages' by it's ID. This means, all messages aren't grouped by their poster, it's grouped by their ID. If I want to fetch the messages that are posted by user 'foo', I have to query a lot of rows, and it will get really slow I think. Or I want to fetch the messages by post subject(yes, it will contain post subject column too, and maybe more column to add), I must query all the table again, and unfortunately, it will be less efficient. Is there any speedy solutions about that? I'm using PHP and MySQL(and PHPMyAdmin).
Edit: For example, my table would look like this:
MessageID: 1
MessageContent(Varchar, this is the message that user posts): Hi I like this site. Bye!
MessagePoster(Varchar): crazyuser
MessagePostDate: 12/12/09
MessagePostedIn(Varchar, this is the post subject): How to make a pizza
MessageID: 2
MessageContent(Varchar): This site reallllly sucks.
MessagePoster(Varchar): top_lel
MessagePostDate: 12/12/09
MessagePostedIn(Varchar): Hello, I have a question!
MessageID: 3
MessageContent(Varchar): Who is the admin of this site?
MessagePoster(Varchar): creepy2000
MessagePostDate: 1/13/10
MessagePostedIn(Varchar): This site is boring.
etc...

This is what DBs (especially relationship DBs) were built for! MySql and other DBs use things like indexes to help you get access to the rows you need in the most efficient way. You will be able to write queries like select * from site_messages where subject like "News%" order by entryDateTime desc limit 10 to find the latest ten messages starting with "News", or select * from site_messages, user where user.userid='foo' and site_messages.fk_user=user.id to find all posts for a certain user, and you'll find it performs pretty well. For these, you'd probably have (amongst others) an index for the subject column, and an index on the fk_user column.
Work on having a good table structure (data model). Of course if you have issues you can research DB performance and the topic of explain plans to help.
Yes, for each set of columns you want, you will query the table again. Think of a query as a set of rows. Avoid sending large numbers of rows over connections. As the other commenters have suggested, we can't help much more without more details about your tables.

Two candidates for indexing that jump right out are (Poster, PostDate) and (PostDate, Poster) to help queries in the form:
select ...
from ...
where Poster = #PID and PostDate > #Yesterday;
and
select Poster, count(*) as Postings, ...
from ...
where PostDate > #Yesterday
group by Poster;
and
select Poster, ...
from ...
where PostDate between #DayBeforeYesterday and #Yesterday;
Just keep in mind that indexing improves queries at the expense of the DML operations (insert, update, delete). If the query/DML ratio is very low, you just may want to live with the slower queries.

Have a column "num_posts" or querying the database

Let's say I have a forum. A small forum with maybe 100 visitors a day.
Would the best way be to store the number of posts a topic has by just creating a column num_posts and each time a user makes a post in that topic I just increase that number by one. And the other way when a user deletes a post. Or just make a query?
SELECT COUNT(*)
FROM posts
WHERE topic_id = thetopicid
I prefer the second. But off course I guess it affect performance. But how much? Is this bad practice?

Use count(*). Having that extra column requires you to maintain it yourself, i.e. update on new and deleted posts. You need to add something extra to do this which definitely requires extra resource, whereas using count(*) you are using something already built in to DMBS.

Choosing data pseudo-randomly with even distribution

I'm currently working on a medium-sized web project, and I've ran into a problem.
What I want to do is display a question, together with an image. I have a (global) list of questions, and a (global) list of images, all questions should be asked for all images.
As far as the user can see the question and image should be chosen at random. However the statistics from the answers (question/image-pair) will be used for research purposes. This means that all the question/image-pair must be chosen such that the answers will be distributed evenly across all question, and across all images.
A user should only be able to answer a specific question/image-pair one time.
I am using a mysql database and php. Currently, i have three database tables:
tbl_images (image_id)
tbl_questions (question_id)
tbl_answers (answer_id, image_id, question_id, user_id)
The other columns are not related to this specific problem.
Solution 1:
Track how many times each image/question has been used (add a column in each table). Always choose the image and question that has been asked the least.
Problem:
What I'm actually interested in is distribution among questions for an image and vice versa, not that each question is even globally.
Solution 2:
Add another table, containing all question/image-pairs along with how many times it has been asked. Choose the lowest combination (first row if count column is sorted by ascending order).
Problem:
Does not enforce that the user can only answer a question once. Also does not give the appearance that the choice is random to the user.
Solution 3:
Same as #2, but store question/image/user_id in table.
Problem:
Performance issues (?), a lot of space wasted for each user. There will probably be semi-large amounts of data (thousands of questions/images and atleast hundreds of users).
Solution 4:
Choose a question and image at true random from all available. With a large enough amount of answers they will be distributed evenly.
Problem:
If i add a new question or image they will not get more answers than the others and therefore never catch up. I want an even amount of statistics for all question/image-pairs.
Solution 5:
Weighted random. Choose a number of question/image pairs (say about 10-100) at true random and pick the best (as in, lowest global count) of these that the user has not answered.
Problem:
Does not guarantee that a recently added question or image gets a lot of answers quickly.
Solution #5 is probably the best once I've come up with so far.
Your input is very much appreciated, thank you for your time.

From what I understand of your problem, I would go with #1. However, you do not need a new column. I would create an SQL View instead becuase it sounds like you'll need to report on things like that anyway. A view is basically a cached select, but acts similar to a table. Thus you would create a view for keeping the total of each question answered for each image:
DROP VIEW IF EXISTS "main"."view_image_question_count";
CREATE VIEW "view_image_question_count" AS
SELECT a.image_id, a.question_id, SUM(b.question_id) as "total"
FROM answer AS a
INNER JOIN answer AS b ON a.question_id = b.question_id
GROUP BY a.image_id, a.question_id;
Then, you need a quick and easy way to get the next best image/question combo to ask:
DROP VIEW IF EXISTS "main"."view_next_best_question";
CREATE VIEW "view_next_best_question" AS
SELECT a.*, user_id
FROM view_image_question_count a
JOIN answer USING( image_id, question_id )
JOIN question USING(question_id)
JOIN image USING(image_id)
ORDER BY total ASC;
Now, if you need to report on your image to question performace, you can do so by:
SELECT * FROM view_image_question_count
If you need the next best image+question to ask for a user, you would call:
SELECT * FROM view_next_best_question WHERE user_id != {USERID} LIMIT 1
The != {USERID} part is to prevent getting a question the user has already answered. The LIMIT optimizes to only get one.
Disclaimer: There is probably a lot that could be done to optimize this. I just wanted to post something for thought.
Also, here is the database dump I used for testing. http://pastebin.com/yutyV2GU

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.