PHP Mysqli - SELECT * or SELECT COUNT(*) when result is 0? - php

Is there any difference in performance between SELECT * and SELECT COUNT(*) when no rows will be found?
My chat script check every second for new messages so it would be good to know if I need count(*)
What would be faster and better for the server:
$result = mysqli_query($con, "blablabla");
if(mysqli_num_rows($result) > 0)
{
}
OR
$result = mysqli_query($con, "blablabla");
$menge = mysqli_fetch_row($result);
$menge = $menge[0];
if($menge > 0)
{
}
When no rows will be found?! I'm asking that for AJAX chat and most of the time there will be no new rows.

With a basic query the time difference would be negligible between the 2 (well within natural variation when executing a query repeatedly). With a complex query there could be a significant difference as likely with just a COUNT you could use a simplified query.
For example if the full query needs a sub query to get the latest details for a grouped set then this could make the full query quite slow. However the count might only be interested in whether there is a row (rather than the details of the latest row) and so not require the sub query. This could make a major difference.
Assuming you had an index on the id and the value you are using is at or close to the max value (ie, you are not expecting to return the full 1m records, rather a couple max from the last few seconds) then I wouldn't expect a significant difference caused by the amount of data returned.
The big constant performance hits are making the connection to MySQL, and the basic processing of a query by the database (ie, processing the request. not performing the query). These provide a fairly constant overhead for any query, and quite a significant one for a simple query which is fast once it is executed. Because of this the overhead from performing 2 queries sometimes (when new rows have been added) is likely to dwarf the negligible saving from just getting a count unless there are new records.

Related

What's faster, db calls or resorting an array?

In a site I maintain I have a need to query the same table (articles) twice, once for each category of article. AFAIT there are basically two ways of doing this (maybe someone can suggest a better, third way?):
Perform the db query twice, meaning the db server has to sort through the entire table twice. After each query, I iterate over the cursor to generate html for a list entry on the page.
Perform the query just once and pull out all the records, then sort them into two separate arrays. After this, I have to iterate over each array separately in order to generate the HTML.
So it's this:
$newsQuery = $mysqli->query("SELECT * FROM articles WHERE type='news' ");
while($newRow = $newsQuery->fetch_assoc()){
// generate article summary in html
}
// repeat for informational articles
vs this:
$query = $mysqli->query("SELECT * FROM articles ");
$news = Array();
$info = Array();
while($row = $query->fetch_assoc()){
if($row['type'] == "news"){
$news[] = $row;
}else{
$info[] = $row;
}
}
// iterate over each array separate to generate article summaries
The recordset is not very large, current <200 and will probably grow to 1000-2000. Is there a significant different in the times between the two approaches, and if so, which one is faster?
(I know this whole thing seems awfully inefficient, but it's a poorly coded site I inherited and have to take care of without a budget for refactoring the whole thing...)
I'm writing in PHP, no framework :( , on a MySql db.
Edit
I just realized I left out one major detail. On a given page in the site, we will display (and thus retrieve from the db) no more than 30 records at once - but here's the catch: 15 info articles, and 15 news articles. On each page we pull the next 15 of each kind.
You know you can sort in the DB right?
SELECT * FROM articles ORDER BY type
EDIT
Due to the change made to the question, I'm updating my answer to address the newly revealed requirement: 15 rows for 'news' and 15 rows for not-'news'.
The gist of the question is the same "which is faster... one query to two separate queries". The gist of the answer remains the same: each database roundtrip incurs overhead (extra time, especially over a network connection to a separate database server), so with all else being equal, reducing the number database roundtrips can improve performance.
The new requirement really doesn't impact that. What the newly revealed requirement really impacts is the actual query to return the specified resultset.
For example:
( SELECT n.*
FROM articles n
WHERE n.type='news'
LIMIT 15
)
UNION ALL
( SELECT o.*
FROM articles o
WHERE NOT (o.type<=>'news')
LIMIT 15
)
Running that statement as a single query is going to require fewer database resources, and be faster than running two separate statements, and retrieving two disparate resultsets.
We weren't provided any indication of what the other values for type can be, so the statement offered here simply addresses two general categories of rows: rows that have type='news', and all other rows that have some other value for type.
That query assumes that type allows for NULL values, and we want to return rows that have a NULL for type. If that's not the case, we can adjust the predicate to be just
WHERE o.type <> 'news'
Or, if there are specific values for type we're interested in, we can specify that in the predicate instead
WHERE o.type IN ('alert','info','weather')
If "paging" is a requirement... "next 15", the typical pattern we see applied, LIMIT 30,15 can be inefficient. But this question isn't asking about improving efficiency of "paging" queries, it's asking whether running a single statement or running two separate statements is faster.
And the answer to that question is still the same.
ORIGINAL ANSWER below
There's overhead for every database roundtrip. In terms of database performance, for small sets (like you describe) you're better off with a single database query.
The downside is that you're fetching all of those rows and materializing an array. (But, that looks like that's the approach you're using in either case.)
Given the choice between the two options you've shown, go with the single query. That's going to be faster.
As far as a different approach, it really depends on what you are doing with those arrays.
You could actually have the database return the rows in a specified sequence, using an ORDER BY clause.
To get all of the 'news' rows first, followed by everything that isn't 'news', you could
ORDER BY type<=>'news' DESC
That's MySQL short hand for the more ANSI standards compliant:
ORDER BY CASE WHEN t.type = 'news' THEN 1 ELSE 0 END DESC
Rather than fetch every single row and store it in an array, you could just fetch from the cursor as you output each row, e.g.
while($row = $query->fetch_assoc()) {
echo "<br>Title: " . htmlspecialchars($row['title']);
echo "<br>byline: " . htmlspecialchars($row['byline']);
echo "<hr>";
}
Best way of dealing with a situation like this is to test this for yourself. Doesn't matter how many records do you have at the moment. You can simulate whatever amount you'd like, that's never a problem. Also, 1000-2000 is really a small set of data.
I somewhat don't understand why you'd have to iterate over all the records twice. You should never retrieve all the records in a query either way, but only a small subset you need to be working with. In a typical site where you manage articles it's usually about 10 records per page MAX. No user will ever go through 2000 articles in a way you'd have to pull all the records at once. Utilize paging and smart querying.
// iterate over each array separate to generate article summaries
Not really what you mean by this, but something tells me this data should be stored in the database as well. I really hope you're not generating article excerpts on the fly for every page hit.
It all sounds to me more like a bad architecture design than anything else...
PS: I believe sorting/ordering/filtering of a database data should be done on the database server, not in the application itself. You may save some traffic by doing a single query, but it won't help much if you transfer too much data at once, that you won't be using anyway.

How's the performance in php with a request sql with LIMIT in terms of RAM and time of execution

I'm writing a PHP script but I'm facing a little question,
What the concrete difference between :
$sql = "SELECT * FROM $table ORDER BY date DESC LIMIT $limit OFFSET $offset";
$result = mysql_query($sql) or die(mysql_error());
and :
$sql = "SELECT * FROM $table ORDER BY date DESC";
$result = mysql_query($sql) or die(mysql_error());
In terms of RAM and time of execution used in the server ?
Edit: Actually I was opting for the first sql request, but for a precise need, I have to recheck a certain amount of data with my PHP script to build another amount of data (smaller) but I cannot do this with sql pure. (I use fetch_array in all the result until I have the amount I want) So I want to know (before I do anything wrong), what solution is faster for the client (sql + php) and what solution is the more safier in terms of load into the server ?
Edit2 :Re-paste the ORDER BY clause
Unfortunately, that's does not change anything in me most common cases.
Excellent response on which cases are improved : Does limiting a query to one record improve performance
And for your case, this article is about offest/limit performances : http://explainextended.com/2009/10/23/mysql-order-by-limit-performance-late-row-lookups/
Well, isn't the answer obvious?
If you limit the result set of a select query, then the amount of data inside that result set is reduced. The result of this, when looking at phps memory usage depends mostly on one central thing:
if you retrieve all of the result set in a single go, for example by using something like fetchAll(), then the whole result set is read into memory.
if you process the result set in a sequential order, then only one element of a result set is read into memory at a given time. Thus, unless you copy that, the memory footprint is limited.
Another thing is performance inside php: obviously the more elements in the result set you process, the more load you generate. So limiting the size of the result set will probably reduce load, unless you have a good way to pick only a certain range of elements to process from the result set.
As a general rule of thumb: always only retrieve the data you really require and want to process.

using count instead of mysql_num_rows?

What is the difference between(performance wise)
$var = mysql_query("select * from tbl where id='something'");
$count = mysql_num_rows($var);
if($count > 1){
do something
}
and
$var = mysql_query("select count(*) from tbl where id='somthing'");
P.S: I know mysql_* are deprecated.
The first version returns the entire result set. This can be a large data volume, if your table is wide. If there is an index on id, it still needs to read the original data to fill in the values.
The second version returns only the count. If there is an index on id, then it will use the index and not need to read in any data from the data pages.
If you only want the count and not the data, the second version is clearer and should perform better.
select * is asking mysql to fetch all data from that table (given the conditions) and give it to you, this is not a very optimizable operation and will result in a lot of data being organised and sent over the socket to PHP.
Since you then do nothing with this data, you have asked mysql to do a whole lot of data processing for nothing.
Instead, just asking mysql to count() the number of rows that fit the conditions will not result in it trying to send you all that data, and will result in a faster query, especially if the id field is indexed.
Overall though, if your php application is still simple, while still being good practice, this might be regarded as a micro-optimization.
I would use the second for 2 reasons :
As you stated, mysql_* are deprecated
if your table is huge, you're putting quite a big amount of data in $var only to count it.
SELECT * FROM tbl_university_master;
2785 row(s) returned
Execution Time : 0.071 sec
Transfer Time : 7.032 sec
Total Time : 8.004 sec
SELECT COUNT(*) FROM tbl_university;
1 row(s) returned
Execution Time : 0.038 sec
Transfer Time : 0 sec
Total Time : 0.039 sec
The first collects all data and counts the number of rows in the resultset, which is performance-intensive. The latter just does a quick count which is way faster.
However, if you need both the data and the count, it's more sensible to execute the first query and use mysql_num_rows (or something similar in PDO) than to execute yet another query to do the counting.
And indeed, mysql_* is to be deprecated. But the same applies when using MySQLi or PDO.
I think using
$var = mysql_query("select count(*) from tbl where id='somthing'");
Is more efficient because you aren't allocating memory based on the number of rows that gets returned from MySQL.
select * from tbl where id='something' selects all the data from table with ID condition.
The COUNT() function returns the number of rows that matches a specified criteria.
For more reading and practice and demonstration please visit =>>> w3schools

Using count() to count results from PDO in PHP?

I'm in the process of converting an old mysql_X site to work with PDO, and so far, so good (I think). I have a question regarding the reliability of counting the results of queries, however.
Currently, I've got everything going like this (I've removed the try / catch error code to make this easier to read):
$stm = $db->prepare("SELECT COUNT(*) FROM table WHERE somevar = '1'");
$stm->execute();
$count = $stm->fetchColumn();
if ($count > 0){
$stm = $db->prepare("SELECT * FROM table WHERE somevar = '1'");
$stm->execute();
$result = $stm->fetchAll();
}
There might be stupid problems with doing it this way, and I invite you to tell me if there are, but my question is really about cutting down on database queries. I've noticed that if I cut the first statement out, run the second by itself, and then use PHP's count() to count the results, I still seem to get a reliable row count, with only one query, like this:
$stm = $db->prepare("SELECT * FROM table WHERE somevar = '1'");
$stm->execute();
$result = $stm->fetchAll();
$count = count($result);
if ($count > 0){
//do whatever
}
Are there any pitfalls to doing it this way instead? Is it reliable? And are there any glaring, stupid mistakes in my PDO here? Thanks for the help!
Doing the count in MySQL is preferable, especially if the count value is the only result you're interested in. Compare your versions to equivalent question "how many chocolate bars does the grocery store have in stock?"
1) count in the db: SELECT count(*) .... Drive to the store, count the chocolate bars, write down the number, drive home, read the number off your slip of paper
2) count in PHP: SELECT * .... Drive to the store. Buy all the chocolate bars. Truck them home. Count them on your living room floor. Write the results on a piece of paper. Throw away the chocolate bars. Read number off the paper.
which one is more efficient/less costly? Not a big deal if your db/table only has a few records. When you start reaching the thousands/millions of records, version 2) is absolutely ludicrious and likely to burn through your bandwidth, blow up your PHP memory limit, and drive your CPU usage into the stratosphere.
That being said, there's no point in running two queries, one to just count how many records you MAY get. Such a system is vulnerable to race conditions. e.g. you do your count and get (say) 1 record. by the time you go to run the second query and fetch that record, some OTHER parallel process has gone and inserted another record, or deleted the one you'd wanted.
In first case you are counting using MYSQL, and in second case you are counting using PHP. Both are essentialy same results.
Your usage of the queries is correct. The only problem will appear when you use LIMIT, because the COUNT(*) and the count($result) will be different.
COUNT(*) will count all the rows that the query would have returned (given that the counting query is the same and not using LIMIT)
count($result) will count just the returned rows, so if you use LIMIT, you will just get the results up to the given limit.
Yes it's reliable in this use case!

Getting total number of records from mysql table - too slow

I have a file that goes thru a large data set and splits out the rows in a paginated manner. The dataset contains about 210k rows, which isn't even that much, it will grow to 3Mil+ in a few weeks, but its already slow.
I have a first query that gets the total number of items in the DB for a particular WHERE clause combination, the most basic one looks like this:
SELECT count(v_id) as num_items FROM versions
WHERE v_status = 1
It takes 0.9 seconds to run.
The 2nd query is a LIMIT query that gets the actual data for that page. This query is really quick. (less than 0.001 s).
SELECT
v_id,
v_title,
v_desc
FROM versions
WHERE v_status = 1
ORDER BY v_dateadded DESC
LIMIT 0, 25
There is an index on v_status, v_dateadded
I use php. I cache the result into memcace, so subsequent requests are really fast, but the first request is laggy. Especially once I throw in a fulltext search in there, it starts taking 2-3 seconds for the 2 queries.
I don't think this is right, but try making it count(*), i think the count(x) has to go through every row and count only the ones that don't have a null value (so it has to go through all the rows)
Given that v_id is a PRIMARY KEY it should not have any nulls, so try count(*) instead...
But i don't think it will help since you have a where clause.
Not sure if this is the same for MySQL, but in MS SQL Server COUNT(*) is almost always faster than COUNT(column). The parser determines the fastest column to count and uses that.
Run an explain plan to see how the optimizer is running your queries.
That'll probably tell you what Andreas Rehm told you: you'll want to add indices that cover your where clauses.
EDIT: For me FOUND_ROWS() was the fastest way of doing this:
SELECT
SQL_CALC_FOUND_ROWS
v_id,
v_title,
v_desc
FROM versions
WHERE v_status = 1
ORDER BY v_dateadded DESC
LIMIT 0, 25;
Then in a secondary query just do:
SELECT FOUND_ROWS();
If you are outputting to PHP you do this:
$totalnumber = mysql_result(mysql_query($secondquery)),0,0);
I was previously trying to the same thing as OP, putting COUNT(column) on the first query but it took about three times longer than even the slowest WHERE and ORDERBY query that I could do (with a LIMIT set). I tried changing to COUNT(*) and it improved a lot. But results in my case were even better using MySQL's FOUND_ROWS();
I am testing in PHP with microtime and repeating the query. In OP's case, if he ran COUNT(*) I think he will save some time, but it is not the fastest way of doing this. I ran some tests on COUNT(*) VS. FOUND_ROWS() and FOUND_ROWS() is quite a bit faster.
Using FOUND_ROWS() was nearly twice as fast in my case.
I first started doing EXPLAIN on the COUNT(*) query. In OP's case you would see that MySQL still checks a total of 210k rows in the first query. It checks every row before even starting the LIMIT query and doesn't seem to get any performance benefit from doing this.
If you run EXPLAIN on the LIMIT query it will probably check less than 100 rows as you have limited the results to 25. But this is still overlap and there will be some cases where you can't afford this or at the least you should still compare performance with FOUND_ROWS().
I thought this might only save time on large LIMIT requests, but when I run EXPLAIN on my LIMIT query it was actually only checking 25 rows to get 15 values. However, there was still a very noticeable difference in query time - on average I got down from .25 to .14 seconds and achieved the same results.

Categories