Get size of max possible result set - php

For my application most of my SQL queries return a specified number of rows. I'd also like to get the maximum possible number of results i.e. how many rows would be returned if I wasn't setting a LIMIT.
Is there a more efficient way to do this (using just SQL?) than returning all the results, getting the size of the result set and then splicing the set to return just the first N rows.

You can use SELECT COUNT(*), but this isn't ideal for large data sets.
A more efficient solution is to use SQL_CALC_FOUND_ROWS and FOUND_ROWS():
http://dev.mysql.com/doc/refman/5.0/en/information-functions.html#function_found-rows
First query:
SELECT SQL_CALC_FOUND_ROWS id, name, etc FROM table LIMIT 10;
Second query:
SELECT FOUND_ROWS();
You'll still need two queries, but you run the main query once, saving resources.

I'd also like to get the maximum possible number of results i.e. how many rows would be returned if I wasn't setting a LIMIT.
Use:
SELECT COUNT(*)
FROM YOUR_TABLE
...to get the number of rows that currently exist in YOUR_TABLE.
Is there a more efficient way to do this (using just SQL?) than returning all the results, getting the size of the result set and then splicing the set to return just the first N rows.
Only fetch the rows/information you need.
Getting everything means a lot of data is going over the wire, that likely won't be used at all. In this situation, data is being cached - which means it can get stale because it isn't fresh from the database.
This sounds like pagination...

To get the specified number of rows you could use select count(*). Is that what you are looking for ?

Related

Adding count(*) to existing mysql query causes unwanted action in php

$results_ts = mysqli_query($connection, "SELECT id, title, description, content
FROM prose WHERE title LIKE '%$search_title%' LIMIT 10");
if(isset ($_GET['search_title'])){
while($title_arr = mysqli_fetch_array($results_ts)){
echo $title_arr[id]; //and so on...
And that works as I wanted.
but when I try to add , count(*) after where content is, than while() loop echoes only one result, even when $title_arr[4](which stands for count) results in many.
Is it possible to do it this sort of way, or should I be running two separate queries?
You shouldn't do that in one query.
Think about the result you're expecting: when you do a count(*) query, what you get back is a result with only one row and one value. If you select multiple lines, then what should it do? Inject the count into each row of the data?
From a logical perspective, those are two different things you're looking for.
EDIT: counting the rows in your result after the fact is probably the best option, but that won't work if you're only looking for the first 10 entries, and there's no reason to SELECT the whole table if you don't need all the data. That's why count(*) is fulfilling such a different role.
COUNT() is an aggregate function so you can't use it like this. But if all you want is the number of rows this query would have returned you can use SQL_CALC_FOUND_ROWS to get the number of rows that would have been returned if thee was no LIMIT clause. You then can get the results by calling SELECT FOUND_ROWS() after your query is run.

PDO::rowCount VS COUNT(*)

i have a query use PDO, count the row first, if row >1 than fetch data
SELECT * WHERE id=:id
$row=$SQL->rowCount();
if($row>0){
while($data=$SQL->fetch(PDO::FETCH_ASSOC)){...
}
}
else{echo "no result";}
or
SELECT COUNT(*), * WHERE id=:id
$data=fetch(POD::FETCH_NUM);
$row=data[0];
if($row>0){
//fetch data
}
else{echo "no result";}
Which will be better performance?
2nd. question, if I have set up index on id
which one is better COUNT(id) or COUNT(*)
1st question:
Using count COUNT(), internally the server(MySQL) will process the request differently.
When doing COUNT(), the server(MySQL) will only allocate memory to store the result of the count.
When using $row=$SQL->rowCount(); the server (Apache/PHP) will process the entire result set, allocate memory for all those results, and put the server in fetching mode, which involves a lot of different details, such as locking.
Take note that PDOStatement::rowCount() returns the number of rows affected by the last statement, not the number of rows returned. If the last SQL statement executed by the associated PDOStatement was a SELECT statement, some databases may return the number of rows returned by that statement. However, this behaviour is not guaranteed for all databases and should not be relied on for portable applications.
On my analysis, if you use COUNT(), the process would be divided to both MySQL and PHP while if you use $row=$SQL->rowCount();, the processing would be more for PHP.
Therefore COUNT() in MySQL is faster.
2nd question:
COUNT(*) is better than COUNT(id).
Explanation:
The count(*) function in mysql is optimized to find the count of values. Using wildcard means it does not fetch every row. It only find the count. So use count(*) wherever possible.
Sources:
PDOStatement::rowCount
MySQL COUNT(*)
As a matter of fact, neither PDO rowCount nor COUNT(*) is ever required here.
if row >1 then fetch data
is a faulty statement.
In a sanely designed web-application (I know it sounds like a joke for PHP) one don't have to make it this way.
Most sensible way would be
to fetch first
to use the fetched data
if needed, we can use this very fetched data to see whether anything was returned:
$data = $stmt->fetch();
if($data){
//use it whatever
} else {
echo "No record";
}
Easy, straightforward, and no questions like "which useless feature is better" at all.
In your case, assuming id is an unique index, only one row can be returned. Therefore, you don't need while statement at all. Just use the snippet above either to fetch and to tell whether enythin has been fetched.
In case many rows are expected, then just change fetch() to fetchAll() and then use foreach to iterate the returned array:
$data = $stmt->fetchAll();
if($data){
foreach ($data as $row) {
//use it whatever
}
} else {
echo "No records";
}
Note that you should never select more rows than needed. Means your query on a regular web page should never return more rows than will be displayed.
Speaking of the question itself - it makes no sense. One cannot compare rowCount VS COUNT(*), because it's incomparable matters. These two functions has absolutely different purpose and cannot be interchanged:
COUNT(*) returns one single row with count, and have to be used ONLY if one need the count of records, but no records themselves.
if you need the records, count(whatever) is not faster nor slower - it's pointless.
rowCount() returns the number of already selected rows and therefore you scarcely need it, as it was shown above.
Not to mention that the second example will fetch no rows at all.
Count(id) or count(*) will use index scan so it will be better for performance. Rowcount returns only affected rows and useful on insert/update/delete
EDIT:
Since the question edited to compare Count(id) and count(), it makes a slight difference. Count() will return row count affected by select. Count(column) will return non null value count but since it is id, there wont be a null column. So it doesnt make difference for this case.
Performance difference should be negligible to null, since you are issuing only one query in both cases. The 2nd query has to fetch an extra column with the same value for every row matching id, hence it might have a large memory footprint. Even without the COUNT(*) the row count should be available, hence you should go with the 1st solution.
About your 2nd question, AFAIK either COUNT(id) or COUNT(*) will be faster with the index on id, since the db engine will have to perform a range scan to retrieve the rows in question, and range scans are faster with indexes when filtering on the indexed column (in your case id = SOME_ID).
Count(*) will be faster.
PDOStatement::rowCount() is not guaranteed to work according to the PDO documentation:
"not guaranteed for all databases and should not be relied on for portable applications."
So in your case I'd suggest using count(*).
See reference:
pdostatement.rowcount Manual

Using count(*) vs num_rows

To get number of rows in result set there are two ways:
Is to use query to get count
$query="Select count(*) as count from some_table where type='t1'";
and then retrieving the value of count.
Is getting count via num_rows(), in php.
so which one is better performance wise?
If your goal is to actually count the rows, use COUNT(*). num_rows is ordinarily (in my experience) only used to confirm that more than zero rows were returned and continue on in that case. It will probably take MySQL longer to read out many selected rows compared to the aggregation on COUNT too even if the query itself takes the same amount of time.
There are a few differences between the two:
num_rows is the number of result rows (records) received.
count(*) is the number of records in the database matching the query.
The database may be configured to limit the number of returned results (MySQL allows this for instance), in which case the two may differ in value if the limit is lower than the number of matching records. Note that limits may be configured by the DBA, so it may not be obvious from the SQL query code itself what limits apply.
Using num_rows to count records implies "transmitting" each record, so if you only want a total number (which would be a single record/row) you are far better off getting the count instead.
Additionally count can be used in more complex query scenario's to do things like sub-totals, which is not easily done with num_rows.
count is much more efficient both performance wise and memory wise as you're not having to retrieve so much data from the database server. If you count by a single column such as a unique id then you can get it a little more efficient
It depends on your implementation. If you're dealing with a lot of rows, count(*) is better because it doesn't have to pass all of those rows to PHP. If, on the other hand, you're dealing with a small amount of rows, the difference is negligible.
num_rows() would be better if you have small quantity of rows and count(*) will give you performance if there are large number of rows and you have to select one and send it to php.

Getting total number of records from mysql table - too slow

I have a file that goes thru a large data set and splits out the rows in a paginated manner. The dataset contains about 210k rows, which isn't even that much, it will grow to 3Mil+ in a few weeks, but its already slow.
I have a first query that gets the total number of items in the DB for a particular WHERE clause combination, the most basic one looks like this:
SELECT count(v_id) as num_items FROM versions
WHERE v_status = 1
It takes 0.9 seconds to run.
The 2nd query is a LIMIT query that gets the actual data for that page. This query is really quick. (less than 0.001 s).
SELECT
v_id,
v_title,
v_desc
FROM versions
WHERE v_status = 1
ORDER BY v_dateadded DESC
LIMIT 0, 25
There is an index on v_status, v_dateadded
I use php. I cache the result into memcace, so subsequent requests are really fast, but the first request is laggy. Especially once I throw in a fulltext search in there, it starts taking 2-3 seconds for the 2 queries.
I don't think this is right, but try making it count(*), i think the count(x) has to go through every row and count only the ones that don't have a null value (so it has to go through all the rows)
Given that v_id is a PRIMARY KEY it should not have any nulls, so try count(*) instead...
But i don't think it will help since you have a where clause.
Not sure if this is the same for MySQL, but in MS SQL Server COUNT(*) is almost always faster than COUNT(column). The parser determines the fastest column to count and uses that.
Run an explain plan to see how the optimizer is running your queries.
That'll probably tell you what Andreas Rehm told you: you'll want to add indices that cover your where clauses.
EDIT: For me FOUND_ROWS() was the fastest way of doing this:
SELECT
SQL_CALC_FOUND_ROWS
v_id,
v_title,
v_desc
FROM versions
WHERE v_status = 1
ORDER BY v_dateadded DESC
LIMIT 0, 25;
Then in a secondary query just do:
SELECT FOUND_ROWS();
If you are outputting to PHP you do this:
$totalnumber = mysql_result(mysql_query($secondquery)),0,0);
I was previously trying to the same thing as OP, putting COUNT(column) on the first query but it took about three times longer than even the slowest WHERE and ORDERBY query that I could do (with a LIMIT set). I tried changing to COUNT(*) and it improved a lot. But results in my case were even better using MySQL's FOUND_ROWS();
I am testing in PHP with microtime and repeating the query. In OP's case, if he ran COUNT(*) I think he will save some time, but it is not the fastest way of doing this. I ran some tests on COUNT(*) VS. FOUND_ROWS() and FOUND_ROWS() is quite a bit faster.
Using FOUND_ROWS() was nearly twice as fast in my case.
I first started doing EXPLAIN on the COUNT(*) query. In OP's case you would see that MySQL still checks a total of 210k rows in the first query. It checks every row before even starting the LIMIT query and doesn't seem to get any performance benefit from doing this.
If you run EXPLAIN on the LIMIT query it will probably check less than 100 rows as you have limited the results to 25. But this is still overlap and there will be some cases where you can't afford this or at the least you should still compare performance with FOUND_ROWS().
I thought this might only save time on large LIMIT requests, but when I run EXPLAIN on my LIMIT query it was actually only checking 25 rows to get 15 values. However, there was still a very noticeable difference in query time - on average I got down from .25 to .14 seconds and achieved the same results.

Best way to get result count before LIMIT was applied

When paging through data that comes from a DB, you need to know how many pages there will be to render the page jump controls.
Currently I do that by running the query twice, once wrapped in a count() to determine the total results, and a second time with a limit applied to get back just the results I need for the current page.
This seems inefficient. Is there a better way to determine how many results would have been returned before LIMIT was applied?
I am using PHP and Postgres.
Pure SQL
Things have changed since 2008. You can use a window function to get the full count and the limited result in one query. Introduced with PostgreSQL 8.4 in 2009.
SELECT foo
, count(*) OVER() AS full_count
FROM bar
WHERE <some condition>
ORDER BY <some col>
LIMIT <pagesize>
OFFSET <offset>;
Note that this can be considerably more expensive than without the total count. All rows have to be counted, and a possible shortcut taking just the top rows from a matching index may not be helpful any more.
Doesn't matter much with small tables or full_count <= OFFSET + LIMIT. Matters for a substantially bigger full_count.
Corner case: when OFFSET is at least as great as the number of rows from the base query, no row is returned. So you also get no full_count. Possible alternative:
Run a query with a LIMIT/OFFSET and also get the total number of rows
Sequence of events in a SELECT query
( 0. CTEs are evaluated and materialized separately. In Postgres 12 or later the planner may inline those like subqueries before going to work.) Not here.
WHERE clause (and JOIN conditions, though none in your example) filter qualifying rows from the base table(s). The rest is based on the filtered subset.
( 2. GROUP BY and aggregate functions would go here.) Not here.
( 3. Other SELECT list expressions are evaluated, based on grouped / aggregated columns.) Not here.
Window functions are applied depending on the OVER clause and the frame specification of the function. The simple count(*) OVER() is based on all qualifying rows.
ORDER BY
( 6. DISTINCT or DISTINCT ON would go here.) Not here.
LIMIT / OFFSET are applied based on the established order to select rows to return.
LIMIT / OFFSET becomes increasingly inefficient with a growing number of rows in the table. Consider alternative approaches if you need better performance:
Optimize query with OFFSET on large table
Alternatives to get final count
There are completely different approaches to get the count of affected rows (not the full count before OFFSET & LIMIT were applied). Postgres has internal bookkeeping how many rows where affected by the last SQL command. Some clients can access that information or count rows themselves (like psql).
For instance, you can retrieve the number of affected rows in plpgsql immediately after executing an SQL command with:
GET DIAGNOSTICS integer_var = ROW_COUNT;
Details in the manual.
Or you can use pg_num_rows in PHP. Or similar functions in other clients.
Related:
Calculate number of rows affected by batch query in PostgreSQL
As I describe on my blog, MySQL has a feature called SQL_CALC_FOUND_ROWS. This removes the need to do the query twice, but it still needs to do the query in its entireity, even if the limit clause would have allowed it to stop early.
As far as I know, there is no similar feature for PostgreSQL. One thing to watch out for when doing pagination (the most common thing for which LIMIT is used IMHO): doing an "OFFSET 1000 LIMIT 10" means that the DB has to fetch at least 1010 rows, even if it only gives you 10. A more performant way to do is to remember the value of the row you are ordering by for the previous row (the 1000th in this case) and rewrite the query like this: "... WHERE order_row > value_of_1000_th LIMIT 10". The advantage is that "order_row" is most probably indexed (if not, you've go a problem). The disadvantage being that if new elements are added between page views, this can get a little out of synch (but then again, it may not be observable by visitors and can be a big performance gain).
You could mitigate the performance penalty by not running the COUNT() query every time. Cache the number of pages for, say 5 minutes before the query is run again. Unless you're seeing a huge number of INSERTs, that should work just fine.
Since Postgres already does a certain amount of caching things, this type of method isn't as inefficient as it seems. It's definitely not doubling execution time. We have timers built into our DB layer, so I have seen the evidence.
Seeing as you need to know for the purpose of paging, I'd suggest running the full query once, writing the data to disk as a server-side cache, then feeding that through your paging mechanism.
If you're running the COUNT query for the purpose of deciding whether to provide the data to the user or not (i.e. if there are > X records, give back an error), you need to stick with the COUNT approach.

Categories