In a nutshell: Is it faster to use PHPs array count() on a number of arrays vs. using SQL row count multiple times?
I'm having an issue with a slow query that I attribute to the COUNT(*) function. I will explain what I am doing and then what I'm anticipating might be significantly faster.
Currently I'm looping a function that does a count of about 20,000 rows each iteration. It returns the number of rows for each month in a year:
// CREATE MONTHLY LINKS
public function monthly_links() {
$months = array('','January','February','March','April','May','June','July','August', 'September','October','November','December');
for ($i=1 ; $i <= 12 ; $i++) {
$array[] = "<a href='monthly.php?month=" . $i . "&status=3'>" . $months[$i] . " " . $this->chosen_year() . " (" . $this->number_positions_month_year($i, $this->chosen_year()) . ")</a>";
}
return $array;
}
// SHOW NUMBER OF POSITIONS FOR EACH MONTH/YEAR
public function number_positions_month_year($month, $year) {
$sql = $this->client_positions_sql() . " AND MONTH(`exp_date`) = " . $month . " AND YEAR(`exp_date`) = " . $year;
$res = $this->conn->query($sql);
$count = $res->num_rows;
return $count;
}
The code is not that important in the example above because essentially what I am asking is: Is it faster to do the following...?
Query the table once while dumping each months corresponding ids to an array (there will be 12 arrays)
Using PHP count() on each array to get the number of entries for each month
You can use SQL's group by function to group by month.
SELECT COUNT(*), MONTH(exp_date)
FROM theTableYouAreUsing
GROUP BY MONTH(exp_date)
Then in php, in the array that's returned you get the count for the month you need.
Speed wise, this is a lot quicker than a separate query for each month.
In short: Premature optimization is the root of all evil :) So usually in the end it doesn't really matter. However, keep in mind that depending on when and where you need the number of rows how you fetch and handle the result you don't have the full data in a single array available, so you don't have an array to call count() on. Just to use count() seems not to be a valid reason to create such an array, because it unnecessarily consumes memory.
I am going to have to agree with KingCrunch on this. What you really need to look at is that type of application you are having if this is low voulme of users or something like that then doing it in the database will be faster now if you have lots of traffic blah blah blah and to avoid the db getting overloaded by soemthing php can do then php will be faster when you get to something like that at scale. Something to alsways keep in mind is that if you send the result set over to php it is going to have to recive the data over the network and then count it meaning more data and network latancy, but again that is assuming this is with a remote db. But try not to over optimize.
Related
I've worked with Postgresql some, but I'm still a novice. I usually default to creating way too many queries and hacking my way through to get the result I need from a query. This time I'd like to write some more streamlined code since I'll be dealing with a large database, and the code needs to be as concise as possible.
So I have a lot of point data, and then I have many counties. I have two tables, "counties" and "ltg_data" (the many points). My goal is to read through a specified number of counties (as given in an array) and determine how many points fall in each county. My novice, repetitive and inefficient way of doing this is by writing queries like this:
$klamath_40_days = pg_query($conn, "SELECT countyname, time from counties, ltg_data where st_contains(counties.the_geom, ltg_data.ltg_geom) and countyname");
$klamath_rows = pg_num_rows($klamath_40_days);
If I run a separate query like the above for each county, it gives me nice output, but it's repetitive and inefficient. I'd much rather use a loop. And eventually I'll need to pass params into the query via the URL. When I try to run a for loop in PHP, I get errors saying "query failed: ERROR: column "jackson" does not exist", etc. Here's the loop:
$counties = array ('Jackson', 'Klamath');
foreach ($counties as $i) {
echo "$i<br>";
$jackson_24 = pg_query($conn, "SELECT countyname, time from counties, ltg_data where st_contains(counties.the_geom, ltg_data.ltg_geom) and countyname = ".$i." and time >= (NOW() - '40 DAY'::INTERVAL)");
$jackson_rows = pg_num_rows($result);
}
echo "$jackson_rows";
So then I researched the pg_query_params feature in PHP, and I thought this would help. But I run this script:
$counties = array('Jackson', 'Josephine', 'Curry', 'Siskiyou', 'Modoc', 'Coos', 'Douglas', 'Klamath', 'Lake');
$query = "SELECT countyname, time from counties, ltg_data where st_contains(counties.the_geom, ltg_data.ltg_geom) and countyname = $1 and time >= (NOW() - '40 DAY'::INTERVAL)";
$result = pg_query_params($conn, $query, $counties);
And I get this error: Query failed: ERROR: bind message supplies 9 parameters, but prepared statement "" requires 1 in
So I'm basically wondering what the best way to pass parameters (either individual from perhaps a URL passed param or multiple elements in an array) to a postgresql query is? And then I'd like to echo out the summary results in an organized manner.
Thanks for any help with this.
If you just need to know how many points fall into each county specified in an array, then you can do the following in a single call to the database:
SELECT countyname, count(*)
FROM counties
JOIN ltg_data ON ST_contains(counties.the_geom, ltg_data.ltg_geom)
WHERE countyname = ANY ($counties)
AND time >= now() - interval '40 days'
GROUP BY countyname;
This is much more efficient than making individual calls and you return only a single instance of the county name, rather than one for every record that is retrieved. If you have, say 1,000 points in the country Klamath, you return the string "Klamath" just once, instead of 1,000 times. Also, php doesn't have to count the length of the query result. All in all much cleaner and faster.
Note also the JOIN syntax in combination with the PostGIS function call.
To execute a query with a parameter in a loop for several values you can use the following pattern:
$counties = array('Jackson', 'Josephine', 'Curry');
$query = "SELECT countyname, time from counties where countyname = $1";
foreach ($counties as $county) {
$result = pg_query_params($conn, $query, array($county));
$row = pg_fetch_row($result);
echo "$row[0] $row[1] \n";
}
Note that the third parameter of pg_query_params() is an array, hence you must put array($county) even though there is only one parameter.
You can also execute one query with an array as parameter.
In this case you should use postgres syntax for an array and pass it to the query as a text variable.
$counties = "array['Jackson', 'Josephine', 'Curry']";
$query = "SELECT countyname, time from counties where countyname = any ($counties)";
echo "$query\n\n";
$result = pg_query($conn, $query);
while ($row = pg_fetch_row($result)) {
echo "$row[0] $row[1] \n";
}
Following code is a mock-up of my real code. I'm getting a big performance hit when myFunction is called. myTable is no more than a few hundred rows, but calling myFunction adds ~10 seconds execution time. Is there something inherently wrong with trying to access a row of a table inside a loop already accessing that table?
<select>
<?php
$stmt = SQLout ("SELECT ID,Title FROM myTable WHERE LEFT(Title,2) = ? ORDER BY Title DESC",
array ('s', $co), array (&$id, &$co_title));
while ($stmt->fetch()) {
if (myFunction($id)) // skip this function call and save 10 seconds
echo '<option value="' . $co_title . '">' . $co_title . '</option>';
}
$stmt->close();
function myFunction ($id) {
$stmt = SQLout ("SELECT Info FROM myTable WHERE ID = ?",
array ('i', $id), array (&$info));
if ($stmt->fetch()) {
$stmt->close();
if ($info == $something)
return true;
}
return false;
}
?>
SQLout is basically:
$sqli_db->prepare($query);
$stmt->bind_param;
$stmt->execute();
$stmt->bind_result;
return $stmt;
What you're doing is sometimes called the "N+1 queries" problem. You run the first (outer) query 1 times, and it returns N rows. Then you run N subordinate queries, one for each row returned by the first query. Thus N+1 queries. It causes a lot of overhead.
This would have far better performance if you could apply the "something" condition in SQL:
$stmt = SQLout ("SELECT ID,Title FROM myTable
WHERE LEFT(Title,2) = ? AND Info = ... ORDER BY Title DESC",
array ('s', $co), array (&$id, &$co_title));
In general, it's not a good idea to run queries in a loop that depends on how many rows match the outer query. What if the outer query matches 1000000 rows? That means a million queries inside the loop will hit your database for this single PHP request.
Even if today the outer query only matches 3 rows, the fact that you've architected the code in this way means that six months from now, at some unpredictable time, there will be some search that results in a vast overhead, even if your code does not change. The number of queries is driven by the data, not the code.
Sometimes it's necessary to do what you're doing, for instance of the "something" condition is complex and can't be represented by an SQL expression. But you should try in all other cases to avoid this pattern of N+1 queries.
So, if you have a "few hundred rows" in the table, you might be calling myFunction a few hundred times, depending on how many rows are returned in the first query.
Check the number of rows that first query is returning to make sure it meets your expectations.
After that, make sure you have an index on myTable.ID.
After that, I would start looking into system/server level issues. On slower systems, say a laptop hard drive, 10 queries per second might be all you can get.
Try something like this:
$stmt = SQLout ("SELECT ID,Title, Info FROM myTable WHERE LEFT(Title,2) = ? ORDER BY Title DESC",
array ('s', $co), array (&$id, &$co_title, &$info));
while ($stmt->fetch()) {
if (myFunction($info)) // skip this function call and save 10 seconds
echo '<option value="' . $co_title . '">' . $co_title . '</option>';
}
$stmt->close();
function myFunction ($info) {
if ($info == $something)
return true;
}
return false;
}
Ever since developing my first MySQL project about 7 years ago, I've been using the same set of simple functions for accessing the database (although, have recently put these into a Database class).
As the projects I develop have become more complex, there are many more records in the database and, as a result, greater likelihood of memory issues.
I'm getting the PHP error Allowed memory size of 67108864 bytes exhausted when looping through a MySQL result set and was wondering whether there was a better way to achieve the flexibility I have without the high memory usage.
My function looks like this:
function get_resultset($query) {
$resultset = array();
if (!($result = mysql_unbuffered_query($query))) {
$men = mysql_errno();
$mem = mysql_error();
echo ('<h4>' . $query . ' ' . $men . ' ' . $mem . '</h4>');
exit;
} else {
$xx = 0 ;
while ( $row = mysql_fetch_array ($result) ) {
$resultset[$xx] = $row;
$xx++ ;
}
mysql_free_result($result);
return $resultset;
}
}
I can then write a query and use the function to get all results, e.g.
$query = 'SELECT * FROM `members`';
$resultset = get_resultset($query);
I can then loop through the $resultset and display the results, e.g.
$total_results = count($resultset);
for($i=0;$i<$total_results;$i++) {
$record = $resultset[$i];
$firstname = $record['firstname'];
$lastname = $record['lastname'];
// etc, etc display in a table, or whatever
}
Is there a better way of looping through results while still having access to each record's properties for displaying the result list?
I've been searching around for people having similar issues and the answers given don't seem to suit my situation or are a little vague.
Your problem is that you're creating an array and filling it up with all the results in your result set, then returning this huge array from the function. I suppose that the reason for which this is not supported by any mysql_* function is that it's extremely inefficient.
You should not fill up the array with everything you get. You should step through the results, just like you do when filling up the array, but instead of filling anything, you should process the result and get to the next one, so that the memory for this one gets a chance to be freed.
If you use the mysql_* or mysqli_* functions, you should return the resource, then step through it right there where you're using it, the same way you're stepping through it to fill the array. If you use PDO, then you can return the PDOStatement and use PDOStatement::fetch() to step through it.
Looking for some ideas here... I have a MySQL table that has 100K rows of test data in it.
I am using a PHP script to fetch rows out of that table, and in this test case, the script is fetching all 100,000 rows (doing some profiling and optimization for large datasets).
I connect to the DB, and execute an unbuffered query:
$result = mysql_unbuffered_query("SELECT * FROM TestTable", $connection) or die('Errant query: ' . $query);
Then I iterate over the results with:
if ($result) {
while($tweet = mysql_fetch_assoc($result)) {
$ctr++;
if ($ctr > $kMAX_RECORDS) {
$masterCount += $ctr;
processResults($results);
$results = array();
$ctr = 1;
}
$results[] = array('tweet' => $tweet);
}
echo "<p/>FINISHED GATHERING RESULTS";
}
function processResults($resultSet) {
echo "<br/>PROCESSED " . count($resultSet) . " RECORDS";
}
$kMAX_RECORDS = 40000 right now, so I would expect to see output like:
PROCESSED 40000 RECORDS PROCESSED 40000 RECORDS PROCESSED
20000 RECORDS FINISHED GATHERING RESULTS
However, I am consistently seeing:
PROCESSED 39999 RECORDS
PROCESSED 40000 RECORDS
FINISHED GATHERING RESULTS
If I add the output of $ctr right after $ctr++, I get the full 100K records, so it seems to me to be some sort of timing issue or problem with fetching the data from the back-end with MYSQL_FETCH_ASSOC.
On a related note, the code in the while loop is there because prior to breaking up the $results array like this the while loop would just fall over at around 45000 records (same place every time). Is this due to a setting somewhere that I have missed?
Thanks for any input... just need some thoughts on where to look for the answer to this.
Cheers!
You're building an array of results, and counting that new array's members. So yes, it is expected behavior that after fetching the first row, you'll get "1 result", then "2 results", etc...
If you want to get the total number of rows expected, you'll need to use mysql_num_rows()
When you start going through your result $ctr has no value and doing first incrementation will evaluate it to 0. But when reaching $kMAX_RECORDS you reset it to 1 instead of 0. I don't know however why you see 1 row less in the first time of calling processResults(). I think it should be one more.
As to missing last 20000 rows notice that you are running processResults() only after $ctr exceeds $kMAX_RECORDS
I have a string that I need to search using php to see if any of 2-3million terms are present. The strings could be 1000 words long and the search terms may be up to 5 words long.
I have tried strpos and strstr but execution time is more than 60 seconds.
Can anyone suggest an alternative?
So far I have the following:
$query = "SELECT City FROM cities";
$result = mysql_query($query);
if ($row = mysql_fetch_array($result)) {
do {
$city = " " . $row['City'] . " ";
if(strpos($string, $city)!== False) {
echo $city . "<br />\n";
}
} while ($row = mysql_fetch_array($result));
}
Take the load from PHP and give to Mysql.
Here instead of doing the search with PHP inside a loop, you can use mysql LIKE function to search for a string.
eg: SELECT City FROM cities LIKE %search text%
if you have more search text, either you can use OR switch or JOIN sql queries to find those.
This will be quicker and you don't take 60 seconds for execution.
Good luck
-- Sajith
I would suggest following:
Using associative arrays create two lookup tables: words and terms, so each entry should represent single word/term
Now you can search through lookup tables with built in complexity O(1)