MySQL performance hit accessing table simultaneously - php

Following code is a mock-up of my real code. I'm getting a big performance hit when myFunction is called. myTable is no more than a few hundred rows, but calling myFunction adds ~10 seconds execution time. Is there something inherently wrong with trying to access a row of a table inside a loop already accessing that table?
<select>
<?php
$stmt = SQLout ("SELECT ID,Title FROM myTable WHERE LEFT(Title,2) = ? ORDER BY Title DESC",
array ('s', $co), array (&$id, &$co_title));
while ($stmt->fetch()) {
if (myFunction($id)) // skip this function call and save 10 seconds
echo '<option value="' . $co_title . '">' . $co_title . '</option>';
}
$stmt->close();
function myFunction ($id) {
$stmt = SQLout ("SELECT Info FROM myTable WHERE ID = ?",
array ('i', $id), array (&$info));
if ($stmt->fetch()) {
$stmt->close();
if ($info == $something)
return true;
}
return false;
}
?>
SQLout is basically:
$sqli_db->prepare($query);
$stmt->bind_param;
$stmt->execute();
$stmt->bind_result;
return $stmt;

What you're doing is sometimes called the "N+1 queries" problem. You run the first (outer) query 1 times, and it returns N rows. Then you run N subordinate queries, one for each row returned by the first query. Thus N+1 queries. It causes a lot of overhead.
This would have far better performance if you could apply the "something" condition in SQL:
$stmt = SQLout ("SELECT ID,Title FROM myTable
WHERE LEFT(Title,2) = ? AND Info = ... ORDER BY Title DESC",
array ('s', $co), array (&$id, &$co_title));
In general, it's not a good idea to run queries in a loop that depends on how many rows match the outer query. What if the outer query matches 1000000 rows? That means a million queries inside the loop will hit your database for this single PHP request.
Even if today the outer query only matches 3 rows, the fact that you've architected the code in this way means that six months from now, at some unpredictable time, there will be some search that results in a vast overhead, even if your code does not change. The number of queries is driven by the data, not the code.
Sometimes it's necessary to do what you're doing, for instance of the "something" condition is complex and can't be represented by an SQL expression. But you should try in all other cases to avoid this pattern of N+1 queries.

So, if you have a "few hundred rows" in the table, you might be calling myFunction a few hundred times, depending on how many rows are returned in the first query.
Check the number of rows that first query is returning to make sure it meets your expectations.
After that, make sure you have an index on myTable.ID.
After that, I would start looking into system/server level issues. On slower systems, say a laptop hard drive, 10 queries per second might be all you can get.

Try something like this:
$stmt = SQLout ("SELECT ID,Title, Info FROM myTable WHERE LEFT(Title,2) = ? ORDER BY Title DESC",
array ('s', $co), array (&$id, &$co_title, &$info));
while ($stmt->fetch()) {
if (myFunction($info)) // skip this function call and save 10 seconds
echo '<option value="' . $co_title . '">' . $co_title . '</option>';
}
$stmt->close();
function myFunction ($info) {
if ($info == $something)
return true;
}
return false;
}

Related

PHP PDO sqlsrv large result set inconsistency

I am using PDO to execute a query for which I am expecting ~500K results. This is my query:
SELECT Email FROM mytable WHERE flag = 1
When I run the query in Microsoft SQL Server management Studio I consistently get 544838 results. I wanted to write a small script in PHP that would fetch these results for me. My original implementation used fetchAll(), but this was exhausting the memory available to php, so I decided to fetch the results one at a time like so:
$q = <<<QUERY
SELECT Email FROM mytable WHERE flag = 1
QUERY;
$stmt = $conn->prepare($q);
$stmt->execute();
$c = 0;
while ($email = $stmt->fetch()[0]) {
echo $email." $c\n";
$c++;
}
but each time I run the query, I get a different number of results! Typical results are:
445664
445836
445979
The number of results seems to be short 100K +/- 200 ish. Any help would be greatly appreciated.
fetch() method fetches one row at a time from current result set. $stmt->fetch()[0] is the first column of the current row.
Your sql query has no ordering and can have some null or empty values (probably).
Since you are controlling this column value in while loop, if the current row's first value is null, it will exit from the loop.
Therefore, you should control only fetch(), not fetch()[0] or something like that.
Also, inside the while loop, use sqlsrv_get_field() to access the columns by index.
$c = 0;
while ($stmt->fetch()) { // You may want to control errors
$email = sqlsrv_get_field($stmt, 0); // get first column value
// $email can be false on errors
echo $email . " $c\n";
$c++;
}
sqlsrv_fetch

Database overload in long task using Laravel

I'm currently struggling with an issue that is overloading my database which makes all page requests being delayed significantly.
Current scenario
- A certain Artisan Command is scheduled to be ran every 8 minutes
- This command has to update a whole table with more than 30000 rows
- Every row will have a new value, which means 30000 queries will have to be executed
- For about 14 seconds the server doesn't answer due to database overload (I guess)
Here's the handle method of the command handle()
public function handle()
{
$thingies = /* Insert big query here */
foreach ($thingies as $thing)
{
$resource = Resource::find($thing->id);
if(!$resource)
{
continue;
}
$resource->update(['column' => $thing->value]);
}
}
Is there any other approach to do this without making my page requests being delayed?
Your process is really inefficient and I'm not surprised it takes a long time to complete. To process 30,000 rows, you're making 60,000 queries (half to find out if the id exists, and the other half to update the row). You could be making just 1.
I have no experience with Laravel, so I'll leave it up to you to find out what functions in Laravel can be used to apply my recommendation. I just want to get you to understand the concepts.
MySQL allows you to submit a multi query; One command that executes many queries. It is drastically faster than executing individual queries in a loop. Here is an example that uses MySQLi directly (no 3rd party framework such as Laravel)
//the 30,000 new values and the record IDs they belong to. These values
// MUST be escaped or known to be safe
$values = [
['id'=>145, 'fieldName'=>'a'], ['id'=>2, 'fieldName'=>'b']...
];
// %s and %d will be replaced with column value and id to look for
$qry_template = "UPDATE myTable SET fieldName = '%s' WHERE id = %d";
$queries = [];//array of all queries to be run
foreach ($values as $row){ //build and add queries
$q = sprintf($qry_template,$row['fieldName'],$row['id']);
array_push($queries,$q);
}
//combine all into one query
$combined = implode("; ",$queries);
//execute all queries at once
$mysqli->multi_query($combined);
I would look into how Laravel does multi queries and start there. The last time I implemented something like this, it took about 7 milliseconds to insert 3,000 rows. So updating 30,000 will definitely not take 14 seconds.
As an added bonus, there is no need to first run a query to figure out whether the ID exists. If it doesn't, nothing will be updated.
Thanks to #cyclone comment I was able to update all the values in one single query.
It's not a perfect solution, but the query execution time now takes roughly 8 seconds and only 1 connection is required, which means the page requests are still being handled when the query is being executed.
I'm not marking this question as definitive since there might be improvements to make.
$ids = [];
$caseQuery = '';
foreach ($thingies as $thing)
{
if(strlen($caseQuery) == 0)
{
$caseQuery = '(CASE WHEN id = '. $thing->id . ' THEN \''. $thing->rank .'\' ';
}
else
{
$caseQuery .= ' WHEN id = '. $thing->id . ' THEN \''. $thing->rank .'\' ';
}
array_push($ids, $thing->id);
}
$caseQuery .= ' END)';
// Execute query
DB::update('UPDATE <table> SET <value> = '. $caseQuery . ' WHERE id IN ('. implode( ',' , $ids) .')');

Trying to create and echo array via pg_query_params

I've worked with Postgresql some, but I'm still a novice. I usually default to creating way too many queries and hacking my way through to get the result I need from a query. This time I'd like to write some more streamlined code since I'll be dealing with a large database, and the code needs to be as concise as possible.
So I have a lot of point data, and then I have many counties. I have two tables, "counties" and "ltg_data" (the many points). My goal is to read through a specified number of counties (as given in an array) and determine how many points fall in each county. My novice, repetitive and inefficient way of doing this is by writing queries like this:
$klamath_40_days = pg_query($conn, "SELECT countyname, time from counties, ltg_data where st_contains(counties.the_geom, ltg_data.ltg_geom) and countyname");
$klamath_rows = pg_num_rows($klamath_40_days);
If I run a separate query like the above for each county, it gives me nice output, but it's repetitive and inefficient. I'd much rather use a loop. And eventually I'll need to pass params into the query via the URL. When I try to run a for loop in PHP, I get errors saying "query failed: ERROR: column "jackson" does not exist", etc. Here's the loop:
$counties = array ('Jackson', 'Klamath');
foreach ($counties as $i) {
echo "$i<br>";
$jackson_24 = pg_query($conn, "SELECT countyname, time from counties, ltg_data where st_contains(counties.the_geom, ltg_data.ltg_geom) and countyname = ".$i." and time >= (NOW() - '40 DAY'::INTERVAL)");
$jackson_rows = pg_num_rows($result);
}
echo "$jackson_rows";
So then I researched the pg_query_params feature in PHP, and I thought this would help. But I run this script:
$counties = array('Jackson', 'Josephine', 'Curry', 'Siskiyou', 'Modoc', 'Coos', 'Douglas', 'Klamath', 'Lake');
$query = "SELECT countyname, time from counties, ltg_data where st_contains(counties.the_geom, ltg_data.ltg_geom) and countyname = $1 and time >= (NOW() - '40 DAY'::INTERVAL)";
$result = pg_query_params($conn, $query, $counties);
And I get this error: Query failed: ERROR: bind message supplies 9 parameters, but prepared statement "" requires 1 in
So I'm basically wondering what the best way to pass parameters (either individual from perhaps a URL passed param or multiple elements in an array) to a postgresql query is? And then I'd like to echo out the summary results in an organized manner.
Thanks for any help with this.
If you just need to know how many points fall into each county specified in an array, then you can do the following in a single call to the database:
SELECT countyname, count(*)
FROM counties
JOIN ltg_data ON ST_contains(counties.the_geom, ltg_data.ltg_geom)
WHERE countyname = ANY ($counties)
AND time >= now() - interval '40 days'
GROUP BY countyname;
This is much more efficient than making individual calls and you return only a single instance of the county name, rather than one for every record that is retrieved. If you have, say 1,000 points in the country Klamath, you return the string "Klamath" just once, instead of 1,000 times. Also, php doesn't have to count the length of the query result. All in all much cleaner and faster.
Note also the JOIN syntax in combination with the PostGIS function call.
To execute a query with a parameter in a loop for several values you can use the following pattern:
$counties = array('Jackson', 'Josephine', 'Curry');
$query = "SELECT countyname, time from counties where countyname = $1";
foreach ($counties as $county) {
$result = pg_query_params($conn, $query, array($county));
$row = pg_fetch_row($result);
echo "$row[0] $row[1] \n";
}
Note that the third parameter of pg_query_params() is an array, hence you must put array($county) even though there is only one parameter.
You can also execute one query with an array as parameter.
In this case you should use postgres syntax for an array and pass it to the query as a text variable.
$counties = "array['Jackson', 'Josephine', 'Curry']";
$query = "SELECT countyname, time from counties where countyname = any ($counties)";
echo "$query\n\n";
$result = pg_query($conn, $query);
while ($row = pg_fetch_row($result)) {
echo "$row[0] $row[1] \n";
}

Is it faster to use php array count() than SQL row count?

In a nutshell: Is it faster to use PHPs array count() on a number of arrays vs. using SQL row count multiple times?
I'm having an issue with a slow query that I attribute to the COUNT(*) function. I will explain what I am doing and then what I'm anticipating might be significantly faster.
Currently I'm looping a function that does a count of about 20,000 rows each iteration. It returns the number of rows for each month in a year:
// CREATE MONTHLY LINKS
public function monthly_links() {
$months = array('','January','February','March','April','May','June','July','August', 'September','October','November','December');
for ($i=1 ; $i <= 12 ; $i++) {
$array[] = "<a href='monthly.php?month=" . $i . "&status=3'>" . $months[$i] . " " . $this->chosen_year() . " (" . $this->number_positions_month_year($i, $this->chosen_year()) . ")</a>";
}
return $array;
}
// SHOW NUMBER OF POSITIONS FOR EACH MONTH/YEAR
public function number_positions_month_year($month, $year) {
$sql = $this->client_positions_sql() . " AND MONTH(`exp_date`) = " . $month . " AND YEAR(`exp_date`) = " . $year;
$res = $this->conn->query($sql);
$count = $res->num_rows;
return $count;
}
The code is not that important in the example above because essentially what I am asking is: Is it faster to do the following...?
Query the table once while dumping each months corresponding ids to an array (there will be 12 arrays)
Using PHP count() on each array to get the number of entries for each month
You can use SQL's group by function to group by month.
SELECT COUNT(*), MONTH(exp_date)
FROM theTableYouAreUsing
GROUP BY MONTH(exp_date)
Then in php, in the array that's returned you get the count for the month you need.
Speed wise, this is a lot quicker than a separate query for each month.
In short: Premature optimization is the root of all evil :) So usually in the end it doesn't really matter. However, keep in mind that depending on when and where you need the number of rows how you fetch and handle the result you don't have the full data in a single array available, so you don't have an array to call count() on. Just to use count() seems not to be a valid reason to create such an array, because it unnecessarily consumes memory.
I am going to have to agree with KingCrunch on this. What you really need to look at is that type of application you are having if this is low voulme of users or something like that then doing it in the database will be faster now if you have lots of traffic blah blah blah and to avoid the db getting overloaded by soemthing php can do then php will be faster when you get to something like that at scale. Something to alsways keep in mind is that if you send the result set over to php it is going to have to recive the data over the network and then count it meaning more data and network latancy, but again that is assuming this is with a remote db. But try not to over optimize.

Odd behavior when fetching 100K rows from MySQL via PHP

Looking for some ideas here... I have a MySQL table that has 100K rows of test data in it.
I am using a PHP script to fetch rows out of that table, and in this test case, the script is fetching all 100,000 rows (doing some profiling and optimization for large datasets).
I connect to the DB, and execute an unbuffered query:
$result = mysql_unbuffered_query("SELECT * FROM TestTable", $connection) or die('Errant query: ' . $query);
Then I iterate over the results with:
if ($result) {
while($tweet = mysql_fetch_assoc($result)) {
$ctr++;
if ($ctr > $kMAX_RECORDS) {
$masterCount += $ctr;
processResults($results);
$results = array();
$ctr = 1;
}
$results[] = array('tweet' => $tweet);
}
echo "<p/>FINISHED GATHERING RESULTS";
}
function processResults($resultSet) {
echo "<br/>PROCESSED " . count($resultSet) . " RECORDS";
}
$kMAX_RECORDS = 40000 right now, so I would expect to see output like:
PROCESSED 40000 RECORDS PROCESSED 40000 RECORDS PROCESSED
20000 RECORDS FINISHED GATHERING RESULTS
However, I am consistently seeing:
PROCESSED 39999 RECORDS
PROCESSED 40000 RECORDS
FINISHED GATHERING RESULTS
If I add the output of $ctr right after $ctr++, I get the full 100K records, so it seems to me to be some sort of timing issue or problem with fetching the data from the back-end with MYSQL_FETCH_ASSOC.
On a related note, the code in the while loop is there because prior to breaking up the $results array like this the while loop would just fall over at around 45000 records (same place every time). Is this due to a setting somewhere that I have missed?
Thanks for any input... just need some thoughts on where to look for the answer to this.
Cheers!
You're building an array of results, and counting that new array's members. So yes, it is expected behavior that after fetching the first row, you'll get "1 result", then "2 results", etc...
If you want to get the total number of rows expected, you'll need to use mysql_num_rows()
When you start going through your result $ctr has no value and doing first incrementation will evaluate it to 0. But when reaching $kMAX_RECORDS you reset it to 1 instead of 0. I don't know however why you see 1 row less in the first time of calling processResults(). I think it should be one more.
As to missing last 20000 rows notice that you are running processResults() only after $ctr exceeds $kMAX_RECORDS

Categories