Reduce memory usage for big database calls php - php

I am using this script to cross reference data so that I can retrieve the right data from the database which requires 3 checks of which 2 are run within a for loop
$state = str_replace("<span>","",$state);
$state = str_replace("</span>","",$state);
$getStateGroups = $wpdb->get_results("SELECT * FROM wp_gmw_locations WHERE region_code='$state'");
$groupIdList = array();
for($i = 0; $i < count($getStateGroups); $i++){
array_push($groupIdList,$getStateGroups[$i]->object_id);
}
$groupMemberList = array();
for($i =0; $i < count($groupIdList);$i++){
$ggm = $wpdb->get_results("SELECT * FROM wp_bp_groups_members WHERE group_id=" . $groupIdList[$i]);
for($x = 0; $x < count($ggm); $i++){
array_push($groupMemberList,$ggm[$x]->user_id);
}
}
$whereList = implode(',',$groupMemberList);
$userGameData = $wpdb->get_results("SELECT * FROM user_game_data WHERE uid IN ($whereList) ORDER BY lifetime_keys_spent DESC");
return implode(',',$groupIdList);
using this I am running out of memory
I have tried increasing my memory limit from 256M to 3GB and it stills runs out of memory
I am using wordpress so $wpdb is the class that wordpress uses to interact with the database
I am required to cross reference the data pulled to get the information required for the functionality we want is there any way i can reduce the memory that this is trying to use or is there some bug that in the code that i am missing?
EDIT:
The script runs out of memory on the line array_push($groupMemberList,$ggm[$x]->user_id);

SELECT * FROM wp_gmw_locations WHERE region_code='$state'
if you only need the ids, why are you selecting everything? (not to mention that just interpreting $state in there makes you prone for SQL injection attacks, read up on prepared statements.)
your first improvement would be only selecting the id:
SELECT object_id FROM wp_gmw_locations WHERE region_code='$state'
next thing: you're only using the ids for further queries, you're not using them in your code otherwise, right?
then why not let the database do the heavy lifting, it's much better at it:
SELECT * FROM wp_bp_groups_members WHERE group_id IN
(SELECT object_id FROM wp_gmw_locations WHERE region_code='$state')
which not just gets rid of all that memory overhead, but also turns n+1 queries to the database into one.
again, you only need the id? further refine your select then!
SELECT user_id FROM wp_bp_groups_members WHERE group_id IN
(SELECT object_id FROM wp_gmw_locations WHERE region_code='$state')
that should drastically reduce your memory footprint and your execution time. you should be able to take it from here and improve the rest of your code. i recommend spinning up a SQL-console and just playing around a bit so you get a feel for what the database can do for you - it's quite a lot! i haven't encountered any data retrieval task in the past few years i couldn't solve with just one query.
by the way: take a look at foreach-loops, they make code quite elegant compared to for-loops.

Related

SQL select by using limit or using program to fetch special records

I am using php to get special records from Database.
which one is better?
1.
Select * From [table] Limit 50000, 10;
while($row = $stmt->fetch()){
//save in array, total 10 times
}
or
2.
Select * From [table];
$start = 50000;
$length = 10;
while($row = $stmt->fetch()){
if($i < $start+$length && $j >=$start){
//save in array, total 50010 times
}
}
In this case, which one should I use?
Which one using DB with less resources?
which one is better?
Too vague: what is "better"?
Which one using DB with less resources?
You're much better off with the first approach. It's efficient to select as little data as you need and no more. Selecting the whole table will force your script to use a lot more memory because all that data needs to be kept live
The best answer you'll get is: test! You can run your queries multiple times in multiple ways and see for yourself. Just use SELECT SQL_NO_CACHE... instead of the generic SELECT... to force the DB to restart the work from scratch. Measure how long it takes to run the query and process results
function wayOne(){
// execute your 1st query and loop through results
}
function wayTwo(){
// execute 2nd query and loop through results
}
//Measures # of milliseconds it takes to execute another function
function timeThis(callable $callback){
$start_time = microtime();
call_user_func($callback);
$microsecs = microtime()-$start_time; //duration in microseconds
return round($microsecs*1000);//duration in milliseconds
}
$wayOneTime = timeThis('wayOne');
$wayTwoTime = timeThis('wayTwo');
You can then compare the two times. Generally (not always) a process that takes significantly less time uses fewer resources

Reading from a big table

I want to read data from a table (or view) on SQL server 2008R2 using PHP 5.4.24 and freetds 0.91.
In PHP, I write:
$ret = mssql_query( 'SELECT * FROM mytable', $Conn ) ;
Then I read rows one at a time, process them, all is ok. Except, when the table is really big, then I get an error:
Fatal error: Allowed memory size of 1073741824 bytes exhausted (tried to allocate 4625408 bytes)
in /home/prove/test.php on line 43
The error happens in the mssql_query(), so it does not matter how I query.
Altering the query to return less rows or less columns is not viable, because I must read a lot of data from many tables in a limited time.
What can I do to convince PHP to read in memory one row at a time, or a reasonable number at a time?
I agree with #James in the Question comments, if you need to read a table so big it exhausts your memory to stash the results before even handing back to PHP, it probably just means you need to figure out a better way. However, here is a possible solution (untested and I've only used MSSQL a couple of times; but if not perfect, I tried my best):
$ret = mssql_query( 'SELECT COUNT(*) as TotalRows FROM mytable', $Conn);
$row = mssql_fetch_row($ret);
$offset = 0;
$increment = 50;
while ($offset < $row['TotalRows'])
{
$ret_2 = mssql_query("SELECT * FROM mytable ORDER BY Id ASC OFFSET {$offset} ROWS FETCH NEXT {$increment} ROWS ONLY", $Conn);
//
// loop over those 50 rows, do your thing...
//
$offset += $increment;
}
//
// there will probably be a remainder, so you'll have to account for that in the post-iteration; probably making the interior loop a function or method would be wise.
//

Efficient query to just return row count

I have five different queries running on my about page showing basic data like the number of news stories we have on the site. I am using queries like this:
$sql4 = "SELECT `ride_id` FROM `tpf_rides` WHERE `type` LIKE '%Roller Coaster%'" ;
$result4 = $pdo->query($sql4);
$coasters = $result4->rowCount();
but wonder if there is a more efficient way. I've tried to minimize the load by only pulling id's but because I only need the count can the load be lightened even more?
Also these queries only really need to run once or twice per day, not every time the page is loaded. Can someone point me in the direction of setting this up? I've never had to do this before. Thanks.
Yes there is a more efficient way. Let the database do the counting for you:
SELECT count(*) as cnt
FROM `tpf_rides`
WHERE `type` LIKE '%Roller Coaster%';
If all the counts you are looking for are from the tpf_rides table, then you can do them in one query:
SELECT sum(`type` LIKE '%Roller Coaster%') as RollerCoaster,
sum(`type` LIKE '%Haunted House%') as HauntedHouse,
sum(`type` LIKE '%Ferris Wheel%') as FerrisWheel
FROM `tpf_rides`;
That would be even faster than running three different queries.
If you want to run those queries only every now and then you need to keep the result stored somewhere. This can take a form of a pre-calculated sum you manage yourself or a simple cache.
Below is a very simple and naive cache implementation that should work reliably on linux. Many things can be improved here but maybe this will give you an idea of what you could do.
The below is not compatible with the query suggested by Gordon Linoff which returns multiple counts.
The code has not been tested.
$cache_directory = "/tmp/";
$cache_lifetime = 86400; // time to keep cache in seconds. 24 hours = 86400sec
$sql4 = "SELECT count(*) FROM `tpf_rides` WHERE `type` LIKE '%Roller Coaster%'";
$cache_key = md5($sql4); //generate a semi-unique identifier for the query
$cache_file = $cache_directory . $cache_key; // generate full cache file path
if (!file_exists($cache_file) || time() <= strtotime(filemtime($cache)) + $cache_lifetime)
{
// cache file doesn't exist or has expired
$result4 = $pdo->query($sql4);
$coasters = $result4->fetchColumn();
file_put_contents($cache_file, $coasters); // store the result in a cache file
} else {
// file exists and data is up to date
$coasters = file_get_contents($cache_file);
}
I would strongly suggest you break this down into functions that take care of different aspects of the problem.

Timing out while updating MySQL with PHP from a CSV

I need to come up with a way to make a large task faster to beat the timeout.
I have very limited access to the server due to the restrictions of the hosting company.
I have a system set up where a cron visits a PHP file that grabs a csv that contains data on some products. The csv does not contain all of the fields that the product would have. Just a handful of essential ones.
I've read a fair number of articles on timeouts and handling csv's and currently (in an attempt to shave time) I have made a table (let's call it csv_data) to hold the csv data. I have a script that truncates the csv_data table then inserts data from the csv so each night the latest recordset from the csv is in that table (the csv file gets updated nightly). So far, no timeout problems..the task only takes about 4-5 seconds.
The timeouts occur when I have to sift through the data to make updates to the products table. The steps that it is running right now is like this
1. Get the sku from csv_data table (that holds thousands of records)
2. Select * from Products where products.sku = csv.sku (products table also holds thousands of records to loop through)
3. Get numrows.
If numrows<0{no record in products, so skip}.
If numrows>1{duplicate entries, don't change anything, but later on report the sku}
If numrows==1{Update selected fields in the products table with csv data}
4. Go to the next record in csv_data all over again
(I figured outlining the process is shorter and easier than dropping in the code.)
I looked into MySQl views and stored procedures but I am not skilled enough in it to know if it will handle the 'if' statement portion.
Is there anything I can do to make this faster to avoid the timeouts?
edit:
I should mention that set_time_limit(0); isn't doing it. And if it helps, the server uses IIS7 and fastcgi
Thanks for your help.
Update after using suggestions from Jakob and Shawn:
I'm doing something wrong. The speed is definitely faster and the csv sku is incrementing,
but when I tried to implement Shawn's solution; the query is giving me a PHP Warning: mysql_result() expects parameter 1 to be resource, boolean error.
Can you help me spot what I am doing wrong?
Here is the section of code:
$csvdata="SELECT * FROM csv_update";
$csvdata_result=mysql_query($csvdata);
mysql_query($csvdata);
$csvdata_num = mysql_num_rows($csvdata_result);
$i=0;
while($i<$csvdata_num){
$csv_code=#mysql_result($csvdata_result,$i,"skucode");
$datacheck=NULL;
$datacheck=substr($csv_code,0,1);
if($datacheck>='0' && $datacheck<='9'){
$csv_price=#mysql_result($csvdata_result,$i,"price");
$csv_retail=#mysql_result($csvdata_result,$i,"retail");
$csv_stock=#mysql_result($csvdata_result,$i,"stock");
$csv_weight=#mysql_result($csvdata_result,$i,"weight");
$csv_manufacturer=#mysql_result($csvdata_result,$i,"manufacturer");
$csv_misc1=#mysql_result($csvdata_result,$i,"misc1");
$csv_misc2=#mysql_result($csvdata_result,$i,"misc2");
$csv_selectlist=#mysql_result($csvdata_result,$i,"selectlist");
$csv_level5=#mysql_result($csvdata_result,$i,"level5");
$csv_frontpage=#mysql_result($csvdata_result,$i,"frontpage");
$csv_level3=#mysql_result($csvdata_result,$i,"level3");
$csv_minquantity=#mysql_result($csvdata_result,$i,"minquantity");
$csv_quantity1=#mysql_result($csvdata_result,$i,"quantity1");
$csv_discount1=#mysql_result($csvdata_result,$i,"discount1");
$csv_quantity2=#mysql_result($csvdata_result,$i,"quantity2");
$csv_discount2=#mysql_result($csvdata_result,$i,"discount2");
$csv_quantity3=#mysql_result($csvdata_result,$i,"quantity3");
$csv_discount3=#mysql_result($csvdata_result,$i,"discount3");
$count_check="SELECT COUNT(*) AS totalCount FROM products WHERE skucode = '$csv_code'";
$count_result=mysql_query($count_check);
mysql_query($count_check);
$totalCount=#mysql_result($count_result,0,'totalCount');
$loopCount = ceil($totalCount / 25);
for($j = 0; $j < $loopCount; $j++){
$prod_check="SELECT skucode FROM products WHERE skucode = '$csv_code' LIMIT ($loopCount*25), 25;";
$prodresult=mysql_query($prod_check);
mysql_query($prod_check);
$prodnum =#mysql_num_rows($prodresult);
$prod_id=#mysql_result($prodresult,0,"catalogid");
if($prodnum<1){
echo "NOT FOUND:$csv_code<br>";
$count_sku_not_found=$count_sku_not_found+1;
$list_sku_not_found=$list_sku_not_found." $csv_code";}
if($prodnum>1){
echo "DUPLICATE:$csv_ccode<br>";
$count_duplicate_skus=$count_duplicate_skus+1;
$list_duplicate_skus=$list_duplicate_skus." $csv_code";}
if ($prodnum==1){
///This prevents an overwrite from happening if the csv file doesn't produce properly
if ($csv_price!="" OR $csv_price!=NULL)
{$sql_price='price="'.$csv_price.'"';}
if ($csv_retail!="" OR $csv_retail!=NULL)
{$sql_retail=',retail="'.$csv_retail.'"';}
if ($csv_stock!="" OR $csv_stock!=NULL)
{$sql_stock=',stock="'.$csv_stock.'"';}
if ($csv_weight!="" OR $csv_weight!=NULL)
{$sql_weight=',weight="'.$csv_weight.'"';}
if ($csv_manufacturer!="" OR $csv_manufacturer!=NULL)
{$sql_manufacturer=',manufacturer="'.$csv_manufacturer.'"';}
if ($csv_misc1!="" OR $csv_misc1!=NULL)
{$sql_misc1=',misc1="'.$csv_misc1.'"';}
if ($csv_misc2!="" OR $csv_misc2!=NULL)
{$sql_pother2=',pother2="'.$csv_misc2.'"';}
if ($csv_selectlist!="" OR $csv_selectlist!=NULL)
{$sql_selectlist=',selectlist="'.$csv_selectlist.'"';}
if ($csv_level5!="" OR $csv_level5!=NULL)
{$sql_level5=',level5="'.$csv_level5.'"';}
if ($csv_frontpage!="" OR $csv_frontpage!=NULL)
{$sql_frontpage=',frontpage="'.$csv_frontpage.'"';}
$import="UPDATE products SET $sql_price $sql_retail $sql_stock $sql_weight $sql_manufacturer $sql_misc1 $sql_misc2 $sql_selectlist $sql_level5 $sql_frontpage $sql_in_stock WHERE skucode='$csv_code'";
mysql_query($import) or die(mysql_error("error updating in products table"));
echo "Update ".$csv_code." successful ($i)<br>";
$count_success_update_skus=$count_success_update_skus+1;
$list_success_update_skus=$list_success_update_skus." $csv_code";
//empty out variables
$sql_price='';
$sql_retail='';
$sql_stock='';
$sql_weight='';
$sql_manufacturer='';
$sql_misc1='';
$sql_misc2='';
$sql_selectlist='';
$sql_level5='';
$sql_frontpage='';
$sql_in_stock='';
$prodnum=0;
}
}
$i++;
}
Is it timing out before the first row is returned or is it between rows during the read? One good practice bit would be to handle your query in chunks; do a count first to see how many records you are dealing with for the SKU, the loop through smaller chunks (the size of these chunks would depend on how many things you have to do with each row). Your updated workflow would look more like this:
Get next SKU from CSV
Get a total count: SELECT COUNT(*) AS totalCount FROM products WHERE products.sku = csv.sku
Determine chunk size (using 25 for this demo)
loopCount = ceil(totalCount / 25)
Loop through all results using a loop like this: for($i = 0; $i < loopCount; $i++)
Inside your loop you should be running a query like this: SELECT * FROM products WHERE products.sku = csv.sku LIMIT (loopCount*25), 25
You will want to use a constant order for your SELECT chunks; your unique ID would probably be best.
I think you can solve this problem with cron. http://en.wikipedia.org/wiki/Cron . It has never had timeout.

PDO/MySQL memory consumption with large result set

I'm having a strange time dealing with selecting from a table with about 30,000 rows.
It seems my script is using an outrageous amount of memory for what is a simple, forward only walk over a query result.
Please note that this example is a somewhat contrived, absolute bare minimum example which bears very little resemblance to the real code and it cannot be replaced with a simple database aggregation. It is intended to illustrate the point that each row does not need to be retained on each iteration.
<?php
$pdo = new PDO('mysql:host=127.0.0.1', 'foo', 'bar', array(
PDO::ATTR_ERRMODE=>PDO::ERRMODE_EXCEPTION,
));
$stmt = $pdo->prepare('SELECT * FROM round');
$stmt->execute();
function do_stuff($row) {}
$c = 0;
while ($row = $stmt->fetch()) {
// do something with the object that doesn't involve keeping
// it around and can't be done in SQL
do_stuff($row);
$row = null;
++$c;
}
var_dump($c);
var_dump(memory_get_usage());
var_dump(memory_get_peak_usage());
This outputs:
int(39508)
int(43005064)
int(43018120)
I don't understand why 40 meg of memory is used when hardly any data needs to be held at any one time. I have already worked out I can reduce the memory by a factor of about 6 by replacing "SELECT *" with "SELECT home, away", however I consider even this usage to be insanely high and the table is only going to get bigger.
Is there a setting I'm missing, or is there some limitation in PDO that I should be aware of? I'm happy to get rid of PDO in favour of mysqli if it can not support this, so if that's my only option, how would I perform this using mysqli instead?
After creating the connection, you need to set PDO::MYSQL_ATTR_USE_BUFFERED_QUERY to false:
<?php
$pdo = new PDO('mysql:host=127.0.0.1', 'foo', 'bar', array(
PDO::ATTR_ERRMODE=>PDO::ERRMODE_EXCEPTION,
));
$pdo->setAttribute(PDO::MYSQL_ATTR_USE_BUFFERED_QUERY, false);
// snip
var_dump(memory_get_usage());
var_dump(memory_get_peak_usage());
This outputs:
int(39508)
int(653920)
int(668136)
Regardless of the result size, the memory usage remains pretty much static.
Another option would be to do something like:
$i = $c = 0;
$query = 'SELECT home, away FROM round LIMIT 2048 OFFSET %u;';
while ($c += count($rows = codeThatFetches(sprintf($query, $i++ * 2048))) > 0)
{
foreach ($rows as $row)
{
do_stuff($row);
}
}
The whole result set (all 30,000 rows) is buffered into memory before you can start looking at it.
You should be letting the database do the aggregation and only asking it for the two numbers you need.
SELECT SUM(home) AS home, SUM(away) AS away, COUNT(*) AS c FROM round
The reality of the situation is that if you fetch all rows and expect to be able to iterate over all of them in PHP, at once, they will exist in memory.
If you really don't think using SQL powered expressions and aggregation is the solution you could consider limiting/chunking your data processing. Instead of fetching all rows at once do something like:
1) Fetch 5,000 rows
2) Aggregate/Calculate intermediary results
3) unset variables to free memory
4) Back to step 1 (fetch next set of rows)
Just an idea...
I haven't done this before in PHP, but you may consider fetching the rows using a scrollable cursor - see the fetch documentation for an example.
Instead of returning all the results of your query at once back to your PHP script, it holds the results on the server side and you use a cursor to iterate through them getting one at a time.
Whilst I have not tested this, it is bound to have other drawbacks such as utilising more server resources and most likely reduced performance due to additional communication with the server.
Altering the fetch style may also have an impact as by default the documentation indicates it will store both an associative array and well as a numerical indexed array which is bound to increase memory usage.
As others have suggested, reducing the number of results in the first place is most likely a better option if possible.

Categories