Reading from a big table

Reading from a big table - php

I want to read data from a table (or view) on SQL server 2008R2 using PHP 5.4.24 and freetds 0.91.
In PHP, I write:
$ret = mssql_query( 'SELECT * FROM mytable', $Conn ) ;
Then I read rows one at a time, process them, all is ok. Except, when the table is really big, then I get an error:
Fatal error: Allowed memory size of 1073741824 bytes exhausted (tried to allocate 4625408 bytes)
in /home/prove/test.php on line 43
The error happens in the mssql_query(), so it does not matter how I query.
Altering the query to return less rows or less columns is not viable, because I must read a lot of data from many tables in a limited time.
What can I do to convince PHP to read in memory one row at a time, or a reasonable number at a time?

I agree with #James in the Question comments, if you need to read a table so big it exhausts your memory to stash the results before even handing back to PHP, it probably just means you need to figure out a better way. However, here is a possible solution (untested and I've only used MSSQL a couple of times; but if not perfect, I tried my best):
$ret = mssql_query( 'SELECT COUNT(*) as TotalRows FROM mytable', $Conn);
$row = mssql_fetch_row($ret);
$offset = 0;
$increment = 50;
while ($offset < $row['TotalRows'])
{
$ret_2 = mssql_query("SELECT * FROM mytable ORDER BY Id ASC OFFSET {$offset} ROWS FETCH NEXT {$increment} ROWS ONLY", $Conn);
//
// loop over those 50 rows, do your thing...
//
$offset += $increment;
}
//
// there will probably be a remainder, so you'll have to account for that in the post-iteration; probably making the interior loop a function or method would be wise.
//

Related

Reduce memory usage for big database calls php

I am using this script to cross reference data so that I can retrieve the right data from the database which requires 3 checks of which 2 are run within a for loop
$state = str_replace("<span>","",$state);
$state = str_replace("</span>","",$state);
$getStateGroups = $wpdb->get_results("SELECT * FROM wp_gmw_locations WHERE region_code='$state'");
$groupIdList = array();
for($i = 0; $i < count($getStateGroups); $i++){
array_push($groupIdList,$getStateGroups[$i]->object_id);
}
$groupMemberList = array();
for($i =0; $i < count($groupIdList);$i++){
$ggm = $wpdb->get_results("SELECT * FROM wp_bp_groups_members WHERE group_id=" . $groupIdList[$i]);
for($x = 0; $x < count($ggm); $i++){
array_push($groupMemberList,$ggm[$x]->user_id);
}
}
$whereList = implode(',',$groupMemberList);
$userGameData = $wpdb->get_results("SELECT * FROM user_game_data WHERE uid IN ($whereList) ORDER BY lifetime_keys_spent DESC");
return implode(',',$groupIdList);
using this I am running out of memory
I have tried increasing my memory limit from 256M to 3GB and it stills runs out of memory
I am using wordpress so $wpdb is the class that wordpress uses to interact with the database
I am required to cross reference the data pulled to get the information required for the functionality we want is there any way i can reduce the memory that this is trying to use or is there some bug that in the code that i am missing?
EDIT:
The script runs out of memory on the line array_push($groupMemberList,$ggm[$x]->user_id);

SELECT * FROM wp_gmw_locations WHERE region_code='$state'
if you only need the ids, why are you selecting everything? (not to mention that just interpreting $state in there makes you prone for SQL injection attacks, read up on prepared statements.)
your first improvement would be only selecting the id:
SELECT object_id FROM wp_gmw_locations WHERE region_code='$state'
next thing: you're only using the ids for further queries, you're not using them in your code otherwise, right?
then why not let the database do the heavy lifting, it's much better at it:
SELECT * FROM wp_bp_groups_members WHERE group_id IN
(SELECT object_id FROM wp_gmw_locations WHERE region_code='$state')
which not just gets rid of all that memory overhead, but also turns n+1 queries to the database into one.
again, you only need the id? further refine your select then!
SELECT user_id FROM wp_bp_groups_members WHERE group_id IN
(SELECT object_id FROM wp_gmw_locations WHERE region_code='$state')
that should drastically reduce your memory footprint and your execution time. you should be able to take it from here and improve the rest of your code. i recommend spinning up a SQL-console and just playing around a bit so you get a feel for what the database can do for you - it's quite a lot! i haven't encountered any data retrieval task in the past few years i couldn't solve with just one query.
by the way: take a look at foreach-loops, they make code quite elegant compared to for-loops.

Longtext max memory error using mysqli_query

The company i work for uses Kayako to manage its support tickets. I was set to make an external webapp that takes all the tickets from one company and displays the history.
Im using mysqli_query to connect to the db.
$link = mysqli_connect($serverName, $userName, $password, $dbName);
$sql = "SELECT ticketid, fullname, creator, contents FROM swticketposts";
$result = mysqli_query($link, $sql) or die(mysqli::$error);
The problem is that the "contents" table in mySQL uses the datatype LONGTEXT.
Trying to read this data with php gives me either timeout or max memory usage errors.
Line 49 is the $result = mysqli_query etc line.
EDIT: Posted wrong error msg in original post
Fatal error: Allowed memory size of 134217728 bytes exhausted (tried to allocate 8192 bytes)
Trying to work around the memory problem i added:
ini_set('memory_limit', '-1');
which gave me this timeout error instead:
Fatal error: Maximum execution time of 30 seconds exceeded in C:\xampp\htdocs\test\php\TicketCall.php on line 49
Thing is, when we view the tickets on the kayako platform, it reads the same contents table and does it instantly, so there must be a way to read these longtext's faster. How is beyond me.
Solutions i can't use:
Change the datatype to something smaller (would break our kayako system)
TLDR; Is there a way to read from longtext data types without killing memory and getting timout errors using php and mysql.

First of all, you might consider changing your query to something like this:
SELECT ticketid, fullname, creator, contents FROM swticketposts WHERE customer = 'something'
That will make your query return less data by filtering out the information from customers your report doesn't care about. It may save execution time on your query.
Second, you may wish to use
SUBSTRING(contents, 1, 1000) AS contents
in place of contents in your query. This will get you back part of the contents column (the first thousand characters); it may (or may not) be good enough for what you're trying to do.
Third, mysqli_ generally uses buffered querying. That means the entire result set gets slurped into your program's memory when you run the query. That's probably why your memory is blowing out. Read this. http://php.net/manual/en/mysqlinfo.concepts.buffering.php
You can arrange to handle your query row-by-row doing this:
$unbufResult = mysqli_query($link, $sql, MYSQLI_USE_RESULT) or trigger_error($link->error);
if ($unbufResult ) {
while ($row = $unbufResult ->fetch_assoc()) {
/* deal with the data from one row */
}
}
$unbufResult->close();
You need to be careful to use close() when you're done with these unbuffered result sets, because they use resources on your MySQL server until you do. This will probably prevent your memory blowout. (There's nothing special about using fetch_assoc() here; you can fetch each row with any method you wish.)
Fourth, php's memory and time limits are usually tuned for interactive web site operation. Sometimes report generation takes a long time. Put calls to set_time_limit() in and around your loop, something like this.
set_time_limit (300); /* five minutes */
$unbufResult = mysqli_query($link, $sql, MYSQLI_USE_RESULT) or trigger_error($link->error);
if ($unbufResult ) {
while ($row = $unbufResult ->fetch_assoc()) {
set_time_limit (30); /* reset time limit on each row */
/* deal with the data from one row */
}
}
$unbufResult->close();
set_time_limit (300); /* set time limit back to five minutes when done reading */

Sphinx How can i keep connection active even if no activity for longer time?

I was doing bulk inserts in the RealTime Index using PHP and by Disabling AUTOCOMIT ,
e.g.
// sphinx connection
$sphinxql = mysqli_connect($sphinxql_host.':'.$sphinxql_port,'','');
//do some other time consuming work
//sphinx start transaction
mysqli_begin_transaction($sphinxql);
//do 50k updates or inserts
// Commit transaction
mysqli_commit($sphinxql);
and kept the script running overnight, in the morning i saw
PHP Fatal error: Allowed memory size of 134217728 bytes exhausted (tried to allocate
212334 bytes) in
so when i checked the nohup.out file closely , i noticed , these lines ,
PHP Warning: mysqli_query(): MySQL server has gone away in /home/script.php on line 502
Warning: mysqli_query(): MySQL server has gone away in /home/script.php on line 502
memory usage before these lines was normal , but memory usage after these lines started to increase, and it hit the php mem_limit and gave PHP Fatal error and died.
in script.php , line 502 is
mysqli_query($sphinxql,$update_query_sphinx);
so my guess is, sphinx server closed/died after few hours/ minutes of inactivity.
i have tried setting in sphinx.conf
client_timeout = 3600
Restarted the searchd by
systemctl restart searchd
and still i am facing same issue.
So how can i not make sphinx server die on me ,when no activity is present for longer time ?
more info added -
i am getting data from mysql in 50k chunks at a time and doing while loop to fetch each row and update it in sphinx RT index. like this
//6mil rows update in mysql, so it takes around 18-20 minutes to complete this then comes this following part.
$subset_count = 50000 ;
$total_count_query = "SELECT COUNT(*) as total_count FROM content WHERE enabled = '1'" ;
$total_count = mysqli_query ($conn,$total_count_query);
$total_count = mysqli_fetch_assoc($total_count);
$total_count = $total_count['total_count'];
$current_count = 0;
while ($current_count <= $total_count){
$get_mysql_data_query = "SELECT record_num, views , comments, votes FROM content WHERE enabled = 1 ORDER BY record_num ASC LIMIT $current_count , $subset_count ";
//sphinx start transaction
mysqli_begin_transaction($sphinxql);
if ($result = mysqli_query($conn, $get_mysql_data_query)) {
/* fetch associative array */
while ($row = mysqli_fetch_assoc($result)) {
//sphinx escape whole array
$escaped_sphinx = mysqli_real_escape_array($sphinxql,$row);
//update data in sphinx index
$update_query_sphinx = "UPDATE $sphinx_index
SET
views = ".$escaped_sphinx['views']." ,
comments = ".$escaped_sphinx['comments']." ,
votes = ".$escaped_sphinx['votes']."
WHERE
id = ".$escaped_sphinx['record_num']." ";
mysqli_query ($sphinxql,$update_query_sphinx);
}
/* free result set */
mysqli_free_result($result);
}
// Commit transaction
mysqli_commit($sphinxql);
$current_count = $current_count + $subset_count ;
}

So there are a couple of issues here, both related to running big processes.
MySQL server has gone away - This usually means that MySQL has timed out, but it could also mean that the MySQL process crashed due to running out of memory. In short, it means that MySQL has stopped responding, and didn't tell the client why (i.e. no direct query error). Seeing as you said that you're running 50k updates in a single transaction, it's likely that MySQL just ran out of memory.
Allowed memory size of 134217728 bytes exhausted - means that PHP ran out of memory. This also leads credence to the idea that MySQL ran out of memory.
So what to do about this?
The initial stop-gap solution is to increase memory limits for PHP and MySQL. That's not really solving the root cause, and depending on t he amount of control you have (and knowledge you have) of your deployment stack, it may not be possible.
As a few people mentioned, batching the process may help. It's hard to say the best way to do this without knowing the actual problem that you're working on solving. If you can calculate, say, 10000 or 20000 records instad of 50000 in a batch that may solve your problems. If that's going to take too long in a single process, you could also look into using a message queue (RabbitMQ is a good one that I've used on a number of projects), so that you can run multiple processes at the same time processing smaller batches.
If you're doing something that requires knowledge of all 6 million+ records to perform the calculation, you could potentially split the process up into a number of smaller steps, cache the work done "to date" (as such), and then pick up the next step in the next process. How to do this cleanly is difficult (again, something like RabbitMQ could simplify that by firing an event when each process is finished, so that the next one can start up).
So, in short, there are your best two options:
Throw more resources/memory at the problem everywhere that you can
Break the problem down into smaller, self contained chunks.

You need to reconnect or restart the DB session just before mysqli_begin_transaction($sphinxql)
something like this.
<?php
//reconnect to spinx if it is disconnected due to timeout or whatever , or force reconnect
function sphinxReconnect($force = false) {
global $sphinxql_host;
global $sphinxql_port;
global $sphinxql;
if($force){
mysqli_close($sphinxql);
$sphinxql = #mysqli_connect($sphinxql_host.':'.$sphinxql_port,'','') or die('ERROR');
}else{
if(!mysqli_ping($sphinxql)){
mysqli_close($sphinxql);
$sphinxql = #mysqli_connect($sphinxql_host.':'.$sphinxql_port,'','') or die('ERROR');
}
}
}
//10mil+ rows update in mysql, so it takes around 18-20 minutes to complete this then comes this following part.
//reconnect to sphinx
sphinxReconnect(true);
//sphinx start transaction
mysqli_begin_transaction($sphinxql);
//do your otherstuff
// Commit transaction
mysqli_commit($sphinxql);

Timing out while updating MySQL with PHP from a CSV

I need to come up with a way to make a large task faster to beat the timeout.
I have very limited access to the server due to the restrictions of the hosting company.
I have a system set up where a cron visits a PHP file that grabs a csv that contains data on some products. The csv does not contain all of the fields that the product would have. Just a handful of essential ones.
I've read a fair number of articles on timeouts and handling csv's and currently (in an attempt to shave time) I have made a table (let's call it csv_data) to hold the csv data. I have a script that truncates the csv_data table then inserts data from the csv so each night the latest recordset from the csv is in that table (the csv file gets updated nightly). So far, no timeout problems..the task only takes about 4-5 seconds.
The timeouts occur when I have to sift through the data to make updates to the products table. The steps that it is running right now is like this
1. Get the sku from csv_data table (that holds thousands of records)
2. Select * from Products where products.sku = csv.sku (products table also holds thousands of records to loop through)
3. Get numrows.
If numrows<0{no record in products, so skip}.
If numrows>1{duplicate entries, don't change anything, but later on report the sku}
If numrows==1{Update selected fields in the products table with csv data}
4. Go to the next record in csv_data all over again
(I figured outlining the process is shorter and easier than dropping in the code.)
I looked into MySQl views and stored procedures but I am not skilled enough in it to know if it will handle the 'if' statement portion.
Is there anything I can do to make this faster to avoid the timeouts?
edit:
I should mention that set_time_limit(0); isn't doing it. And if it helps, the server uses IIS7 and fastcgi
Thanks for your help.
Update after using suggestions from Jakob and Shawn:
I'm doing something wrong. The speed is definitely faster and the csv sku is incrementing,
but when I tried to implement Shawn's solution; the query is giving me a PHP Warning: mysql_result() expects parameter 1 to be resource, boolean error.
Can you help me spot what I am doing wrong?
Here is the section of code:
$csvdata="SELECT * FROM csv_update";
$csvdata_result=mysql_query($csvdata);
mysql_query($csvdata);
$csvdata_num = mysql_num_rows($csvdata_result);
$i=0;
while($i<$csvdata_num){
$csv_code=#mysql_result($csvdata_result,$i,"skucode");
$datacheck=NULL;
$datacheck=substr($csv_code,0,1);
if($datacheck>='0' && $datacheck<='9'){
$csv_price=#mysql_result($csvdata_result,$i,"price");
$csv_retail=#mysql_result($csvdata_result,$i,"retail");
$csv_stock=#mysql_result($csvdata_result,$i,"stock");
$csv_weight=#mysql_result($csvdata_result,$i,"weight");
$csv_manufacturer=#mysql_result($csvdata_result,$i,"manufacturer");
$csv_misc1=#mysql_result($csvdata_result,$i,"misc1");
$csv_misc2=#mysql_result($csvdata_result,$i,"misc2");
$csv_selectlist=#mysql_result($csvdata_result,$i,"selectlist");
$csv_level5=#mysql_result($csvdata_result,$i,"level5");
$csv_frontpage=#mysql_result($csvdata_result,$i,"frontpage");
$csv_level3=#mysql_result($csvdata_result,$i,"level3");
$csv_minquantity=#mysql_result($csvdata_result,$i,"minquantity");
$csv_quantity1=#mysql_result($csvdata_result,$i,"quantity1");
$csv_discount1=#mysql_result($csvdata_result,$i,"discount1");
$csv_quantity2=#mysql_result($csvdata_result,$i,"quantity2");
$csv_discount2=#mysql_result($csvdata_result,$i,"discount2");
$csv_quantity3=#mysql_result($csvdata_result,$i,"quantity3");
$csv_discount3=#mysql_result($csvdata_result,$i,"discount3");
$count_check="SELECT COUNT(*) AS totalCount FROM products WHERE skucode = '$csv_code'";
$count_result=mysql_query($count_check);
mysql_query($count_check);
$totalCount=#mysql_result($count_result,0,'totalCount');
$loopCount = ceil($totalCount / 25);
for($j = 0; $j < $loopCount; $j++){
$prod_check="SELECT skucode FROM products WHERE skucode = '$csv_code' LIMIT ($loopCount*25), 25;";
$prodresult=mysql_query($prod_check);
mysql_query($prod_check);
$prodnum =#mysql_num_rows($prodresult);
$prod_id=#mysql_result($prodresult,0,"catalogid");
if($prodnum<1){
echo "NOT FOUND:$csv_code<br>";
$count_sku_not_found=$count_sku_not_found+1;
$list_sku_not_found=$list_sku_not_found." $csv_code";}
if($prodnum>1){
echo "DUPLICATE:$csv_ccode<br>";
$count_duplicate_skus=$count_duplicate_skus+1;
$list_duplicate_skus=$list_duplicate_skus." $csv_code";}
if ($prodnum==1){
///This prevents an overwrite from happening if the csv file doesn't produce properly
if ($csv_price!="" OR $csv_price!=NULL)
{$sql_price='price="'.$csv_price.'"';}
if ($csv_retail!="" OR $csv_retail!=NULL)
{$sql_retail=',retail="'.$csv_retail.'"';}
if ($csv_stock!="" OR $csv_stock!=NULL)
{$sql_stock=',stock="'.$csv_stock.'"';}
if ($csv_weight!="" OR $csv_weight!=NULL)
{$sql_weight=',weight="'.$csv_weight.'"';}
if ($csv_manufacturer!="" OR $csv_manufacturer!=NULL)
{$sql_manufacturer=',manufacturer="'.$csv_manufacturer.'"';}
if ($csv_misc1!="" OR $csv_misc1!=NULL)
{$sql_misc1=',misc1="'.$csv_misc1.'"';}
if ($csv_misc2!="" OR $csv_misc2!=NULL)
{$sql_pother2=',pother2="'.$csv_misc2.'"';}
if ($csv_selectlist!="" OR $csv_selectlist!=NULL)
{$sql_selectlist=',selectlist="'.$csv_selectlist.'"';}
if ($csv_level5!="" OR $csv_level5!=NULL)
{$sql_level5=',level5="'.$csv_level5.'"';}
if ($csv_frontpage!="" OR $csv_frontpage!=NULL)
{$sql_frontpage=',frontpage="'.$csv_frontpage.'"';}
$import="UPDATE products SET $sql_price $sql_retail $sql_stock $sql_weight $sql_manufacturer $sql_misc1 $sql_misc2 $sql_selectlist $sql_level5 $sql_frontpage $sql_in_stock WHERE skucode='$csv_code'";
mysql_query($import) or die(mysql_error("error updating in products table"));
echo "Update ".$csv_code." successful ($i)<br>";
$count_success_update_skus=$count_success_update_skus+1;
$list_success_update_skus=$list_success_update_skus." $csv_code";
//empty out variables
$sql_price='';
$sql_retail='';
$sql_stock='';
$sql_weight='';
$sql_manufacturer='';
$sql_misc1='';
$sql_misc2='';
$sql_selectlist='';
$sql_level5='';
$sql_frontpage='';
$sql_in_stock='';
$prodnum=0;
}
}
$i++;
}

Is it timing out before the first row is returned or is it between rows during the read? One good practice bit would be to handle your query in chunks; do a count first to see how many records you are dealing with for the SKU, the loop through smaller chunks (the size of these chunks would depend on how many things you have to do with each row). Your updated workflow would look more like this:
Get next SKU from CSV
Get a total count: SELECT COUNT(*) AS totalCount FROM products WHERE products.sku = csv.sku
Determine chunk size (using 25 for this demo)
loopCount = ceil(totalCount / 25)
Loop through all results using a loop like this: for($i = 0; $i < loopCount; $i++)
Inside your loop you should be running a query like this: SELECT * FROM products WHERE products.sku = csv.sku LIMIT (loopCount*25), 25
You will want to use a constant order for your SELECT chunks; your unique ID would probably be best.

I think you can solve this problem with cron. http://en.wikipedia.org/wiki/Cron . It has never had timeout.

PDO/MySQL memory consumption with large result set

I'm having a strange time dealing with selecting from a table with about 30,000 rows.
It seems my script is using an outrageous amount of memory for what is a simple, forward only walk over a query result.
Please note that this example is a somewhat contrived, absolute bare minimum example which bears very little resemblance to the real code and it cannot be replaced with a simple database aggregation. It is intended to illustrate the point that each row does not need to be retained on each iteration.
<?php
$pdo = new PDO('mysql:host=127.0.0.1', 'foo', 'bar', array(
PDO::ATTR_ERRMODE=>PDO::ERRMODE_EXCEPTION,
));
$stmt = $pdo->prepare('SELECT * FROM round');
$stmt->execute();
function do_stuff($row) {}
$c = 0;
while ($row = $stmt->fetch()) {
// do something with the object that doesn't involve keeping
// it around and can't be done in SQL
do_stuff($row);
$row = null;
++$c;
}
var_dump($c);
var_dump(memory_get_usage());
var_dump(memory_get_peak_usage());
This outputs:
int(39508)
int(43005064)
int(43018120)
I don't understand why 40 meg of memory is used when hardly any data needs to be held at any one time. I have already worked out I can reduce the memory by a factor of about 6 by replacing "SELECT *" with "SELECT home, away", however I consider even this usage to be insanely high and the table is only going to get bigger.
Is there a setting I'm missing, or is there some limitation in PDO that I should be aware of? I'm happy to get rid of PDO in favour of mysqli if it can not support this, so if that's my only option, how would I perform this using mysqli instead?

After creating the connection, you need to set PDO::MYSQL_ATTR_USE_BUFFERED_QUERY to false:
<?php
$pdo = new PDO('mysql:host=127.0.0.1', 'foo', 'bar', array(
PDO::ATTR_ERRMODE=>PDO::ERRMODE_EXCEPTION,
));
$pdo->setAttribute(PDO::MYSQL_ATTR_USE_BUFFERED_QUERY, false);
// snip
var_dump(memory_get_usage());
var_dump(memory_get_peak_usage());
This outputs:
int(39508)
int(653920)
int(668136)
Regardless of the result size, the memory usage remains pretty much static.

Another option would be to do something like:
$i = $c = 0;
$query = 'SELECT home, away FROM round LIMIT 2048 OFFSET %u;';
while ($c += count($rows = codeThatFetches(sprintf($query, $i++ * 2048))) > 0)
{
foreach ($rows as $row)
{
do_stuff($row);
}
}

The whole result set (all 30,000 rows) is buffered into memory before you can start looking at it.
You should be letting the database do the aggregation and only asking it for the two numbers you need.
SELECT SUM(home) AS home, SUM(away) AS away, COUNT(*) AS c FROM round

The reality of the situation is that if you fetch all rows and expect to be able to iterate over all of them in PHP, at once, they will exist in memory.
If you really don't think using SQL powered expressions and aggregation is the solution you could consider limiting/chunking your data processing. Instead of fetching all rows at once do something like:
1) Fetch 5,000 rows
2) Aggregate/Calculate intermediary results
3) unset variables to free memory
4) Back to step 1 (fetch next set of rows)
Just an idea...

I haven't done this before in PHP, but you may consider fetching the rows using a scrollable cursor - see the fetch documentation for an example.
Instead of returning all the results of your query at once back to your PHP script, it holds the results on the server side and you use a cursor to iterate through them getting one at a time.
Whilst I have not tested this, it is bound to have other drawbacks such as utilising more server resources and most likely reduced performance due to additional communication with the server.
Altering the fetch style may also have an impact as by default the documentation indicates it will store both an associative array and well as a numerical indexed array which is bound to increase memory usage.
As others have suggested, reducing the number of results in the first place is most likely a better option if possible.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.