Running an intensive batch process in PHP, and avoiding memory exhaustion - php

I have several thousand records (stored in in a table in a MYSQL table) that I need to "batch process." All of the records contain a large JSON. In some cases, the JSON is over 1MB (yes, my DB is well over 1GB).
I have a function that grabs a record, decodes the JSON, changes some data, re-encodes the PHP array back to a JSON, and saves it back to the db. Pretty simple. FWIW, this is within the context of a CakePHP app.
Given an array of ID's, I'm attempting to do something like this (very simple mock code):
foreach ($ids as $id) {
$this->Model->id = $id;
$data = $this->Model->read();
$newData = processData($data);
$this->Model->save($newData);
}
The issue is that, very quickly, PHP runs out of memory. When running a foreach like this, it's almost as if PHP moves from one record to the next, without releasing the memory required for the preceding operations.
Is there anyway to run a loop in such a way that memory is freed before moving on to the next iteration of the loop, so that I can actually process the massive amount of data?
Edit: Adding more code. This function takes my JSON, converts it to a PHP array, does some manipulation (namely, reconfiguring data based on what's present in another array), and replacing values in the the original array. The JSON is many layers deep, hence the extremely long foreach loops.
function processData($theData) {
$toConvert = json_decode($theData['Program']['data'], $assoc = true);
foreach($toConvert['cycles'] as $cycle => $val) {
foreach($toConvert['cycles'][$cycle]['days'] as $day => $val) {
foreach($toConvert['cycles'][$cycle]['days'][$day]['sections'] as $section => $val) {
foreach($toConvert['cycles'][$cycle]['days'][$day]['sections'] as $section => $val) {
foreach($toConvert['cycles'][$cycle]['days'][$day]['sections'][$section]['exercises'] as $exercise => $val) {
if (isset($toConvert['cycles'][$cycle]['days'][$day]['sections'][$section]['exercises'][$exercise]['selectedFolder'])) {
$folderName = $toConvert['cycles'][$cycle]['days'][$day]['sections'][$section]['exercises'][$exercise]['selectedFolder']['folderName'];
if ( isset($newFolderList['Folders'][$folderName]) ) {
$toConvert['cycles'][$cycle]['days'][$day]['sections'][$section]['exercises'][$exercise]['selectedFolder'] = $newFolderList['Folders'][$folderName]['id'];
}
}
if (isset($toConvert['cycles'][$cycle]['days'][$day]['sections'][$section]['exercises'][$exercise]['selectedFile'])) {
$fileName = basename($toConvert['cycles'][$cycle]['days'][$day]['sections'][$section]['exercises'][$exercise]['selectedFile']['fileURL']);
if ( isset($newFolderList['Exercises'][$fileName]) ) {
$toConvert['cycles'][$cycle]['days'][$day]['sections'][$section]['exercises'][$exercise]['selectedFile'] = $newFolderList['Exercises'][$fileName]['id'];
}
}
}
}
}
}
}
return $toConvert;
}
Model->read() essentially just tells Cake to pull a record from the db, and returns it in an array. There's plenty of stuff that's happening behind the scenes, someone more knowledgable would have to explain that.

The first step I would do is make sure everything is passed by reference.
Eg,
foreach ($ids as $id) {
processData($data);
}
function processData(&$d){}
http://php.net/manual/en/language.references.pass.php

Related

Efficient way of creating a multi - dim array from coursor

I'm executing a cursor. I have cutted off the code how the procedure is called and executed. This part is efficient.
At last I have a a not big cursor. I'm calling the procedure, which returns this cursor many times on the page and I need to create a multidimensional array from it. This array should look like like the following:
$ret = oci_execute($outrefc) ;
while ($row = #oci_fetch_array($outrefc))
{
foreach (array_keys($row) as $key)
{
$res[$i][$key] = $row[$key];
}
$i++;
}
Is there any way to make the upper snippet faster?
The multidimensional array should stay as it is. I only wonder if I could create it in any more efficient way.
Thank you!

PHP - json_encode a generator object (using yield)

I have a very large array in PHP (5.6), generated dynamically, which I want to convert to JSON. The problem is that the array is too large that it doesn't fit in memory - I get a fatal error when I try to process it (exhausted memory). So I figured out that, using generators, the memory problem will disappear.
This is the code I've tried so far (this reduced example obvisously doesn't produce the memory error):
<?php
function arrayGenerator()// new way using generators
{
for ($i = 0; $i < 100; $i++) {
yield $i;
}
}
function getArray()// old way, generating and returning the full array
{
$array = [];
for ($i = 0; $i < 100; $i++) {
$array[] = $i;
}
return $array;
}
$object = [
'id' => 'foo',
'type' => 'blah',
'data' => getArray(),
'gen' => arrayGenerator(),
];
echo json_encode($object);
But PHP seems to not JSON-encode the values from the generator. This is the output I get from the previuos script:
{
"id": "foo",
"type": "blah",
"data": [// old way - OK
0,
1,
2,
3,
//...
],
"gen": {}// using generator - empty object!
}
Is it possible to JSON-encode an array produced by a generator without generating the full sequence before I call to json_encode?
Unfortunately, json_encode cannot generate a result from a generator function. Using iterator_to_array will still try to create the whole array, which will still cause memory issues.
You will need to create your function that will generate the json string from the generator function. Here's an example of how that could look:
function json_encode_generator(callable $generator) {
$result = '[';
foreach ($generator as $value) {
$result .= json_encode($value) . ',';
}
return trim($result, ',') . ']';
}
Instead of encoding the whole array at once, it encodes only one object at a time and concatenates the results into one string.
The above example only takes care of encoding an array, but it can be easily extended to recursively encoding whole objects.
If the created string is still too big to fit in the memory, then your only remaining option is to directly use an output stream. Here's how that could look:
function json_encode_generator(callable $generator, $outputStream) {
fwrite($outputStream, '[');
foreach ($generator as $key => $value) {
if ($key != 0) {
fwrite($outputStream, ',');
}
fwrite($outputStream, json_encode($value));
}
fwrite($outputStream, ']');
}
As you can see, the only difference is that we now use fwrite to write to the passed in stream instead of concatenating strings, and we also need to take care of the trailing comma in a different way.
What is a generator function?
A generator function is effectively a more compact and efficient way to write an Iterator. It allows you to define a function that will calculate and return values while you are looping over it:
Also as per document from http://php.net/manual/en/language.generators.overview.php
Generators provide an easy way to implement simple iterators without the overhead or complexity of implementing a class that implements the Iterator interface.
A generator allows you to write code that uses foreach to iterate over a set of data without needing to build an array in memory, which may cause you to exceed a memory limit, or require a considerable amount of processing time to generate. Instead, you can write a generator function, which is the same as a normal function, except that instead of returning once, a generator can yield as many times as it needs to in order to provide the values to be iterated over.
What is yield?
The yield keyword returns data from a generator function:
The heart of a generator function is the yield keyword. In its simplest form, a yield statement looks much like a return statement, except that instead of stopping execution of the function and returning, yield instead provides a value to the code looping over the generator and pauses execution of the generator function.
So in your case to generate expected output you need to iterate output of arrayGenerator() function by using foreach loop or iterator before processind it to json (as suggested by #apokryfos)

Checking if value in slave array exists in master array Quickly without loop via PHP

In my PHP application. I am taking value from user and all these user values are stored in Array. And just for validation. I am comparing user input value with my array. :-
<?php
// Current Code
$masterArray = array(......); // ..... represents some 60-100 different values.
foreach($_POST as $key => $value) {
if(in_array($value, $masterArray)) {
$insertQuery = $mysqli->query("INSERTION stuff or Updating Stuff");
} else {
echo "Are you tampering html-form-data ?";
}
}
?>
But this is so worthless code, as it takes quite good time in updating or insertion.
Is there any better function that is way faster to check if value in slave array exists in master array ?
From Slave Array i Mean => List / Array of User Input value.
From Master Array i mean => List of my array value stored in page.
Thanks
I think i got the better option with array_diff.
Please let me know if i am doing anything wrong in below before i put this code in production page:- Thanks a lot for your efforts #J.David Smith & #grossvogel
<?php
$masterArray = array(.......); // My Master Array List
$result = array_diff($_POST['checkBox'], $masterArray);
if(count($result) > 0) {
// If they are trying to do some tampering , let them submit all again.
echo 'Something is not Right';
} else {
// If Person is genuine, no waiting just insert them all
$total = count($_POST['checkBox']);
$insertQuery = "INSERT into notes(user,quote) values ";
for($i=0;$i<=$total; $i++) {
array_push($values, "('someuser','".$mysqli->real_escape_string($_POST['checkBox'][$i])."')");
}
$finalQuery = $mysqli->query($insertQuery.implode(',', $values));
}
?>
Is my Code Better , I am testing it in localhost i don't see much of difference, I just want to know expert views if I am messing arround with something ? before i put this code in production page.
Update : This looks pretty better and faster than code in question.
Thanks
The only other way to do this is to use an associative array with your values as keys (well, you could custom-implement another storage container specifically for this, but that'd be overkill imo). Then, you can check with isset. For example:
$masterArray = array(....); // same thing, but with values as keys instead of values
foreach($_POST as $key => $value) {
if(isset($masterArray[$value])) {
// do stuff
} else {
// do stuff
}
}
I'm kind of curious what the point of doing this is anyway, especially given the statement printed by your echo call. There may be an even better way to accomplish your goal than this.
EDIT: Another method suggested by grossvogel: loop over $masterArray instead of $_POST. If you expect $_POST to be a large set of data consistently (ie most of the time people will select 50+ items), this could be faster. Hashing is already very fast, so you'll have to benchmark it on your code in order to make a decision.
$masterArray = array(...); // either style of definition will work; i'll use yours for simplicity
foreach($masterArray as $value) {
if(isset($_POST[$value])) {
// do stuff
}
}

Query being run multiple times

I am trying to run the following code in PHP to query a MongoDB:
<?
$m = new Mongo(); // connect
$dogs = $m->dogs;
$races = $dogs->newdogs;
$js = "function() {
return this.location == 'SHEFFIELD'
}";
$dataSet = $races->find(array('$where' => $js));
foreach ($dataSet as $r){
}
?>
When I run this and watch the console, I see the query being run once.
When I change the foreach loop to be nested within another one like this:
foreach(range(1,5) as $test){
foreach ($dataSet as $r){
}
}
I see the query being run 7 times in the console?
Is this something stupid I am doing? A scoping issue? Or am I just misunderstanding how MongoDB is supposed to work?
Thanks
AH
This happens because $dataSet is a MongoCursor, not an array. A MongoCursor is a representation of a query. It will be turned into an array "on-demand", that means that when you use foreach on it, $dataSet is converted into an array by simply querying.
Since you do it within another loop, the MongoCursor is executed every time it encounters the foreach. If you don't want that behaviour, you can use iterator_to_array, since a MongoCursor is just an iterator:
$executed = iterator_to_array($dataSet); // Actual query execution
foreach($executed as $r) { // Iterate the array, not the iterator
// Hic sunt ponies
}
EDIT: Keep in mind that iterator_to_array converts the entire result set into an array in memory. If you have a very big result set, this can cause huge and unnecessary memory consumption. It's advisable to stick with a single foreach call, since it will only load one single row into memory at once.

Speed Up While loop In php

right the problem is a smallish one but hope someone can help.
Basically we use memcache to cache some of our mysql queries and then store the results in an array for memcache. Then when the query runs again it grabs the array results from memcache and processes the results like normal.
So the code looks like this.
$query = "SELECT x,y,z FROM a WHERE b = 3";
$result = mysql_query_run($query,'memcache_name');
while($row = mysql_mem_fetch_array($result,'memcache_name')) {
//do processing
}
mysql_query_run basically just either runs the query or returns the array from memcache.
The mysql_mem_fetch_array either processes the results from mysql or transverses the array.
The transversing part uses this code.
if(is_array($result)) {
$return = current($result);
//get the current result - based on the internal pointer
next($result);//increment pointed
return $return;//return result
}
else {
//mysql result tab
//so get the array from the mysql_fetch_array
$array = mysql_fetch_array($result);
if(is_array($MEMCACHE_SERVER_RESULTS[$memcache])==false) {
//if there is no results then just set the results as array
$MEMCACHE_SERVER_RESULTS[$memcache] = array();
}
//push the array onto the end of the current array - from memcache
//if there are some results then push them on
if($single_row == 1) {
//only one row so just combine the 2 steps and store
array_push($MEMCACHE_SERVER_RESULTS[$memcache],$array);
$MEMCACHE_SERVER->set($memcache, $MEMCACHE_SERVER_RESULTS[$memcache],0,$memcache_time);
}
else if($array!==false) {
array_push($MEMCACHE_SERVER_RESULTS[$memcache],$array);
//and set it
}
else {
//set the memcache to the array that it has been making.
$MEMCACHE_SERVER->set($memcache, $MEMCACHE_SERVER_RESULTS[$memcache],0,$memcache_time);
}
//return array
return $array;
}
The problem is the current and next commands - in large arrays (which are very rare) it causes some hanging.
Currently we are in php version 5.1.6 and we are going to 5.3 will the problem be solved?
Or is there a better way to handle the array?
Thanks for your help.
Richard
in large arrays (which are very rare) it causes some hanging.
Easy. Just avoid storing large arrays in memcache.
If you do:
if (is_array($result)) {
$return = each($result); //get the current result and advance the array pointer
return $return['value']; //return result
} else // ...and so on
...is it any better?

Categories