My PHP nested loops use way too much memory - php

I'm trying to program a utility that handles a large volume of data and memory is a factor. Unfortunately each time this set of loops I have runs, it eats apx. 14MB of memory because it is executed thousands of times, even with the unset() calls (and yes I'm aware they do not clean up memory entirely, kind of why I'm asking the question). I'm wondering if there is an easier way to do this. Current working code:
$qr = array();
foreach($XML->row as $row)
{
$ra = array();
foreach($row as $key => $value)
{
$ra[$key] = $value[0];
unset($key,$value);
}
$qr[] = $ra;
unset($row,$ra);
}
unset($XML);
return $qr;
Another attempt was to do this, but it lags out. Anybody know what I'm doing wrong?
$qr = array();
while(list(,$row) = each($XML->row))
{
$ra = array();
while(list($key,$value) = each($row))
{
$ra[$key] = $value[0];
unset($key,$value);
}
$qr[] = $ra;
unset($row,$ra);
}
unset($XML);
return $qr;
Basically in the first loop, I'm just trying to do a basic array/object iteration. In the 2nd loop, I'm trying to go through each array value and get the 1st element while maintaining object/array index association. It seems I originally wrote it like this due to it being the only thing that worked (because it's looping through a SimpleXML Object). Any tips on speeding this thing up or figuring out how to make it not eat memory would be appreciated.
I'm looking for solutions for garbage collection or more efficient code. I do not plan on replacing SimpleXML as there is no need for it. More clearly, I'm looking for:
A way to iterate the SimpleXML object without needing to call the inner loop (which is only due to me doing $value[0]. Why is that necessary?
A way which is more efficient (either speed or memory-wise) for iterating through the data

If you want to use less memory i recommend you start looking at SAX parser. Here is example. It is more difficult to develop parser with SAX but it's more efficient then SimpleXML, and you could parse big xml files with it.

Your memory load is high because SimpleXML loads the entire document into memory when parsing. So your unset() calls just decrement the reference count, and because the data still persists in memory it isn't freed. This is a consequence of working with SimpleXML: the benefit of which is that the document is in memory and represented as a PHP object.
If you want to reduce your memory usage, you need use something else like XMLReader or XML Parser. These are SAX-based, or event-based, which won't load the XML file into memory, but will walk the tree one element at a time. Since you don't appear to be using something like XPath this is your better choice.

That's not how you access data from a SimpleXML object. I see you are using index [0] to get the string contents of each part of the object and treating it as an array. It's not an array, it's an object. This is how you should access string data... Example: http://php.net/manual/en/simplexml.examples-basic.php#example-5095
Something like this will do the trick:
$qr = array();
foreach($XML->row as $row)
{
$ra = array();
$ra['name'] = $value->name;
$ra['name2'] = $value->name2;
//Add a line for each element name, etc...
$qr[] = $ra;
unset($row,$ra);
}
unset($XML);
return $qr;
It will also get rid of your inner loop and save you memory.

Related

How do I pre-allocate memory for an array in PHP?

How do I pre-allocate memory for an array in PHP? I want to pre-allocate space for 351k longs. The function works when I don't use the array, but if I try to save long values in the array, then it fails. If I try a simple test loop to fill up 351k values with a range(), it works. I suspect that the array is causing memory fragmentation and then running out of memory.
In Java, I can use ArrayList al = new ArrayList(351000);.
I saw array_fill and array_pad but those initialize the array to specific values.
Solution:
I used a combination of answers. Kevin's answer worked alone, but I was hoping to prevent problems in the future too as the size grows.
ini_set('memory_limit','512M');
$foundAdIds = new \SplFixedArray(100000); # google doesn't return deleted ads. must keep track and assume everything else was deleted.
$foundAdIdsIndex = 0;
// $foundAdIds = array();
$result = $gaw->getAds(function ($googleAd) use ($adTemplates, &$foundAdIds, &$foundAdIdsIndex) { // use call back to avoid saving in memory
if ($foundAdIdsIndex >= $foundAdIds->count()) $foundAdIds->setSize( $foundAdIds->count() * 1.10 ); // grow the array
$foundAdIds[$foundAdIdsIndex++] = $googleAd->ad->id; # save ids to know which to not set deleted
// $foundAdIds[] = $googleAd->ad->id;
PHP has an Array Class with SplFixedArray
$array = new SplFixedArray(3);
$array[1] = 'test1';
$array[0] = 'test2';
$array[2] = 'test3';
foreach ($array as $k => $v) {
echo "$k => $v\n";
}
$array[] = 'fails';
gives
0 => test1
1 => test2
2 => test3
As other people have pointed out, you can't do this in PHP (well, you can create an array of fixed length, but that's not really want you need). What you can do however is increase the amount of memory for the process.
ini_set('memory_limit', '1024M');
Put that at the top of your PHP script and you should be ok. You can also set this in the php.ini file. This does not allocate 1GB of memory to PHP, but rather allows PHP to expand it's memory usage up to that point.
A couple of things to point out though:
This might not be allowed on some shared hosts
If you're using this much memory, you might need to have a look at how you're doing things and see if they can be done more efficiently
Look out for opportunities to clear out unneeded resources (do you really need to keep hold of $x that contains a huge object you've already used?) using unset($x);
The quick answer is: you can't
PHP is quite different from java.
You can make an array with specific values as you said, but you already know about them. You can 'fake' it by filling it with null values, but that's about the same to be honest.
So unless you want to just create one with array_fill and null (which is a hack in my head), you just can't.
(You might want to check your reasoning about the memory. Are you sure this isn't an XY-problem? As memory is limited by a number (max usage) I don't think the fragmentation would have much effect. Check what is taking your memory rather then try going down this road)
The closest you will get is using SplFixedArray. It doesn't preallocate the memory needed to store the values (because you can't pre-specify the type of values used), but it preallocates the array slots and doesn't need to resize the array itself as you add values.

Does unsetting array values during iterating save on memory?

This is a simple programming question, coming from my lack of knowledge of how PHP handles array copying and unsetting during a foreach loop. It's like this, I have an array that comes to me from an outside source formatted in a way I want to change. A simple example would be:
$myData = array('Key1' => array('value1', 'value2'));
But what I want would be something like:
$myData = array([0] => array('MyKey' => array('Key1' => array('value1', 'value2'))));
So I take the first $myData and format it like the second $myData. I'm totally fine with my formatting algorithm. My question lies in finding a way to conserve memory since these arrays might get a little unwieldy. So, during my foreach loop I copy the current array value(s) into the new format, then I unset the value I'm working with from the original array. E.g.:
$formattedData = array();
foreach ($myData as $key => $val) {
// do some formatting here, copy to $reformattedVal
$formattedData[] = $reformattedVal;
unset($myData[$key]);
}
Is the call to unset() a good idea here? I.e., does it conserve memory since I have copied the data and no longer need the original value? Or, does PHP automatically garbage collect the data since I don't reference it in any subsequent code?
The code runs fine, and so far my datasets have been too negligible in size to test for performance differences. I just don't know if I'm setting myself up for some weird bugs or CPU hits later on.
Thanks for any insights.
-sR
Use a reference to the variable in the foreach loop using the & operator. This avoids making a copy of the array in memory for foreach to iterate over.
edit: as pointed out by Artefacto unsetting the variable only decreases the number of references to the original variable, so the memory saved is only on pointers rather than the value of the variable. Bizarrely using a reference actually increases the total memory usage as presumably the value is copied to a new memory location instead of being referenced.
Unless the array is referenced,
foreach operates on a copy of the
specified array and not the array
itself. foreach has some side effects
on the array pointer. Don't rely on
the array pointer during or after the
foreach without resetting it.
Use memory_get_usage() to identify how much memory you are using.
There is a good write up on memory usage and allocation here.
This is useful test code to see memory allocation - try uncommenting the commented lines to see total memory usage in different scenarios.
echo memory_get_usage() . PHP_EOL;
$test = $testCopy = array();
$i = 0;
while ($i++ < 100000) {
$test[] = $i;
}
echo memory_get_usage() . PHP_EOL;
foreach ($test as $k => $v) {
//foreach ($test as $k => &$v) {
$testCopy[$k] = $v;
//unset($test[$k]);
}
echo memory_get_usage() . PHP_EOL;
I was running out of memory while processing lines of a text (xml) file within a loop. For anyone with a similar situation, this worked for me:
while($data = array_pop($xml_data)){
//process $data
}
Please remember the rules of Optimization Club:
The first rule of Optimization Club is, you do not Optimize.
The second rule of Optimization Club is, you do not Optimize without measuring.
If your app is running faster than the underlying transport protocol, the optimization is over.
One factor at a time.
No marketroids, no marketroid schedules.
Testing will go on as long as it has to.
If this is your first night at Optimization Club, you have to write a test case.
Rules #1 and #2 are especially relevant here. Unless you know that you need to optimize, and unless you have measured that need to optimize, then don't do it. Adding the unset will add a run-time hit and will make future programmers why you are doing it.
Leave it alone.
If at any point in the "formatting" you do something like:
$reformattedVal['a']['b'] = $myData[$key];
Then doing unset($myData[$key]); is irrelevant memory-wise because you are only decreasing the reference count of the variable, which now exists in two places (inside $myData[$key] and $reformattedVal['a']['b']). Actually, you save the memory of indexing the variable inside the original array, but that's almost nothing.
Unless you're accessing the element by reference unsetting will do nothing whatsoever, as you can't alter the array during within the iterator.
That said, it's generally considered bad practice to modify the collection you're iterating over - a better approach would be to break down the source array into smaller chunks (by only loading a portion of the source data at a time) and process these, unsetting each entire array "chunk" as you go.

Recursive MySQL function call eats up too much memory and dies

I have the following recursive function which works... up until a point. Then the script asks for more memory once the queries exceed about 100, and when I add more memory, the script typically just dies (I end up with a white screen on my browser).
public function returnPArray($parent=0,$depth=0,$orderBy = 'showOrder ASC'){
$query = mysql_query("SELECT *, UNIX_TIMESTAMP(lastDate) AS whenTime
FROM these_pages
WHERE parent = '".$parent."' AND deleted = 'N' ORDER BY ".$orderBy."");
$rows = mysql_num_rows($query);
while($row = mysql_fetch_assoc($query)){
// This uses my class and places the content in an array.
MyClass::$_navArray[] = array(
'id' => $row['id'],
'parent' => $row['parent']
);
MyClass::returnPArray($row['id'],($depth+1));
}
$i++;
}
Can anyone help me make this query less resource intensive? Or find a way to free up memory between calls... somehow.
The white screen is likely because of a stack overflow. Do you have a row where the parent_id is it's own id? Try adding AND id != '".(int)$parent."' to the where clause to prevent that kind of bug from creeping in...
**EDIT: To account for circular references, try modifying the assignment to something like:
while($row = mysql_fetch_assoc($query)){
if (isset(MyClass::$_navArray[$row['id']])) continue;
MyClass::$_navArray[$row['id']] = array(
'id' => $row['id'],
'parent' => $row['parent']
);
MyClass::returnPArray($row['id'],($depth+1));
}
Shouldn't you stop recursion at some point (I guess you do need to return from method if the number of rows is 0) ? From the code you posted I see an endless recursive calls to returnPArray.
Let me ask you this... are you just trying to build out a tree of pages? If so, is there some point along the hierarchy that you can call an ultimate parent? I've found that when storing tress in a db, storing the ultimate parent id in addition to the immediate parent makes it much faster to get back as you don't need any recursion or iteration against the db.
It is a bit of denormalization, but just a small bit, and it's better to denorm than to recurse or iterate vs the db.
If your needs are more complex, it may be better to retrieve more of the tree than you need and use app code to iterate through to get just the nodes/rows you need. Most application code is far superior to any DB at iteration/recursion.
Most likely you're overloading on active query result sets. If, as you say, you're getting about 100 iterations deep into the recursion, that means you've got 100 queries/resultsets open. Even if each query only returns one row, the whole resultset is kept open until the second fetch call (which would return false). You never get back to any particular level to do that second call, so you just keep firing off new queries and opening new result sets.
If you're going for a simple breadcrumb trail, with a single result needed per tree level, then I'd suggest not doing a while() loop to iterate over the result set. Fetch the record for each particular level, then close the resultset with mysql_free_result(), THEN do the recursive call.
Otherwise, try switching to a breadth-first query method, and again, free the resulset after building each tree level.
Why are you using a recursive function? When I look at the code, it looks as though you're simply creating a table which will contain both the child and parent ID of all records. If that's what you want as a result then you don't even need recursion. A simple select, not filtering on parent_id (but probably ordering on it) will do, and you only iterate over it once.
The following will probably return the same results as your current recursive function :
public function returnPArray($orderBy = 'showOrder ASC'){
$query = mysql_query("SELECT *, UNIX_TIMESTAMP(lastDate) AS whenTime
FROM these_pages
WHERE deleted = 'N' ORDER BY parent ASC,".$orderBy."");
$rows = mysql_num_rows($query);
while($row = mysql_fetch_assoc($query)){
// This uses my class and places the content in an array.
MyClass::$_navArray[] = array(
'id' => $row['id'],
'parent' => $row['parent']
);
}
}
I'd suggest getting all rows in one query and build up the tree-structure using pure PHP:
$nodeList = array();
$tree = array();
$query = mysql_query("SELECT *, UNIX_TIMESTAMP(lastDate) AS whenTime
FROM these_pages WHERE deleted = 'N' ORDER BY ".$orderBy."");
while($row = mysql_fetch_assoc($query)){
$nodeList[$row['id']] = array_merge($row, array('children' => array()));
}
mysql_free_result($query);
foreach ($nodeList as $nodeId => &$node) {
if (!$node['parent_id'] || !array_key_exists($node['parent_id'], $nodeList)) {
$tree[] = &$node;
} else {
$nodeList[$node['parent_id']]['children'][] = &$node;
}
}
unset($node);
unset($nodeList);
Adjust as needed.
There are a few problems.
You already noticed the memory problem. You can set unlimited memory by using ini_set('memory_limit', -1).
The reason you get a white screen is because the script exceeds the max execution time and you either have display_errors turned off or error_reporting is set to E_NONE. You can set unlimited execution time by using set_time_limit(0).
Even with "unlimited" memory and "unlimited" time, you are still obviously constrained by the limits of your server and your own precious time. The algorithm and data model that you have selected will not scale well, and if this is meant for a production website, then you have already blown your time and memory budget.
The solution to #3 is to use a better data model which supports a more efficient algorithm.
Your function is named poorly, but I'm guessing it means to "return an array of all parents of a particular page".
If that's what you want to do, then check out Modified Pre-order Tree Traversal as a strategy for more efficient querying. This behavior is already built into some frameworks, such as Doctrine ORM, which makes it particularly easy to use.

for vs foreach vs while which is faster for iterating through arrays in php

which one is the fastest for iterating through arrays in php? or does another exist which is also faster for iterating through arrays?
Even if there is any kind of difference, that difference will be so small it won't matter at all.
If you have, say, one query to the database, it'll take so long compared to the loop iterating over the results that the eternal debate of for vs foreach vs while will not change a thing -- at least if you have a reasonable amount of data.
So, use :
whatever you like
whatever fits your programming standard
whatever is best suited for your code/application
There will be plenty of other things you could/should optimize before thinking about that kind of micro-optimization.
And if you really want some numbers (even if it's just for fun), you can make some benchmark and see the results in practice.
For me I pick my loop based on this:
foreach
Use when iterating through an array whose length is (or can be) unknown.
for
Use when iterating through an array whose length is set, or, when you need a counter.
while
Use when you're iterating through an array with the express purpose of finding, or triggering a certain flag.
Now it's true, you can use a FOR loop like a FOREACH loop, by using count($array)... so ultimately it comes down to your personal preference/style.
In general there is no applicable speed differences between the three functions.
To provide benchmark results to demonstrate the efficiency of varying methods used to iterate over an array from 1 to 10,000.
Benchmark results of varying PHP versions: https://3v4l.org/a3Jn4
while $i++: 0.00077605247497559 sec
for $i++: 0.00073003768920898 sec
foreach: 0.0004420280456543 sec
while current, next: 0.024288892745972 sec
while reset, next: 0.012929201126099 sec
do while next: 0.011449098587036 sec //added after terminal benchmark
while array_shift: 0.36452603340149 sec
while array_pop: 0.013902902603149 sec
Takes into consideration individual calls for count with while and for
$values = range(1, 10000);
$l = count($values);
$i = 0;
while($i<$l){
$i++;
}
$l = count($values);
for($i=0;$i<$l;$i++){
}
foreach($values as $val){
}
The below examples using while, demonstrate how it would be used less efficiently during iteration.
When functionally iterating over an array and maintaining the current position; while becomes much less efficient, as next() and current() is called during the iteration.
while($val = current($values)){
next($values);
}
If the current positioning of the array is not important, you can call reset() or current() prior to iteration.
$value = reset($values);
while ($value) {
$value = next($values);
}
do ... while is an alternative syntax that can be used, also in conjunction with calling reset() or current() prior to iteration and by moving the next() call to the end of the iteration.
$value = current($values);
do{
}while($value = next($values));
array_shift can also be called during the iteration, but that negatively impacts performance greatly, due to array_shift re-indexing the array each time it is called.
while($values){
array_shift($values);
}
Alternatively array_reverse can be called prior to iteration, in conjunction with calling array_pop. This will avoid the impact from re-indexing when calling array_shift.
$values = array_reverse($values);
while($values) {
array_pop($values);
}
In conclusion, the speed of while, for, and foreach should not be the question, but rather what is done within them to maintain positioning of the array.
Terminal Tests run on PHP 5.6.20 x64 NTS CLI:
Correctly used, while is the fastest, as it can have only one check for every iteration, comparing one $i with another $max variable, and no additional calls before loop (except setting $max) or during loop (except $i++; which is inherently done in any loop statement).
When you start misusing it (like while(list..) ) you're better off with foreach of course, as every function call will not be as optimized as the one included in foreach (because that one is pre-optimized).
Even then, array_keys() gives you the same usability as foreach, still faster.
And beyond that, if you're into 2d-arrays, a home-made 2d_array_keys will enable you to use while all the way in a much faster way then foreach can be used (just try and tell the next foreach within the first foreach, that the last foreach had <2d_array_keys($array)> as keys --- ).
Besides, all questions related to first or last item of a loop using a while($i
And
while ($people_care_about_optimization!==true){
echo "there still exists a better way of doing it and there's no reason to use any other one";
}
Make a benchmark test.
There is no major "performance" difference, because the differences are located inside the logic.
You use foreach for array iteration,
without integers as keys.
You use for for array iteration with
integers as keys.
etc.
Remember that prefetching a lot of mysqli_result into a comfortable array can raise the question whether it is better to use for/foreach/while to cycle that array, but it's the wrong question about a bad solution that waste a lot of RAM.
So do not prefere this:
function funny_query_results($query) {
$results = $GLOBALS['mysqli']->query($query);
$rows = [];
while( $row = $results->fetch_object() ) {
$rows[] = $results;
}
return $rows;
}
$rows = funny_query_results("SELECT ...");
foreach($rows as $row) { // Uh... What should I use? foreach VS for VS while?
echo $row->something;
}
The direct way getting one-by-one every mysql_result in a simple while is a lot more optimized:
$results = $mysqli->query("SELECT ...");
while( $row = $results->fetch_object() ) {
echo $row->something;
}

PHP - Foreach loops and ressources

I'm using a foreach loop to process a large set of items, unfortunately it's using alot of memory. (probably because It's doing a copy of the array).
Apparently there is a way to save some memory with the following code: $items = &$array;
Isn't it better to use for loops instead?
And is there a way to destroy each item as soon as they have been processed in a foreach loop.
eg.
$items = &$array;
foreach($items as $item)
{
dosomethingwithmy($item);
destroy($item);
}
I'm just looking for the best way to process a lot of items without running out of ressources.
Try a for loop:
$keys = array_keys($array);
for ($i=0, $n=count($keys); $i<$n; ++$i) {
$item = &$array[$keys[$i]];
dosomethingwithmy($item);
destroy($item);
}
Resource-wise, your code will be more efficient if you use a for loop, instead of a foreach loop. Each iteration of your foreach loop will copy the current element in memory, which will take time and memory. Using for and accessing the current item with an index is a bit better and faster.
use this:
reset($array);
while(list($key_d, $val_d) = each($array)){
}
because foreach create a copy
If you are getting that large data set from a database, it can often help to try and consume the data set as soon as it comes from the database. For example from the php mysql_fetch_array documentation.
$resource = mysql_query("query");
while ($row = mysql_fetch_array($resource, MYSQL_NUM)) {
process($row);
}
this loop will not create an in memory copy of the entire dataset (at least not redundantly). A friend of mine sped up some of her query processing by 10x using this technique (her datasets are biological so they can get quite large).

Categories