This is a simple programming question, coming from my lack of knowledge of how PHP handles array copying and unsetting during a foreach loop. It's like this, I have an array that comes to me from an outside source formatted in a way I want to change. A simple example would be:
$myData = array('Key1' => array('value1', 'value2'));
But what I want would be something like:
$myData = array([0] => array('MyKey' => array('Key1' => array('value1', 'value2'))));
So I take the first $myData and format it like the second $myData. I'm totally fine with my formatting algorithm. My question lies in finding a way to conserve memory since these arrays might get a little unwieldy. So, during my foreach loop I copy the current array value(s) into the new format, then I unset the value I'm working with from the original array. E.g.:
$formattedData = array();
foreach ($myData as $key => $val) {
// do some formatting here, copy to $reformattedVal
$formattedData[] = $reformattedVal;
unset($myData[$key]);
}
Is the call to unset() a good idea here? I.e., does it conserve memory since I have copied the data and no longer need the original value? Or, does PHP automatically garbage collect the data since I don't reference it in any subsequent code?
The code runs fine, and so far my datasets have been too negligible in size to test for performance differences. I just don't know if I'm setting myself up for some weird bugs or CPU hits later on.
Thanks for any insights.
-sR
Use a reference to the variable in the foreach loop using the & operator. This avoids making a copy of the array in memory for foreach to iterate over.
edit: as pointed out by Artefacto unsetting the variable only decreases the number of references to the original variable, so the memory saved is only on pointers rather than the value of the variable. Bizarrely using a reference actually increases the total memory usage as presumably the value is copied to a new memory location instead of being referenced.
Unless the array is referenced,
foreach operates on a copy of the
specified array and not the array
itself. foreach has some side effects
on the array pointer. Don't rely on
the array pointer during or after the
foreach without resetting it.
Use memory_get_usage() to identify how much memory you are using.
There is a good write up on memory usage and allocation here.
This is useful test code to see memory allocation - try uncommenting the commented lines to see total memory usage in different scenarios.
echo memory_get_usage() . PHP_EOL;
$test = $testCopy = array();
$i = 0;
while ($i++ < 100000) {
$test[] = $i;
}
echo memory_get_usage() . PHP_EOL;
foreach ($test as $k => $v) {
//foreach ($test as $k => &$v) {
$testCopy[$k] = $v;
//unset($test[$k]);
}
echo memory_get_usage() . PHP_EOL;
I was running out of memory while processing lines of a text (xml) file within a loop. For anyone with a similar situation, this worked for me:
while($data = array_pop($xml_data)){
//process $data
}
Please remember the rules of Optimization Club:
The first rule of Optimization Club is, you do not Optimize.
The second rule of Optimization Club is, you do not Optimize without measuring.
If your app is running faster than the underlying transport protocol, the optimization is over.
One factor at a time.
No marketroids, no marketroid schedules.
Testing will go on as long as it has to.
If this is your first night at Optimization Club, you have to write a test case.
Rules #1 and #2 are especially relevant here. Unless you know that you need to optimize, and unless you have measured that need to optimize, then don't do it. Adding the unset will add a run-time hit and will make future programmers why you are doing it.
Leave it alone.
If at any point in the "formatting" you do something like:
$reformattedVal['a']['b'] = $myData[$key];
Then doing unset($myData[$key]); is irrelevant memory-wise because you are only decreasing the reference count of the variable, which now exists in two places (inside $myData[$key] and $reformattedVal['a']['b']). Actually, you save the memory of indexing the variable inside the original array, but that's almost nothing.
Unless you're accessing the element by reference unsetting will do nothing whatsoever, as you can't alter the array during within the iterator.
That said, it's generally considered bad practice to modify the collection you're iterating over - a better approach would be to break down the source array into smaller chunks (by only loading a portion of the source data at a time) and process these, unsetting each entire array "chunk" as you go.
Related
How do I pre-allocate memory for an array in PHP? I want to pre-allocate space for 351k longs. The function works when I don't use the array, but if I try to save long values in the array, then it fails. If I try a simple test loop to fill up 351k values with a range(), it works. I suspect that the array is causing memory fragmentation and then running out of memory.
In Java, I can use ArrayList al = new ArrayList(351000);.
I saw array_fill and array_pad but those initialize the array to specific values.
Solution:
I used a combination of answers. Kevin's answer worked alone, but I was hoping to prevent problems in the future too as the size grows.
ini_set('memory_limit','512M');
$foundAdIds = new \SplFixedArray(100000); # google doesn't return deleted ads. must keep track and assume everything else was deleted.
$foundAdIdsIndex = 0;
// $foundAdIds = array();
$result = $gaw->getAds(function ($googleAd) use ($adTemplates, &$foundAdIds, &$foundAdIdsIndex) { // use call back to avoid saving in memory
if ($foundAdIdsIndex >= $foundAdIds->count()) $foundAdIds->setSize( $foundAdIds->count() * 1.10 ); // grow the array
$foundAdIds[$foundAdIdsIndex++] = $googleAd->ad->id; # save ids to know which to not set deleted
// $foundAdIds[] = $googleAd->ad->id;
PHP has an Array Class with SplFixedArray
$array = new SplFixedArray(3);
$array[1] = 'test1';
$array[0] = 'test2';
$array[2] = 'test3';
foreach ($array as $k => $v) {
echo "$k => $v\n";
}
$array[] = 'fails';
gives
0 => test1
1 => test2
2 => test3
As other people have pointed out, you can't do this in PHP (well, you can create an array of fixed length, but that's not really want you need). What you can do however is increase the amount of memory for the process.
ini_set('memory_limit', '1024M');
Put that at the top of your PHP script and you should be ok. You can also set this in the php.ini file. This does not allocate 1GB of memory to PHP, but rather allows PHP to expand it's memory usage up to that point.
A couple of things to point out though:
This might not be allowed on some shared hosts
If you're using this much memory, you might need to have a look at how you're doing things and see if they can be done more efficiently
Look out for opportunities to clear out unneeded resources (do you really need to keep hold of $x that contains a huge object you've already used?) using unset($x);
The quick answer is: you can't
PHP is quite different from java.
You can make an array with specific values as you said, but you already know about them. You can 'fake' it by filling it with null values, but that's about the same to be honest.
So unless you want to just create one with array_fill and null (which is a hack in my head), you just can't.
(You might want to check your reasoning about the memory. Are you sure this isn't an XY-problem? As memory is limited by a number (max usage) I don't think the fragmentation would have much effect. Check what is taking your memory rather then try going down this road)
The closest you will get is using SplFixedArray. It doesn't preallocate the memory needed to store the values (because you can't pre-specify the type of values used), but it preallocates the array slots and doesn't need to resize the array itself as you add values.
I'm fairly certain the answer is no, but is it possible to insert something into an array during a foreach loop? Ideally at the very spot you are at in the array during the loop.
For example:
foreach($stock->StockData as &$stock) {
if($dateTime < $stock['DateTime']) {
// INSERT NEW RECORD AT THIS SPOT IN THE ARRAY
}
}
As I say, I'm fairly certain the answer is no, but rather than build a new array, I just thought I'd ask.
I stand corrected!
http://docstore.mik.ua/orelly/webprog/php/ch05_07.htm
It apparently is just fine to do this in PHP.
According to the reference, PHP operates on a copy of the array when you start a foreach iterator, meaning that the iterator will not be corrupted by operations on the original array within the body of the foreach!
You really don't want to mutate an object is being iterated on. It will break your iterator/loop and could possibly crash the script/program by accessing or changing memory that you don't have access to anymore, possibly because array size has reduced.
I'm trying to program a utility that handles a large volume of data and memory is a factor. Unfortunately each time this set of loops I have runs, it eats apx. 14MB of memory because it is executed thousands of times, even with the unset() calls (and yes I'm aware they do not clean up memory entirely, kind of why I'm asking the question). I'm wondering if there is an easier way to do this. Current working code:
$qr = array();
foreach($XML->row as $row)
{
$ra = array();
foreach($row as $key => $value)
{
$ra[$key] = $value[0];
unset($key,$value);
}
$qr[] = $ra;
unset($row,$ra);
}
unset($XML);
return $qr;
Another attempt was to do this, but it lags out. Anybody know what I'm doing wrong?
$qr = array();
while(list(,$row) = each($XML->row))
{
$ra = array();
while(list($key,$value) = each($row))
{
$ra[$key] = $value[0];
unset($key,$value);
}
$qr[] = $ra;
unset($row,$ra);
}
unset($XML);
return $qr;
Basically in the first loop, I'm just trying to do a basic array/object iteration. In the 2nd loop, I'm trying to go through each array value and get the 1st element while maintaining object/array index association. It seems I originally wrote it like this due to it being the only thing that worked (because it's looping through a SimpleXML Object). Any tips on speeding this thing up or figuring out how to make it not eat memory would be appreciated.
I'm looking for solutions for garbage collection or more efficient code. I do not plan on replacing SimpleXML as there is no need for it. More clearly, I'm looking for:
A way to iterate the SimpleXML object without needing to call the inner loop (which is only due to me doing $value[0]. Why is that necessary?
A way which is more efficient (either speed or memory-wise) for iterating through the data
If you want to use less memory i recommend you start looking at SAX parser. Here is example. It is more difficult to develop parser with SAX but it's more efficient then SimpleXML, and you could parse big xml files with it.
Your memory load is high because SimpleXML loads the entire document into memory when parsing. So your unset() calls just decrement the reference count, and because the data still persists in memory it isn't freed. This is a consequence of working with SimpleXML: the benefit of which is that the document is in memory and represented as a PHP object.
If you want to reduce your memory usage, you need use something else like XMLReader or XML Parser. These are SAX-based, or event-based, which won't load the XML file into memory, but will walk the tree one element at a time. Since you don't appear to be using something like XPath this is your better choice.
That's not how you access data from a SimpleXML object. I see you are using index [0] to get the string contents of each part of the object and treating it as an array. It's not an array, it's an object. This is how you should access string data... Example: http://php.net/manual/en/simplexml.examples-basic.php#example-5095
Something like this will do the trick:
$qr = array();
foreach($XML->row as $row)
{
$ra = array();
$ra['name'] = $value->name;
$ra['name2'] = $value->name2;
//Add a line for each element name, etc...
$qr[] = $ra;
unset($row,$ra);
}
unset($XML);
return $qr;
It will also get rid of your inner loop and save you memory.
As taken from https://stackoverflow.com/questions/4891301/top-bad-practices-in-php
Is this similar code killing kittens, too?
foreach (file("longFile.txt") as $line) {
// ouch! Kitten killed
}
??
For those who have no idea what am I talking about:
Is PHP getting longFile.txt everytime it goes to next line of file or no? Talking about this code:
foreach (file("longFile.txt") as $line) {
// something
}
In the linked question the for loop incurs a performance hit by calling count on every iteration. foreach uses internal pointers to iterate through the array passed to it.
In your example file() will be called once and the resulting array will be passed to foreach which will iterate through it, thus the kittens are saved. Happy Caturday!
It shouldn't be killing any kittens, since, in order for PHP to get the following line of the file, it has to know the position of the file pointer since the previous line that was pulled off. You are only advancing the iterator, which maintains a reference to the file object.
Also, it's bad practice to be opening a file like that; you should have a variable to store it in and close it when you're done.
You want to use a Duff Device to unroll the loop: http://de.wikipedia.org/wiki/Duff%E2%80%99s_Device. This would be faster then foreach and faster then for loop without using count() on each iteration and it would be faster then concatenating and rtrim the string but the same like using implode().
Is that file really large? Consider:
foreach(new SplFileObject($path) as $line) {
// neither kill puppies nor kittens
}
foreach works always on the concrete iterator. If you're passing an array:
Unless the array is referenced, foreach operates on a copy of the specified array and not the array itself. (ref)
So nor array or function call will executed each time the foreach steps ahead.
Related: How to store and reset a PHP array pointer?
No. There are many reasons why kittens die, but foreach loops are not one of them.
which one is the fastest for iterating through arrays in php? or does another exist which is also faster for iterating through arrays?
Even if there is any kind of difference, that difference will be so small it won't matter at all.
If you have, say, one query to the database, it'll take so long compared to the loop iterating over the results that the eternal debate of for vs foreach vs while will not change a thing -- at least if you have a reasonable amount of data.
So, use :
whatever you like
whatever fits your programming standard
whatever is best suited for your code/application
There will be plenty of other things you could/should optimize before thinking about that kind of micro-optimization.
And if you really want some numbers (even if it's just for fun), you can make some benchmark and see the results in practice.
For me I pick my loop based on this:
foreach
Use when iterating through an array whose length is (or can be) unknown.
for
Use when iterating through an array whose length is set, or, when you need a counter.
while
Use when you're iterating through an array with the express purpose of finding, or triggering a certain flag.
Now it's true, you can use a FOR loop like a FOREACH loop, by using count($array)... so ultimately it comes down to your personal preference/style.
In general there is no applicable speed differences between the three functions.
To provide benchmark results to demonstrate the efficiency of varying methods used to iterate over an array from 1 to 10,000.
Benchmark results of varying PHP versions: https://3v4l.org/a3Jn4
while $i++: 0.00077605247497559 sec
for $i++: 0.00073003768920898 sec
foreach: 0.0004420280456543 sec
while current, next: 0.024288892745972 sec
while reset, next: 0.012929201126099 sec
do while next: 0.011449098587036 sec //added after terminal benchmark
while array_shift: 0.36452603340149 sec
while array_pop: 0.013902902603149 sec
Takes into consideration individual calls for count with while and for
$values = range(1, 10000);
$l = count($values);
$i = 0;
while($i<$l){
$i++;
}
$l = count($values);
for($i=0;$i<$l;$i++){
}
foreach($values as $val){
}
The below examples using while, demonstrate how it would be used less efficiently during iteration.
When functionally iterating over an array and maintaining the current position; while becomes much less efficient, as next() and current() is called during the iteration.
while($val = current($values)){
next($values);
}
If the current positioning of the array is not important, you can call reset() or current() prior to iteration.
$value = reset($values);
while ($value) {
$value = next($values);
}
do ... while is an alternative syntax that can be used, also in conjunction with calling reset() or current() prior to iteration and by moving the next() call to the end of the iteration.
$value = current($values);
do{
}while($value = next($values));
array_shift can also be called during the iteration, but that negatively impacts performance greatly, due to array_shift re-indexing the array each time it is called.
while($values){
array_shift($values);
}
Alternatively array_reverse can be called prior to iteration, in conjunction with calling array_pop. This will avoid the impact from re-indexing when calling array_shift.
$values = array_reverse($values);
while($values) {
array_pop($values);
}
In conclusion, the speed of while, for, and foreach should not be the question, but rather what is done within them to maintain positioning of the array.
Terminal Tests run on PHP 5.6.20 x64 NTS CLI:
Correctly used, while is the fastest, as it can have only one check for every iteration, comparing one $i with another $max variable, and no additional calls before loop (except setting $max) or during loop (except $i++; which is inherently done in any loop statement).
When you start misusing it (like while(list..) ) you're better off with foreach of course, as every function call will not be as optimized as the one included in foreach (because that one is pre-optimized).
Even then, array_keys() gives you the same usability as foreach, still faster.
And beyond that, if you're into 2d-arrays, a home-made 2d_array_keys will enable you to use while all the way in a much faster way then foreach can be used (just try and tell the next foreach within the first foreach, that the last foreach had <2d_array_keys($array)> as keys --- ).
Besides, all questions related to first or last item of a loop using a while($i
And
while ($people_care_about_optimization!==true){
echo "there still exists a better way of doing it and there's no reason to use any other one";
}
Make a benchmark test.
There is no major "performance" difference, because the differences are located inside the logic.
You use foreach for array iteration,
without integers as keys.
You use for for array iteration with
integers as keys.
etc.
Remember that prefetching a lot of mysqli_result into a comfortable array can raise the question whether it is better to use for/foreach/while to cycle that array, but it's the wrong question about a bad solution that waste a lot of RAM.
So do not prefere this:
function funny_query_results($query) {
$results = $GLOBALS['mysqli']->query($query);
$rows = [];
while( $row = $results->fetch_object() ) {
$rows[] = $results;
}
return $rows;
}
$rows = funny_query_results("SELECT ...");
foreach($rows as $row) { // Uh... What should I use? foreach VS for VS while?
echo $row->something;
}
The direct way getting one-by-one every mysql_result in a simple while is a lot more optimized:
$results = $mysqli->query("SELECT ...");
while( $row = $results->fetch_object() ) {
echo $row->something;
}