PHP - Foreach loops and ressources - php

I'm using a foreach loop to process a large set of items, unfortunately it's using alot of memory. (probably because It's doing a copy of the array).
Apparently there is a way to save some memory with the following code: $items = &$array;
Isn't it better to use for loops instead?
And is there a way to destroy each item as soon as they have been processed in a foreach loop.
eg.
$items = &$array;
foreach($items as $item)
{
dosomethingwithmy($item);
destroy($item);
}
I'm just looking for the best way to process a lot of items without running out of ressources.

Try a for loop:
$keys = array_keys($array);
for ($i=0, $n=count($keys); $i<$n; ++$i) {
$item = &$array[$keys[$i]];
dosomethingwithmy($item);
destroy($item);
}

Resource-wise, your code will be more efficient if you use a for loop, instead of a foreach loop. Each iteration of your foreach loop will copy the current element in memory, which will take time and memory. Using for and accessing the current item with an index is a bit better and faster.

use this:
reset($array);
while(list($key_d, $val_d) = each($array)){
}
because foreach create a copy

If you are getting that large data set from a database, it can often help to try and consume the data set as soon as it comes from the database. For example from the php mysql_fetch_array documentation.
$resource = mysql_query("query");
while ($row = mysql_fetch_array($resource, MYSQL_NUM)) {
process($row);
}
this loop will not create an in memory copy of the entire dataset (at least not redundantly). A friend of mine sped up some of her query processing by 10x using this technique (her datasets are biological so they can get quite large).

Related

Laravel importer routine slows down over time

I have this routine that fetches some data from a webservice and stores it in my database. This data have 20k+ items. To save them into the database, i have to retrieve some information first and then store them. So i have this foreach loop that runs 20k+ times, performing a read and a write to the database each time.
But this approach slows down over time. It takes more than an hour to finish!
I've disabled the query log (DB::disableQueryLog()) but i didn't notice any gain in performance.
Here's my code:
$data = API::getItems();
foreach ($data as $item) {
$otherItem = OtherItem::where('something', $item['something'])->first();
if (!is_null($otherItem)) {
Item::create([
...
]);
}
}
As a solution i decided to pre-fetch all the OtherItem into a collection and it solved the problem:
$data = API::getItems();
$otherItems = OtherItem::all();
foreach ($data as $item) {
$otherItem = otherItems->where('something', $item['something'])->first();
if (!is_null($otherItem)) {
Item::create([
...
]);
}
}
But i want to understand why the first approach slows down drastically over time and what is the best way to do such sort of things.
EDIT:
To clarify:
I know that doing 20k queries is not performant and, in this case, performance is not important (unless it takes hours instead of minutes). I will only run this routine while in development now and then. My final approach was a mix of both answers (I haven't thought in buffering the items and insert them in batches).
Here's the code for anyone interested:
$data = collect(API::getPrices());
$chunks = $data->chunk(500);
$otherItems = OtherItem::all();
foreach ($chunks as $items) {
$buffer = [];
foreach ($items as $item) {
$otherItem = otherItems->where('something', $item['something'])->first();
if (!is_null($otherItem)) {
$buffer[] = [
...
];
}
}
Item::insert($buffer);
}
So, what is bothering me is why is painfully slow (even with all the queries). I've decided to do some benchmarking to analyse the question further.
With the two queries approach i get the following results:
For 6000 loop:
Max read: 11.5232 s
Min read: 0.0044 s
Mean read: 0.3196 s
Max write: 0.9133 s
Min write: 0.0007 s
Mean write: 0.0085 s
Every 10-20 iteractions the read time goes up to over a sec for 2-3 iteractions which is weird and i have no ideia why.
Just out of curiosity, i've also benchmarked the diference between chunking and buffering the items before inserting into the DB:
without buffering: 1 115,4 s (18 min 35 s)
chunking and buffering: 1064.7 s (17 min 45 s)
In first code snippet, you're creating 40000 queries for 20000 items. It's two queries per item - first will get the data, second will store something.
Second code snippet will create 20001 query and it's very slow solution too.
You can build an array and use insert() instead of using create() method each time you want to store some data. So this code will create just 2 queries instead of 40000 and 20001.
$otherItems = OtherItem::all();
$items = [];
foreach ($data as $item) {
$otherItem = otherItems->where('something', $item['something'])->first();
if (!is_null($model)) {
$items[] = [.....];
}
}
Item::insert($items);
It slows down because there's simply so many queries - each one is a round trip to the database.
Another thing you can do is try chunking the inserts with database transactions. Play around with the exact numbers but try inserting in batches of a few hundred or so.
i.e.
start transaction
loop over chunk, performing inserts
commit
repeat for next chunk until no chunks remain
Laravel's ORM provides a chunk method for this kind of use case.

My PHP nested loops use way too much memory

I'm trying to program a utility that handles a large volume of data and memory is a factor. Unfortunately each time this set of loops I have runs, it eats apx. 14MB of memory because it is executed thousands of times, even with the unset() calls (and yes I'm aware they do not clean up memory entirely, kind of why I'm asking the question). I'm wondering if there is an easier way to do this. Current working code:
$qr = array();
foreach($XML->row as $row)
{
$ra = array();
foreach($row as $key => $value)
{
$ra[$key] = $value[0];
unset($key,$value);
}
$qr[] = $ra;
unset($row,$ra);
}
unset($XML);
return $qr;
Another attempt was to do this, but it lags out. Anybody know what I'm doing wrong?
$qr = array();
while(list(,$row) = each($XML->row))
{
$ra = array();
while(list($key,$value) = each($row))
{
$ra[$key] = $value[0];
unset($key,$value);
}
$qr[] = $ra;
unset($row,$ra);
}
unset($XML);
return $qr;
Basically in the first loop, I'm just trying to do a basic array/object iteration. In the 2nd loop, I'm trying to go through each array value and get the 1st element while maintaining object/array index association. It seems I originally wrote it like this due to it being the only thing that worked (because it's looping through a SimpleXML Object). Any tips on speeding this thing up or figuring out how to make it not eat memory would be appreciated.
I'm looking for solutions for garbage collection or more efficient code. I do not plan on replacing SimpleXML as there is no need for it. More clearly, I'm looking for:
A way to iterate the SimpleXML object without needing to call the inner loop (which is only due to me doing $value[0]. Why is that necessary?
A way which is more efficient (either speed or memory-wise) for iterating through the data
If you want to use less memory i recommend you start looking at SAX parser. Here is example. It is more difficult to develop parser with SAX but it's more efficient then SimpleXML, and you could parse big xml files with it.
Your memory load is high because SimpleXML loads the entire document into memory when parsing. So your unset() calls just decrement the reference count, and because the data still persists in memory it isn't freed. This is a consequence of working with SimpleXML: the benefit of which is that the document is in memory and represented as a PHP object.
If you want to reduce your memory usage, you need use something else like XMLReader or XML Parser. These are SAX-based, or event-based, which won't load the XML file into memory, but will walk the tree one element at a time. Since you don't appear to be using something like XPath this is your better choice.
That's not how you access data from a SimpleXML object. I see you are using index [0] to get the string contents of each part of the object and treating it as an array. It's not an array, it's an object. This is how you should access string data... Example: http://php.net/manual/en/simplexml.examples-basic.php#example-5095
Something like this will do the trick:
$qr = array();
foreach($XML->row as $row)
{
$ra = array();
$ra['name'] = $value->name;
$ra['name2'] = $value->name2;
//Add a line for each element name, etc...
$qr[] = $ra;
unset($row,$ra);
}
unset($XML);
return $qr;
It will also get rid of your inner loop and save you memory.

Initiating the same loop with either a while or foreach statement

I have code in php such as the following:
while($r = mysql_fetch_array($q))
{
// Do some stuff
}
where $q is a query retrieving a set of group members. However, certain groups have there members saved in memcached and that memcached value is stored in an array as $mem_entry. To run through that, I'd normally do the following
foreach($mem_entry as $k => $r)
{
// Do some stuff
}
Here's the problem. I don't want to have two blocks of identical code (the //do some stuff section) nested in two different loops just because in one case I have to use mysql for the loop and the other memcached. Is there some way to toggle starting off the loop with the while or foreach? In other words, if $mem_entry has a non-blank value, the first line of the loop will be foreach($mem_entry as $k => $r), or if it's empty, the first line of the loop will be while($r = mysql_fetch_array($q))
Edit
Well, pretty much a few seconds after I wrote this I ended up coming with the solution. Figure I'd leave this up for anyone else that might come upon this problem. I first set the value of $members to the memcached value. If that's blank, I run the mysql query and use a while loop to transfer all the records to an array called $members. I then initiate the loop using foreach($members as as $k => $r). Basically, I'm using a foreach loop everytime, but the value of $members is set differently based on whether or not a value for it exists in memcached.
Why not just refactor out doSomeStuff() as a function which gets called from within each loop. Yes, you'll need to see if this results in a performance hit, but unless that's significant, this is a simple approach to avoiding code repetition.
If there's a way to toggle as you suggest, I don't know of it.
Not the ideal solution but i will give you my 2 cents. The ideal would have been to call a function but if you dont want to do that then, you can try something like this:
if(!isset($mem_entry)){
$mem_entry = array();
while($r = mysql_fetch_array($q))
{
$mem_entry[] = $r;
}
}
The idea is to just use the foreach loop to do the actual work, if there is nothing in memcache then fill your mem_entry array with stuff from mysql and then feed it to your foreach loop.

for vs foreach vs while which is faster for iterating through arrays in php

which one is the fastest for iterating through arrays in php? or does another exist which is also faster for iterating through arrays?
Even if there is any kind of difference, that difference will be so small it won't matter at all.
If you have, say, one query to the database, it'll take so long compared to the loop iterating over the results that the eternal debate of for vs foreach vs while will not change a thing -- at least if you have a reasonable amount of data.
So, use :
whatever you like
whatever fits your programming standard
whatever is best suited for your code/application
There will be plenty of other things you could/should optimize before thinking about that kind of micro-optimization.
And if you really want some numbers (even if it's just for fun), you can make some benchmark and see the results in practice.
For me I pick my loop based on this:
foreach
Use when iterating through an array whose length is (or can be) unknown.
for
Use when iterating through an array whose length is set, or, when you need a counter.
while
Use when you're iterating through an array with the express purpose of finding, or triggering a certain flag.
Now it's true, you can use a FOR loop like a FOREACH loop, by using count($array)... so ultimately it comes down to your personal preference/style.
In general there is no applicable speed differences between the three functions.
To provide benchmark results to demonstrate the efficiency of varying methods used to iterate over an array from 1 to 10,000.
Benchmark results of varying PHP versions: https://3v4l.org/a3Jn4
while $i++: 0.00077605247497559 sec
for $i++: 0.00073003768920898 sec
foreach: 0.0004420280456543 sec
while current, next: 0.024288892745972 sec
while reset, next: 0.012929201126099 sec
do while next: 0.011449098587036 sec //added after terminal benchmark
while array_shift: 0.36452603340149 sec
while array_pop: 0.013902902603149 sec
Takes into consideration individual calls for count with while and for
$values = range(1, 10000);
$l = count($values);
$i = 0;
while($i<$l){
$i++;
}
$l = count($values);
for($i=0;$i<$l;$i++){
}
foreach($values as $val){
}
The below examples using while, demonstrate how it would be used less efficiently during iteration.
When functionally iterating over an array and maintaining the current position; while becomes much less efficient, as next() and current() is called during the iteration.
while($val = current($values)){
next($values);
}
If the current positioning of the array is not important, you can call reset() or current() prior to iteration.
$value = reset($values);
while ($value) {
$value = next($values);
}
do ... while is an alternative syntax that can be used, also in conjunction with calling reset() or current() prior to iteration and by moving the next() call to the end of the iteration.
$value = current($values);
do{
}while($value = next($values));
array_shift can also be called during the iteration, but that negatively impacts performance greatly, due to array_shift re-indexing the array each time it is called.
while($values){
array_shift($values);
}
Alternatively array_reverse can be called prior to iteration, in conjunction with calling array_pop. This will avoid the impact from re-indexing when calling array_shift.
$values = array_reverse($values);
while($values) {
array_pop($values);
}
In conclusion, the speed of while, for, and foreach should not be the question, but rather what is done within them to maintain positioning of the array.
Terminal Tests run on PHP 5.6.20 x64 NTS CLI:
Correctly used, while is the fastest, as it can have only one check for every iteration, comparing one $i with another $max variable, and no additional calls before loop (except setting $max) or during loop (except $i++; which is inherently done in any loop statement).
When you start misusing it (like while(list..) ) you're better off with foreach of course, as every function call will not be as optimized as the one included in foreach (because that one is pre-optimized).
Even then, array_keys() gives you the same usability as foreach, still faster.
And beyond that, if you're into 2d-arrays, a home-made 2d_array_keys will enable you to use while all the way in a much faster way then foreach can be used (just try and tell the next foreach within the first foreach, that the last foreach had <2d_array_keys($array)> as keys --- ).
Besides, all questions related to first or last item of a loop using a while($i
And
while ($people_care_about_optimization!==true){
echo "there still exists a better way of doing it and there's no reason to use any other one";
}
Make a benchmark test.
There is no major "performance" difference, because the differences are located inside the logic.
You use foreach for array iteration,
without integers as keys.
You use for for array iteration with
integers as keys.
etc.
Remember that prefetching a lot of mysqli_result into a comfortable array can raise the question whether it is better to use for/foreach/while to cycle that array, but it's the wrong question about a bad solution that waste a lot of RAM.
So do not prefere this:
function funny_query_results($query) {
$results = $GLOBALS['mysqli']->query($query);
$rows = [];
while( $row = $results->fetch_object() ) {
$rows[] = $results;
}
return $rows;
}
$rows = funny_query_results("SELECT ...");
foreach($rows as $row) { // Uh... What should I use? foreach VS for VS while?
echo $row->something;
}
The direct way getting one-by-one every mysql_result in a simple while is a lot more optimized:
$results = $mysqli->query("SELECT ...");
while( $row = $results->fetch_object() ) {
echo $row->something;
}

The difference between loops

It's about PHP but I've no doubt many of the same comments will apply to other languages.
Simply put, what are the differences in the different types of loop for PHP? Is one faster/better than the others or should I simply put in the most readable loop?
for ($i = 0; $i < 10; $i++)
{
# code...
}
foreach ($array as $index => $value)
{
# code...
}
do
{
# code...
}
while ($flag == false);
For loop and While loops are entry condition loops. They evaluate condition first, so the statement block associated with the loop won't run even once if the condition fails to meet
The statements inside this for loop block will run 10 times, the value of $i will be 0 to 9;
for ($i = 0; $i < 10; $i++)
{
# code...
}
Same thing done with while loop:
$i = 0;
while ($i < 10)
{
# code...
$i++
}
Do-while loop is exit-condition loop. It's guaranteed to execute once, then it will evaluate condition before repeating the block
do
{
# code...
}
while ($flag == false);
foreach is used to access array elements from start to end. At the beginning of foreach loop, the internal pointer of the array is set to the first element of the array, in next step it is set to the 2nd element of the array and so on till the array ends. In the loop block The value of current array item is available as $value and the key of current item is available as $index.
foreach ($array as $index => $value)
{
# code...
}
You could do the same thing with while loop, like this
while (current($array))
{
$index = key($array); // to get key of the current element
$value = $array[$index]; // to get value of current element
# code ...
next($array); // advance the internal array pointer of $array
}
And lastly: The PHP Manual is your friend :)
This is CS101, but since no one else has mentioned it, while loops evaluate their condition before the code block, and do-while evaluates after the code block, so do-while loops are always guaranteed to run their code block at least once, regardless of the condition.
PHP Benchmarks
#brendan:
The article you cited is seriously outdated and the information is just plain wrong. Especially the last point (use for instead of foreach) is misleading and the justification offered in the article no longer applies to modern versions of .NET.
While it's true that the IEnumerator uses virtual calls, these can actually be inlined by a modern compiler. Furthermore, .NET now knows generics and strongly typed enumerators.
There are a lot of performance tests out there that prove conclusively that for is generally no faster than foreach. Here's an example.
I use the first loop when iterating over a conventional (indexed?) array and the foreach loop when dealing with an associative array. It just seems natural and helps the code flow and be more readable, in my opinion. As for do...while loops, I use those when I have to do more than just flip through an array.
I'm not sure of any performance benefits, though.
Performance is not significantly better in either case. While is useful for more complex tasks than iterating, but for and while are functionally equivalent.
Foreach is nice, but has one important caveat: you can't modify the enumerable you're iterating. So no removing, adding or replacing entries to/in it. Modifying entries (like changing their properties) is OK, of course.
With a foreach loop, a copy of the original array is made in memory to use inside. You shouldn't use them on large structures; a simple for loop is a better choice. You can use a while loop more efficiently on a large non-numerically indexed structure like this:
while(list($key, $value) = each($array)) {
But that approach is particularly ugly for a simple small structure.
while loops are better suited for looping through streams, or as in the following example that you see very frequently in PHP:
while ($row = mysql_fetch_array($result)) {
Almost all of the time the different loops are interchangeable, and it will come down to either a) efficiency, or b) clarity.
If you know the efficiency trade-offs of the different types of loops, then yes, to answer your original question: use the one that looks the most clean.
Each looping construct serves a different purpose.
for - This is used to loop for a specific number of iterations.
foreach - This is used to loop through all of the values in a collection.
while - This is used to loop until you meet a condition.
Of the three, "while" will most likely provide the best performance in most situations. Of course, if you do something like the following, you are basically rewriting the "for" loop (which in c# is slightly more performant).
$count = 0;
do
{
...
$count++;
}
while ($count < 10);
They all have different basic purposes, but they can also be used in somewhat the same way. It completely depends on the specific problem that you are trying to solve.
With a foreach loop, a copy of the original array is made in memory to use inside.
Foreach is nice, but has one important caveat: you can't modify the enumerable you're iterating.
Both of those won't be a problem if you pass by reference instead of value:
foreach ($array as &$value) {
I think this has been allowed since PHP 5.
When accessing the elements of an array, for clarity I would use a foreach whenever possible, and only use a for if you need the actual index values (for example, the same index in multiple arrays). This also minimizes the chance for typo mistakes since for loops make this all too easy. In general, PHP might not be the place be worrying too much about performance. And last but not least, for and foreach have (or should have; I'm not a PHP-er) the same Big-O time (O(n)) so you are looking possibly at a small amount more of memory usage or a slight constant or linear hit in time.
In regards to performance, a foreach is more consuming than a for
http://forums.asp.net/p/1041090/1457897.aspx

Categories