Recursion develops spikes in memory usage

Recursion develops spikes in memory usage - php

function rec_func($x) {
echo $x;
if ($x < 1000) {
$x++;
rec_func(&$x);
}
}
This is not the actual recursive function, that I am going to be using for my PHP script. Just like this example, my real function works perfectly fine, up to the point where I start examining how much memory is it using.
If I loop this function, say, a 100 times, nothing bad happens. The used memory in each iteration stays the same. But If I loop 1000 times and more, at one point this function will start consuming more memory than before, right up to the last iteration.
The amount of spikes seem to depend on how big is the variable I give it, and how many iterations it goes through, but I can't be certain. And this is where the problem begins. This function will start consuming more memory just once in a 1000 times. My function, however, does this every 150 or so times in 1000 iterations, and the worst thing is that MY function may have to loop even more times than that.
This made me curious, because the obvious thing to do would be to use a traditional loop instead (which I have done), but I really want to understand why is this happening, and if it can be avoided.
P.S. note that I'm using a real variable only in the first iteration and all of the iterations following it are using a reference, to preserve memory. Thanks in advance.

(correct me if I am wrong).
I think, it is related with your PHP module memory handling: It does reserve some memory when it starts running (if it is used as CGI for example, or you call it from a command prompt).
Every time when you start a recursion, it will do two things basically:
Before calling the next level of recursion, saves the current status to the call stack. If you are not manipulating with references, then the variable data as well, with the current stack state. As long as the reserved memory is enough, your memory consumption will not be increased, but when you run out of memory, a new bunch of memory will be reserved, in order if more will be needed.
It means the following:
function rec_func($x) {
echo $x;
if ($x < 1000) {
$x++;
rec_func($x);
}
}
This will have a significant bigger memory usage, as every time a recursion occures, there will be an extra line in the stack, and you will create an extra $x in the memory to store it's current state, in order to reserve the value when the recursion get's back to the current stack call.
The following lines will get rid of the memory usage problem - it manipulates with variable references (obviously ruins the functionality in this case:
function rec_func(&$x) {
echo $x;
if ($x < 1000) {
$x++;
rec_func($x);
}
}
(your original code causes fatal error due to call-time reference passing).
In case if you want to reduce the memory usage, you should (or can) use references in the function declaration for variables, which will not change during the recursion, but never do that with variables, that change meanwhile.
You can try it out yourself by playing with debug_zval_dump(), like this:
<?php
function rec_func_ref(&$x) {
//echo $x;
if ($x < 100000) {
$x++;
rec_func_ref($x);
}
echo 'rec_func_ref:'.$x.'------<br />';
debug_zval_dump($x);
echo "<br />";
}
function rec_func_not_ref($x) {
//echo $x;
if ($x < 100000) {
$x++;
rec_func_not_ref($x);
}
echo 'rec_func_not_ref:'.$x.'------<br />';
debug_zval_dump($x);
echo "<br />";
}
$a = 1;
rec_func_ref($a);
$a = 1;
rec_func_not_ref($a);
?>

Related

will php copy value when I assign value?

I see many code like this:
function load_items(&$items_arr) {
// ... some code
}
load_items($item_arr);
$v = &$items_arr[$id];
compared to code as follow:
function load_items() {
// ... some code
return $items_arr;
}
$item_arr = load_items();
$v = $items_arr[$id];
Did the second code will copy items_arr and $item_arr[$id]?
Will the first code import performance?

No, it will not copy the value right away. Copy-on-write
is one of the memory management methods used in PHP. It ensures that memory isn’t wasted when you copy values between variables.
What that means is that when you assign:
$v = $items_arr[$id];
PHP will simply update the symbol table to indicate that $v points to the same memory address of $item_arr[$id], just if you change $item_arr or $v afterwards then PHP allocates more memory and then performs the copy.
By delaying this extra memory allocation and the copying PHP saves time and memory in some cases.
There's are nice article about memory management in PHP: http://hengrui-li.blogspot.no/2011/08/php-copy-on-write-how-php-manages.html

How much resources does PHP waste on irrelevant code, if any?

This is not a question about principles or common coding procedures, it is a question about how PHP processes code, or more precisely, doesn't process code that it should ignore, in the name of better understanding how PHP works
Scenario 1:
if (1==2) { echo rand(0,99); }
Obviously, the code above will not have any output, and that's not what the question is about; but rather, about whether or not PHP even considers making any output. As PHP goes through the page, does it entirely skip the code assigned to the failed if-check, or does it get allocated any sort of resources beyond simply what the filesize does?
Scenario 2:
if (1==2) { for ($x = 0; $x <= 999999; $x++) { echo rand(0,99); } }
Similar to scenario 1 but with a key difference to clarify the point, considering that 1==2 is always going to be false, does this code use any more resources than the previous one or will they both be equally "cheap" to process? Or are there any "hidden" actions that add up even if the code in the loop is as minimal as this?
Scenario 3:
for ($x = 0; $x <= 999999; $x++) { if (1==2) { echo rand(0,99); } }
Now, this one should see a false statement a million times, but how significant is that really in terms of resources? Will it keep checking if 1 is 2 or does PHP "learn" from the first time it checks? And does it spend any resources beyond that, or is a simple if-check like this inside a loop the only thing PHP will process? Will it "read" echo rand(0,99); a million times, even though 1 is not 2?
Scenario 4:
for ($x = 0; $x <= 999999; $x++) { if (1==2) { for ($x = 0; $x <= 999999; $x++) { echo rand(0,99); } } }
Finally, a combination of them all, will this example be a massive loop-in-a-loop-level of resource wasting or will the inner loop be completely ignored from processing? In other words, will 1!=2 cause PHP to entirely skip processing the inner loop, or will it waste memory on code that it should ignore? And how different is this scenario compared to the previous three in terms of processing and resources?
Thanks in advance for any PHP and memory-usage expertise on the matter, it is my hope that the answer to this question will bring better understanding about how PHP processes code to me and others
EDIT:
Another somewhat relevant example would be that of having a large amount of comments within a loop compared to outside of it; would comments inside of a loop affect performance differently in any way (regardless of how "unnoticeable" you might consider it to be) than the same amount of comments outside of the loop?

1 & 2) Everything inside these if blocks is not evaluated
3) PHP doesnt learn anything, it will perform 1 million if checks. This isn't significant but it's not insignificant either. As one commenter suggested, try it and see the page time hit.
4) This generates the same amount of processing as #3

PHP behavior under the hood

I was just wondering how PHP works under the hood in this certain scenario. Let's say I have these two pieces of code:
function foo() {
return 2 * 2;
}
// First.
if (foo()) {
bar(foo());
}
// Second.
if (($ref = foo())) {
bar($ref);
}
Now the questions:
In the first case, does PHP make some sort of temporary variable inside the if clause? If so, isn't the second piece of code always better approach?
Does the second case take more memory? If answer to the first question is yes to the first question, then not?

The two codes are not equivalent, because the first one calls foo() twice (if it returns a truthy value). If it has side effects, such as printing something, they will be done twice. Or if it's dependent on something that can change (e.g. the contents of a file or database), the two calls may return different values. In your example where it just multiplies two numbers, this doesn't happen, but it still means it has to do an extra multiplication, which is unnecessary.
The answer to your questions is:
Yes, it needs to hold the returned value in a temporary memory location so it can test whether it's true or not.
Yes, it uses a little more memory. In the first version, the temporary memory can be reclaimed as soon as the if test is completed. In the second version, it will not be reclaimed until the variable $foo is reassigned or goes out of scope.

In the first case, you are calling a function twice, so, if the function is time consuming, it is inefficient. The second case is indeed better since you are saving the result of foo().
In both cases, PHP needs to allocate memory depending on what data foo() generates. That memory will be freed by the garbage collector later on. In terms of memory both cases are pretty much equivalent. Maybe the memory will be released earlier, maybe not, but most likely you won't encounter a case where it matters.

PHP can't make any temporary variable because it can't be sure that foo()'s returning value will always be the same. microtime(), rand() will return different values for each call, for example.
In the second example, it takes indeed more memory, since PHP needs to create and keep the value in memory.
Here is how to test it :
<?php
function foo() {
return true;
}
function bar($bool) {
echo memory_get_usage();
}
if (1) {
// 253632 bytes on my machine
if (foo()) {
bar(foo());
}
} else {
// 253720 bytes on my machine
if (($ref = foo())) {
bar($ref);
}
}

Benchmark memory usage in PHP

Let us suppose that we have some problem and at least two solutions for it. And what we want to achieve - is to compare effectiveness for them. How to do this? Obviously, the best answer is: do tests. And I doubt there's a better way when it comes to language-specific questions (for example "what is faster for PHP: echo 'foo', 'bar' or echo('foo'.'bar')").
Ok, now we'll assume that if we want to test some code, it's equal to test some function. Why? Because we can wrap that code to function and pass it's context (if any) as it's parameters. Thus, all we need - is to have, for example, some benchmark function which will do all stuff. Here's very simple one:
function benchmark(callable $function, $args=null, $count=1)
{
$time = microtime(1);
for($i=0; $i<$count; $i++)
{
$result = is_array($args)?
call_user_func_array($function, $args):
call_user_func_array($function);
}
return [
'total_time' => microtime(1) - $time,
'average_time' => (microtime(1) - $time)/$count,
'count' => $count
];
}
-this will fit our issue and can be used to do comparative benchmarks. Under comparative I mean that we can use function above for code X, then for code Y and, after that, we can say that code X is Z% faster/slower than code Y.
The problem
Ok, so we can easily measure time. But what about memory? Our previous assumption "if we want to test some code, it's equal to test some function" seems to be not true here. Why? Because - it's true from formal point, but if we'll hide code inside function, we'll never be able to measure memory after that. Example:
function foo($x, $y)
{
$bar = array_fill(0, $y, str_repeat('bar', $x));
//do stuff
}
function baz($n)
{
//do stuff, resulting in $x, $y
$bee = foo($x, $y);
//do other stuff
}
-and we want to test baz - i.e. how much memory it will use. By 'how much' I mean 'how much will be maximum memory usage during execution of function'. And it is obvious that we can not act like when we were measuring time of execution - because we know nothing about function outside of it - it's a black box. If fact, we even can't be sure that function will be successfully executed (imagine what will happen if somehow $x and $y inside baz will be assigned as 1E6, for example). Thus, may be it isn't a good idea to wrap our code inside function. But what if code itself contains other functions/methods call?
My approach
My current idea is to create somehow a function, which will measure memory after each input code's line. That means something like this: let we have code
$x = foo();
echo($x);
$y = bar();
-and after doing some thing, measure function will do:
$memory = memory_get_usage();
$max = 0;
$x = foo();//line 1 of code
$memory = memory_get_usage()-$memory;
$max = $memory>$max:$memory:$max;
$memory = memory_get_usage();
echo($x);//second line of code
$memory = memory_get_usage()-$memory;
$max = $memory>$max:$memory:$max;
$memory = memory_get_usage();
$y = bar();//third line of code
$memory = memory_get_usage()-$memory;
$max = $memory>$max:$memory:$max;
$memory = memory_get_usage();
//our result is $max
-but that looks weird and also it does not answer a question - how to measure function memory usage.
Use-case
Use-case for this: in most case, complexity-theory can provide at least big-O estimation for certain code. But:
First, code can be huge - and I want to avoid it's manual analysis as long as possible. And that is why my current idea is bad: it can be applied, yes, but it will still manual work with code. And, more, to go deeper in code's structure I will need to apply it recursively: for example, after applying it for top-level I've found that some foo() function takes too much memory. What I will do? Yes, go to this foo() function, and.. repeat my analysis within it. And so on.
Second - as I've mentioned, there are some language-specific things that can be resolved only by doing tests. That is why having some automatic way like for time measurement is my goal.
Also, garbage collection is enabled. I am using PHP 5.5 (I believe this matters)
The question
How can we effectively measure memory usage of certain function? Is it achievable in PHP? May be it's possible with some simple code (like benchmark function for time measuring above)?

After #bwoebi proposed great idea with using ticks, I've done some researching. Now I have my answer with this class:
class Benchmark
{
private static $max, $memory;
public static function memoryTick()
{
self::$memory = memory_get_usage() - self::$memory;
self::$max = self::$memory>self::$max?self::$memory:self::$max;
self::$memory = memory_get_usage();
}
public static function benchmarkMemory(callable $function, $args=null)
{
declare(ticks=1);
self::$memory = memory_get_usage();
self::$max = 0;
register_tick_function('call_user_func_array', ['Benchmark', 'memoryTick'], []);
$result = is_array($args)?
call_user_func_array($function, $args):
call_user_func($function);
unregister_tick_function('call_user_func_array');
return [
'memory' => self::$max
];
}
}
//var_dump(Benchmark::benchmarkMemory('str_repeat', ['test',1E4]));
//var_dump(Benchmark::benchmarkMemory('str_repeat', ['test',1E3]));
-so it does exactly what I want:
It is a black box
It measures maximum used memory for passed function
It is independent from context
Now, some background. In PHP, declaring ticks is possible from inside function and we can use callback for register_tick_function(). So my though was - to use anonymous function which will use local context of my benchmark function. And I've successfully created that. However, I don't want to affect global context and so I want unregister ticks handler with unregister_tick_function(). And that is where troubles are: this function expects string to be passed. So you can not unregister tick handler, which is closure (since it will try to stringify it which will cause fatal error because there's no __toString() method in Closure class in PHP). Why is it so? It's nothing else, but a bug. I hope fix will be done soon.
What are other options? The most easy option that I had in mind was using global variables. But they are weird and also it is side-effect which I want to avoid. I don't want to affect context. But, really, we can wrap all that we need in some class and then invoke tick function via call_user_func_array(). And call_user_func_array is just string, so we can overcome this buggy PHP behavior and do the whole stuff succesfully.
Update: I've implemented measurement tool from this. I've added time measurement and custom callback-defined measurement there. Feel free to use it.
Update: Bug, mentioned in this answer, is now fixed, so there's no need in trick with call_user_func(), registered as tick function. Now closure can be created and used directly.
Update: Due to feature request, I've added composer package for this measurement tool.

declare(ticks=1); // should be placed before any further file loading happens
That should say already all what I will say.
Use a tick handler and print on every execution the memory usage to a file with the file line with:
function tick_handler() {
$mem = memory_get_usage();
$bt = debug_backtrace(DEBUG_BACKTRACE_IGNORE_ARGS, 2)[0];
fwrite($file, $bt["file"].":".$bt["line"]."\t".$mem."\n");
}
register_tick_function('tick_handler'); // or in class: ([$this, 'tick_handler']);
Then look at the file to see how the memory varies in time, line by line.
You also can parse that file later by a separate program to analyse peaks etc.
(And to see how the possible peaks are by calling internal functions, you need to store the results into a variable, else it'll be already freed before the tick handler will measure the memory)

You can use the XDebug and a patch for XDebug which provides memory usage information
if this is not possible, you can always use memory_get_peak_usage() which i think would fit better than memory_get_usage()

This might not be exactly what you are looking for, but you could probably use XDebug to get at that information.

Just stumbled across
http://3v4l.org/
Although they don't provide details on how the benchmarks, respectively the taking of measures is implemented - don't think many people have over 100 PHP versions running in parallel on a VM beneath their desk ;)

Is it better call a function every time or store that value in a new variable?

I use often the function sizeof($var) on my web application, and I'd like to know if is better (in resources term) store this value in a new variable and use this one, or if it's better call/use every time that function; or maybe is indifferent :)

TLDR: it's better to set a variable, calling sizeof() only once. (IMO)
I ran some tests on the looping aspect of this small array:
$myArray = array("bill", "dave", "alex", "tom", "fred", "smith", "etc", "etc", "etc");
// A)
for($i=0; $i<10000; $i++) {
echo sizeof($myArray);
}
// B)
$sizeof = sizeof($myArray);
for($i=0; $i<10000; $i++) {
echo $sizeof;
}
With an array of 9 items:
A) took 0.0085 seconds
B) took 0.0049 seconds
With a array of 180 items:
A) took 0.0078 seconds
B) took 0.0043 seconds
With a array of 3600 items:
A) took 0.5-0.6 seconds
B) took 0.35-0.5 seconds
Although there isn't much of a difference, you can see that as the array grows, the difference becomes more and more. I think this has made me re-think my opinion, and say that from now on, I'll be setting the variable pre-loop.
Storing a PHP integer takes 68 bytes of memory. This is a small enough amount, that I think I'd rather worry about processing time than memory space.

In general, it is preferable to assign the result of a function you are likely to repeat to a variable.
In the example you suggested, the difference in processing code produced by this approach and the alternative (repeatedly calling the function) would be insignificant. However, where the function in question is more complex it would be better to avoid executing it repeatedly.
For example:
for($i=0; $i<10000; $i++) {
echo date('Y-m-d');
}
Executes in 0.225273 seconds on my server, while:
$date = date('Y-m-d');
for($i=0; $i<10000; $i++) {
echo $date;
}
executes in 0.134742 seconds. I know these snippets aren't quite equivalent, but you get the idea. Over many page loads by many users over many months or years, even a difference of this size can be significant. If we were to use some complex function, serious scalability issues could be introduced.
A main advantage of not assigning a return value to a variable is that you need one less line of code. In PHP, we can commonly do our assignment at the same time as invoking our function:
$sql = "SELECT...";
if(!$query = mysql_query($sql))...
...although this is sometimes discouraged for readability reasons.
In my view for the sake of consistency assigning return values to variables is broadly the better approach, even when performing simple functions.

If you are calling the function over and over, it is probably best to keep this info in a variable. That way the server doesn't have to keep processing the answer, it just looks it up. If the result is likely to change, however, it will be best to keep running the function.

Since you allocate a new variable, this will take a tiny bit more memory. But it might make your code a tiny bit more faster.
The troubles it bring, could be big. For example, if you include another file that applies the same trick, and both store the size in a var $sizeof, bad things might happen. Strange bugs, that happen when you don't expect it. Or you forget to add global $sizeof in your function.
There are so many possible bugs you introduce, for what? Since the speed gain is likely not measurable, I don't think it's worth it.

Unless you are calling this function a million times your "performance boost" will be negligible.

I do no think that it really matters. In a sense, you do not want to perform the same thing over and over again, but considering that it is sizeof(); unless it is a enormous array you should be fine either way.

I think, you should avoid constructs like:
for ($i = 0; $i < sizeof($array), $i += 1) {
// do stuff
}
For, sizeof will be executed every iteration, even though it is often not likely to change.
Whereas in constructs like this:
while(sizeof($array) > 0) {
if ($someCondition) {
$entry = array_pop($array);
}
}
You often have no choice but to calculate it every iteration.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Recursion develops spikes in memory usage - php

Related

will php copy value when I assign value?

How much resources does PHP waste on irrelevant code, if any?

PHP behavior under the hood

Benchmark memory usage in PHP

Is it better call a function every time or store that value in a new variable?

Categories

Resources