Calling unset() in PHP script

Calling unset() in PHP script - php

Coming from a C/C++ background, I am used to doing my own garbage collection - i.e. freeing resources after using them (i.e. RAII in C++ land).
I find myself unsetting (mostly ORM) variables after using them. Are there any benefits of this habit?
I remeber reading somewhere a while back, that unsetting variables marks them for deletion for the attention of PHP's GC - which can help resource usage on the server side - true or false?
[Edit]
I forgot to mention, I am using PHP 5.3, and also most of the unset() calls I make are in a loop where I am processing several 'heavy' ORM variables

I find that if your having to unset use alot your probably doing it wrong. Let scoping doing the "unsetting" for you. Consider the two examples:
1:
$var1 = f( ... );
....
unset( $var1 );
$var2 = g( ... );
....
unset( $var2 );
2:
function scope1()
{
$var1 = f( ... );
....
} //end of function triggers release of $var1
function scope2()
{
$var2 = g( ... );
....
} //end of function triggers release of $var2
scope1();
scope2();
The second example I would be preferable because it clearly defines the scope and decreases the risk of leaking variables to global scope (which are only released at the end of the script).
EDIT:
Another things to keep in mind is the unset in PHP costs more (CPU) than normal scope garbage collection. While the difference is small, it goes to show how little of an emphases the PHP team puts on unset. If anything unset should give PHP insight that on how to release memory, but it actually adds to execution time. unset is really only a hack to free up variables that are no longer needed, unless your doing something fairly complex, reusing variables (which acts like a natural unset on the old variable) and scoping should be all you ever need.
function noop( $value ){}
function test1()
{
$value = "something";
noop( $value ); //make sure $value isn't optimized out
}
function test2()
{
$value = "something";
noop( $value ); //make sure $value isn't optimized out
unset( $value );
}
$start1 = microtime(true);
for( $i = 0; $i < 1000000; $i++ ) test1();
$end1 = microtime(true);
$start2 = microtime(true);
for( $i = 0; $i < 1000000; $i++ ) test2();
$end2 = microtime(true);
echo "test1 ".($end1 - $start1)."\n"; //test1 0.404934883118
echo "test2 ".($end2 - $start2)."\n"; //test2 0.434437990189

If a very large object is used early in a long script, and there is no opportunity for the object to go out of scope, then unset() might help with memory usage. In most cases, objects go out of scope and they're marked for GC automatically.

Yes it can especially when you are dealing with big arrays, and script require much time to run.

Without going to look up some proof I'm going to say that it doesn't really matter. Garbage collection occurs automatically when you leave a function or a script ends. So unless you are really strapped for resources, don't bother.
OK, looked something up. Here is a good quote:
"Freeing memory - particularly large
amounts - isn't free in terms of
processor time, which means that if
you want your script to execute as
fast as possible at the expense of
RAM, you should avoid garbage
collection on large variables while
it's running, then let PHP do it en
masse at the end of the script."
For more info on the subject check out the links provided in the first answer here.

I thought PHP variables were only preserved through the lifetime of your script, so it's unlikely to help that much unless your script is particularly long-running or using a lot of temporary memory in one step.
Clearing explicitly may be slower than letting them all be automatically cleared at startup.
You're adding more code, which is generally going to make thing slower unless you know that it helps.
Premature optimization, in any case.

The PHP GC is usually good enough so that you usually do not need to call unset() on simple variables. For objects however, the GC will only destroy them when they leave scope and no other objects refer to them. Unset can help with memory in this case. See http://us3.php.net/manual/en/language.references.unset.php

I have had to use unset when you are running into memory issues when looping through and making copies of arrays. I would say don't use it unless you are in this situation ad the GC will kick in automatically.

Related

Benchmark memory usage in PHP

Let us suppose that we have some problem and at least two solutions for it. And what we want to achieve - is to compare effectiveness for them. How to do this? Obviously, the best answer is: do tests. And I doubt there's a better way when it comes to language-specific questions (for example "what is faster for PHP: echo 'foo', 'bar' or echo('foo'.'bar')").
Ok, now we'll assume that if we want to test some code, it's equal to test some function. Why? Because we can wrap that code to function and pass it's context (if any) as it's parameters. Thus, all we need - is to have, for example, some benchmark function which will do all stuff. Here's very simple one:
function benchmark(callable $function, $args=null, $count=1)
{
$time = microtime(1);
for($i=0; $i<$count; $i++)
{
$result = is_array($args)?
call_user_func_array($function, $args):
call_user_func_array($function);
}
return [
'total_time' => microtime(1) - $time,
'average_time' => (microtime(1) - $time)/$count,
'count' => $count
];
}
-this will fit our issue and can be used to do comparative benchmarks. Under comparative I mean that we can use function above for code X, then for code Y and, after that, we can say that code X is Z% faster/slower than code Y.
The problem
Ok, so we can easily measure time. But what about memory? Our previous assumption "if we want to test some code, it's equal to test some function" seems to be not true here. Why? Because - it's true from formal point, but if we'll hide code inside function, we'll never be able to measure memory after that. Example:
function foo($x, $y)
{
$bar = array_fill(0, $y, str_repeat('bar', $x));
//do stuff
}
function baz($n)
{
//do stuff, resulting in $x, $y
$bee = foo($x, $y);
//do other stuff
}
-and we want to test baz - i.e. how much memory it will use. By 'how much' I mean 'how much will be maximum memory usage during execution of function'. And it is obvious that we can not act like when we were measuring time of execution - because we know nothing about function outside of it - it's a black box. If fact, we even can't be sure that function will be successfully executed (imagine what will happen if somehow $x and $y inside baz will be assigned as 1E6, for example). Thus, may be it isn't a good idea to wrap our code inside function. But what if code itself contains other functions/methods call?
My approach
My current idea is to create somehow a function, which will measure memory after each input code's line. That means something like this: let we have code
$x = foo();
echo($x);
$y = bar();
-and after doing some thing, measure function will do:
$memory = memory_get_usage();
$max = 0;
$x = foo();//line 1 of code
$memory = memory_get_usage()-$memory;
$max = $memory>$max:$memory:$max;
$memory = memory_get_usage();
echo($x);//second line of code
$memory = memory_get_usage()-$memory;
$max = $memory>$max:$memory:$max;
$memory = memory_get_usage();
$y = bar();//third line of code
$memory = memory_get_usage()-$memory;
$max = $memory>$max:$memory:$max;
$memory = memory_get_usage();
//our result is $max
-but that looks weird and also it does not answer a question - how to measure function memory usage.
Use-case
Use-case for this: in most case, complexity-theory can provide at least big-O estimation for certain code. But:
First, code can be huge - and I want to avoid it's manual analysis as long as possible. And that is why my current idea is bad: it can be applied, yes, but it will still manual work with code. And, more, to go deeper in code's structure I will need to apply it recursively: for example, after applying it for top-level I've found that some foo() function takes too much memory. What I will do? Yes, go to this foo() function, and.. repeat my analysis within it. And so on.
Second - as I've mentioned, there are some language-specific things that can be resolved only by doing tests. That is why having some automatic way like for time measurement is my goal.
Also, garbage collection is enabled. I am using PHP 5.5 (I believe this matters)
The question
How can we effectively measure memory usage of certain function? Is it achievable in PHP? May be it's possible with some simple code (like benchmark function for time measuring above)?

After #bwoebi proposed great idea with using ticks, I've done some researching. Now I have my answer with this class:
class Benchmark
{
private static $max, $memory;
public static function memoryTick()
{
self::$memory = memory_get_usage() - self::$memory;
self::$max = self::$memory>self::$max?self::$memory:self::$max;
self::$memory = memory_get_usage();
}
public static function benchmarkMemory(callable $function, $args=null)
{
declare(ticks=1);
self::$memory = memory_get_usage();
self::$max = 0;
register_tick_function('call_user_func_array', ['Benchmark', 'memoryTick'], []);
$result = is_array($args)?
call_user_func_array($function, $args):
call_user_func($function);
unregister_tick_function('call_user_func_array');
return [
'memory' => self::$max
];
}
}
//var_dump(Benchmark::benchmarkMemory('str_repeat', ['test',1E4]));
//var_dump(Benchmark::benchmarkMemory('str_repeat', ['test',1E3]));
-so it does exactly what I want:
It is a black box
It measures maximum used memory for passed function
It is independent from context
Now, some background. In PHP, declaring ticks is possible from inside function and we can use callback for register_tick_function(). So my though was - to use anonymous function which will use local context of my benchmark function. And I've successfully created that. However, I don't want to affect global context and so I want unregister ticks handler with unregister_tick_function(). And that is where troubles are: this function expects string to be passed. So you can not unregister tick handler, which is closure (since it will try to stringify it which will cause fatal error because there's no __toString() method in Closure class in PHP). Why is it so? It's nothing else, but a bug. I hope fix will be done soon.
What are other options? The most easy option that I had in mind was using global variables. But they are weird and also it is side-effect which I want to avoid. I don't want to affect context. But, really, we can wrap all that we need in some class and then invoke tick function via call_user_func_array(). And call_user_func_array is just string, so we can overcome this buggy PHP behavior and do the whole stuff succesfully.
Update: I've implemented measurement tool from this. I've added time measurement and custom callback-defined measurement there. Feel free to use it.
Update: Bug, mentioned in this answer, is now fixed, so there's no need in trick with call_user_func(), registered as tick function. Now closure can be created and used directly.
Update: Due to feature request, I've added composer package for this measurement tool.

declare(ticks=1); // should be placed before any further file loading happens
That should say already all what I will say.
Use a tick handler and print on every execution the memory usage to a file with the file line with:
function tick_handler() {
$mem = memory_get_usage();
$bt = debug_backtrace(DEBUG_BACKTRACE_IGNORE_ARGS, 2)[0];
fwrite($file, $bt["file"].":".$bt["line"]."\t".$mem."\n");
}
register_tick_function('tick_handler'); // or in class: ([$this, 'tick_handler']);
Then look at the file to see how the memory varies in time, line by line.
You also can parse that file later by a separate program to analyse peaks etc.
(And to see how the possible peaks are by calling internal functions, you need to store the results into a variable, else it'll be already freed before the tick handler will measure the memory)

You can use the XDebug and a patch for XDebug which provides memory usage information
if this is not possible, you can always use memory_get_peak_usage() which i think would fit better than memory_get_usage()

This might not be exactly what you are looking for, but you could probably use XDebug to get at that information.

Just stumbled across
http://3v4l.org/
Although they don't provide details on how the benchmarks, respectively the taking of measures is implemented - don't think many people have over 100 PHP versions running in parallel on a VM beneath their desk ;)

When unset() should really be used?

I'm curious about using of unset() language construct just about everywhere, where I took memory or declare some variables (regardless of structure).
I mean, when somebody declares variable, when should it really be left for GC, or be unset()?
Example 1:
<?php
$buffer = array(/* over 1000 elements */);
// 1) some long code, that uses $buffer
// 2) some long code, that does not use $buffer
?>
Is there any chance, that $buffer might affect performance of point 2?
Am I really need (or should) to do unset($buffer) before entering point 2?
Example 2:
<?php
function someFunc(/* some args */){
$buffer = new VeryLargeObject();
// 1) some actions with $buffer methods and properties
// 2) some actions without usage of $buffer
return $something;
}
?>
Am I really need (or should) to do unset($buffer) within someFunc()s body before entering point 2?
Will GC free all allocated memory (references and objects included) within someFunc()s scope, when function will come to an end or will find return statement?
I'm interested in technical explaination, but code style suggestions are welcome too.
Thanks.

In php, all memory gets cleaned up after script is finished, and most of the time it's enough.
From php.net:
unset() does just what it's name says - unset a variable. It does not
force immediate memory freeing. PHP's garbage collector will do it
when it see fits - by intention as soon, as those CPU cycles aren't
needed anyway, or as late as before the script would run out of
memory, whatever occurs first.
If you are doing $whatever = null; then you are rewriting variable's
data. You might get memory freed / shrunk faster, but it may steal CPU
cycles from the code that truly needs them sooner, resulting in a
longer overall execution time.
In reality you would use unset() for cleaning memory pretty rare, and it's described good in this post:
https://stackoverflow.com/a/2617786/1870446
By doing an unset() on a variable, you mark the variable for being "garbage collected" so the memory isn't immediately available. The variable does not have the data anymore, but the stack remains at the larger size.
In PHP >= 5.3.0, you can call gc_collect_cycles() to force a GC pass. (after doing gc_enable() first).
But you must understand that PHP is script language, it's not Java so you shouldn't consider it like one. If your script is really that heavy to use tons of RAM - you can use unset and when script is close to exceed the memory - GC will trigger and clean up everything useless, including your unset variables. But in most cases you can forget about it.
Also, if you would want to go for unsetting every variable you do not use - don't. It will actually make your script execute longer - by using more CPU cycles - for the sake of getting free memory that would, in most cases, would never be needed.
Some people also say that they use unset to explicitly show that they won't use variable anymore. I find it a bad practice too, for me it just makes code more verbose with all these useless unsets.

PHP function variables and the garbage collector

I was wondering if anyone could answer me this quick question. I tried searching it but I get similar questions but in the wrong context.
What I am wondering is take this code:
function foo()
{
$test_array = array();
for($i=0; $i<10000000; $i++)
{
$test_array[] = $i;
}
}
What happens to $test_array after the function finishes. I know that it looses scope, I am not new to programming.
What I am wondering is should I call
unset($test_array);
before the function ends or does PHP set it for deletion to the garbage collector as the function ends?
I used the for loop just to show a variable of a fair size to get my point across.
Thanks for reading
Kevin

Once $test_array is no longer in scope (and there are no additional references that point to it), it is flagged for garbage collection.
It ceases to be in scope when the process returns from the function to the calling routine.
So there is no need to unset it.
This would only be different if you had declared $test_array as static.

unset() doesn't free the memory a variable uses, it just marks it for the garbage collector which will decide when to free the memory (when it has free cpu cycles or when it runs out of memory, whichever comes first).
However you have to realize that ALL memory used by a PHP script is freed when the script finishes which, most of the time, is measured in milliseconds, so if you're not doing any lengthy operations that would exceed the "normal" execution time of a PHP script you shouldn't worry about freeing memory.

Is it better call a function every time or store that value in a new variable?

I use often the function sizeof($var) on my web application, and I'd like to know if is better (in resources term) store this value in a new variable and use this one, or if it's better call/use every time that function; or maybe is indifferent :)

TLDR: it's better to set a variable, calling sizeof() only once. (IMO)
I ran some tests on the looping aspect of this small array:
$myArray = array("bill", "dave", "alex", "tom", "fred", "smith", "etc", "etc", "etc");
// A)
for($i=0; $i<10000; $i++) {
echo sizeof($myArray);
}
// B)
$sizeof = sizeof($myArray);
for($i=0; $i<10000; $i++) {
echo $sizeof;
}
With an array of 9 items:
A) took 0.0085 seconds
B) took 0.0049 seconds
With a array of 180 items:
A) took 0.0078 seconds
B) took 0.0043 seconds
With a array of 3600 items:
A) took 0.5-0.6 seconds
B) took 0.35-0.5 seconds
Although there isn't much of a difference, you can see that as the array grows, the difference becomes more and more. I think this has made me re-think my opinion, and say that from now on, I'll be setting the variable pre-loop.
Storing a PHP integer takes 68 bytes of memory. This is a small enough amount, that I think I'd rather worry about processing time than memory space.

In general, it is preferable to assign the result of a function you are likely to repeat to a variable.
In the example you suggested, the difference in processing code produced by this approach and the alternative (repeatedly calling the function) would be insignificant. However, where the function in question is more complex it would be better to avoid executing it repeatedly.
For example:
for($i=0; $i<10000; $i++) {
echo date('Y-m-d');
}
Executes in 0.225273 seconds on my server, while:
$date = date('Y-m-d');
for($i=0; $i<10000; $i++) {
echo $date;
}
executes in 0.134742 seconds. I know these snippets aren't quite equivalent, but you get the idea. Over many page loads by many users over many months or years, even a difference of this size can be significant. If we were to use some complex function, serious scalability issues could be introduced.
A main advantage of not assigning a return value to a variable is that you need one less line of code. In PHP, we can commonly do our assignment at the same time as invoking our function:
$sql = "SELECT...";
if(!$query = mysql_query($sql))...
...although this is sometimes discouraged for readability reasons.
In my view for the sake of consistency assigning return values to variables is broadly the better approach, even when performing simple functions.

If you are calling the function over and over, it is probably best to keep this info in a variable. That way the server doesn't have to keep processing the answer, it just looks it up. If the result is likely to change, however, it will be best to keep running the function.

Since you allocate a new variable, this will take a tiny bit more memory. But it might make your code a tiny bit more faster.
The troubles it bring, could be big. For example, if you include another file that applies the same trick, and both store the size in a var $sizeof, bad things might happen. Strange bugs, that happen when you don't expect it. Or you forget to add global $sizeof in your function.
There are so many possible bugs you introduce, for what? Since the speed gain is likely not measurable, I don't think it's worth it.

Unless you are calling this function a million times your "performance boost" will be negligible.

I do no think that it really matters. In a sense, you do not want to perform the same thing over and over again, but considering that it is sizeof(); unless it is a enormous array you should be fine either way.

I think, you should avoid constructs like:
for ($i = 0; $i < sizeof($array), $i += 1) {
// do stuff
}
For, sizeof will be executed every iteration, even though it is often not likely to change.
Whereas in constructs like this:
while(sizeof($array) > 0) {
if ($someCondition) {
$entry = array_pop($array);
}
}
You often have no choice but to calculate it every iteration.

The advantage / disadvantage between global variables and function parameters in PHP?

sorry i'm a beginner and i can't determine how good a question this is, maybe it sounds utterly obvious to some of you.
if our use of these two below is the same which is better?
function doSomething ($var1,$var2,..){
...
}
OR
function doSomething (){
global $var1,$var2,..;
...
}
by our use I mean that I know that in the second scenario we can also alter the global variables' value. but what if we don't need to do that, which is the better way of writing this function? does passing variables take less memory than announcing global's in a function?

The memory usage is a paltry concern. It's much more important that the code be easy to follow and not have... unpredicted... results. Adding global variables is a VERY BAD IDEA from this standpoint, IMO.
If you're concerned about memory usage, the thing to do is
function doSomething (&$var1, &$var2,..) {
...
}
This will pass the variables by reference and not create new copies of them in memory. If you modify them during the execution of the function, those modifications will be reflected when execution returns to the caller.
However, please note that it's very unusual for even this to be necessary for memory reasons. The usual reason to use by-reference is for the reason I listed above (modifying them for the caller). The way to go is almost always the simple
function doSomething ($var1, $var2) {
...
}

Avoid using global variables, use the passing variables in parameters approach instead. Depending on the size of your program, the performance may be negligible.
But if you are concerned with performance here are some key things to note about global variable performance with regards to local variables (variables defined within functions.)
Incrementing a global variable is 2 times slow than a local var.
Just declaring a global variable without using it in a function also slows things down (by about the same amount as incrementing a local var). PHP probably does a check to see if the global exists.
Also, global variables increase the risk of using wrong values, if they were altered elsewhere inside your code.

Write it to take parameters. Maintainability is far more important than micro-optimization. When you take parameters, the variables can not be modified in unexpected places.

Although it is not good practice as long as you guarantee that the global is never written, but only read you will have the flexibility of paramaters.
As as alternative, you can pass one parameter (or two if it really goes with the function, like exp) and the rest in an array of option (a bit like jquery does).
This way you are not using globals, have some parameter flexibility and have clearly defined the defaults for each parameter.
function get_things($thing_name,$opt= array() {
if(!isset($opt["order"])) $opt["order"]= 'ASC';
}

Pass in parameters, avoid globals. Keeping only the scope you need for a given situation is a measure of good code design. You may want to look at PHP variable scope...
http://php.net/manual/en/language.variables.scope.php
An excellent resource, with some pointers on what is best practices and memory management.

As of PHP 4 using global with big variables affects performance significantly.
Having in $data a 3Mb string with binary map data and running 10k tests if the bit is 0 or 1 for different global usage gives the following time results:
function getBit($pos) {
global $data;
$posByte = floor($pos/8);
...
}
t5 bit open: 0.05495s, seek: 5.04544s, all: 5.10039s
function getBit($data) {
global $_bin_point;
$pos = $_bin_point;
$posByte = floor($pos/8);
}
t5 bit open: 0.03947s, seek: 0.12345s, all: 0.16292s
function getBit($data, $pos) {
$posByte = floor($pos/8);
...
}
t5 bit open: 0.05179s, seek: 0.08856s, all: 0.14035s
So, passing parameters is way faster than using global on variables >= 3Mb. Haven't tested with passing a $&data reference and haven't tested with PHP5.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.