When to pass-by-reference in PHP - php

Im wondering if its good practice to pass-by-reference when you are only reading a variable, or if it should always be passed as a value.
Example with pass-by-reference:
$a = 'fish and chips';
$b = do_my_hash($a);
echo $b;
function &do_my_hash(&$value){
return md5($value);
}
Example with pass-by-value:
$a = 'fish and chips';
$b = do_my_hash($a);
echo $b;
function do_my_hash($value){
return md5($value);
}
Which is better ? E.g if I was to run a loop with 1000 rounds ?
Example of loop:
for($i = 0 ; $i < 1000 ; $i++){
$a = 'Fish & Chips '.$i;
echo do_my_hash($a);
}

If you mean to pass a value (so the function doesn't modify it), there is no reason to pass it by reference : it will only make your code harder to understand, as people will think "this function could modify what I will pass to it — oh, it doesn't modify it?"
In the example you provided, your do_my_hash function doesn't modify the value you're passing to it; so, I wouldn't use a reference.
And if you're concerned about performance, you should read this recent blog post: Do not use PHP references:
Another reason people use reference is
since they think it makes the code
faster. But this is wrong. It is even
worse: References mostly make the code
slower! Yes, references often make
the code slower - Sorry, I just had to
repeat this to make it clear.
Actually, this article might be an interesting read, even if you're not primarily concerned about performance ;-)

PHP makes use of copy-on-write as much as possible (whenever it would typically increase performance) so using references is not going to give you any performance benefit; it will only hurt. Use references only when you really need them. From the PHP Manual:
Do not use return-by-reference to increase performance. The engine will automatically optimize this on its own. Only return references when you have a valid technical reason to do so.

Good programming practice is always to pass by value whenever you can, and if you have to modify a single value it's generally better to return the modified value as a result of a function rather than pass the value by reference.
The only cases where you may need to pass by reference is where you need to modify multiple values. However these cases tend to be rare and usually should be treated as a flag to check you code because there's probably a better way of approaching the problem.
Back in the day early programming languages always used to pass by reference and passing by value was a later development to tackle the problems that this produced (you tend to end up with obscure bugs because sooner or later some programmer puts in code to modify the passed by reference value in some function or other and then it's tricky to identify where and fix properly - you tend to end up with multiple, obscure, dependencies). Consequently it's pretty perverse really to seriously consider this as an option for shaving a few machine cycles when we're multiple generations of processor beyond the point when it was considered to be a good trade-off of cpu vs complexity to aid clean, maintainable, code.

The joy of micro-optimisation. :-)
To be honest, there's probably not a great deal to be gained by passing 'normal' variables by reference (unless you want to affect their value in their original scope). Also, since PHP 5 objects are automatically passed by reference.

Passing by reference offers no benefit if you don't want to modify that value inside the function. I try to use pass-by-value as much as possible, as it's much easier to read, and the flow of the script is more consistent.

Related

Passing by Reference (when variable can change, but it does not have to)

It has been explained quite thoroughly that you only pass by reference in PHP if their is a technical reason to do so, because Copy-On-Write basically makes the performance equivalent. From what I understand, if it is never changed it never does copy the object.
But what if the function does change the variable, but your code never uses it again/does not use any part that is changed? it does not matter to the code if the original is changed or not. Yes, it is possible that the PHP optimiser takes this situation into account, but I have no reason to believe it does.
And passing a single reference is sure going to be a whole lot faster than copying a huge array or object.
So is this is good situation to pass by reference or not?
For Example, say you pass in a DomCrawler (not much more than a big [html formatted] string, except it is passed by reference implicitly in this specific case). Crawl a little and extract some information. In many situations you would not need that Crawler reset to its original position, as you are simply not using it again.
Also, imagine latter that we do use the DOMCrawler, we read the URI from it. The function did not change this, so passing by reference or value is still equivalent, but will passing by reference not be significantly more optimal? I think this situation would be very hard for any optimiser to spot.
So is this is good situation to pass by reference or not?
No.
Okay. Imagine you have a $bigString and you pass it to a function, the function modifies it and does something with it and the caller never wants it again. Passing by reference is initially faster since it avoids the copy. However, it's still a bad idea.
(1) If a different caller calls your function that does want to continue using that variable, things break. The reference violates encapsulation, basically.
(2) As soon as you have more than 1 non-reference variable outside the function refering to that value, merely creating the reference requires the copy again. (Variable values are held in containers that may be either a non-reference (copy-on-modify) or a reference (do nothing special on modify), so for reference variables and non-reference variables to try refer to that value at the same time, it has to be duplicated.)
(3) Because of the above, something as innocent as calling strlen within the function will have to duplicate the value, because strlen's parameter is passed by-value, which is the norm. Now imagine you call a few functions, such as substr, and maybe strlen in a loop, and you're making a new copy of the data every time.
(4) DDR3 RAM can shove around more than 10 GB per second and CPU cache RAM is goodness knows how fast. I think there are bigger things to worry about with PHP performance than how long a string or array copy takes.
Don't use references for superstitious performance gains. It never works.
If you really want to avoid the copy, the right way to do this is probably to put your function as a method of an object that looks after the variable:
class Thing {
private $bigString;
public function foo() {
$this->bigString[0] = 'x';
}
}
Then you avoid copying, get the benefits of encapsulation and none of the subtleties of references.
PS: DomCrawler is not a good example because it's an object. PHP objects are never copy-on-write anyway (well I think they are, but there is an additional level of indirection so the only part that is copy-on-write is a small pointer container, or something like that).
I've always avoided passing by reference for the same reason I avoid goto.
$a = myFunction($a);
Is more easily read and reused than myFunction(&$a);
From my understanding of the PHP system, everything is passed by "reference". So if you are passing around huge arrays or objects, they are always passed by "reference".
I put "reference" in quotes cause there are 2 different types here:
Explicit References is where you specify to php that you want it tracked as a reference
Implicit references is where you want it tracked as a value rather
PHP defaults to the implicit reference.
So there is no performance implications until such a time as you change an implicit reference. In this case PHP will allocate copy the values to separate memory addresses and update your reference.
If the compiler detects that the variable is no longer used or is no longer in scope, the GC will scoop it up.

Pros and cons of assigning new variables vs. re-using exisiting ones

First, I apologize if this just a coding style issue. I'm wondering about the pros and cons of assign a new variable for each property or function to just to re-assign an existing variable. This is assuming you don't need access to the variable beyond the scope.
Here's what I mean (noting that the names $var0,... are just for simplicity),
Option#1:
$var0= array('hello', 'world');
$var1="hello world";
$var2=//some crazy large database query result
$var3=//some complicated function()
vs.
Option#2:
$var0= array('hello', 'world');
$var0="hello world";
$var0=//some crazy large database query result
$var0=//some complicated function()
Does it depend on the memory size of the existing variable? I.e., is re-assigning memory more computationally expensive that assigning a new variable?
Is this always a scope issue, meaning you should use Option#2 only if you don't need each of the variable values outside the scope shown here?
Does it depend on what each variable value is? Does re-assigning to different data types have different costs associated with it?
Technically speaking, reusing variables would be (insignificantly) faster. It will make zero difference in measurable performance though.
While hardware gets cheaper and hours get more expensive, you should rather look to have maintainable code. This will save yourself headaches and your company hard dollars in the long run.
Only optimize where enough performance gain can be made to offset the
amount of work (money) you are putting into it.
Nowadays of clouds and server clusters, a-bit-less-optimized code will most likely not make for a slower project in the end. It is more probable that your project will run just as fast, but will take a few more cpu cycles, and therefore cost you a little bit more money to your hosting provider. This minor added cost though, will most likely not weigh up to hours of optimizing for performance gain. Unless, ofcourse, you're asking this because you're developing for Amazon. (and even at places like Amazon, with millions and millions of hits per day, reusing variables will not result any noticable performance gain)
To get back to your question; I believe you should only reuse a variable when it will hold updated content of the original state. But in general, that doesn't happen too much.
I think in the following situation, reusing the $content var is the logical choice to make
function getContent()
{
$cacheId = 'someUniqueCacheIdSoItDoesNotTriggerANotice';
$content = someCacheLoadingCall( $cacheId );
if (null === $content) {
$content = someContentGeneratingFunction();
someCacheSavingCall( $cacheId, $content);
}
return $content;
}
Descriptive code
Also, please be kind to your future self to always use descriptive names for your variables. You will thank yourself for it. When you then make the pact with yourself to never reuse variables unless it logically makes sense, you've made another step towards maintainable code.
Imagine, that in 6 months from now, after you've done another big project - or a more small projects - you get a call from an important client that there is a bug in the old project. Holy !##! Gotta fix that right now!
You open up and see functions like this everywhere;
function gC()
{
$cI = 'someUniqueCacheIdSoItDoesNotTriggerANotice';
$c = sclc( $cI );
if (null === $c) {
$c = scg_f();
scsc( $cI, $c);
}
return $c;
}
Much better to use descriptive variable and function names and to get a code editor with good code completion so you're still coding as fast as you want. Right now, I would recommend Aptana Studio or Zend Studio, Zend has a little bit better code completion, but Aptana has proven to be more stable.
PS. I don't know your level of programming, sorry if I babbled on too far. If not relevant for you, I hope to have helped someone else who might read this :)
Personally I would say you should never ever reassign a variable to contain different stuff. This makes it really hard to debug. If you are worried about memory consumption you can always unset() variables.
Also note that you should never ever have variables names $var#. Your variablenames should describe what it holds.
In the end of the day it's all about minimizing the number of WTFs inyour code. And option two is one big WTF.
Does it depend on the memory size of the existing variable? I.e., is re-assigning memory more computationally expensive that assigning a new variable?
It's about limiting the number of WTFs for both you and other people (re)viewing your code.
Is this always a scope issue, meaning you should use Option#2 only if you don't need each of the variable values outside the scope shown here?
Well if it is in a totally other scope it is fine if you use the same name multiple names. As long as it is clear what the variabel contains, e.g.:
// perfectly fine to use the same name again. I would go as far as to say this is prefered.
function doSomethingWithText($articleText)
{
// do something
}
$articleText = 'Some text of some article';
doSomethingWithText($articleText);
Does it depend on what each variable value is? Does re-assigning to different data types have different costs associated with it?
Not a matter of cost, but a matter of maintainability. Which is often way more important.
You should never use option #2. Reusing variables for unrelated blocks of code is a terrible practice. You shouldn't even be in a situation where option #2 is possible. If your function is so long that you're changing context completely and working on some different problem, you should refactor your function into smaller single-purpose functions.
You should never reuse a variable out of some desire to "recycle" them after the old value is no longer used. If a variable is no longer it should naturally fall out of scope if you're architecturing your software correctly. Your decision should have nothing to do with performance or memory-optimization, neither of which are affected by the naming of your variables. Your only consideration when picking variable names should be producing maintainable, stable code.
The fact that you're even asking yourself whether to reuse your variables means you're using names which are too generic. Variable names like var0,var1 etc are terrible. You should be naming your variables according to what they actually contain, and declaring a new variable when you need to store a new value.

php return or reference?

I have got a couple of functions that manipulate data in an array, for example unset_data() you pass it the value and an unlimited amount of string args like:
unset_data( $my_array, "firstname", "password" );
and it can handle multi-dimentional arrays etc, quite simple.
But should this function use a reference to the array and change it directly?
Or should i return the new array with the values unset.
I can never decide whether a function should use reference or not,
Is there like, specific cases or examples when and when to not use them??
Thanks
I'd ask myself what the expected use case of the function is. Does the typical use case involve keeping the original data intact and deriving new data from it, or is the explicit use case of this function to modify data in place?
Say md5 would modify data in place, that would be pretty inconvenient, since I usually want to keep the original data intact. So I'd always have to do this:
$hash = $data;
md5($hash);
instead of:
$hash = md5($data);
That's pretty ugly code, forced on you by the API of the function.
For unset though, I don't think the typical use case is for deriving new data:
$arr = unset($arr['foo']);
That seems pretty clunky as well as possibly a performance hit.
Generally speaking, it's better to return by value instead of taking a reference because:
It's the most common usage pattern (there's one less thing to keep in mind about this particular function)
You can create call chains freely, e.g. you can write array_filter(unset_data(...))
Generally speaking, code without side effects (I 'm calling the mutation of an argument in a manner visible to the caller a side effect) is easier to reason about
Most of the time, these advantages come at the cost of using up additional memory. Unless you have good reason (or better yet, proof) to believe that the additional memory consumption is going to be an issue, my advice is to just return the mutated value.
I feel that there is not a general you should/shouldn't answer to this question - it depends entirely on the usage case.
My personal feeling is leaning towards passing by reference, to keep it's behaviour more in line with the native unset(), but if you are likely to end up regularly having to make copies of the array before you call the function, then go with a return value. Another advantage of the by reference approach is that you can return some other information as well as achieving modification of the array - for example, you could return an integer describing how many values were removed from the array based on the arguments.
I don't think there is a solid argument for "best practice" with either option here, so the short answer would be:
Do whatever you are most comfortable with and whatever allows you to write the most concise, readable and self-documenting code.

Code optimization in PHP

I need 3 temporary variables is there any difference between this codes ?
$temp1, $temp2, $temp3 vs $temp[0],$temp[1],$temp[2];
If i have a variable name $X in page1.php and in page2.php and then i include page2.php in page1.php, is the variable $X going to be overwritten ?
can i stop this ?
Generally speaking, arrays and individual variables shouldn't have any size or performance drawback in a low-level language like C/C++. In these languages arrays are just a chain of data types back-to-back in memory.
Because PHP has more advanced array constructs which enable you to do some extra things which you can't do in most low-level languages such as determining the length of the array, it is likely that the interpreter will introduce a size and/or speed overhead to array operations, however for modest-sized I'd say this would be negligible.
If optimisation is high on your list of priorities, then maybe you should consider a different programming language. PHP is not known for its bleeding-edge performance.
If you have a global variable $X in page1.php and include page2.php which also defines a global variable $X, then they will be the same global $X variable. I do not believe there is any way to avoid this.
You shouldn't be defining $X more than once in several pages to begin with. Why not just give them different names?
Also, something which might lead to a similar situation:
When you have a situation where page1.php includes page2.php and page3.php, and page 3.php includes page2.php also, it's important to use the require_once() function to make sure pages are only included once.
First, there isn't enough of a difference to really need to choose between one or the other.
Second, it depends on the scope of each variable, if they are defined inside say a loop, then no one wouldn't overwrite the other. Otherwise yes.
Here is a good bit on variable scope: http://php.net/manual/en/language.variables.scope.php
An interpreter has to reevaluate the expressions it processes every time. $temp1[0] will very likely take longer than $temp1 to evaluate, as a consequence. Whether this matters in a visible way to the execution of program depends on how many times this is executed; if just a few, you won't be able to realistically measure the difference; if millions of times, you might be able to visibly see the difference.
If you were using a really good, optimizing compiler, it might make no difference in any case, as a good compiler can see you are referencing a unique location.
As a general rule, I'd write $temp1 rather than $temp[1] just because it is less to type, easier to read, and generally runs faster; what's not to like?
Regarding your $X question: the value of $X at global scope won't be overwritten by the include. If the include has an assignment to $X, the act of "including" it causes the assignment to occur and $X will be overwritten.
Using $temp[0] would be better if you plan on expanding the amount of temporary variables in the future. Also, if you decide to do some operation on all of the variables then a simple foreach() loop would work, but would be much harder if you used the other method of $temp0, $temp1, etc. Therefore, use $temp[0] and protect yourself for the future.
I am pretty sure that $X would be overwritten and you could create classes or possibly use a temporary variable?
(I am not sure about my answer to the second part, but am sure about the first part.)

When in optimization phase, would it be smart to pass all array by ref?

In PHP, When we get to the optimization phase of the application (I am speaking about an app that handles hundreds of thousands of users) would it be smart to pass all arrays (where I do not mind what happens to them after I pass them to the function) by ref?
Probably not, there's likely bigger performance hits elsewhere.
Profile your application. Never trust anyone except your profiler to tell you what to optimize.
Sometimes the oddest pieces of code use up most of the execution time, and without knowing exactly where the bottlenecks are, you might spend a long time optimizing away 0.01% of the execution time.
PHP arrays have copy-on-write semantics. That means that new memory isn't allocated for the new array until it is modified. For example:
$a = array(1,2,3);
$b = $a; // $b points to the same memory locations as $a
$b[1] = 4; // $b is copied and the modification is performed
Passing by reference might make a difference if the arrays are passed around between a lot of different methods and constantly modified in all these localized scopes. However, that would change the semantics of the application which means that this change is only possible if the arrays aren't being modified.
That means that there's no advantage at all to passing the arrays as references. Don't do it.
This article http://derickrethans.nl/files/phparch-php-variables-article.pdf suggests this proposed optimization won't make any difference given the way arrays are implemented, and may actually slow things down.

Categories