Will there be any measurable performance difference when passing data as values instead of as reference in PHP?
It seems like few people are aware of that variables can be passed as values instead of references. Is this common sense or not?
From my understanding, PHP 5 passes simple data types and arrays by value, but when it comes to objects it passes by reference. It seems this is a behaviour you should be aware of - I assume arrays are passed by value and therefore large ones may well incur a performance hit if you do not require a copy to be made.
I've seen plenty of arguments against passing by reference explicitly and letting PHP do its thing.
Also, if you want to pass an object by value then you should clone it, ideally.
If you are passing a large variable by value (which is the default for everything except objects in PHP5+), then yes, you can take a performance hit.
For example, if the user submits a large amount of POST data, if you were to pass that to a function normally (aka pass by value), the whole array would have to be copied, which would affect performance. However, unless you're on a very large-scale site, you probably won't notice the hit.
Pass by reference is possible in PHP, but certainly not the default (unless it's an object): you need to add an & before the variable to make it pass by reference, otherwise it's just by value (and copies it). As of PHP5, objects are passed by reference automatically, but before PHP5 you need to explicitly pass by reference (ie add the &)
If there is a performance difference, it's negligible. Don't worry about these sorts of micro-optimizations unless you know that passing by reference is causing a performance hit (except I can't imagine a situation where that is true).
On a side note, people generally advise against passing arguments by reference because it encourages bad design, much like using global variables does.
I'm not sure what you meant by the last part, though. PHP passes arguments by value by default.
Objects are always passed by reference if you use a recent version of PHP. As of the other type, the main concern are the strings / array.
For those it depends. PHP's implementation of strings makes that if you don't modify the string you are passing to the function's argument (you only read it / scan it), it never will be copied. The implementation is called "copy-on-write". I'm not sure about array, I'll need some test to answer this.
Unless you modify the passed by value string argument, there will be no difference with the passed by reference.
Related
Since PHP7.0, foreach loops have been modified and a new behavior has been set to make reference quicker than before (foreach $tab as &$val).
But I've read many times that passing by reference is actually sometimes slower than a the common loop (foreach $tab as $val)
Is it always quicker ? Always slower ? Does it depend on what we are doing in the foreach loop ? Since references in PHP are not references it's confusing... I'm a bit sick of reading different answers about this subject across the web and I cant figure out where the truth is.
Thank you for bringing some light here ;)
Source : https://blog.eduonix.com/web-programming-tutorials/learn-changes-foreach-statement-php-7/
Source : http://php.net/manual/fr/control-structures.foreach.php
Source : http://php.net/manual/en/language.references.arent.php
...
Executive Summary: You are worrying about a performance problem that does not exist.
Details
PHP uses "COW" (Copy On Write).
Normal passing ($val):
Scalar -- pass the value. Writing to it changes only the local copy.
Structure/array/etc -- pass pointer to it. If the receiver (foreach loop or function body) wants to change any part of the structure, the structure is copied. The cost is proportional to the size of the structure.
Pass by reference (&$val):
Scalar -- pass a pointer to the scalar so that the innards can actually change it.
Structure -- pass the pointer. If the receiver writes to the structure, it simply (efficiently) goes through the pointer to get to the item in the structure. No COW.
The wisdom in the design is that 99% of the time PHP "just works", and is efficient, and the user does not need to know that there are two passing techniques. The tricky part is that you need to understand when to do the &$val thing.
PHP is different than virtually all other languages.
ALGOL 60 -- Pass by value or "name" (sort of like anonymous functions in newish languages)
FORTRAN -- Only pass by "reference" (pointer). A drawback is that if you pass a literal, the receiver can change the value of the literal! (OK, that was a 'bug' baked into old compilers.)
C -- Only pass by value. But you could say (with syntax) that you want the pointer to the thing and then dereference on the inside.
Pascal -- Value or Reference, but not identical to anything above (I forget the details).
It seems there are almost as many ways to "pass arguments" as there are Languages.
Back to the Question:
If you are passing a scalar, there may be a tiny performance penalty by having &$val, and going through the pointer.
If you are passing a structure and don't need to write to it, there is probably zero difference.
If you are passing a structure and do need to write to it, then you should have decided on how to pass it based on whether you want the change to be preserved, not on performance.
The language designers are simply trying to handle the situation where the array elements are big, and to clarify directly in the source-code exactly what is to occur. You can now tell PHP that the variable, $val, is to contain a reference to the array-element, which means that the big-value won't be duplicated and the memory garbage-collector won't have more work to do, and the source-code clearly says so. Subsequent programmers who read this statement will know exactly what PHP is going to do.
Of course, references are naturally more-efficient than copying values around in memory.
It has been explained quite thoroughly that you only pass by reference in PHP if their is a technical reason to do so, because Copy-On-Write basically makes the performance equivalent. From what I understand, if it is never changed it never does copy the object.
But what if the function does change the variable, but your code never uses it again/does not use any part that is changed? it does not matter to the code if the original is changed or not. Yes, it is possible that the PHP optimiser takes this situation into account, but I have no reason to believe it does.
And passing a single reference is sure going to be a whole lot faster than copying a huge array or object.
So is this is good situation to pass by reference or not?
For Example, say you pass in a DomCrawler (not much more than a big [html formatted] string, except it is passed by reference implicitly in this specific case). Crawl a little and extract some information. In many situations you would not need that Crawler reset to its original position, as you are simply not using it again.
Also, imagine latter that we do use the DOMCrawler, we read the URI from it. The function did not change this, so passing by reference or value is still equivalent, but will passing by reference not be significantly more optimal? I think this situation would be very hard for any optimiser to spot.
So is this is good situation to pass by reference or not?
No.
Okay. Imagine you have a $bigString and you pass it to a function, the function modifies it and does something with it and the caller never wants it again. Passing by reference is initially faster since it avoids the copy. However, it's still a bad idea.
(1) If a different caller calls your function that does want to continue using that variable, things break. The reference violates encapsulation, basically.
(2) As soon as you have more than 1 non-reference variable outside the function refering to that value, merely creating the reference requires the copy again. (Variable values are held in containers that may be either a non-reference (copy-on-modify) or a reference (do nothing special on modify), so for reference variables and non-reference variables to try refer to that value at the same time, it has to be duplicated.)
(3) Because of the above, something as innocent as calling strlen within the function will have to duplicate the value, because strlen's parameter is passed by-value, which is the norm. Now imagine you call a few functions, such as substr, and maybe strlen in a loop, and you're making a new copy of the data every time.
(4) DDR3 RAM can shove around more than 10 GB per second and CPU cache RAM is goodness knows how fast. I think there are bigger things to worry about with PHP performance than how long a string or array copy takes.
Don't use references for superstitious performance gains. It never works.
If you really want to avoid the copy, the right way to do this is probably to put your function as a method of an object that looks after the variable:
class Thing {
private $bigString;
public function foo() {
$this->bigString[0] = 'x';
}
}
Then you avoid copying, get the benefits of encapsulation and none of the subtleties of references.
PS: DomCrawler is not a good example because it's an object. PHP objects are never copy-on-write anyway (well I think they are, but there is an additional level of indirection so the only part that is copy-on-write is a small pointer container, or something like that).
I've always avoided passing by reference for the same reason I avoid goto.
$a = myFunction($a);
Is more easily read and reused than myFunction(&$a);
From my understanding of the PHP system, everything is passed by "reference". So if you are passing around huge arrays or objects, they are always passed by "reference".
I put "reference" in quotes cause there are 2 different types here:
Explicit References is where you specify to php that you want it tracked as a reference
Implicit references is where you want it tracked as a value rather
PHP defaults to the implicit reference.
So there is no performance implications until such a time as you change an implicit reference. In this case PHP will allocate copy the values to separate memory addresses and update your reference.
If the compiler detects that the variable is no longer used or is no longer in scope, the GC will scoop it up.
When passing arguments to functions, the convention is to pass by value if the function is not supposed to change the value of that argument. We pass &byref only when the function is going to modify that variable.
On the other hand, we do know that when passing by reference, PHP works with a pointer, rather than duplicating a copy of the variable as in the case with passing by value.
This brings up a question in my mind, shouldn't it we pass certain variables to our functions by reference from time to time for speed and efficiency purposes - even though we need them not modified.
Without getting crazy with the idea amd totally abuse it, I'd like to put a frame of reference as to which variables or what kind of variables I'm talking about here.
They are mainly $dbh ( database handles ) and very large variables. to tell you the truth, just the $dbh's! really.
In your opinion, do you think this is a good practice or do you find it as one that should never be practiced.
Let's bring this matter down to a code snippet and comment off of that.
//assume $dbh is the database_handle for a mysql connection
for ($userID = 1; $userID <= 1000; $userID++) {
display_name ($dbh,"users",$userID)// outputs the name of the passed userid
}
here, should the display_name function take the $dbh by ref or by value?
PHP uses copy-on-write, which esentially means as long as the variable is not modified, passing it by value has the same effect as passing it by reference. That is to say, there will be no performance gain by using references in the scenario you describe (in fact, some report references can be slower)
They are mainly $dbh ( database handles ) and very large variables. to
tell you the truth, just the $dbh's! really.
What makes you think it is large? A "handle" usually means something like a pointer (or pointer to a pointer), a very small thing.
I have got a couple of functions that manipulate data in an array, for example unset_data() you pass it the value and an unlimited amount of string args like:
unset_data( $my_array, "firstname", "password" );
and it can handle multi-dimentional arrays etc, quite simple.
But should this function use a reference to the array and change it directly?
Or should i return the new array with the values unset.
I can never decide whether a function should use reference or not,
Is there like, specific cases or examples when and when to not use them??
Thanks
I'd ask myself what the expected use case of the function is. Does the typical use case involve keeping the original data intact and deriving new data from it, or is the explicit use case of this function to modify data in place?
Say md5 would modify data in place, that would be pretty inconvenient, since I usually want to keep the original data intact. So I'd always have to do this:
$hash = $data;
md5($hash);
instead of:
$hash = md5($data);
That's pretty ugly code, forced on you by the API of the function.
For unset though, I don't think the typical use case is for deriving new data:
$arr = unset($arr['foo']);
That seems pretty clunky as well as possibly a performance hit.
Generally speaking, it's better to return by value instead of taking a reference because:
It's the most common usage pattern (there's one less thing to keep in mind about this particular function)
You can create call chains freely, e.g. you can write array_filter(unset_data(...))
Generally speaking, code without side effects (I 'm calling the mutation of an argument in a manner visible to the caller a side effect) is easier to reason about
Most of the time, these advantages come at the cost of using up additional memory. Unless you have good reason (or better yet, proof) to believe that the additional memory consumption is going to be an issue, my advice is to just return the mutated value.
I feel that there is not a general you should/shouldn't answer to this question - it depends entirely on the usage case.
My personal feeling is leaning towards passing by reference, to keep it's behaviour more in line with the native unset(), but if you are likely to end up regularly having to make copies of the array before you call the function, then go with a return value. Another advantage of the by reference approach is that you can return some other information as well as achieving modification of the array - for example, you could return an integer describing how many values were removed from the array based on the arguments.
I don't think there is a solid argument for "best practice" with either option here, so the short answer would be:
Do whatever you are most comfortable with and whatever allows you to write the most concise, readable and self-documenting code.
This is not a theoretical or general question about when to use "pass by reference" vs "pass by value". Several questions of that type here, and NONE of them answer my very specific question:
In PHP, when passing a database connection resource ID to a function (just a simple procedural function, not any class or anything complicated) does it matter if I pass by reference or by value?
The connection itself does absolutely nothing. It is never changed. So both reference and value work fine.
My question I suppose is to figure out WHY I would pass it by reference. I have old inherited code, and the comments in the code suggest that passing by reference saves memory and speeds up performance. I wonder if this is true?
Btw I am using Postgresql connections, not MySQL.
Thanks!
There are no differences in that case because a resource variable is itself a pointer to an external script resources.
Just pass it by value.
You can check how PHP resources are user-friendly if you do echo $resourceVar;.
Will print something like: "Resource id #2"
A database connection is a "resource". It is usually just a pointer, so it doesn't matter whether you pass it by value or by reference. Resources behave like objects in this sense.
Passing by reference used to have a significant effect on memory usage in the past. That's why old code contains so many & operators. But this is no longer necessary. PHP 5 uses copy-on-write semantics for variables, so it doesn't cost any more memory to pass things by value in most circumstances. Especially if it's just a pointer like a resource... but this also applies to other types of variables such as string and arrays. (Try passing a 1MB string by value. Memory consumption will NOT go up by 1MB.)
For most uses, it doesn't matter at all (usually passing by value), unless you specifically want to change the database connection.
It works similar to objects, in the sense that even when passing an object by value you can still make changes to it because it's a pointer to the actual data.
So to answer the question, there is NO reason for you to pass by reference.