I ask this question because i learned that in programming and designing, you must have a good reason for decisions. I am php learner and i am at a crossroad here, i am using simple incrementation to try to get what im asking across. I am certainly not here to start a debate about the pros/cons of referencing, but when it comes to php, which is the better programming practice:
function increment(&$param) {
++$param;
}
vs.
function increment($param){
return ++$param;
}
$param = increment($param);
First, references are not pointers.
I tried the code given by #John in his answer, but I got strange results. It turns out microtime() returns a string. Arithmetic is unreliable and I even got negative results on some runs. One should use microtime(true) to get the value as a float.
I added another test of no function call, just incrementing the variable:
<?php
$param = 1;
$start = microtime(true);
for($i = 1; $i <= 1000000; $i++) {
$param++;
}
$end = microtime(true);
echo "increment: " . ($end - $start) . "\n";
The results on my machine, Macbook 2.4GHz running PHP 5.3.2.
function call with pass by reference: 2.14 sec.
function call with pass by value: 2.26 sec.
no function call, just bare increment: 0.42 sec.
So there seems to be a 5.3% performance advantage to passing by reference, but there is a 81% performance advantage to avoiding the function call completely.
I guess the example of incrementing an integer is arbitrary, and the OP is really asking about the general advantage of passing by reference. But I'm just offering this example to demonstrate that the function call alone incurs far more overhead than the method of passing parameters.
So if you're trying to decide how to micro-optimize parameter passing, you're being penny wise and pound foolish.
There are also other reasons why you should avoid references. Though they can simplify several algorithms, especially when you are manipulating two or more data structures that must have the same underlying data:
They make functions have side-effects. You should, in general, avoid functions with side-effects, as they make the program more unpredictable (as in "OK, how this did this value get here? did any of the functions modify its parameters?")
They cause bugs. If you make a variable a reference, you must remember to unset it before assigning it a value, unless you want to change the underlying value of the reference set. This happens frequently after you run a foreach loop by reference and then re-use the loop variable.
It depends on what the functions purpose is. If its express purpose is to modify the input, use references. If the purpose is to compute some data based on the input and not to alter the input, by all means use a regular return.
Take for example array_push:
int array_push(array &$array, mixed $var[, mixed $...])
The express purpose of this function is to modify an array. It's unlikely that you need both the original array and a copy of it including the pushed values.
array_push($array, 42); // most likely use case
// if you really need both versions, just do:
$old = $array;
array_push($array, 42);
If array_push didn't take references, you'd need to do this:
// array_push($array, $var[, $...])
$array = array_push($array, 42); // quite a waste to do this every time
On the other hand, a purely computational function like pow should not modify the original value:
number pow(number $base, number $exp)
You are probably more likely to use this function in a context where you want to keep the original number intact and just compute a result based on it. In this case it would be a nuisance if pow modified the original number.
$x = 123;
$y = pow($x, 42); // most likely use case
If pow took references, you'd need to do this:
// pow(&$base, $exp)
$x = 123;
$y = $x; // nuisance
pow($y, 42);
The best practice is not to write a function when an expression will do.
$param++;
is all you need.
The second version:
function increment($param){
return $param++;
}
$param = increment($param);
Does nothing. increment() returns $param. Perhaps you meant ++$param or $param+1;
I mention this not to be pedantic, but so that if you compare timings, you are comparing the same function (it could be possible for PHP's optimizer to remove the function completely).
By definition, incrementing a variable's value (at least in PHP) mutates the variable. It doesn't take a value, increase it by 1 and return it; rather it changes the variable holding that value.
So your first example would be the better way to go, as it's taking a reference (PHP doesn't really do pointers per se) of $param and post-incrementing it.
I just ran a couple quick tests with the following code:
<?php
function increment(&$param){
$param++;
}
$param = 1;
$start = microtime();
for($i = 1; $i <= 1000000; $i++) {
increment($param);
}
$end = microtime();
echo $end - $start;
?>
This returned, consistently around .42 to .43, where as the following code returned about .55 to .59
<?php
function increment($param){
return $param++;
}
$param = 1;
$start = microtime();
for($i = 1; $i <= 1000000; $i++) {
$param = increment($param);
}
$end = microtime();
echo $end - $start;
?>
So I would say that the references are quicker, but only in extreme cases.
I think your example is a little abstract.
There is no problem with using pointers but in most real-world cases you are probably modifying an object not an int in which case you don't need the reference (in PHP5 at least).
I'd say it would quite depend on what you're doing. If you're trying to interact on a large set of data without wanting an extra copy of it in memory - go ahead, pass by value (your second example). If you want to save the memory, and interact on an object directly - pass by reference (your first example)
Related
In C++ if you pass a large array to a function, you need to pass it by reference, so that it is not copied to the new function wasting memory. If you don't want it modified you pass it by const reference.
Can anyone verify that passing by reference will save me memory in PHP as well. I know PHP does not use addresses for references like C++ that is why I'm slightly uncertain. That is the question.
The following does not apply to objects, as it has been already stated here. Passing arrays and scalar values by reference will only save you memory if you plan on modifying the passed value, because PHP uses a copy-on-change (aka copy-on-write) policy. For example:
# $array will not be copied, because it is not modified.
function foo($array) {
echo $array[0];
}
# $array will be copied, because it is modified.
function bar($array) {
$array[0] += 1;
echo $array[0] + $array[1];
}
# This is how bar shoudl've been implemented in the first place.
function baz($array) {
$temp = $array[0] + 1;
echo $temp + $array[1];
}
# This would also work (passing the array by reference), but has a serious
#side-effect which you may not want, but $array is not copied here.
function foobar(&$array) {
$array[0] += 1;
echo $array[0] + $array[1];
}
To summarize:
If you are working on a very large array and plan on modifying it inside a function, you actually should use a reference to prevent it from getting copied, which can seriously decrease performance or even exhaust your memory limit.
If it is avoidable though (that is small arrays or scalar values), I'd always use functional-style approach with no side-effects, because as soon as you pass something by reference, you can never be sure what passed variable may hold after the function call, which sometimes can lead to nasty and hard-to-find bugs.
IMHO scalar values should never be passed by reference, because the performance impact can not be that big as to justify the loss of transparency in your code.
The short answer is use references when you need the functionality that they provide. Don't think of them in terms of memory usage or speed. Pass by reference is always going to be slower if the variable is read only.
Everything is passed by value, including objects. However, it's the handle of the object that is passed, so people often mistakenly call it by-reference because it looks like that.
Then what functionality does it provide? It gives you the ability to modify the variable in the calling scope:
class Bar {}
$bar = new Bar();
function by_val($o) { $o = null; }
function by_ref(&$o) { $o = null; }
by_val($bar); // $bar is still non null
by_ref($bar); // $bar is now null
So if you need such functionality (most often you do not), then use a reference. Otherwise, just pass by value.
Functions that look like this:
$foo = modify_me($foo);
sometimes are good candidates for pass-by-reference, but it should be absolutely clear that the function modifies the passed in variable. (And if such a function is useful, often it's because it really ought to just be part of some class modifying its own private data.)
In PHP :
objects are passed by reference1 -- always
arrays and scalars are passed by value by default ; and can be passed by reference, using an & in the function's declaration.
For the performance part of your question, PHP doesn't deal with that the same way as C/C++ ; you should read the following article : Do not use PHP references
1. Or that's what we usually say -- even if it's not "completely true" -- see Objects and references
Let us suppose that we have some problem and at least two solutions for it. And what we want to achieve - is to compare effectiveness for them. How to do this? Obviously, the best answer is: do tests. And I doubt there's a better way when it comes to language-specific questions (for example "what is faster for PHP: echo 'foo', 'bar' or echo('foo'.'bar')").
Ok, now we'll assume that if we want to test some code, it's equal to test some function. Why? Because we can wrap that code to function and pass it's context (if any) as it's parameters. Thus, all we need - is to have, for example, some benchmark function which will do all stuff. Here's very simple one:
function benchmark(callable $function, $args=null, $count=1)
{
$time = microtime(1);
for($i=0; $i<$count; $i++)
{
$result = is_array($args)?
call_user_func_array($function, $args):
call_user_func_array($function);
}
return [
'total_time' => microtime(1) - $time,
'average_time' => (microtime(1) - $time)/$count,
'count' => $count
];
}
-this will fit our issue and can be used to do comparative benchmarks. Under comparative I mean that we can use function above for code X, then for code Y and, after that, we can say that code X is Z% faster/slower than code Y.
The problem
Ok, so we can easily measure time. But what about memory? Our previous assumption "if we want to test some code, it's equal to test some function" seems to be not true here. Why? Because - it's true from formal point, but if we'll hide code inside function, we'll never be able to measure memory after that. Example:
function foo($x, $y)
{
$bar = array_fill(0, $y, str_repeat('bar', $x));
//do stuff
}
function baz($n)
{
//do stuff, resulting in $x, $y
$bee = foo($x, $y);
//do other stuff
}
-and we want to test baz - i.e. how much memory it will use. By 'how much' I mean 'how much will be maximum memory usage during execution of function'. And it is obvious that we can not act like when we were measuring time of execution - because we know nothing about function outside of it - it's a black box. If fact, we even can't be sure that function will be successfully executed (imagine what will happen if somehow $x and $y inside baz will be assigned as 1E6, for example). Thus, may be it isn't a good idea to wrap our code inside function. But what if code itself contains other functions/methods call?
My approach
My current idea is to create somehow a function, which will measure memory after each input code's line. That means something like this: let we have code
$x = foo();
echo($x);
$y = bar();
-and after doing some thing, measure function will do:
$memory = memory_get_usage();
$max = 0;
$x = foo();//line 1 of code
$memory = memory_get_usage()-$memory;
$max = $memory>$max:$memory:$max;
$memory = memory_get_usage();
echo($x);//second line of code
$memory = memory_get_usage()-$memory;
$max = $memory>$max:$memory:$max;
$memory = memory_get_usage();
$y = bar();//third line of code
$memory = memory_get_usage()-$memory;
$max = $memory>$max:$memory:$max;
$memory = memory_get_usage();
//our result is $max
-but that looks weird and also it does not answer a question - how to measure function memory usage.
Use-case
Use-case for this: in most case, complexity-theory can provide at least big-O estimation for certain code. But:
First, code can be huge - and I want to avoid it's manual analysis as long as possible. And that is why my current idea is bad: it can be applied, yes, but it will still manual work with code. And, more, to go deeper in code's structure I will need to apply it recursively: for example, after applying it for top-level I've found that some foo() function takes too much memory. What I will do? Yes, go to this foo() function, and.. repeat my analysis within it. And so on.
Second - as I've mentioned, there are some language-specific things that can be resolved only by doing tests. That is why having some automatic way like for time measurement is my goal.
Also, garbage collection is enabled. I am using PHP 5.5 (I believe this matters)
The question
How can we effectively measure memory usage of certain function? Is it achievable in PHP? May be it's possible with some simple code (like benchmark function for time measuring above)?
After #bwoebi proposed great idea with using ticks, I've done some researching. Now I have my answer with this class:
class Benchmark
{
private static $max, $memory;
public static function memoryTick()
{
self::$memory = memory_get_usage() - self::$memory;
self::$max = self::$memory>self::$max?self::$memory:self::$max;
self::$memory = memory_get_usage();
}
public static function benchmarkMemory(callable $function, $args=null)
{
declare(ticks=1);
self::$memory = memory_get_usage();
self::$max = 0;
register_tick_function('call_user_func_array', ['Benchmark', 'memoryTick'], []);
$result = is_array($args)?
call_user_func_array($function, $args):
call_user_func($function);
unregister_tick_function('call_user_func_array');
return [
'memory' => self::$max
];
}
}
//var_dump(Benchmark::benchmarkMemory('str_repeat', ['test',1E4]));
//var_dump(Benchmark::benchmarkMemory('str_repeat', ['test',1E3]));
-so it does exactly what I want:
It is a black box
It measures maximum used memory for passed function
It is independent from context
Now, some background. In PHP, declaring ticks is possible from inside function and we can use callback for register_tick_function(). So my though was - to use anonymous function which will use local context of my benchmark function. And I've successfully created that. However, I don't want to affect global context and so I want unregister ticks handler with unregister_tick_function(). And that is where troubles are: this function expects string to be passed. So you can not unregister tick handler, which is closure (since it will try to stringify it which will cause fatal error because there's no __toString() method in Closure class in PHP). Why is it so? It's nothing else, but a bug. I hope fix will be done soon.
What are other options? The most easy option that I had in mind was using global variables. But they are weird and also it is side-effect which I want to avoid. I don't want to affect context. But, really, we can wrap all that we need in some class and then invoke tick function via call_user_func_array(). And call_user_func_array is just string, so we can overcome this buggy PHP behavior and do the whole stuff succesfully.
Update: I've implemented measurement tool from this. I've added time measurement and custom callback-defined measurement there. Feel free to use it.
Update: Bug, mentioned in this answer, is now fixed, so there's no need in trick with call_user_func(), registered as tick function. Now closure can be created and used directly.
Update: Due to feature request, I've added composer package for this measurement tool.
declare(ticks=1); // should be placed before any further file loading happens
That should say already all what I will say.
Use a tick handler and print on every execution the memory usage to a file with the file line with:
function tick_handler() {
$mem = memory_get_usage();
$bt = debug_backtrace(DEBUG_BACKTRACE_IGNORE_ARGS, 2)[0];
fwrite($file, $bt["file"].":".$bt["line"]."\t".$mem."\n");
}
register_tick_function('tick_handler'); // or in class: ([$this, 'tick_handler']);
Then look at the file to see how the memory varies in time, line by line.
You also can parse that file later by a separate program to analyse peaks etc.
(And to see how the possible peaks are by calling internal functions, you need to store the results into a variable, else it'll be already freed before the tick handler will measure the memory)
You can use the XDebug and a patch for XDebug which provides memory usage information
if this is not possible, you can always use memory_get_peak_usage() which i think would fit better than memory_get_usage()
This might not be exactly what you are looking for, but you could probably use XDebug to get at that information.
Just stumbled across
http://3v4l.org/
Although they don't provide details on how the benchmarks, respectively the taking of measures is implemented - don't think many people have over 100 PHP versions running in parallel on a VM beneath their desk ;)
Consider the following PHP Code:
//Method 1
$array = array(1,2,3,4,5);
foreach($array as $i=>$number){
$number++;
$array[$i] = $number;
}
print_r($array);
//Method 2
$array = array(1,2,3,4,5);
foreach($array as &$number){
$number++;
}
print_r($array);
Both methods accomplish the same task, one by assigning a reference and another by re-assigning based on key. I want to use good programming techniques in my work and I wonder which method is the better programming practice? Or is this one of those it doesn't really matter things?
Since the highest scoring answer states that the second method is better in every way, I feel compelled to post an answer here. True, looping by reference is more performant, but it isn't without risks/pitfalls.
Bottom line, as always: "Which is better X or Y", the only real answers you can get are:
It depends on what you're after/what you're doing
Oh, both are OK, if you know what you're doing
X is good for Such, Y is better for So
Don't forget about Z, and even then ...("which is better X, Y or Z" is the same question, so the same answers apply: it depends, both are ok if...)
Be that as it may, as Orangepill showed, the reference-approach offers better performance. In this case, the tradeoff one of performance vs code that is less error-prone, easier to read/maintan. In general, it's considered better to go for safer, more reliable, and more maintainable code:
'Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it.' — Brian Kernighan
I guess that means the first method has to be considered best practice. But that doesn't mean the second approach should be avoided at all time, so what follows here are the downsides, pitfalls and quirks that you'll have to take into account when using a reference in a foreach loop:
Scope:
For a start, PHP isn't truly block-scoped like C(++), C#, Java, Perl or (with a bit of luck) ECMAScript6... That means that the $value variable will not be unset once the loop has finished. When looping by reference, this means a reference to the last value of whatever object/array you were iterating is floating around. The phrase "an accident waiting to happen" should spring to mind.
Consider what happens to $value, and subsequently $array, in the following code:
$array = range(1,10);
foreach($array as &$value)
{
$value++;
}
echo json_encode($array);
$value++;
echo json_encode($array);
$value = 'Some random value';
echo json_encode($array);
The output of this snippet will be:
[2,3,4,5,6,7,8,9,10,11]
[2,3,4,5,6,7,8,9,10,12]
[2,3,4,5,6,7,8,9,10,"Some random value"]
In other words, by reusing the $value variable (which references the last element in the array), you're actually manipulating the array itself. This makes for error-prone code, and difficult debugging. As opposed to:
$array = range(1,10);
$array[] = 'foobar';
foreach($array as $k => $v)
{
$array[$k]++;//increments foobar, to foobas!
if ($array[$k] === ($v +1))//$v + 1 yields 1 if $v === 'foobar'
{//so 'foobas' === 1 => false
$array[$k] = $v;//restore initial value: foobar
}
}
Maintainability/idiot-proofness:
Of course, you might say that the dangling reference is an easy fix, and you'd be right:
foreach($array as &$value)
{
$value++;
}
unset($value);
But after you've written your first 100 loops with references, do you honestly believe you won't have forgotten to unset a single reference? Of course not! It's so uncommon to unset variables that have been used in a loop (we assume the GC will take care of it for us), so most of the time, you don't bother. When references are involved, this is a source of frustration, mysterious bug-reports, or traveling values, where you're using complex nested loops, possibly with multiple references... The horror, the horror.
Besides, as time passes, who's to say that the next person working on your code won't foget about unset? Who knows, he might not even know about references, or see your numerous unset calls and deem them redundant, a sign of your being paranoid, and delete them all together. Comments alone won't help you: they need to be read, and everyone working with your code should be thoroughly briefed, perhaps have them read a full article on the subject. The examples listed in the linked article are bad, but I've seen worse, still:
foreach($nestedArr as &$array)
{
if (count($array)%2 === 0)
{
foreach($array as &$value)
{//pointless, but you get the idea...
$value = array($value, 'Part of even-length array');
}
//$value now references the last index of $array
}
else
{
$value = array_pop($array);//assigns new value to var that might be a reference!
$value = is_numeric($value) ? $value/2 : null;
array_push($array, $value);//congrats, X-references ==> traveling value!
}
}
This is a simple example of a traveling value problem. I did not make this up, BTW, I've come across code that boils down to this... honestly. Quite apart from spotting the bug, and understanding the code (which has been made more difficult by the references), it's still quite obvious in this example, mainly because it's a mere 15 lines long, even using the spacious Allman coding style... Now imagine this basic construct being used in code that actually does something even slightly more complex, and meaningful. Good luck debugging that.
side-effects:
It's often said that functions shouldn't have side-effects, because side-effects are (rightfully) considered to be code-smell. Though foreach is a language construct, and not a function, in your example, the same mindset should apply. When using too many references, you're being too clever for your own good, and might find yourself having to step through a loop, just to know what is being referenced by what variable, and when.
The first method hasn't got this problem: you have the key, so you know where you are in the array. What's more, with the first method, you can perform any number of operations on the value, without changing the original value in the array (no side-effects):
function recursiveFunc($n, $max = 10)
{
if (--$max)
{
return $n === 1 ? 10-$max : recursiveFunc($n%2 ? ($n*3)+1 : $n/2, $max);
}
return null;
}
$array = range(10,20);
foreach($array as $k => $v)
{
$v = recursiveFunc($v);//reassigning $v here
if ($v !== null)
{
$array[$k] = $v;//only now, will the actual array change
}
}
echo json_encode($array);
This generates the output:
[7,11,12,13,14,15,5,17,18,19,8]
As you can see, the first, seventh and tenth elements have been altered, the others haven't. If we were to rewrite this code using a loop by reference, the loop looks a lot smaller, but the output will be different (we have a side-effect):
$array = range(10,20);
foreach($array as &$v)
{
$v = recursiveFunc($v);//Changes the original array...
//granted, if your version permits it, you'd probably do:
$v = recursiveFunc($v) ?: $v;
}
echo json_encode($array);
//[7,null,null,null,null,null,5,null,null,null,8]
To counter this, we'll either have to create a temporary variable, or call the function tiwce, or add a key, and recalculate the initial value of $v, but that's just plain stupid (that's adding complexity to fix what shouldn't be broken):
foreach($array as &$v)
{
$temp = recursiveFunc($v);//creating copy here, anyway
$v = $temp ? $temp : $v;//assignment doesn't require the lookup, though
}
//or:
foreach($array as &$v)
{
$v = recursiveFunc($v) ? recursiveFunc($v) : $v;//2 calls === twice the overhead!
}
//or
$base = reset($array);//get the base value
foreach($array as $k => &$v)
{//silly combine both methods to fix what needn't be a problem to begin with
$v = recursiveFunc($v);
if ($v === 0)
{
$v = $base + $k;
}
}
Anyway, adding branches, temp variables and what have you, rather defeats the point. For one, it introduces extra overhead which will eat away at the performance benefits references gave you in the first place.
If you have to add logic to a loop, to fix something that shouldn't need fixing, you should step back, and think about what tools you're using. 9/10 times, you chose the wrong tool for the job.
The last thing that, to me at least, is a compelling argument for the first method is simple: readability. The reference-operator (&) is easily overlooked if you're doing some quick fixes, or try to add functionality. You could be creating bugs in the code that was working just fine. What's more: because it was working fine, you might not test the existing functionality as thoroughly because there were no known issues.
Discovering a bug that went into production, because of your overlooking an operator might sound silly, but you wouldn't be the first to have encountered this.
Note:
Passing by reference at call-time has been removed since 5.4. Be weary of features/functionality that is subject to changes. a standard iteration of an array hasn't changed in years. I guess it's what you could call "proven technology". It does what it says on the tin, and is the safer way of doing things. So what if it's slower? If speed is an issue, you can optimize your code, and introduce references to your loops then.
When writing new code, go for the easy-to-read, most failsafe option. Optimization can (and indeed should) wait until everything's tried and tested.
And as always: premature optimization is the root of all evil. And Choose the right tool for the job, not because it's new and shiny.
As far as performance is concerned Method 2 is better, especially if you either have a large array and/or are using string keys.
While both methods use the same amount of memory the first method requires the array to be searched, even though this search is done by a index the lookup has some overhead.
Given this test script:
$array = range(1, 1000000);
$start = microtime(true);
foreach($array as $k => $v){
$array[$k] = $v+1;
}
echo "Method 1: ".((microtime(true)-$start));
echo "\n";
$start = microtime(true);
foreach($array as $k => &$v){
$v+=1;
}
echo "Method 2: ".((microtime(true)-$start));
The average output is
Method 1: 0.72429609298706
Method 2: 0.22671484947205
If I scale back the test to only run ten times instead of 1 million I get results like
Method 1: 3.504753112793E-5
Method 2: 1.2874603271484E-5
With string keys the performance difference is more pronounced.
So running.
$array = array();
for($x = 0; $x<1000000; $x++){
$array["num".$x] = $x+1;
}
$start = microtime(true);
foreach($array as $k => $v){
$array[$k] = $v+1;
}
echo "Method 1: ".((microtime(true)-$start));
echo "\n";
$start = microtime(true);
foreach($array as $k => &$v){
$v+=1;
}
echo "Method 2: ".((microtime(true)-$start));
Yields performance like
Method 1: 0.90371179580688
Method 2: 0.2799870967865
This is because searching by string key has more overhead then the array index.
It is also worth noting that as suggested in Elias Van Ootegem's Answer to properly clean up after yourself you should unset the reference after the loop has completed. I.e. unset($v); And the performance gains should be measured against the loss in readability.
There are some minor performance differences, but they aren't going to have any significant effect.
I would choose the first option for two reasons:
It's more readable. This is a bit of a personal preference, but at first glance, it's not immediately obvious to me that $number++ is updating the array. By explicitly using something like $array[$i]++, it's much clearer, and less likely to cause confusion when you come back to this code in a year.
It doesn't leave you with a dangling reference to the last item in the array. Consider this code:
$array = array(1,2,3,4,5);
foreach($array as &$number){
$number++;
}
// ... some time later in an unrelated section of code
$number = intval("100");
// now unexpectedly, $array[4] == 100 instead of 6
I guess that depends. Do you care more about code readability/maintainability or minimizing memory usage. The second method would use slightly less memory, but I would honestly prefere the first usage, as assigned by reference in foreach definition does not seem to be commonplace practice in PHP.
Personally if I wanted to modify an array in place like this I would go with a third option:
array_walk($array, function(&$value) {
$value++;
});
The first method will be insignificantly slower, because each time it will go through the loop, it will assign a new value to the $number variable. The second method uses the variable directly so it doesn't need to assign a new value for each loop.
But, as I said, the difference is not significant, the main thing to consider is readability.
In my opinion, the first method makes more sense when you don't need to modify the value in the loop, the $number variable would only be read.
The second method makes more sense when you need to modify the $number variable often, as you don't need to repeat the key each time you want to modify it, and it is more readable.
Have you considered array_map? It is designed to change values inside arrays.
$array = array(1,2,3,4,5);
$new = array_map(function($number){
return $number++ ;
}, $array) ;
var_dump($new) ;
I'd choose #2, but it's a personal preference.
I disagree with the other answers, using references to array items in foreach loops is quite common, but it depends on the framework you're using. As always, try to follow existing coding conventions in your project or framework.
I also disagree with the other answers that suggest array_map or array_walk. These introduce the overhead of a function call for each array element. For small arrays, this won't be significant, but for large arrays, this will add a significant overhead for such a simple function. However, they are appropriate if you're performing more significant calculations or actions - you'll need to decide which to use depending on the scenario, perhaps by benchmarking.
Most of the answers interpreted your question to be about performance.
This is not what you asked. What you asked is:
I wonder which method is the better programming practice?
As you said, both do the same thing. Both work. In the end, better is often a matter of opinion.
Or is this one of those it doesn't really matter things?
I wouldn't go so far as to say it doesn't matter. As you can see there can be performance considerations for Method 1 and reference gotchas for Method 2.
I can say what matters more is readability and consistency. While there are dozens of ways to increment array elements in PHP, some look like line noise or code golf.
Ensuring your code is readable to future developers and you consistently apply your method of solving problems is a far better macro programming practice than whatever micro differences exist in this foreach code.
I use often the function sizeof($var) on my web application, and I'd like to know if is better (in resources term) store this value in a new variable and use this one, or if it's better call/use every time that function; or maybe is indifferent :)
TLDR: it's better to set a variable, calling sizeof() only once. (IMO)
I ran some tests on the looping aspect of this small array:
$myArray = array("bill", "dave", "alex", "tom", "fred", "smith", "etc", "etc", "etc");
// A)
for($i=0; $i<10000; $i++) {
echo sizeof($myArray);
}
// B)
$sizeof = sizeof($myArray);
for($i=0; $i<10000; $i++) {
echo $sizeof;
}
With an array of 9 items:
A) took 0.0085 seconds
B) took 0.0049 seconds
With a array of 180 items:
A) took 0.0078 seconds
B) took 0.0043 seconds
With a array of 3600 items:
A) took 0.5-0.6 seconds
B) took 0.35-0.5 seconds
Although there isn't much of a difference, you can see that as the array grows, the difference becomes more and more. I think this has made me re-think my opinion, and say that from now on, I'll be setting the variable pre-loop.
Storing a PHP integer takes 68 bytes of memory. This is a small enough amount, that I think I'd rather worry about processing time than memory space.
In general, it is preferable to assign the result of a function you are likely to repeat to a variable.
In the example you suggested, the difference in processing code produced by this approach and the alternative (repeatedly calling the function) would be insignificant. However, where the function in question is more complex it would be better to avoid executing it repeatedly.
For example:
for($i=0; $i<10000; $i++) {
echo date('Y-m-d');
}
Executes in 0.225273 seconds on my server, while:
$date = date('Y-m-d');
for($i=0; $i<10000; $i++) {
echo $date;
}
executes in 0.134742 seconds. I know these snippets aren't quite equivalent, but you get the idea. Over many page loads by many users over many months or years, even a difference of this size can be significant. If we were to use some complex function, serious scalability issues could be introduced.
A main advantage of not assigning a return value to a variable is that you need one less line of code. In PHP, we can commonly do our assignment at the same time as invoking our function:
$sql = "SELECT...";
if(!$query = mysql_query($sql))...
...although this is sometimes discouraged for readability reasons.
In my view for the sake of consistency assigning return values to variables is broadly the better approach, even when performing simple functions.
If you are calling the function over and over, it is probably best to keep this info in a variable. That way the server doesn't have to keep processing the answer, it just looks it up. If the result is likely to change, however, it will be best to keep running the function.
Since you allocate a new variable, this will take a tiny bit more memory. But it might make your code a tiny bit more faster.
The troubles it bring, could be big. For example, if you include another file that applies the same trick, and both store the size in a var $sizeof, bad things might happen. Strange bugs, that happen when you don't expect it. Or you forget to add global $sizeof in your function.
There are so many possible bugs you introduce, for what? Since the speed gain is likely not measurable, I don't think it's worth it.
Unless you are calling this function a million times your "performance boost" will be negligible.
I do no think that it really matters. In a sense, you do not want to perform the same thing over and over again, but considering that it is sizeof(); unless it is a enormous array you should be fine either way.
I think, you should avoid constructs like:
for ($i = 0; $i < sizeof($array), $i += 1) {
// do stuff
}
For, sizeof will be executed every iteration, even though it is often not likely to change.
Whereas in constructs like this:
while(sizeof($array) > 0) {
if ($someCondition) {
$entry = array_pop($array);
}
}
You often have no choice but to calculate it every iteration.
From what I understand, PHP stdClass objects are generally faster than arrays, when the code is deeply-nested enough for it to actually matter. How is that efficiency affected if I'm typecasting to define stdClass objects on the fly:
$var = (object)array('one' => 1, 'two' => 2);
If the code doing this is going to be executed many many times, will I be better off explicitly defining $var as an objects instead:
$var = new stdClass();
$var->one = 1;
$var->two = 2;
Is the difference negligible since I'll then be accessing $var as an object from there on, either way?
Edit:
A stdClass is the datatype I need here. I'm not concerned with whether I should use arrays or whether I should use stdClass objects; I'm more concerned with whether using the (object)array(....) shorthand of instantiating a stdClass is efficient. And yes, this is in code that will be executed potentially thousands of times.
You understand wrong. Objects are not "much faster" than arrays. In fact, the opposite is usually true, since arrays don't have the problem of inheritance (and visibility lookups). Sure, there may be specific cases where you can show a clear significant gain/loss, but in the generic case arrays will tend to be faster...
Use the tool that's semantically correct. Don't avoid using arrays because you think objects are faster. Use a construct when it makes sense. There are times when it makes sense to replace an array with an object (For example, when you want to enforce the types in the array strictly). But this is not one of them. And even at that, you wouldn't replace array() with STDClass, you'd replace it with a custom class (one that likely extends ArrayObject or at least implements Iterator and ArrayAccess interfaces). Different Data Structures exist for a reason. If it was better to use objects instead of arrays, wouldn't we all be using objects instead?
Don't worry about micro-optimizations like "is cast faster". It's almost always not. Write readable, correct code. Then optimize if you're having a problem...
function benchmark($func_name, $iterations) {
$begin = microtime(true);
for ($i = 0; $i < $iterations; $i++) {
$func_name();
}
$end = microtime(true);
$execution_time = $end - $begin;
echo $func_name , ': ' , $execution_time;
}
function standClass() {
$obj = new stdClass();
$obj->param_one = 1;
$obj->param_two = 2;
}
function castFromArray() {
$obj = (object)array('param_one' => 1, 'param_two' => 2);
}
benchmark('standClass', 1000);
benchmark('castFromArray', 1000);
benchmark('standClass', 100000);
benchmark('castFromArray', 100000);
Outputs:
standClass: 0.0045979022979736
castFromArray: 0.0053138732910156
standClass: 0.27266097068787
castFromArray: 0.20209217071533
Casting from an array to stdClass on the fly is around 30% more efficient, but the difference is still negligible until you know you will be performing the operation 100,000 times (and even then, you're only looking at a tenth of a second, at least on my machine).
So, in short, it doesn't really matter the vast majority of the time, but if it does, define the array in a single command and then type-cast it to an object. I definitely wouldn't spend time worrying about it unless you've identified the code in question as a bottleneck (and even then, focus on reducing your number of iterations if possible).
It's more important to chose the right data structure for the right job.
When you want to decide what to use, ask yourself: What does your data represent?
Are you representing a set, or a sequence? Then you should probably use an array.
Are you representing some thing with certain properties or behaviors? Then you should probably use an object.