From what I understand, PHP stdClass objects are generally faster than arrays, when the code is deeply-nested enough for it to actually matter. How is that efficiency affected if I'm typecasting to define stdClass objects on the fly:
$var = (object)array('one' => 1, 'two' => 2);
If the code doing this is going to be executed many many times, will I be better off explicitly defining $var as an objects instead:
$var = new stdClass();
$var->one = 1;
$var->two = 2;
Is the difference negligible since I'll then be accessing $var as an object from there on, either way?
Edit:
A stdClass is the datatype I need here. I'm not concerned with whether I should use arrays or whether I should use stdClass objects; I'm more concerned with whether using the (object)array(....) shorthand of instantiating a stdClass is efficient. And yes, this is in code that will be executed potentially thousands of times.
You understand wrong. Objects are not "much faster" than arrays. In fact, the opposite is usually true, since arrays don't have the problem of inheritance (and visibility lookups). Sure, there may be specific cases where you can show a clear significant gain/loss, but in the generic case arrays will tend to be faster...
Use the tool that's semantically correct. Don't avoid using arrays because you think objects are faster. Use a construct when it makes sense. There are times when it makes sense to replace an array with an object (For example, when you want to enforce the types in the array strictly). But this is not one of them. And even at that, you wouldn't replace array() with STDClass, you'd replace it with a custom class (one that likely extends ArrayObject or at least implements Iterator and ArrayAccess interfaces). Different Data Structures exist for a reason. If it was better to use objects instead of arrays, wouldn't we all be using objects instead?
Don't worry about micro-optimizations like "is cast faster". It's almost always not. Write readable, correct code. Then optimize if you're having a problem...
function benchmark($func_name, $iterations) {
$begin = microtime(true);
for ($i = 0; $i < $iterations; $i++) {
$func_name();
}
$end = microtime(true);
$execution_time = $end - $begin;
echo $func_name , ': ' , $execution_time;
}
function standClass() {
$obj = new stdClass();
$obj->param_one = 1;
$obj->param_two = 2;
}
function castFromArray() {
$obj = (object)array('param_one' => 1, 'param_two' => 2);
}
benchmark('standClass', 1000);
benchmark('castFromArray', 1000);
benchmark('standClass', 100000);
benchmark('castFromArray', 100000);
Outputs:
standClass: 0.0045979022979736
castFromArray: 0.0053138732910156
standClass: 0.27266097068787
castFromArray: 0.20209217071533
Casting from an array to stdClass on the fly is around 30% more efficient, but the difference is still negligible until you know you will be performing the operation 100,000 times (and even then, you're only looking at a tenth of a second, at least on my machine).
So, in short, it doesn't really matter the vast majority of the time, but if it does, define the array in a single command and then type-cast it to an object. I definitely wouldn't spend time worrying about it unless you've identified the code in question as a bottleneck (and even then, focus on reducing your number of iterations if possible).
It's more important to chose the right data structure for the right job.
When you want to decide what to use, ask yourself: What does your data represent?
Are you representing a set, or a sequence? Then you should probably use an array.
Are you representing some thing with certain properties or behaviors? Then you should probably use an object.
Related
In C++ if you pass a large array to a function, you need to pass it by reference, so that it is not copied to the new function wasting memory. If you don't want it modified you pass it by const reference.
Can anyone verify that passing by reference will save me memory in PHP as well. I know PHP does not use addresses for references like C++ that is why I'm slightly uncertain. That is the question.
The following does not apply to objects, as it has been already stated here. Passing arrays and scalar values by reference will only save you memory if you plan on modifying the passed value, because PHP uses a copy-on-change (aka copy-on-write) policy. For example:
# $array will not be copied, because it is not modified.
function foo($array) {
echo $array[0];
}
# $array will be copied, because it is modified.
function bar($array) {
$array[0] += 1;
echo $array[0] + $array[1];
}
# This is how bar shoudl've been implemented in the first place.
function baz($array) {
$temp = $array[0] + 1;
echo $temp + $array[1];
}
# This would also work (passing the array by reference), but has a serious
#side-effect which you may not want, but $array is not copied here.
function foobar(&$array) {
$array[0] += 1;
echo $array[0] + $array[1];
}
To summarize:
If you are working on a very large array and plan on modifying it inside a function, you actually should use a reference to prevent it from getting copied, which can seriously decrease performance or even exhaust your memory limit.
If it is avoidable though (that is small arrays or scalar values), I'd always use functional-style approach with no side-effects, because as soon as you pass something by reference, you can never be sure what passed variable may hold after the function call, which sometimes can lead to nasty and hard-to-find bugs.
IMHO scalar values should never be passed by reference, because the performance impact can not be that big as to justify the loss of transparency in your code.
The short answer is use references when you need the functionality that they provide. Don't think of them in terms of memory usage or speed. Pass by reference is always going to be slower if the variable is read only.
Everything is passed by value, including objects. However, it's the handle of the object that is passed, so people often mistakenly call it by-reference because it looks like that.
Then what functionality does it provide? It gives you the ability to modify the variable in the calling scope:
class Bar {}
$bar = new Bar();
function by_val($o) { $o = null; }
function by_ref(&$o) { $o = null; }
by_val($bar); // $bar is still non null
by_ref($bar); // $bar is now null
So if you need such functionality (most often you do not), then use a reference. Otherwise, just pass by value.
Functions that look like this:
$foo = modify_me($foo);
sometimes are good candidates for pass-by-reference, but it should be absolutely clear that the function modifies the passed in variable. (And if such a function is useful, often it's because it really ought to just be part of some class modifying its own private data.)
In PHP :
objects are passed by reference1 -- always
arrays and scalars are passed by value by default ; and can be passed by reference, using an & in the function's declaration.
For the performance part of your question, PHP doesn't deal with that the same way as C/C++ ; you should read the following article : Do not use PHP references
1. Or that's what we usually say -- even if it's not "completely true" -- see Objects and references
I use often the function sizeof($var) on my web application, and I'd like to know if is better (in resources term) store this value in a new variable and use this one, or if it's better call/use every time that function; or maybe is indifferent :)
TLDR: it's better to set a variable, calling sizeof() only once. (IMO)
I ran some tests on the looping aspect of this small array:
$myArray = array("bill", "dave", "alex", "tom", "fred", "smith", "etc", "etc", "etc");
// A)
for($i=0; $i<10000; $i++) {
echo sizeof($myArray);
}
// B)
$sizeof = sizeof($myArray);
for($i=0; $i<10000; $i++) {
echo $sizeof;
}
With an array of 9 items:
A) took 0.0085 seconds
B) took 0.0049 seconds
With a array of 180 items:
A) took 0.0078 seconds
B) took 0.0043 seconds
With a array of 3600 items:
A) took 0.5-0.6 seconds
B) took 0.35-0.5 seconds
Although there isn't much of a difference, you can see that as the array grows, the difference becomes more and more. I think this has made me re-think my opinion, and say that from now on, I'll be setting the variable pre-loop.
Storing a PHP integer takes 68 bytes of memory. This is a small enough amount, that I think I'd rather worry about processing time than memory space.
In general, it is preferable to assign the result of a function you are likely to repeat to a variable.
In the example you suggested, the difference in processing code produced by this approach and the alternative (repeatedly calling the function) would be insignificant. However, where the function in question is more complex it would be better to avoid executing it repeatedly.
For example:
for($i=0; $i<10000; $i++) {
echo date('Y-m-d');
}
Executes in 0.225273 seconds on my server, while:
$date = date('Y-m-d');
for($i=0; $i<10000; $i++) {
echo $date;
}
executes in 0.134742 seconds. I know these snippets aren't quite equivalent, but you get the idea. Over many page loads by many users over many months or years, even a difference of this size can be significant. If we were to use some complex function, serious scalability issues could be introduced.
A main advantage of not assigning a return value to a variable is that you need one less line of code. In PHP, we can commonly do our assignment at the same time as invoking our function:
$sql = "SELECT...";
if(!$query = mysql_query($sql))...
...although this is sometimes discouraged for readability reasons.
In my view for the sake of consistency assigning return values to variables is broadly the better approach, even when performing simple functions.
If you are calling the function over and over, it is probably best to keep this info in a variable. That way the server doesn't have to keep processing the answer, it just looks it up. If the result is likely to change, however, it will be best to keep running the function.
Since you allocate a new variable, this will take a tiny bit more memory. But it might make your code a tiny bit more faster.
The troubles it bring, could be big. For example, if you include another file that applies the same trick, and both store the size in a var $sizeof, bad things might happen. Strange bugs, that happen when you don't expect it. Or you forget to add global $sizeof in your function.
There are so many possible bugs you introduce, for what? Since the speed gain is likely not measurable, I don't think it's worth it.
Unless you are calling this function a million times your "performance boost" will be negligible.
I do no think that it really matters. In a sense, you do not want to perform the same thing over and over again, but considering that it is sizeof(); unless it is a enormous array you should be fine either way.
I think, you should avoid constructs like:
for ($i = 0; $i < sizeof($array), $i += 1) {
// do stuff
}
For, sizeof will be executed every iteration, even though it is often not likely to change.
Whereas in constructs like this:
while(sizeof($array) > 0) {
if ($someCondition) {
$entry = array_pop($array);
}
}
You often have no choice but to calculate it every iteration.
After some work in C and Java I've been more and more annoyed by the wild west laws in PHP. What I really feel that PHP lacks is strict data types. The fact that string('0') == (int)0 == (boolean)false is one example.
You cannot rely on what the data type a function returns is. You can neither force arguments of a function to be of a specific type, which might lead to a non strict compare resulting in something unexpected. Everything can be taken care of, but it still opens up for unexpected bugs.
Is it good or bad practice to typecast arguments received for a method? And is it good to typecast the return?
IE
public function doo($foo, $bar) {
$foo = (int)$foo;
$bar = (float)$bar;
$result = $bar + $foo;
return (array)$result;
}
The example is quite stupid and I haven't tested it, but I think everyone gets the idea. Is there any reason for the PHP-god to convert data type as he wants, beside letting people that don't know of data types use PHP?
For better or worse, loose-typing is "The PHP Way". Many of the built-ins, and most of the language constructs, will operate on whatever types you give them -- silently (and often dangerously) casting them behind the scenes to make things (sort of) fit together.
Coming from a Java/C/C++ background myself, PHP's loose-typing model has always been a source of frustration for me. But through the years I've found that, if I have to write PHP I can do a better job of it (i.e. cleaner, safer, more testable code) by embracing PHP's "looseness", rather than fighting it; and I end up a happier monkey because of it.
Casting really is fundamental to my technique -- and (IMHO) it's the only way to consistently build clean, readable PHP code that handles mixed-type arguments in a well-understood, testable, deterministic way.
The main point (which you clearly understand as well) is that, in PHP, you can not simply assume that an argument is the type you expect it to be. Doing so, can have serious consequences that you are not likely to catch until after your app has gone to production.
To illustrate this point:
<?php
function displayRoomCount( $numBoys, $numGirls ) {
// we'll assume both args are int
// check boundary conditions
if( ($numBoys < 0) || ($numGirls < 0) ) throw new Exception('argument out of range');
// perform the specified logic
$total = $numBoys + $numGirls;
print( "{$total} people: {$numBoys} boys, and {$numGirls} girls \n" );
}
displayRoomCount(0, 0); // (ok) prints: "0 people: 0 boys, and 0 girls"
displayRoomCount(-10, 20); // (ok) throws an exception
displayRoomCount("asdf", 10); // (wrong!) prints: "10 people: asdf boys, and 10 girls"
One approach to solving this is to restrict the types that the function can accept, throwing an exception when an invalid type is detected. Others have mentioned this approach already. It appeals well to my Java/C/C++ aesthetics, and I followed this approach in PHP for years and years. In short, there's nothing wrong with it, but it does go against "The PHP Way", and after a while, that starts to feel like swimming up-stream.
As an alternative, casting provides a simple and clean way to ensure that the function behaves deterministically for all possible inputs, without having to write specific logic to handle each different type.
Using casting, our example now becomes:
<?php
function displayRoomCount( $numBoys, $numGirls ) {
// we cast to ensure that we have the types we expect
$numBoys = (int)$numBoys;
$numGirls = (int)$numGirls;
// check boundary conditions
if( ($numBoys < 0) || ($numGirls < 0) ) throw new Exception('argument out of range');
// perform the specified logic
$total = $numBoys + $numGirls;
print( "{$total} people: {$numBoys} boys, and {$numGirls} girls \n" );
}
displayRoomCount("asdf", 10); // (ok now!) prints: "10 people: 0 boys, and 10 girls"
The function now behaves as expected. In fact, it's easy to show that the function's behavior is now well-defined for all possible inputs. This is because the the cast operation is well-defined for all possible inputs; the casts ensure that we're always working with integers; and the rest of the function is written so as to be well-defined for all possible integers.
Rules for type-casting in PHP are documented here, (see the type-specific links mid-way down the page - eg: "Converting to integer").
This approach has the added benefit that the function will now behave in a way that is consistent with other PHP built-ins, and language constructs. For example:
// assume $db_row read from a database of some sort
displayRoomCount( $db_row['boys'], $db_row['girls'] );
will work just fine, despite the fact that $db_row['boys'] and $db_row['girls'] are actually strings that contain numeric values. This is consistent with the way that the average PHP developer (who does not know C, C++, or Java) will expect it to work.
As for casting return values: there is very little point in doing so, unless you know that you have a potentially mixed-type variable, and you want to always ensure that the return value is a specific type. This is more often the case at intermediate points in the code, rather than at the point where you're returning from a function.
A practical example:
<?php
function getParam( $name, $idx=0 ) {
$name = (string)$name;
$idx = (int)$idx;
if($name==='') return null;
if($idx<0) $idx=0;
// $_REQUEST[$name] could be null, or string, or array
// this depends on the web request that came in. Our use of
// the array cast here, lets us write generic logic to deal with them all
//
$param = (array)$_REQUEST[$name];
if( count($param) <= $idx) return null;
return $param[$idx];
}
// here, the cast is used to ensure that we always get a string
// even if "fullName" was missing from the request, the cast will convert
// the returned NULL value into an empty string.
$full_name = (string)getParam("fullName");
You get the idea.
There are a couple of gotcha's to be aware of
PHP's casting mechanism is not smart enough to optimize the "no-op" cast. So casting always causes a copy of the variable to be made. In most cases, this not a problem, but if you regularly use this approach, you should keep it in the back of your mind. Because of this, casting can cause unexpected issues with references and large arrays. See PHP Bug Report #50894 for more details.
In php, a whole number that is too large (or too small) to represent as an integer type, will automatically be represented as a float (or a double, if necessary). This means that the result of ($big_int + $big_int) can actually be a float, and if you cast it to an int the resulting number will be gibberish. So, if you're building functions that need to operate on large whole numbers, you should keep this in mind, and probably consider some other approach.
Sorry for the long post, but it's a topic that I've considered in depth, and through the years, I've accumulated quite a bit of knowledge (and opinion) about it. By putting it out here, I hope someone will find it helpful.
The next version of PHP (probably 5.4) will support scalar type hinting in arguments.
But apart from that: Dynamic type conversion really isn't something you should hate and avoid. Mostly it will work as expected. And if it doesn't, fix it by checking it is_* of some type, by using strict comparison, ..., ...
You can use type hinting for complex types. If you need to compare value + type you can use "===" for comparison.
(0 === false) => results in false
(0 == false) => results in true
Also you write return (array)$result; which makes no sense. What you want in this case is return array($result) if you want the return type to be an array.
I don't think it's bad, but I would go one step further: Use type hinting for complex types, and throw an exception if a simple type isn't one you expect. This way you make clients aware of any costs/problems with the cast (such as loss of precision going from int -> float or float -> int).
Your cast to array in the above code there though is misleading -- you should just create a new array containing the one value.
That all said, your example above becomes:
public function doo($foo, $bar) {
if (!is_int($foo)) throw new InvalidArgumentException();
if (!is_float($bar)) throw new InvalidArgumentException();
$result = $bar + $foo;
return array($result);
}
No, it's not good to typecast because you don't know what you'll have in the end. I would personally suggest using functions such as intval(), floatval(), etc.
I ask this question because i learned that in programming and designing, you must have a good reason for decisions. I am php learner and i am at a crossroad here, i am using simple incrementation to try to get what im asking across. I am certainly not here to start a debate about the pros/cons of referencing, but when it comes to php, which is the better programming practice:
function increment(&$param) {
++$param;
}
vs.
function increment($param){
return ++$param;
}
$param = increment($param);
First, references are not pointers.
I tried the code given by #John in his answer, but I got strange results. It turns out microtime() returns a string. Arithmetic is unreliable and I even got negative results on some runs. One should use microtime(true) to get the value as a float.
I added another test of no function call, just incrementing the variable:
<?php
$param = 1;
$start = microtime(true);
for($i = 1; $i <= 1000000; $i++) {
$param++;
}
$end = microtime(true);
echo "increment: " . ($end - $start) . "\n";
The results on my machine, Macbook 2.4GHz running PHP 5.3.2.
function call with pass by reference: 2.14 sec.
function call with pass by value: 2.26 sec.
no function call, just bare increment: 0.42 sec.
So there seems to be a 5.3% performance advantage to passing by reference, but there is a 81% performance advantage to avoiding the function call completely.
I guess the example of incrementing an integer is arbitrary, and the OP is really asking about the general advantage of passing by reference. But I'm just offering this example to demonstrate that the function call alone incurs far more overhead than the method of passing parameters.
So if you're trying to decide how to micro-optimize parameter passing, you're being penny wise and pound foolish.
There are also other reasons why you should avoid references. Though they can simplify several algorithms, especially when you are manipulating two or more data structures that must have the same underlying data:
They make functions have side-effects. You should, in general, avoid functions with side-effects, as they make the program more unpredictable (as in "OK, how this did this value get here? did any of the functions modify its parameters?")
They cause bugs. If you make a variable a reference, you must remember to unset it before assigning it a value, unless you want to change the underlying value of the reference set. This happens frequently after you run a foreach loop by reference and then re-use the loop variable.
It depends on what the functions purpose is. If its express purpose is to modify the input, use references. If the purpose is to compute some data based on the input and not to alter the input, by all means use a regular return.
Take for example array_push:
int array_push(array &$array, mixed $var[, mixed $...])
The express purpose of this function is to modify an array. It's unlikely that you need both the original array and a copy of it including the pushed values.
array_push($array, 42); // most likely use case
// if you really need both versions, just do:
$old = $array;
array_push($array, 42);
If array_push didn't take references, you'd need to do this:
// array_push($array, $var[, $...])
$array = array_push($array, 42); // quite a waste to do this every time
On the other hand, a purely computational function like pow should not modify the original value:
number pow(number $base, number $exp)
You are probably more likely to use this function in a context where you want to keep the original number intact and just compute a result based on it. In this case it would be a nuisance if pow modified the original number.
$x = 123;
$y = pow($x, 42); // most likely use case
If pow took references, you'd need to do this:
// pow(&$base, $exp)
$x = 123;
$y = $x; // nuisance
pow($y, 42);
The best practice is not to write a function when an expression will do.
$param++;
is all you need.
The second version:
function increment($param){
return $param++;
}
$param = increment($param);
Does nothing. increment() returns $param. Perhaps you meant ++$param or $param+1;
I mention this not to be pedantic, but so that if you compare timings, you are comparing the same function (it could be possible for PHP's optimizer to remove the function completely).
By definition, incrementing a variable's value (at least in PHP) mutates the variable. It doesn't take a value, increase it by 1 and return it; rather it changes the variable holding that value.
So your first example would be the better way to go, as it's taking a reference (PHP doesn't really do pointers per se) of $param and post-incrementing it.
I just ran a couple quick tests with the following code:
<?php
function increment(&$param){
$param++;
}
$param = 1;
$start = microtime();
for($i = 1; $i <= 1000000; $i++) {
increment($param);
}
$end = microtime();
echo $end - $start;
?>
This returned, consistently around .42 to .43, where as the following code returned about .55 to .59
<?php
function increment($param){
return $param++;
}
$param = 1;
$start = microtime();
for($i = 1; $i <= 1000000; $i++) {
$param = increment($param);
}
$end = microtime();
echo $end - $start;
?>
So I would say that the references are quicker, but only in extreme cases.
I think your example is a little abstract.
There is no problem with using pointers but in most real-world cases you are probably modifying an object not an int in which case you don't need the reference (in PHP5 at least).
I'd say it would quite depend on what you're doing. If you're trying to interact on a large set of data without wanting an extra copy of it in memory - go ahead, pass by value (your second example). If you want to save the memory, and interact on an object directly - pass by reference (your first example)
I'm trying to refactor a large, old project and one thing I've noticed is a range of different Iterator implementations:
while($iterator->moveNext()) {
$item = $iterator->current();
// do something with $item;
}
for($iterator = getIterator(), $iterator->HasNext()) {
$item = $iterator->Next();
// do something with $item
}
while($item = $iterator->fetch()) {
// do something with item
}
or even the StandardPHPLibrary (SPL) iterator which allows
foreach($iterator as $item) {
// do something with $item
}
Having so many different Iterators (with different methods for looping over collections) seems like a strong code smell, and I'm inclined to refactor everything to SPL.
Is there a compelling advantage to any of these implementations of Iterator, or is it purely a matter of personal taste?
The SPL version is definitely the way to go. Not only is it the easiest to read, but it's a part of PHP now, so will be familiar to many more people.
There's nothing "wrong" with the others, but as you stated, having all these different versions in one project isn't helping anyone.
Imo, simply utilising one or more of the SPL libraries as an interface tends to be less ugly in use at the front end. However, the backing behind the implementation can get a bit ugly.
For instance, I wrote an iterator that efficiently iterated a database result set, so that results that were never requested were never fetched from the request pointer, and if items were prematurely fetched ( IE: $obj[ 5 ] ) , it would seek all the required results into an internal buffer.
Worked wonderfully, you just pray that code that makes the magic behind the scenes never fails because it confuses people when they see you use something that looks like an array, and it does "magic" which can fail :)
Magic got people burnt at the stake. So use it carefully and wisely, possibly make it obvious how it works.
My personal preference is for the
for( $object as $i => $v )
notation for it is generally more consistent and predictable.
for( $dbresult->iterator() as $i => $v ){
}
style notation is functionally identical, but at least you have less guesswork how it works on the surface.