Which implementation of Iterator should I use in PHP, and why? - php

I'm trying to refactor a large, old project and one thing I've noticed is a range of different Iterator implementations:
while($iterator->moveNext()) {
$item = $iterator->current();
// do something with $item;
}
for($iterator = getIterator(), $iterator->HasNext()) {
$item = $iterator->Next();
// do something with $item
}
while($item = $iterator->fetch()) {
// do something with item
}
or even the StandardPHPLibrary (SPL) iterator which allows
foreach($iterator as $item) {
// do something with $item
}
Having so many different Iterators (with different methods for looping over collections) seems like a strong code smell, and I'm inclined to refactor everything to SPL.
Is there a compelling advantage to any of these implementations of Iterator, or is it purely a matter of personal taste?

The SPL version is definitely the way to go. Not only is it the easiest to read, but it's a part of PHP now, so will be familiar to many more people.
There's nothing "wrong" with the others, but as you stated, having all these different versions in one project isn't helping anyone.

Imo, simply utilising one or more of the SPL libraries as an interface tends to be less ugly in use at the front end. However, the backing behind the implementation can get a bit ugly.
For instance, I wrote an iterator that efficiently iterated a database result set, so that results that were never requested were never fetched from the request pointer, and if items were prematurely fetched ( IE: $obj[ 5 ] ) , it would seek all the required results into an internal buffer.
Worked wonderfully, you just pray that code that makes the magic behind the scenes never fails because it confuses people when they see you use something that looks like an array, and it does "magic" which can fail :)
Magic got people burnt at the stake. So use it carefully and wisely, possibly make it obvious how it works.
My personal preference is for the
for( $object as $i => $v )
notation for it is generally more consistent and predictable.
for( $dbresult->iterator() as $i => $v ){
}
style notation is functionally identical, but at least you have less guesswork how it works on the surface.

Related

Functional Programming - Return Transformed array and the count of the array without calculating twice

I'm trying to write more functional code in PHP without any helper libraries.
I need to return some JSON that includes the results of a transformed array AND the count of that array (for convenience on the data consumer end). Since you're not supposed to use variables in FP, I'm stumped on how to get the count of the array without recalculating/remapping the array.
Here's an example of what my code currently looks like:
$duplicates = array_filter( get_results(), 'find_duplicates' );
send_json( array(
"duplicates" => $duplicates,
"numDuplicates" => count( $duplicates )
) );
How can I do the same without storing the results of the filter in a temporary variable to avoid running array_filter() twice?
But first, acknowledge the following...
"Since you're not supposed to use variables in FP..." – that's a ludicrous understanding of functional programming. Variables are used constantly in functional programs. I'm guessing you saw point-free functional programs and then imagined that every program can be expressed in such a way...
the receiver of the JSON could easily get the number of duplicates using JSON.parse(json).duplicates.length because every Array in JavaScript has a length property – it's arguably silly to attach a numDuplicates in the first place. Anyway, let's assume your consumer has a specific API that requires the numDuplicates field...
functional programming is concerned with things like function purity – maybe you've simplified your code in your post (which is bad; don't do that) or that is in fact your actual code. In such a case, get_results() and send_json functions are impure; send_json has an obvious (but unknown) side effect (the return value is not used) — You ask for a functional solution but you have other outstanding non-functional code... so...
There's nothing wrong with the code you have. Sometimes removing a point (variable, or argument), it hurts the readability of the code. In your case, this code is perfectly legible. It is at this point that I feel you're only trying to shorten the code or make it more clever. Your intention is to improve it, but I think you'd actually harm it in this case.
What if I told you...
a variable assignment can be replaced with a lambda? 0_0
(function ($duplicates) {
send_json([
'duplicates' => $duplicates,
'numDuplicates' => count($duplicates)
});
}) (array_filter(get_results(), 'find_duplicates'));
But that made the code longer.. and there's added abstraction which hurts readability T_T In this case, using a normal variable assignment (as in your original code) would've been much better
Combinators
OK, so what if you had some combinators at your disposal to massage the data into the desired shape?
function apply (...$xs) {
return function ($f) use ($xs) {
return call_user_func($f, ...$xs);
};
}
function identity ($x) { return $x; }
// hey look, mom! no points!
send_json(
array_combine(
['duplicates', 'numDuplicates'],
array_map(
apply(
array_filter(get_results(), 'find_duplicates')),
['identity', 'count'])));
Did we achieve anything other than writing the weirdest PHP you or anyone else has probably seen? Not to mention, the input is strangely nested in the middle of the expression...
remarks
I'm nearly certain that you'll be disappointed with this answer (or disagree with me), but I'm also pretty confident that you're not sure what you're looking for. A guess: you saw functional programming that "doesn't use variables" and assumed that's how all programs can and should be written; but that's just not the case. Sometimes using a variable or two can dramatically improve the readability of a given expression.
Anyway, all of this is truly beside the point because attaching numDuplicates is arguably an anti-pattern in JSON anyway (point #2 above).

What is better in a foreach loop... using the & symbol or reassigning based on key?

Consider the following PHP Code:
//Method 1
$array = array(1,2,3,4,5);
foreach($array as $i=>$number){
$number++;
$array[$i] = $number;
}
print_r($array);
//Method 2
$array = array(1,2,3,4,5);
foreach($array as &$number){
$number++;
}
print_r($array);
Both methods accomplish the same task, one by assigning a reference and another by re-assigning based on key. I want to use good programming techniques in my work and I wonder which method is the better programming practice? Or is this one of those it doesn't really matter things?
Since the highest scoring answer states that the second method is better in every way, I feel compelled to post an answer here. True, looping by reference is more performant, but it isn't without risks/pitfalls.
Bottom line, as always: "Which is better X or Y", the only real answers you can get are:
It depends on what you're after/what you're doing
Oh, both are OK, if you know what you're doing
X is good for Such, Y is better for So
Don't forget about Z, and even then ...("which is better X, Y or Z" is the same question, so the same answers apply: it depends, both are ok if...)
Be that as it may, as Orangepill showed, the reference-approach offers better performance. In this case, the tradeoff one of performance vs code that is less error-prone, easier to read/maintan. In general, it's considered better to go for safer, more reliable, and more maintainable code:
'Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it.' — Brian Kernighan
I guess that means the first method has to be considered best practice. But that doesn't mean the second approach should be avoided at all time, so what follows here are the downsides, pitfalls and quirks that you'll have to take into account when using a reference in a foreach loop:
Scope:
For a start, PHP isn't truly block-scoped like C(++), C#, Java, Perl or (with a bit of luck) ECMAScript6... That means that the $value variable will not be unset once the loop has finished. When looping by reference, this means a reference to the last value of whatever object/array you were iterating is floating around. The phrase "an accident waiting to happen" should spring to mind.
Consider what happens to $value, and subsequently $array, in the following code:
$array = range(1,10);
foreach($array as &$value)
{
$value++;
}
echo json_encode($array);
$value++;
echo json_encode($array);
$value = 'Some random value';
echo json_encode($array);
The output of this snippet will be:
[2,3,4,5,6,7,8,9,10,11]
[2,3,4,5,6,7,8,9,10,12]
[2,3,4,5,6,7,8,9,10,"Some random value"]
In other words, by reusing the $value variable (which references the last element in the array), you're actually manipulating the array itself. This makes for error-prone code, and difficult debugging. As opposed to:
$array = range(1,10);
$array[] = 'foobar';
foreach($array as $k => $v)
{
$array[$k]++;//increments foobar, to foobas!
if ($array[$k] === ($v +1))//$v + 1 yields 1 if $v === 'foobar'
{//so 'foobas' === 1 => false
$array[$k] = $v;//restore initial value: foobar
}
}
Maintainability/idiot-proofness:
Of course, you might say that the dangling reference is an easy fix, and you'd be right:
foreach($array as &$value)
{
$value++;
}
unset($value);
But after you've written your first 100 loops with references, do you honestly believe you won't have forgotten to unset a single reference? Of course not! It's so uncommon to unset variables that have been used in a loop (we assume the GC will take care of it for us), so most of the time, you don't bother. When references are involved, this is a source of frustration, mysterious bug-reports, or traveling values, where you're using complex nested loops, possibly with multiple references... The horror, the horror.
Besides, as time passes, who's to say that the next person working on your code won't foget about unset? Who knows, he might not even know about references, or see your numerous unset calls and deem them redundant, a sign of your being paranoid, and delete them all together. Comments alone won't help you: they need to be read, and everyone working with your code should be thoroughly briefed, perhaps have them read a full article on the subject. The examples listed in the linked article are bad, but I've seen worse, still:
foreach($nestedArr as &$array)
{
if (count($array)%2 === 0)
{
foreach($array as &$value)
{//pointless, but you get the idea...
$value = array($value, 'Part of even-length array');
}
//$value now references the last index of $array
}
else
{
$value = array_pop($array);//assigns new value to var that might be a reference!
$value = is_numeric($value) ? $value/2 : null;
array_push($array, $value);//congrats, X-references ==> traveling value!
}
}
This is a simple example of a traveling value problem. I did not make this up, BTW, I've come across code that boils down to this... honestly. Quite apart from spotting the bug, and understanding the code (which has been made more difficult by the references), it's still quite obvious in this example, mainly because it's a mere 15 lines long, even using the spacious Allman coding style... Now imagine this basic construct being used in code that actually does something even slightly more complex, and meaningful. Good luck debugging that.
side-effects:
It's often said that functions shouldn't have side-effects, because side-effects are (rightfully) considered to be code-smell. Though foreach is a language construct, and not a function, in your example, the same mindset should apply. When using too many references, you're being too clever for your own good, and might find yourself having to step through a loop, just to know what is being referenced by what variable, and when.
The first method hasn't got this problem: you have the key, so you know where you are in the array. What's more, with the first method, you can perform any number of operations on the value, without changing the original value in the array (no side-effects):
function recursiveFunc($n, $max = 10)
{
if (--$max)
{
return $n === 1 ? 10-$max : recursiveFunc($n%2 ? ($n*3)+1 : $n/2, $max);
}
return null;
}
$array = range(10,20);
foreach($array as $k => $v)
{
$v = recursiveFunc($v);//reassigning $v here
if ($v !== null)
{
$array[$k] = $v;//only now, will the actual array change
}
}
echo json_encode($array);
This generates the output:
[7,11,12,13,14,15,5,17,18,19,8]
As you can see, the first, seventh and tenth elements have been altered, the others haven't. If we were to rewrite this code using a loop by reference, the loop looks a lot smaller, but the output will be different (we have a side-effect):
$array = range(10,20);
foreach($array as &$v)
{
$v = recursiveFunc($v);//Changes the original array...
//granted, if your version permits it, you'd probably do:
$v = recursiveFunc($v) ?: $v;
}
echo json_encode($array);
//[7,null,null,null,null,null,5,null,null,null,8]
To counter this, we'll either have to create a temporary variable, or call the function tiwce, or add a key, and recalculate the initial value of $v, but that's just plain stupid (that's adding complexity to fix what shouldn't be broken):
foreach($array as &$v)
{
$temp = recursiveFunc($v);//creating copy here, anyway
$v = $temp ? $temp : $v;//assignment doesn't require the lookup, though
}
//or:
foreach($array as &$v)
{
$v = recursiveFunc($v) ? recursiveFunc($v) : $v;//2 calls === twice the overhead!
}
//or
$base = reset($array);//get the base value
foreach($array as $k => &$v)
{//silly combine both methods to fix what needn't be a problem to begin with
$v = recursiveFunc($v);
if ($v === 0)
{
$v = $base + $k;
}
}
Anyway, adding branches, temp variables and what have you, rather defeats the point. For one, it introduces extra overhead which will eat away at the performance benefits references gave you in the first place.
If you have to add logic to a loop, to fix something that shouldn't need fixing, you should step back, and think about what tools you're using. 9/10 times, you chose the wrong tool for the job.
The last thing that, to me at least, is a compelling argument for the first method is simple: readability. The reference-operator (&) is easily overlooked if you're doing some quick fixes, or try to add functionality. You could be creating bugs in the code that was working just fine. What's more: because it was working fine, you might not test the existing functionality as thoroughly because there were no known issues.
Discovering a bug that went into production, because of your overlooking an operator might sound silly, but you wouldn't be the first to have encountered this.
Note:
Passing by reference at call-time has been removed since 5.4. Be weary of features/functionality that is subject to changes. a standard iteration of an array hasn't changed in years. I guess it's what you could call "proven technology". It does what it says on the tin, and is the safer way of doing things. So what if it's slower? If speed is an issue, you can optimize your code, and introduce references to your loops then.
When writing new code, go for the easy-to-read, most failsafe option. Optimization can (and indeed should) wait until everything's tried and tested.
And as always: premature optimization is the root of all evil. And Choose the right tool for the job, not because it's new and shiny.
As far as performance is concerned Method 2 is better, especially if you either have a large array and/or are using string keys.
While both methods use the same amount of memory the first method requires the array to be searched, even though this search is done by a index the lookup has some overhead.
Given this test script:
$array = range(1, 1000000);
$start = microtime(true);
foreach($array as $k => $v){
$array[$k] = $v+1;
}
echo "Method 1: ".((microtime(true)-$start));
echo "\n";
$start = microtime(true);
foreach($array as $k => &$v){
$v+=1;
}
echo "Method 2: ".((microtime(true)-$start));
The average output is
Method 1: 0.72429609298706
Method 2: 0.22671484947205
If I scale back the test to only run ten times instead of 1 million I get results like
Method 1: 3.504753112793E-5
Method 2: 1.2874603271484E-5
With string keys the performance difference is more pronounced.
So running.
$array = array();
for($x = 0; $x<1000000; $x++){
$array["num".$x] = $x+1;
}
$start = microtime(true);
foreach($array as $k => $v){
$array[$k] = $v+1;
}
echo "Method 1: ".((microtime(true)-$start));
echo "\n";
$start = microtime(true);
foreach($array as $k => &$v){
$v+=1;
}
echo "Method 2: ".((microtime(true)-$start));
Yields performance like
Method 1: 0.90371179580688
Method 2: 0.2799870967865
This is because searching by string key has more overhead then the array index.
It is also worth noting that as suggested in Elias Van Ootegem's Answer to properly clean up after yourself you should unset the reference after the loop has completed. I.e. unset($v); And the performance gains should be measured against the loss in readability.
There are some minor performance differences, but they aren't going to have any significant effect.
I would choose the first option for two reasons:
It's more readable. This is a bit of a personal preference, but at first glance, it's not immediately obvious to me that $number++ is updating the array. By explicitly using something like $array[$i]++, it's much clearer, and less likely to cause confusion when you come back to this code in a year.
It doesn't leave you with a dangling reference to the last item in the array. Consider this code:
$array = array(1,2,3,4,5);
foreach($array as &$number){
$number++;
}
// ... some time later in an unrelated section of code
$number = intval("100");
// now unexpectedly, $array[4] == 100 instead of 6
I guess that depends. Do you care more about code readability/maintainability or minimizing memory usage. The second method would use slightly less memory, but I would honestly prefere the first usage, as assigned by reference in foreach definition does not seem to be commonplace practice in PHP.
Personally if I wanted to modify an array in place like this I would go with a third option:
array_walk($array, function(&$value) {
$value++;
});
The first method will be insignificantly slower, because each time it will go through the loop, it will assign a new value to the $number variable. The second method uses the variable directly so it doesn't need to assign a new value for each loop.
But, as I said, the difference is not significant, the main thing to consider is readability.
In my opinion, the first method makes more sense when you don't need to modify the value in the loop, the $number variable would only be read.
The second method makes more sense when you need to modify the $number variable often, as you don't need to repeat the key each time you want to modify it, and it is more readable.
Have you considered array_map? It is designed to change values inside arrays.
$array = array(1,2,3,4,5);
$new = array_map(function($number){
return $number++ ;
}, $array) ;
var_dump($new) ;
I'd choose #2, but it's a personal preference.
I disagree with the other answers, using references to array items in foreach loops is quite common, but it depends on the framework you're using. As always, try to follow existing coding conventions in your project or framework.
I also disagree with the other answers that suggest array_map or array_walk. These introduce the overhead of a function call for each array element. For small arrays, this won't be significant, but for large arrays, this will add a significant overhead for such a simple function. However, they are appropriate if you're performing more significant calculations or actions - you'll need to decide which to use depending on the scenario, perhaps by benchmarking.
Most of the answers interpreted your question to be about performance.
This is not what you asked. What you asked is:
I wonder which method is the better programming practice?
As you said, both do the same thing. Both work. In the end, better is often a matter of opinion.
Or is this one of those it doesn't really matter things?
I wouldn't go so far as to say it doesn't matter. As you can see there can be performance considerations for Method 1 and reference gotchas for Method 2.
I can say what matters more is readability and consistency. While there are dozens of ways to increment array elements in PHP, some look like line noise or code golf.
Ensuring your code is readable to future developers and you consistently apply your method of solving problems is a far better macro programming practice than whatever micro differences exist in this foreach code.

"iterate" a single object or the index equivalent of a php object

I can get where I'm going, but I want the prettiest road.
What led me here was researching the best way to run a function that returns a collection of objects like $dom->getElementsByTagName() or $pdo->query('SELECT itcher FROM scratches')and then - knowing that it will only have one result - accessing that result.
I've done some research but I'd like to know that I know all that there is to know.
foreach or anything that iterates over multiple things felt silly from an aesthetic point-of-view because I know there's just one. Casting it to an array feels like a blemish, and I want to get off to my code. The one I like the most so far is $object->{'0'} because it's as close to $object[0] as I've found, but it doesn't seem to work in every case. Is there something even prettier to look at? For instance, what exactly is foreach($key as $val) doing on each iteration when it sets a $key to $val? Can I do that myself?
Maybe I'm not visualizing the idea of an object properly in my mind, but wouldn't it make more sense for getElementsByTagName and mysql queries to return arrays? Why don't they?
For instance, what exactly is foreach($key as $val) doing on each iteration when it sets a $key to $val? Can I do that myself?
If it really is an array, you can get the first element with reset().
If it's an Iterator (which is used to allow foreach access to an object) then you should be able to use $object->current();
I've never really looked at the underlying code for foreach - however reset() and current() seem analogous to what foreach would 'do' at the start of iteration.
Unfortunately, the two examples you reference are internal classes, and according to the docs internal classes (classes that are part of the language or an extension) don't follow the same rules:
Internal (built-in) classes that implement this interface can be used in a foreach construct and do not need to implement IteratorAggregate or Iterator.
That said, any user classes that return objects that can be iterated by foreach should be of a Traversable type (what foreach requires), so I believe current() should work the way you want. Note that if the object actually implements IteratorAggregate instead of Iterator, you'll have to use getIterator()->current().
However it's still possible to do with your two examples - just using methods specific to those returned types. In the case of getElementsByTagName(), you can use getElementsByTagName->item(0) and in the case of PDO you can use query()->fetch().

Efficiency of PHP arrays cast as objects?

From what I understand, PHP stdClass objects are generally faster than arrays, when the code is deeply-nested enough for it to actually matter. How is that efficiency affected if I'm typecasting to define stdClass objects on the fly:
$var = (object)array('one' => 1, 'two' => 2);
If the code doing this is going to be executed many many times, will I be better off explicitly defining $var as an objects instead:
$var = new stdClass();
$var->one = 1;
$var->two = 2;
Is the difference negligible since I'll then be accessing $var as an object from there on, either way?
Edit:
A stdClass is the datatype I need here. I'm not concerned with whether I should use arrays or whether I should use stdClass objects; I'm more concerned with whether using the (object)array(....) shorthand of instantiating a stdClass is efficient. And yes, this is in code that will be executed potentially thousands of times.
You understand wrong. Objects are not "much faster" than arrays. In fact, the opposite is usually true, since arrays don't have the problem of inheritance (and visibility lookups). Sure, there may be specific cases where you can show a clear significant gain/loss, but in the generic case arrays will tend to be faster...
Use the tool that's semantically correct. Don't avoid using arrays because you think objects are faster. Use a construct when it makes sense. There are times when it makes sense to replace an array with an object (For example, when you want to enforce the types in the array strictly). But this is not one of them. And even at that, you wouldn't replace array() with STDClass, you'd replace it with a custom class (one that likely extends ArrayObject or at least implements Iterator and ArrayAccess interfaces). Different Data Structures exist for a reason. If it was better to use objects instead of arrays, wouldn't we all be using objects instead?
Don't worry about micro-optimizations like "is cast faster". It's almost always not. Write readable, correct code. Then optimize if you're having a problem...
function benchmark($func_name, $iterations) {
$begin = microtime(true);
for ($i = 0; $i < $iterations; $i++) {
$func_name();
}
$end = microtime(true);
$execution_time = $end - $begin;
echo $func_name , ': ' , $execution_time;
}
function standClass() {
$obj = new stdClass();
$obj->param_one = 1;
$obj->param_two = 2;
}
function castFromArray() {
$obj = (object)array('param_one' => 1, 'param_two' => 2);
}
benchmark('standClass', 1000);
benchmark('castFromArray', 1000);
benchmark('standClass', 100000);
benchmark('castFromArray', 100000);
Outputs:
standClass: 0.0045979022979736
castFromArray: 0.0053138732910156
standClass: 0.27266097068787
castFromArray: 0.20209217071533
Casting from an array to stdClass on the fly is around 30% more efficient, but the difference is still negligible until you know you will be performing the operation 100,000 times (and even then, you're only looking at a tenth of a second, at least on my machine).
So, in short, it doesn't really matter the vast majority of the time, but if it does, define the array in a single command and then type-cast it to an object. I definitely wouldn't spend time worrying about it unless you've identified the code in question as a bottleneck (and even then, focus on reducing your number of iterations if possible).
It's more important to chose the right data structure for the right job.
When you want to decide what to use, ask yourself: What does your data represent?
Are you representing a set, or a sequence? Then you should probably use an array.
Are you representing some thing with certain properties or behaviors? Then you should probably use an object.

Mixing arrays and objects

I feel bad about mixing arrays with objects, but I'm not sure that I should.
// Populate array
foreach($participants as $participant){
$participants[$key]['contestant'] = new Contestant($participant);
$participants[$key]['brand'] = new Brand($brand);
$key++;
}
[...]
// Print array
foreach($participants as $participant){
print "Name: " . $participant['contestant']->name;
print "Nationality: " . $participant['contestant']->nationality;
}
I'm not comfortable about the $contestant['contestant']->name part. I'd feel better about using objects exclusively.
Is it in fact considered bad practice to mix objects and arrays, or am I obsessing over something that everyone else thinks is fine?
It feels fine to me. Where an array makes sense, use an array. Where an object makes sense, use an object.
However, maybe you feel that a participant makes more sense as an object, which, looking at this relatively small code sample, it just may. If so, write up a quick Participant class. If that feels like too much overhead, then don't worry about it. At this point, it's personal preference, since your code, does, in fact, work. It's all about which codebase you would prefer to be working with in the future.
You're worrying yourself unnecessarily. It's fairly commonplace in other languages to mix sequences/mappings with objects. Arrays are useful when you don't know the number of elements or only a "ragtag" set of elements exist, and objects are useful when you know exactly the number of elements, and said elements each have a specific purpose.

Categories