Equivalent of std::set in PHP? - php

What's the equivalent function in PHP for C plus plus "set" ("Sets are a kind of associative containers that stores unique elements, and in which the elements themselves are the keys.")?

There isn't one, but they can be emulated.
Here is a achieve copy before the link died.. all the contents
A Set of Objects in PHP: Arrays vs. SplObjectStorage
One of my projects, QueryPath, performs many tasks that require maintaining a set of unique objects. In my quest to optimize QueryPath, I have been looking into various ways of efficiently storing sets of objects in a way that provides expedient containment checks. In other words, I want a data structure that keeps a list of unique objects, and can quickly tell me if some object is present in that list. The ability to loop through the contents of the list is also necessary.
Recently I narrowed the list of candidates down to two methods:
Use good old fashioned arrays to emulate a hash set.
Use the SPLObjectStorage system present in PHP 5.2 and up.
Before implementing anything directly in QueryPath, I first set out designing the two methods, and then ran some micro-benchmarks (with Crell's help) on the pair of methods. To say that the results were surprising is an understatement. The benchmarks will likely change the way I structure future code, both inside and outside of Drupal.
The Designs
Before presenting the benchmarks, I want to quickly explain the two designs that I settled on.
Arrays emulating a hash set
The first method I have been considering is using PHP's standard array() to emulate a set backed by a hash mapping (a "hash set"). A set is a data structure designed to keep a list of unique elements. In my case, I am interested in storing a unique set of DOM objects. A hash set is a set that is implemented using a hash table, where the key is a unique identifier for the stored value. While one would normally write a class to encapsulate this functionality, I decided to test the implementation as a bare array with no layers of indirection on top. In other words, what I am about to present are the internals of what would be a hash set implementation.
The Goal: Store a (unique) set of objects in a way that makes them (a) easy to iterate, and (b) cheap to check membership.
The Strategy: Create an associative array where the key is a hash ID and the value is the object.
With a reasonably good hashing function, the strategy outlined above should work as desired.
"Reasonably good hashing function" -- that was the first gotcha. How do you generate a good hashing function for an object like DOMDocument? One (bad) way would be to serialize the object and then, perhaps, take its MD5 hash. That, however, will not work on many objects -- specifically any object that cannot serialze. The DOMDocument, for example, is actually backed by a resource and cannot be serialized.
One needed look far for a such a function, though. It turns out that there is an object hashing function in PHP 5. It's called spl_object_hash(), and it can take any object (even one that is not native PHP) and generate a hashcode for it.
Using spl_object_hash() we can build a simple data structure that functions like a hash set. This structure looks something like this:
array(
$hashcode => $object
);
For example, we an generate an entry like so:
$object = new stdClass();
$hashcode = spl_object_hash($object);
$arr = array(
$hashcode => $object
);
In the example above, then, the hashcode string is an array key, and the object itself is the array value. Note that since the hashcode will be the same each time an object is re-hashed, it serves not only as a comparison point ("if object a's hashkey == object b's hashkey, then a == b"), it also functions as a uniqueness constraint. Only one object with the specified hashcode can exist per array, so there is no possibility of two copies (actually, two references) to the same object being placed in the array.
With a data structure like this, we have a host of readily available functions for manipulating the structure, since we have at our disposal all of the PHP array functions. So to some degree this is an attractive choice out of the box.
The most import task, in our context at least, is that of determining whether an entry exists inside of the set. There are two possible candidates for this check, and both require supplying the hashcode:
Check whether the key exists using array_key_exists().
Check whether the key is set using isset().
To cut to the chase, isset() is faster than array_key_exists(), and offers the same features in our context, so we will use that. (The fact that they handle null values differently makes no difference to us. No null values will ever be inserted into the set.)
With this in mind, then, we would perform a containment check using something like this:
$object = new stdClass();
$hashkey = spl_object_hash($object);
// ...
// Check whether $arr has the $object.
if (isset($arr[$hashkey])) {
// Do something...
}
Again, using an array that emulates a hash set allows us to use all of the existing array functions and also provides easy iterability. We can easily drop this into a foreach loop and iterate the contents. Before looking at how this performs, though, let's look at the other possible solution.
Using SplObjectStorage
The second method under consideration makes use of the new SplObjectStorage class from PHP 5.2+ (it might be in 5.1). This class, which is backed by a C implementation, provides a set-like storage mechanism for classes. It enforces uniqueness; only one of each object can be stored. It is also traversable, as it implements the Iterable interface. That means you can use it in loops such as foreach. (On the down side, the version in PHP 5.2 does not provide any method of random access or index-based access. The version in PHP 5.3 rectifies this shortcoming.)
The Goal: Store a (unique) set of objects in a way that makes them (a) easy to iterate, and (b) cheap to check membership.
The Strategy: Instantiate an object of class SplObjectStorage and store references to objects inside of this.
Creating a new SplObjectStorage is simple:
$objectStore = new SplObjectStorage();
An SplObjectStorage instance not only retains uniqueness information about objects, but objects are also stored in predictable order. SplObjectStorage is a FIFO -- First In, First Out.
Adding objects is done with the attach() method:
$objectStore = new SplObjectStorage();
$object = new stdClass();
$objectStore->attach($object);
It should be noted that attach will only attach an object once. If the same object is passed to attach() twice, the second attempt will simply be ignored. For this reason it is unnecessary to perform a contains() call before an attach() call. Doing so is redundant and costly.
Checking for the existence of an object within an SplObjectStorage instance is also straightforward:
$objectStore = new SplObjectStorage();
$object = new stdClass();
$objectStore->attach($object);
// ...
if ($objectStore->contains($object)) {
// Do something...
}
While SplObjectStorage has nowhere near the number of supporting methods that one has access to with arrays, it allows for iteration and somewhat limited access to the objects stored within. In many use cases (including the one I am investigating here), SplObjectStorage provides the requisite functionality.
Now that we have taken a look at the two candidate data structures, let's see how they perform.
The Comparisons
Anyone who has seen Larry (Crell) Garfield's micro-benchmarks for arrays and SPL ArrayAccess objects will likely come into this set of benchmarks with the same set of expectations Larry and I had. We expected PHP's arrays to blow the SplObjectStorage out of the water. After all, arrays are a primitive type in PHP, and have enjoyed years of optimizations. However, the documentation for the SplObjectStorage indicates that the search time for an SplObjectStorage object is O(1), which would certainly make it competitive if the base speed is similar to that of an array.
My testing environments are:
An iMac (current generation) with a 3.06 Ghz Intel Core 2 Duo and 2G of 800mhz DDR2 RAM. MAMP 1.72 (PHP 5.2.5) provides the AMP stack.
A MacBook Pro with a 2.4 Ghz Intel Core 2 Duo and 4G of 667mhz DDR2 RAM. MAMP 1.72 (PHP 5.2.5) provides the AMP stack.
In both cases, the performance tests averaged about the same. Benchmarks in this article come from the second system.
Our basic testing strategy was to build a simple test that captured information about three things:
The amount of time it takes to load the data structure
The amount of time it takes to seek the data structure
The amount of memory the data structure uses
We did our best to minimize the influence of other factors on the test. Here is our testing script:
<?php
/**
* Object hashing tests.
*/
$sos = new SplObjectStorage();
$docs = array();
$iterations = 100000;
for ($i = 0; $i < $iterations; ++$i) {
$doc = new DOMDocument();
//$doc = new stdClass();
$docs[] = $doc;
}
$start = $finis = 0;
$mem_empty = memory_get_usage();
// Load the SplObjectStorage
$start = microtime(TRUE);
foreach ($docs as $d) {
$sos->attach($d);
}
$finis = microtime(TRUE);
$time_to_fill = $finis - $start;
// Check membership on the object storage
$start = microtime(FALSE);
foreach ($docs as $d) {
$sos->contains($d);
}
$finis = microtime(FALSE);
$time_to_check = $finis - $start;
$mem_spl = memory_get_usage();
$mem_used = $mem_spl - $mem_empty;
printf("SplObjectStorage:\nTime to fill: %0.12f.\nTime to check: %0.12f.\nMemory: %d\n\n", $time_to_fill, $time_to_check, $mem_used);
unset($sos);
$mem_empty = memory_get_usage();
// Test arrays:
$start = microtime(TRUE);
$arr = array();
// Load the array
foreach ($docs as $d) {
$arr[spl_object_hash($d)] = $d;
}
$finis = microtime(TRUE);
$time_to_fill = $finis - $start;
// Check membership on the array
$start = microtime(FALSE);
foreach ($docs as $d) {
//$arr[spl_object_hash($d)];
isset($arr[spl_object_hash($d)]);
}
$finis = microtime(FALSE);
$time_to_check = $finis - $start;
$mem_arr = memory_get_usage();
$mem_used = $mem_arr - $mem_empty;
printf("Arrays:\nTime to fill: %0.12f.\nTime to check: %0.12f.\nMemory: %d\n\n", $time_to_fill, $time_to_check, $mem_used);
?>
The test above is broken into four separate tests. The first two test how well the SplObjectStorage method handles loading and containment checking. The second two perform the same test on our improvised array data structure.
There are two things worth noting about the test above.
First, the object of choice for our test was a DOMDocument. There are a few reasons for this. The obvious reason is that this test was done with the intent of optimizing QueryPath, which works with elements from the DOM implementation. There are two other interesting reasons, though. One is that DOMDocuments are not lightweight. The other is that DOMDocuments are backed by a C implementation, making them one of the more difficult cases when storing objects. (They cannot, for example, be conveniently serialized.)
That said, after observing the outcome, we repeated the test with basic stdClass objects and found the performance results to be nearly identical, and the memory usage to be proportional.
The second thing worth mention is that we used 100,000 iterations to test. This was about the upper bound that my PHP configuration allowed before running out of memory. Other than that, though, the number was chosen arbitrarily. When I ran tests with lower iteration counts, the SplObjectStorage definitely scaled linearly. Array performance was less predictable (larger standard deviation) with smaller data sets, though it seemed to average around the same for lower sizes as it does (more predictably) for larger sized arrays.
The Results
So how did these two strategies fare in our micro-benchmarks? Here is a representative sample of the output generated when running the above:
SplObjectStorage:
Time to fill: 0.085041999817.
Time to check: 0.073099000000.
Memory: 6124624
Arrays:
Time to fill: 0.193022966385.
Time to check: 0.153498000000.
Memory: 8524352
Averaging this over multiple runs, SplObjectStorage executed both fill and check functions twice as fast as the array method presented above. We tried various permutations of the tests above. Here, for example, are results for the same test with a stdClass object:
SplObjectStorage:
Time to fill: 0.082209110260.
Time to check: 0.070617000000.
Memory: 6124624
Arrays:
Time to fill: 0.189271926880.
Time to check: 0.152644000000.
Memory: 8524360
Not much different. Even adding arbitrary data to the object we stored does not make a difference in the time it takes for the SplObjectStorage (though it does seem to raise the time ever so slightly for the array).
Our conclusion is that SplObjectStorage is indeed a better solution for storing lots of objects in a set. Over the last week, I've ported QueryPath to SplObjectStorage (see the Quark branch at GitHub -- the existing Drupal QueryPath module can use this experimental branch without alteration), and will likely continue benchmarking. But preliminary results seem to provide a clear indication as to the best approach.
As a result of these findings, I'm much less inclined to default to arrays as "the best choice" simply because they are basic data types. If the SPL library contains features that out-perform arrays, they should be used when appropriate. From QueryPath to my Drupal modules, I expect that my code will be impacted by these findings.
Thanks to Crell for his help, and for Eddie at Frameweld for sparking my examination of these two methods in the first place.

In PHP you use arrays for that.

There is no built-in equivalent of std::set in PHP.
You can use arrays "like" sets, but it's up to you to enforce the rules.

Have a look at Set from Nspl. It supports basic set operations which take other sets, arrays and traversable objects as arguments. You can see examples here.

Related

Why does Symfony provide an OrderedHashMap

Symfony provides an OrderedHashMap. Its documentation states
Unlike associative arrays, the map keeps track of the order in which keys were added and removed. This order is reflected during iteration.
I'm confused by this statement, because I thought PHP associative arrays are actually already ordered maps. I found this question on SO, which confirms my previous assumption: Are PHP Associative Arrays ordered?
I wonder, if the Symfony devs didn't know PHP arrays are already ordered maps or if I don't understand the role of Symfony's OrderedHashMap
Of course PHP's array is an ordered-map.
But Symfony's OrderedHashMap has some different behaviors (say, features) from PHP's array.
OrderedHashMap supports concurrent modification during iteration. That means that you can insert and remove elements from within a foreach loop and the iterator will reflect those changes accordingly. But array in the iteration is a copied one (Copy-On-Write), any modification is not reflected in the loop.
$map = new OrderedHashMap();
$map[1] = 1;
$map[2] = 2;
$map[3] = 3;
foreach ($map as $index => $value) {
echo "$index: $value\n"
if (1 === $index) {
$map[1] = 4;
$map[] = 5;
}
}
You will get different output if you are using array. In the iteration of an array the loop won't see the number 5.
About "Why?": Search it in Symfony's codebase. There is only one place to use the OrderedHashMap, which is used to store a form's children. It's used by new InheritDataAwareIterator($this->children) to iterate. So I think the answer is to help to process the form's children.
And thanks to #fishbone:
So the benefit is not the ordering but the ability to modify it in loops.
In general, not only in Symfony's context, beside additional implemented features, object oriented structures are preferred over primitive types such as int, string or array as they can be injected into the class for unit testing.
Object oriented structures can enforce invariants as well whereas primitive types can only hold data without any behaviors.

indexed array foreach shorthand

Data:
$players = array(
new Player('psycketom'),
new Player('stackexchanger'),
new Player('max')
);
Usually, in order to get something out of every object within array, we have to use for / foreach.
foreach ($players as $player)
{
var_dump( $player->score );
}
But, since it's a repetitive task, is there a way to shortcut it to something along these imaginary lines(?):
var_dump( every( $players )->score );
every( $players )->score += 40;
Since I have not seen such a feature for php, is there a way to implement it?
I have asked the question using php as main language, but the language-agnostic and programming-languages stand for the second part of the question: what languages support such or at least similar shorthand?
So, you are correct that PHP does not support this "out of the box" (except kinda, see below). The first language I know of that does is Objective-C (well, at least the CoreFoundation library). NSArrays and other sets have methods to (in one line) instruct that a given method should be executed on all members; and even more cool (to me, at least) is the concept of "keypaths" and the support that NSArray and others has for them. An example; lets say you have an array of "people" who each have a parent, who in turn have a "name":
arrayOfNames = [peopleArray valueForKeyPath:"parent.name"];
arrayOfNames is now an array of all the parents' names.
The closest thing PHP has is array_map, which you can use together with anonymous functions to very quickly whip something together.
edit anecdotal as it may be, one should remember that loop structures don't need their curly-braces if there is only one statement to execute; so any fancier solutions need to compete with this:
foreach($players as $p) $p->score += 40;
And I'll mention a deeper solution for those OOP fans out there... If you work with collection objects instead of arrays, the world is your oyster with stuff like this. The simplest concept that comes to mind is php's magic __call() method. How simple to iterate over your members and make that call for your users? For more controll, you can implement a few different strategies for iteration (one for transforms, one for filters, etc. Difference being what gets returned, essentially). So in theory you could create a few different iterator classes, and in your "main" collection class implement a couple methods to get one of them, which will be pre-initialized with the contents:
$players->transform()->addScore(40);
where transform() returns an instance of your "don't return the array" iterator, which uses the __call() technique.
The sky starts to open up at this point, and you can start to build filter iterators which take predicates and return another collection of a subset of the objects, and syntax like this is possible:
// send flyer to all parents with valid email address
$parentsPredicate = new Predicate('email');
$players->filteredByPredicate($parentsPredicate)->each()->email($fyler_email_body);
You could do:
var_dump(array_map(function($el){return $el->score;}, $players));
array_walk($players, function($el) {$el->score += 40;});

SPL objectstorage vs SPL array vs ordinary array

what is the difference,*usage* scenerio between normal ARray, SPL array and SPL datastorage? It would be great if anyone can give some practical example of usage of SPLarray and SPL objectsrorage.
The main advantage of SplFixedArray is that for a certain subset of use cases for arrays, it is much faster (that subset being arrays that have only integer keys, and a fixed length). So, for example:
$a = array("foo", $bar, 7, ... thousands of values ..., $quux);
$b = \SplFixedArray::fromArray($a);
// here, $b will be much faster to use than $a
The usage for this class could literally be anything you could use an array for, but found them to previously be too slow. A lot of times this is the case in complex calculations on large data sets. For your typical PHP-based web application or website, there aren't going to be many (if any) cases where you'd need the performance boost.
The SplObjectStorage class, however, can be useful in all kinds of typical cases. It provides a way to map objects to other data. So, for example, maybe you have a Route class that you'd like to provide a mapping to a Controller class:
$routeOne = new Route(/* ... */);
$routeTwo = new Route(/* ... */);
$controllerOne = new Controller(/* ... */);
$controllerTwo = new Controller(/* ... */);
$controllers = new \SplObjectStorage();
$controllers[$routeOne] = $controllerOne;
$controllers[$routeTwo] = $controllerTwo;
// now you can look up a controller for a given route by: $controllers[$route]

limiting a collection of objects to a unique set

Currently I have a PHP class called Collection. It uses an array to hold set of unique objects. They are unique, not in the sense that they have different memory addresses (though obviously they do), but in the sense that there are no equivalent objects in the set.
I've been reading about SplObjectStorage which has significant speed advantages over an array and might be easier to maintain than my Collection class. My problem is that SplObjectStorage does not concern itself with equivalency, only identity. For example:
class Foo {
public $id;
function __construct($int){
$this->id=$int;
}
function equals(self $obj){
return $this->id==$obj->id;
}
}
$f1a = new Foo(1);
$f1b = new Foo(1);//equivalent to $f1a
$f2a = new Foo(2);
$f2b = $f2a; //identical to $f2a
$s=new SplObjectStorage;
$s->attach($f1a);
$s->attach($f1b);//attaches (I don't want this)
$s->attach($f2a);
$s->attach($f2b);//does not attach (I want this)
foreach($s as $o) echo $o->id; //1 1 2, but I wish it would be 1 2
So I've been thinking about how to subclass SplObjectStorage so its attach() would be restricted by object equivalency, but so far it involves setting the $data of the object to an "equivalency signature" which seems to require looping through the datastructure until I find (or not) a matching value.
e.g.:
class MyFooStorage extends SplObjectStorage {
function attach(Foo $obj){
$es=$obj->id;
foreach($this as $o=>$data) {//this is the inefficient bit
if($es==$data) return;
}
parent::attach($obj);
$this[$obj]=$es;
}
}
Is there a better way?
If the only thing that defines equality is relative to another object, then I am afraid that what you want is impossible. Think about it, there is no way to determine that an object is already included in the array unless I check every object, so I will have a complexity of O(n) no matter what.
However, if you make the equality absolute then this is possible. In order to do that, you will have to produce a hash value for each of the objects. Two objects will be equal if and only if their hashes are equal. Once you have that, then you can achieve O(1) with a HashMap.
Under the hoods, this is exactly what SplObjectStorage does, by taking the address of an object as its hash value.

Best method of passing/return values

The reason I am asking this question is because I have landed my first real (yes, a paid office job - no more volunteering!) Web Development job about two months ago. I have a couple of associates in computer information systems (web development and programming). But as many of you know, what you learn in college and what you need in the job site can be very different and much more. I am definitely learning from my job - I recreated the entire framework we use from scratch in a MVC architecture - first time doing anything related to design patterns.
I was wondering what you would recommend as the best way to pass/return values around in OO PHP? Right now I have not implement any sort of standard, but I would like to create one before the size of the framework increases any more. I return arrays when more than 1 value needs to get return, and sometimes pass arrays or have multiple parameters. Is arrays the best way or is there a more efficient method, such as json? I like the idea of arrays in that to pass more values or less, you just need to change the array and not the function definition itself.
Thank you all, just trying to become a better developer.
EDIT: I'm sorry all, I thought I had accepted an answer for this question. My bad, very, very bad.
How often do you run across a situation where you actually need multiple return values? I can't imagine it's that often.
And I don't mean a scenario where you are returning something that's expected to be an enumerable data collection of some sort (i.e., a query result), but where the returned array has no other meaning that to just hold two-or-more values.
One technique the PHP library itself uses is reference parameter, such as with preg_match(). The function itself returns a single value, a boolean, but optionally uses the supplied 3rd parameter to store the matched data. This is, in essence, a "second return value".
Definitely don't use a data interchange format like JSON. the purpose of these formats is to move data between disparate systems in an expected, parse-able way. In a single PHP execution you don't need that.
You can return anything you want: a single value, an array or a reference (depending on the function needs). Just be consistent.
But please don't use JSON internally. It just produces unnecessary overhead.
I also use arrays for returning multiple values, but in practice it doesn't happen very often. If it does, it's generally a sensible grouping of data, such as returning array('x'=>10,'y'=>10) from a function called getCoordinates(). If you find yourself doing lots of processing and returning wads of data in arrays from a lot of functions, there's probably some refactoring that can be done to put the work into smaller units.
That being said, you mentioned:
I like the idea of arrays in that to pass more values or less, you just need to change the array and not the function definition itself.
In that regard, another technique you might be interested in is using functions with variable numbers of arguments. It is perfectly acceptable to declare a function with no parameters:
function stuff() {
//do some stuff
}
but call it with all the parameters you care to give it:
$x = stuff($var1, $var2, $var3, $var4);
By using func_get_args(), func_get_arg() (singular) and func_num_args() you can easily find/loop all the parameters that were passed. This works very well if you don't have specific parameters in mind, say for instance a sum() function:
function sum()
{
$out = 0;
for($i = 0; $i < $c = func_num_args(); $i++) {
$out += func_get_arg($i);
}
return $out;
}
//echoes 35
echo sum(10,10,15);
Food for thought, maybe you'll find it useful.
The only thing I'm careful to avoid passing/returning arrays where the keys have "special" meaning. Example:
<?php
// Bad. Don't pass around arrays with 'special' keys
$personArray = array("eyeColor"=>"blue", "height"=>198, "weight"=>103, ...);
?>
Code that uses an array like this is harder to refactor and debug. This type of structure is better represented as an object.
<?php
Interface Person {
/**
* #return string Color Name
*/
public function getEyeColor();
...
}
?>
This interface provides a contract that the consuming code can rely on.
Other than that I can't think of any reason to limit yourself.
Note: to be clear, associative arrays are great for list data. like:
<?php
// Good array
$usStates = array("AL"=>"ALABAMA", "AK"="ALASKA", ... );
?>

Categories