I've seen a few creative solutions for dealing with serialized SPL objects but am looking for more options (or elaborations). I store nested serialized objects - of which, one is SimpleXML - in the database, only to be un-serialized later. This obviously cause some problems.
$s = new SimpleXmlElement('<foo>bar</foo>');
$ss = serialize($s);
$su = unserialize($ss);
// Warning: unserialize() [function.unserialize]: Node no longer exists...
Does anyone have any insight into highly-reliable methods for dealing with serialized SPL objects? __sleep()/__wakeup() overrides? Cast-to-stdClass? Cast-to-string, then serialize?
Any help is appreciated.
[Edit: The scope and variation of these XML schemas are too varied to map with an ORM. They are, at their most fundamental level, arbitrary payloads in stateful processes, triggered within restful APIs.]
Questions on appropriateness notwithstanding, you can turn it back into XML like this:
$xml = $simpleXmlElem->asXML();
And then, when you pull it from the database:
$simpleXmlElem = simplexml_load_string($xml);
As for whether it's appropriate to just serialize large chunks of XML, it can be true that putting XML into the database removes much of the advantage of using a relational system, but you do have the advantage of being able to accommodate an arbitrary workload. If some fields are universal, and/or you gain benefit from normalizing them properly (e.g. you want to select based on those fields), move those into normalized columns.
More clear and OOP.
namespace MyApp;
class SimpleXMLElement extends \SimpleXMLElement
{
public function arrayToXml($array = array())
{
array_walk_recursive($array, array(&$this, 'addChildInverted'));
return $this;
}
public function addChildInverted($name ,$value)
{
parent::addChild($value,$name);
}
}
and calling
$xml = new \MyApp\SimpleXMLElement('<resultado/>');
echo $xml->arrayToXml($app->getReturnedValue())->asXML();
Wouldn't simply rendering and storing the XML be the best way to serialize any object that represents an XML structure?
What are you trying to do with the serialized data that might prevent this?
edit:
Also,
I store nested serialized objects [...] in the database, only to be un-serialized later
Why are you storing PHP serialized data in a database? There are many better ways to store objects in a database.
Related
In PHP I'm having a real hard time using serialize/unserialize on a large array of objects (100000+ objects). These objects can be of a lot of different types, but are all descendants from a base class.
Somehow when I use unserialize on the array of objects about 0,001% of the objects are generated wrong! A whole different object is generated instead. This does not happen random, but each time with the same objects. But if I change the order of the array, it happens with different objects, so this looks like a bug to me.
I switched to json_encode/json_decode, but found that this always uses stdClass as the object's class. I solved this by including the classname of each object as a property, and then use this property to construct a the new object, but this solution is not very elegant.
Using var_export with eval works fine but is about 3 times slower than the other methods and uses much more memory.
Now my questions are :
what could cause the bug / wrong objects that are created with
unserialize ?
is there a better way to use json_decode with an array of objects, so that classes are somehow stored within the json
automatically?
is there maybe even an other method to read/write a large array of objects in PHP?
UPDATE
I'm beginning to believe there must be something strange with my array-data, because with msgpack_serialize (php extension, alternative to serialize) I get the same kind of errors (but strangely enough not the same objects are generated wrong!).
UPDATE 2
Found a solution : instead of doing serialize on the entire array, I do it on each object now, first serialize and then base64_encode and then I store each serialized object as a separate line in a text-file. This way I can generate the entire array of objects and then iterate each object using file() with unserialize and base64_decode : no more errors!
With serialize/unserialize functions 2 magic methods are connected.
__sleep()
serialize() checks if your class has a function with the magic name __sleep(). If so, that function is executed prior to any serialization. It can clean up the object and is supposed to return an array with the names of all variables of that object that should be serialized. If the method doesn't return anything then NULL is serialized and E_NOTICE is issued.
With sleep you have better control on serializaction you can pass the variables that can be serialized and clean resources befere seralization.
When unserialize is called then the other function should be mentioned
__wakeup()
The intended use of __wakeup() is to reestablish any database connections that may have been lost during serialization and perform other reinitialization tasks.
About json_encode()
It doesn't have magic methods __wakeup, __sleep so you have less control
It doesn't serialize private properties
Objects are always stored as stdClass
Json_encode is faster than serialize
It's is up to you what you choose but for more advanced classes with db connection etc. I would suggest serialize()
I have an XML file which contains the structure of some objects. The object is like this:
class Object:
{
private $name;
private $info;
private $$items;
}
Where $items is an array of objects, so it is recursive.
As of now, when I have to list the items I use simplexml to iterate within the elements and show them.
My questions are:
1) If I parse the XML and convert the data into the object instead of working with pure XML, would it affect the overall performance of the pages all that much? Will it slow down too much considering that every page the user loads he will have to load the items?
2) Is it a good idea to serialize() a recursive object like the one I’ve defined?
SimpleXML cannot be serialized because it's considered to be a resource. However, you can easily grab the output of $sx->toXML(); and serialize that, re-constructing the SimpleXMLElement(s) once you've unserialized them.
The performance difference of re-parsing the XML is not very considerable unless you're working with extremely large XML trees.
In regards to your object, you can also implement the __sleep() and __wakeup() magic methods that will allow you do alter the object before it is serialized and unserialized respectively.
When serializing your recursive object example, don't include your $$items variable in the __sleep() magic method and re-implement it in __wakeup.
I'm developing a floorplanner Flex mini application. I was just wondering whether JSON or XML would be a better choice in terms of performance when generating responses from PHP. I'm currently leaning for JSON since the responses could also be reused for Javascript. I've read elsewhere that JSON takes longer to parse than XML, is that true? What about flexibility for handling data with XML vs JSON in Flex?
I'd go with JSON. We've added native JSON support to Flash Player, so it will be as fast on the parsing side as XML and it's much less verbose/smaller.
=Ryan ryan#adobe.com
JSON is not a native structure to Flex (strange, huh? You'd think that the {} objects could be easily serialized, but not really), XML is. This means that XML is done behind the scenes by the virtual machine while the JSON Strings are parsed and turned into objects through String manipulation (even if you're using AS3CoreLib)... gross... Personally, I've also seen inconsistencies in JSONEncoder (at one point Arrays were just numerically indexed objects).
Once the data has been translated into an AS3 object, it is still faster to search and parse data in XML than it is with Objects. XPath expressions make data traversal a pleasure (almost easy enough to make you smile compared to other things out there).
On the other hand JS is much better at parsing JSON. MUCH, MUCH BETTER. But, since the move to JavaScript is a "maybe... someday..." then you may want to consider, "will future use of JSON be worth the performance hit right now?"
But here is a question, why not simply have two outputs? Since both JS and AS can provide you POSTs with a virtually arbitrary number of variables, you really only need to concern yourself with how the server send the data not receives it. Here's a potential way to handle this:
// as you are about to output:
$type = './outputs/' . $_GET[ 'type' ] . '.php';
if( file_exists( $type ) && strpos( $type, '.', 1 ) === FALSE )
{
include( $type );
echo output_data( $data );
}
else
{
// add a 404 if you like
die();
}
Then, when getting a $_GET['type'] == 'js', js.php would be:
function output_data( $data ){ return json_encode( $data ); }
When getting $_GET['type'] == 'xml', xml.php would hold something which had output_data return a string which represented XML (plenty of examples here)
Of course, if you're using a framework, then you could just do something like this with a view instead (my suggestion boils down to "you should have two different views and use MVC").
No, JSON is ALWAYS smaller than XML when their structures are completely same. And the cost of parsing text is almost up to the size of the target text.
So, JSON is faster than XML and if you have a plan to reuse them on javascript side, choose JSON.
Benchmark JSON vs XML:
http://www.navioo.com/ajax/ajax_json_xml_Benchmarking.php
If you're ever going to use Javascript with it, definitely go with JSON. Both have a very nice structure.
It depends on how well Flex can parse JSON though, so I would look into that. How much data are you going to be passing back? Error/Success messages? User Profiles? What kind of data is this going to contain?
Is it going to need attributes on the tags? Or just a "structure". If it needs attributes and things like that, and you don't want to go too deep into an "array like" structure, go with XML.
If you're just going to have key => value, even multi-dimensional... go with JSON.
All depends on what kind of data you're going to be passing back and forth. That will make your decision for you :)
Download time:
JSON is faster.
Javascript Parse
JSON is faster
Actionscript Parse
XML is faster.
Advanced use within Actionscript
XML is better with all the E4X functionality.
JSON is limited with no knowledge of Vectors meaning you are limited to Arrays or will need to override the JSON Encoder in ascorelib with something such as
else if ( value is Vector.<*> ) {
// converts the vector to an array and
return arrayToString( vectorToArray( value ) );
} else if ( value is Object && value != null ) {
private function vectorToArray(__vector:Object):Array {
var __return : Array;
var __vList : Vector.<*>;
__return = new Array;
if ( !__vector || !(__vector is Vector.<*>) )
{
return __return;
}
for each ( var __obj:* in (__vector as Vector.<*>) )
{
__return.push(__obj);
}
return __return;
}
But I am afraid getting those values back into Vectors is not as nice. I had to make a whole utility class devoted to it.
So which one is all depending on how advanced your objects you are going to be moving.. More advanced, go with XML to make it easier ActionScript side.
Simple stuff go JSON
I have multiple calls to many RESTful services. I translate to PHP using native PHP json_decode when I receive the data, and use json_encode when sending data.
My concern is that with deeply nested data I end up writing code like:
$interestType = $person['children'][$i]['interests'][$j]['type'];
This can get quite messy. I feel there would be some benefit in creating objects whose methods/instance variables wrap around these structures, such that I could do:
$interestType = $person->getChild($i)->getInterest($j)->getType();
It seems clearer to me, but in reality it's not much more concise.
What are the benefits of just doing everything using native PHP arrays, and writing wrapper classes for each REST resource?
My concern is that I will have to write custom encode/decode functions to map to these wrappers.
I am not familiar with the implementation of objects in PHP, but reading this blogpost about array vs object performance, it seems the overhead is minimal. So I guess it boils down to style preferences. A simple (not-nested) array to object converter can be found here:
http://www.lost-in-code.com/programming/php-code/php-array-to-object/
A compromise, which would be trivial to implement:
<?php
$json = '{"a": [{"aa" : 11}, {"ab" : 12}],"b":2,"c":3,"d":4,"e":5}';
$o = json_decode($json); // plain object
$a = json_decode($json, true); // this will yield an array
echo $o->a[0]->aa;
?>
json_decode takes an optional argument, that determines, if the supplied JSON is converted to an associative array. If it isn't ($o), you have half the syntax, you aim for.
What's the equivalent function in PHP for C plus plus "set" ("Sets are a kind of associative containers that stores unique elements, and in which the elements themselves are the keys.")?
There isn't one, but they can be emulated.
Here is a achieve copy before the link died.. all the contents
A Set of Objects in PHP: Arrays vs. SplObjectStorage
One of my projects, QueryPath, performs many tasks that require maintaining a set of unique objects. In my quest to optimize QueryPath, I have been looking into various ways of efficiently storing sets of objects in a way that provides expedient containment checks. In other words, I want a data structure that keeps a list of unique objects, and can quickly tell me if some object is present in that list. The ability to loop through the contents of the list is also necessary.
Recently I narrowed the list of candidates down to two methods:
Use good old fashioned arrays to emulate a hash set.
Use the SPLObjectStorage system present in PHP 5.2 and up.
Before implementing anything directly in QueryPath, I first set out designing the two methods, and then ran some micro-benchmarks (with Crell's help) on the pair of methods. To say that the results were surprising is an understatement. The benchmarks will likely change the way I structure future code, both inside and outside of Drupal.
The Designs
Before presenting the benchmarks, I want to quickly explain the two designs that I settled on.
Arrays emulating a hash set
The first method I have been considering is using PHP's standard array() to emulate a set backed by a hash mapping (a "hash set"). A set is a data structure designed to keep a list of unique elements. In my case, I am interested in storing a unique set of DOM objects. A hash set is a set that is implemented using a hash table, where the key is a unique identifier for the stored value. While one would normally write a class to encapsulate this functionality, I decided to test the implementation as a bare array with no layers of indirection on top. In other words, what I am about to present are the internals of what would be a hash set implementation.
The Goal: Store a (unique) set of objects in a way that makes them (a) easy to iterate, and (b) cheap to check membership.
The Strategy: Create an associative array where the key is a hash ID and the value is the object.
With a reasonably good hashing function, the strategy outlined above should work as desired.
"Reasonably good hashing function" -- that was the first gotcha. How do you generate a good hashing function for an object like DOMDocument? One (bad) way would be to serialize the object and then, perhaps, take its MD5 hash. That, however, will not work on many objects -- specifically any object that cannot serialze. The DOMDocument, for example, is actually backed by a resource and cannot be serialized.
One needed look far for a such a function, though. It turns out that there is an object hashing function in PHP 5. It's called spl_object_hash(), and it can take any object (even one that is not native PHP) and generate a hashcode for it.
Using spl_object_hash() we can build a simple data structure that functions like a hash set. This structure looks something like this:
array(
$hashcode => $object
);
For example, we an generate an entry like so:
$object = new stdClass();
$hashcode = spl_object_hash($object);
$arr = array(
$hashcode => $object
);
In the example above, then, the hashcode string is an array key, and the object itself is the array value. Note that since the hashcode will be the same each time an object is re-hashed, it serves not only as a comparison point ("if object a's hashkey == object b's hashkey, then a == b"), it also functions as a uniqueness constraint. Only one object with the specified hashcode can exist per array, so there is no possibility of two copies (actually, two references) to the same object being placed in the array.
With a data structure like this, we have a host of readily available functions for manipulating the structure, since we have at our disposal all of the PHP array functions. So to some degree this is an attractive choice out of the box.
The most import task, in our context at least, is that of determining whether an entry exists inside of the set. There are two possible candidates for this check, and both require supplying the hashcode:
Check whether the key exists using array_key_exists().
Check whether the key is set using isset().
To cut to the chase, isset() is faster than array_key_exists(), and offers the same features in our context, so we will use that. (The fact that they handle null values differently makes no difference to us. No null values will ever be inserted into the set.)
With this in mind, then, we would perform a containment check using something like this:
$object = new stdClass();
$hashkey = spl_object_hash($object);
// ...
// Check whether $arr has the $object.
if (isset($arr[$hashkey])) {
// Do something...
}
Again, using an array that emulates a hash set allows us to use all of the existing array functions and also provides easy iterability. We can easily drop this into a foreach loop and iterate the contents. Before looking at how this performs, though, let's look at the other possible solution.
Using SplObjectStorage
The second method under consideration makes use of the new SplObjectStorage class from PHP 5.2+ (it might be in 5.1). This class, which is backed by a C implementation, provides a set-like storage mechanism for classes. It enforces uniqueness; only one of each object can be stored. It is also traversable, as it implements the Iterable interface. That means you can use it in loops such as foreach. (On the down side, the version in PHP 5.2 does not provide any method of random access or index-based access. The version in PHP 5.3 rectifies this shortcoming.)
The Goal: Store a (unique) set of objects in a way that makes them (a) easy to iterate, and (b) cheap to check membership.
The Strategy: Instantiate an object of class SplObjectStorage and store references to objects inside of this.
Creating a new SplObjectStorage is simple:
$objectStore = new SplObjectStorage();
An SplObjectStorage instance not only retains uniqueness information about objects, but objects are also stored in predictable order. SplObjectStorage is a FIFO -- First In, First Out.
Adding objects is done with the attach() method:
$objectStore = new SplObjectStorage();
$object = new stdClass();
$objectStore->attach($object);
It should be noted that attach will only attach an object once. If the same object is passed to attach() twice, the second attempt will simply be ignored. For this reason it is unnecessary to perform a contains() call before an attach() call. Doing so is redundant and costly.
Checking for the existence of an object within an SplObjectStorage instance is also straightforward:
$objectStore = new SplObjectStorage();
$object = new stdClass();
$objectStore->attach($object);
// ...
if ($objectStore->contains($object)) {
// Do something...
}
While SplObjectStorage has nowhere near the number of supporting methods that one has access to with arrays, it allows for iteration and somewhat limited access to the objects stored within. In many use cases (including the one I am investigating here), SplObjectStorage provides the requisite functionality.
Now that we have taken a look at the two candidate data structures, let's see how they perform.
The Comparisons
Anyone who has seen Larry (Crell) Garfield's micro-benchmarks for arrays and SPL ArrayAccess objects will likely come into this set of benchmarks with the same set of expectations Larry and I had. We expected PHP's arrays to blow the SplObjectStorage out of the water. After all, arrays are a primitive type in PHP, and have enjoyed years of optimizations. However, the documentation for the SplObjectStorage indicates that the search time for an SplObjectStorage object is O(1), which would certainly make it competitive if the base speed is similar to that of an array.
My testing environments are:
An iMac (current generation) with a 3.06 Ghz Intel Core 2 Duo and 2G of 800mhz DDR2 RAM. MAMP 1.72 (PHP 5.2.5) provides the AMP stack.
A MacBook Pro with a 2.4 Ghz Intel Core 2 Duo and 4G of 667mhz DDR2 RAM. MAMP 1.72 (PHP 5.2.5) provides the AMP stack.
In both cases, the performance tests averaged about the same. Benchmarks in this article come from the second system.
Our basic testing strategy was to build a simple test that captured information about three things:
The amount of time it takes to load the data structure
The amount of time it takes to seek the data structure
The amount of memory the data structure uses
We did our best to minimize the influence of other factors on the test. Here is our testing script:
<?php
/**
* Object hashing tests.
*/
$sos = new SplObjectStorage();
$docs = array();
$iterations = 100000;
for ($i = 0; $i < $iterations; ++$i) {
$doc = new DOMDocument();
//$doc = new stdClass();
$docs[] = $doc;
}
$start = $finis = 0;
$mem_empty = memory_get_usage();
// Load the SplObjectStorage
$start = microtime(TRUE);
foreach ($docs as $d) {
$sos->attach($d);
}
$finis = microtime(TRUE);
$time_to_fill = $finis - $start;
// Check membership on the object storage
$start = microtime(FALSE);
foreach ($docs as $d) {
$sos->contains($d);
}
$finis = microtime(FALSE);
$time_to_check = $finis - $start;
$mem_spl = memory_get_usage();
$mem_used = $mem_spl - $mem_empty;
printf("SplObjectStorage:\nTime to fill: %0.12f.\nTime to check: %0.12f.\nMemory: %d\n\n", $time_to_fill, $time_to_check, $mem_used);
unset($sos);
$mem_empty = memory_get_usage();
// Test arrays:
$start = microtime(TRUE);
$arr = array();
// Load the array
foreach ($docs as $d) {
$arr[spl_object_hash($d)] = $d;
}
$finis = microtime(TRUE);
$time_to_fill = $finis - $start;
// Check membership on the array
$start = microtime(FALSE);
foreach ($docs as $d) {
//$arr[spl_object_hash($d)];
isset($arr[spl_object_hash($d)]);
}
$finis = microtime(FALSE);
$time_to_check = $finis - $start;
$mem_arr = memory_get_usage();
$mem_used = $mem_arr - $mem_empty;
printf("Arrays:\nTime to fill: %0.12f.\nTime to check: %0.12f.\nMemory: %d\n\n", $time_to_fill, $time_to_check, $mem_used);
?>
The test above is broken into four separate tests. The first two test how well the SplObjectStorage method handles loading and containment checking. The second two perform the same test on our improvised array data structure.
There are two things worth noting about the test above.
First, the object of choice for our test was a DOMDocument. There are a few reasons for this. The obvious reason is that this test was done with the intent of optimizing QueryPath, which works with elements from the DOM implementation. There are two other interesting reasons, though. One is that DOMDocuments are not lightweight. The other is that DOMDocuments are backed by a C implementation, making them one of the more difficult cases when storing objects. (They cannot, for example, be conveniently serialized.)
That said, after observing the outcome, we repeated the test with basic stdClass objects and found the performance results to be nearly identical, and the memory usage to be proportional.
The second thing worth mention is that we used 100,000 iterations to test. This was about the upper bound that my PHP configuration allowed before running out of memory. Other than that, though, the number was chosen arbitrarily. When I ran tests with lower iteration counts, the SplObjectStorage definitely scaled linearly. Array performance was less predictable (larger standard deviation) with smaller data sets, though it seemed to average around the same for lower sizes as it does (more predictably) for larger sized arrays.
The Results
So how did these two strategies fare in our micro-benchmarks? Here is a representative sample of the output generated when running the above:
SplObjectStorage:
Time to fill: 0.085041999817.
Time to check: 0.073099000000.
Memory: 6124624
Arrays:
Time to fill: 0.193022966385.
Time to check: 0.153498000000.
Memory: 8524352
Averaging this over multiple runs, SplObjectStorage executed both fill and check functions twice as fast as the array method presented above. We tried various permutations of the tests above. Here, for example, are results for the same test with a stdClass object:
SplObjectStorage:
Time to fill: 0.082209110260.
Time to check: 0.070617000000.
Memory: 6124624
Arrays:
Time to fill: 0.189271926880.
Time to check: 0.152644000000.
Memory: 8524360
Not much different. Even adding arbitrary data to the object we stored does not make a difference in the time it takes for the SplObjectStorage (though it does seem to raise the time ever so slightly for the array).
Our conclusion is that SplObjectStorage is indeed a better solution for storing lots of objects in a set. Over the last week, I've ported QueryPath to SplObjectStorage (see the Quark branch at GitHub -- the existing Drupal QueryPath module can use this experimental branch without alteration), and will likely continue benchmarking. But preliminary results seem to provide a clear indication as to the best approach.
As a result of these findings, I'm much less inclined to default to arrays as "the best choice" simply because they are basic data types. If the SPL library contains features that out-perform arrays, they should be used when appropriate. From QueryPath to my Drupal modules, I expect that my code will be impacted by these findings.
Thanks to Crell for his help, and for Eddie at Frameweld for sparking my examination of these two methods in the first place.
In PHP you use arrays for that.
There is no built-in equivalent of std::set in PHP.
You can use arrays "like" sets, but it's up to you to enforce the rules.
Have a look at Set from Nspl. It supports basic set operations which take other sets, arrays and traversable objects as arguments. You can see examples here.