Serialize or json in PHP? - php

So I need to encode an array in PHP and store it in plain text in MySQL database, my question is should I use serialize() or json_encode()? What are the advantages and disadvantages of each of them?
I think either of them would do in this situation. But which one would you prefer and why? If it is for something other than an array?

Main advantage of serialize : it's specific to PHP, which means it can represent PHP types, including instances of your own classes -- and you'll get your objects back, still instances of your classes, when unserializing your data.
Main advantage of json_encode : JSON is not specific to PHP : there are libraries to read/write it in several languages -- which means it's better if you want something that can be manipulated with another language than PHP.
A JSON string is also easier to read/write/modify by hand than a serialized one.
On the other hand, as JSON is not specific to PHP, it's not aware of the stuff that's specific to PHP -- like data-types.
As a couple of sidenotes :
Even if there is a small difference in speed between those two, it shouldn't matter much : you will probably not serialize/unserialize a lot of data
Are you sure this is the best way to store data in a database ?
You won't be able to do much queries on serialized strins, in a DB : you will not be able to use your data in where clauses, nor update it without the intervention of PHP...

I did some analysis on Json Encoding vs Serialization in PHP. And I found that Json is best for plain and simple data like array.
See the results of my experiments at https://www.shozab.com/php-serialization-vs-json-encoding-for-an-array/

Another advantage of json_encode over serialize is the size. I noticed that as I was trying to figure out why our memcache used memory was getting so big, and was trying to find ways to reduce is:
<?php
$myarray = array();
$myarray["a"]="b";
$serialize=serialize($myarray);
$json=json_encode($myarray);
$serialize_size=strlen($serialize);
$json_size=strlen($json);
var_dump($serialize);
var_dump($json);
echo "Size of serialized array: $serialize_size\n";
echo "Size of json encoded array: $json_size\n";
echo "Serialize is " . round(($serialize_size-$json_size)/$serialize_size*100) . "% bigger\n";
Which gives you:
string(22) "a:1:{s:1:"a";s:1:"b";}"
string(9) "{"a":"b"}"
Size of serialized array: 22
Size of json encoded array: 9
Serialize is 59% bigger
Obviously I've taken the most extreme example, as the shorter the array, the more important the overhead with serialize (relative to the initial object size, due to formatting which imposes a minimum number of characters no matter how small the content). Still from a production website I see serialized array that are 20% bigger than their json equivalent.

Well firstly serializing an array or object and storing it in a database is typically a code smell. Sometimes people end up putting a comma separated list into a column and then get into all sorts of trouble when they later find out they need to query on it.
So think very carefully about that if this is that kind of situation.
As for the differences. PHP serialize is probably more compact but only usable with PHP. JSON is cross-platform and possibly slower to encode and decode (although I doubt meaningfully so).

If you data will never has to leave your PHP application, I recommend serialize() because it offers a lot of extra functionality like __sleep() and __wakeup() methods for your objects. It also restores objects as instances of the correct classes.
If you will pass the serialized data to another application, you should use JSON or XML for compatibility.
But storing a serialized objet into a database? Maybe you should think about that again. It can be real trouble later.

First, thanks to Shozab Hasan and user359650 for these tests. I was wondering which choice was the best and now i know:
To encode a simple array, JSON which is OK with both PHP AND javascript, maybe other languages.
To encode a PHP object, serialize is a better choice because of specificity of PHP Objects only instanciable with PHP.
To store datas, either store encoded datas in a file or use MySQL with standard format. It would be much easier to get your datas back. MySQL has great functions to get datas the way you'd like to get them without PHP treatment.
I've never made any test but i think that file storage is the best way to store your datas if system file sorting is enough to get back your files in alphabetical/numeral order.
MySQL is to greedy for this kind of treatment and uses file system too...

Related

What is data serialization?

First of all, I couldn't get clear definition of it from WikiPedia or even from serialize function in the PHP manual. I need to know some cases where we need the term serialization and how things are going without it? In other words, Where you need serialization and without it your code will be missing some important feature.
What is serialization?
Serialization encodes objects into another format.
For example you have an array in PHP like this:
$array = array("a" => 1, "b" => 2, "c" => array("a" => 1, "b" => 2));
And then you want to store it in file or send to other application.
There are several format choices, but the idea is the same:
The array has to be encoded (or you could say "translated"), into text or bytes, that can be written to a file or sent via the network.
For example, in PHP, if you:
$data = serialize($array);
you will get this:
a:3:{s:1:"a";i:1;s:1:"b";i:2;s:1:"c";a:2:{s:1:"a";i:1;s:1:"b";i:2;}}
This is PHP's particular serializing format that PHP understands, and it works vice versa, so you are able to use it to deserialize objects.
For example, you stored a serialized array in a file, and you want it back in your code as an array:
$array = unserialize($data);
But you could choose a different serialization format, for example, JSON:
$json = json_encode($array);
will give you this:
{"a":1,"b":2,"c":{"a":1,"b":2}}
The result is not only easily saved, read by human eye, or sent via network, but is also understandable by almost every other language (JavaScript, Java, C#, C++, ...)
Conclusion
Serialization translate objects to another format, in case you want to store or share data.
Are there any situations, where you cannot do anything, but serialize it?
No. But serialization usually makes things easier.
Are JSON and PHP format the only possible formats?
No, no, no and one more time no. There are plenty of formats.
XML (e.g. using a schema like WSDL or XHTML)
Bytes, Protobuf, etc.
Yaml
...
...
Your own formats (you can create your own format for serialization and use it, but that is a big thing to do and is not worth it, most of the time)
Serialization is the process of converting some in-memory object to another format that could be used to either store in a file or sent over the network. Deserialization is the inverse process meaning the actual object instance is restored from the given serialized representation of the object. This is very useful when communicating between various systems.
The serialization format could be either interoperable or non-interoperable. Interoperable formats (such as JSON, XML, ...) allow for serializing some object using a given platform and deserializing it using a different platform. For example with JSON you could use javascript to serialize the object and send it over the network to a PHP script that will deserialize the object and use it.
The serialize() PHP function uses an non-interoperable format. This means that only PHP could be used to both serialize and deserialize the object back.
You could use the json_encode and json_decode() functions in order to serialize/deserialize PHP objects using the JSON interoperable format.
Serialization is the process of turning data (e.g. variables) into a representation such as a string, that can easily be written and read back from for example a file or the database.
Use cases? There are many, but generally it revolves around the idea of taking a complex, nested array or object and turning it into a simple string that can be saved and read later to retrieve the same structure. For example, provided you have in php:
$blub = array();
$blub['a'] = 1;
$blub['a']['b'] = 4;
$blub['b'] = 27;
$blub['b']['b'] = 46;
Instead of going through every array member individually and writing it one could just:
$dataString = serialize($blub);
And the serialized array is ready to be written anywhere as a simple string, in such a way that retrieving this string again and doing unserialize() over it gets you the exact same array structure you had before. Yes, it's really that simple.
I need to know some cases we need the term serialization and how things are going without it?
Serialization can become handy if you need to store complete structures (like an invoice with all associated data like customer address, sender address, product positions, tax caclulcations etc) that are only valid at a certain point in time.
All these data will change in the future, new tax regulations might come, the address of a customer changes, products go out of life. But still the invoice needs to be valid and stored.
This is possible with serialization. Like a snapshot. The object in memory are serialized into a (often like in PHP) binary form that can be just stored. It can be brought back to live later on (and in a different context). Like with this invoice example: In ten years, the data can still be read and the invoice object is the same as it was ten years earlier.
In other word, Where you must need serialization and without it your code will be missing some important feature.
That was one example. It's not that you always needs that, but if things become more complex, serialization can be helpful.
Since you've tagged it with javascript, one kind of serialization could be form serialization.
Here are the references for the jQuery and prototype.JS equivalents.
What they basically do is serialize form input values into comma-separated name-value pairs.
So considering an actual usage..
$.ajax({
url : 'insert.php?a=10,b=15' //values serialized via .serialize()
type: 'GET'
});
And you would probably do $GET["a"] to retrieve those values, I'm not familiar with PHP though.

Determine data type from file_get_contents()

I'm writing a command line application in PHP that accepts a path to a local input file as an argument. The input file will contain one of the following things:
JSON encoded associative array
A serialized() version of the associative array
A base 64 encoded version of the serialized() associative array
Base 64 encoded JSON encoded associative array
A plain old PHP associative array
Rubbish
In short, there are several dissimilar programs that I have no control over that will be writing to this file, in a uniform way that I can understand, once I actually figure out the format. Once I figure out how to ingest the data, I can just run with it.
What I'm considering is:
If the first byte of the file is { , try json_decode(), see if it fails.
If the first byte of the file is < or $, try include(), see if it fails.
if the first three bytes of the file match a:[0-9], try unserialize().
If not the first three, try base64_decode(), see if it fails. If not:
Check the first bytes of the decoded data, again.
If all of that fails, it's rubbish.
That just seems quite expensive for quite a simple task. Could I be doing it in a better way? If so, how?
There isn't much to optimize here. The magic bytes approach is already the way to go. But of course the actual deserialization functions can be avoided. It's feasible to use a verification regex for each instead (which despite the meme are often faster than having PHP actually unpack a nested array).
base64 is easy enough to probe for.
json can be checked with a regex. Fastest way to check if a string is JSON in PHP? is the RFC version for securing it in JS. But it would be feasible to write a complete json (?R) match rule.
serialize is a bit more difficult without a proper unpack function. But with some heuristics you can already assert that it's a serialize blob.
php array scripts can be probed a bit faster with token_get_all. Or if the format and data is constrained enough, again with a regex.
The more important question here is, do you need reliability - or simplicity and speed?
For speed, you could use the file(1) utility and add "magic numbers" in /usr/share/file/magic. It should be faster than a pure PHP alternative.
You can try json_decode() and unserialize() which will return NULL if they fail, then base64_decode() and run that again. It's not fast, but it's infinitely less error prone than hand parsing them...
The issue here is that if you have no idea which it can be, you will need to develop a detection algorithm. Conventions should be set with an extension (check the extension, if it fails, tell whoever put the file there to place the correct extension on), otherwise you will need to check yourself. Most algorithms that detect what type a file actually is do use hereustics to determine it's contents (exe, jpg etc) because generally they have some sort of signature that identifies them. So if you have no idea what the content will be for definate, it's best to look for features that are specific to those contents. This does sometimes mean reading more than a couple of bytes.

Encoding large numbers with json_encode in php

I have a php script that outputs a json-encoded object with large numbers (greater than PHP_MAX_INT) so to store those numbers internally, I have to store them as strings. However, I need them to be shown as un-quoted numbers to the client.
I've thought of several solutions, many of which haven't worked. Most of the ideas revolve around writing my own JSON encoder, which I have done already, but don't want to take the time to change all the places I have json_encode to instead say my_json_encode.
Since I have no control over the server, I cannot turn remove the JSON library. I cannot undeclare json_encode, nor can I rename it. Is there any easy way to handle all this, or is the best option to just go through each and every file and rename all the method calls?
With javascript being loosely typed, why the need to control the type in the JSON data? What are you doing with this number in javascript, and would parseInt\parseFloat not be able to make the leap from string to number on the client side?
The only option I had was to use my own json_encode method renamed to my_json_encode, and then change everywhere that called that method.

How to serialize URL and make it still readable in PHP?

I need to serialize array which contains URLs:
Array(
'url1' => 'http://www.example.com',
'url2' => 'http://www.example1.com'
)
and store it in DB.
When I serialize it standard way, it doesn't work as it contains special chars. I found solution to encode it with base64_encode . Then it works but string is unreadable from me in DB manager program. Is there a way to make this work without base64_decode ?
It should always set off a red flag when you're trying to store serialized data in a relational database. Normalize your schema so you don't have to serialize.
Storing your data in a poor format so it is readable while in the DB is not a good idea. You want to store it in a format that is the most efficient for database system, then update your manager to unserialize it when you are ready yo display.
json_encode is popular these days, and helps make your data portable.
If you're using PHP 5+, try using JSON instead of the native PHP serializer. JSON is a lot more portable.
But your problem could be with automatic escaping of quotes. It would be helpful if you can show examples of your input & output to/from the DB.
There's no reason why serialize shouldn't work on this example, so it may be more to do with adequate escaping of inputs in your SQL query rather than an issue with the kind of serialisation you're doing. If you're using MySQL, try running the serialised data through mysql_real_escape_string() before you concatenate it into your SQL statement.
Separately, I tend to prefer json_encode() for serialisation of values to a DB field, because serialise tends to make serialised data that is very hard to read manually, and extremely difficult to edit.

PHP unserialize keeps throwing same error over 100 times

I have a large 2d array that I serialize and base64_encode and throw into a database. On a different page I pull the array out and when I base64_decode the serialized array I can echo it out and it definitely looks valid.
However, if I try to unserialize(base64_decode($serializedArray)) it just throws the same error to the point of nearly crashing Firefox.
The error is:
Warning: unserialize() [function.unserialize]: Node no longer exists in /var/www/dev/wc_paul/inc/analyzerTester.php on line 24
I would include the entire serialized array that I echo out but last time I tried that on this form it crashed my Firefox.
Does anyone have any idea why this might be happening?
Are you sure you're just serializing an array, and not an object (e.g. DOMNode?) Like resources, not all classes are going to be happy with being unserialized. As an example with the DOM (which your error suggests to me you're working with), every node has a reference to the parentNode, and if the parentNode doesn't exist at the moment a node is being unserialized, it's not able to recreate that reference and problems ensue.
I would suggest saving the dom tree as XML to the database and loading it back later.
Make sure that the database field is large enough to hold the serialized array. Serialized data is very space-inefficient in PHP, and many DBs (like MySQL) will silently truncate field values that are too long.
What type of elements are in your array? serialize/unserialize does not work with built-in PHP objects, and that is usually the cause of that error.
Also, based on your comment this isn't your problem, but to save space in your database don't base64 encode the data, just escape it. i.e. for mysql use mysql_real_escape_string.
Make sure you don't serialize resources, they can't be serialized.
Resources#php.net

Categories