Preferred method to store PHP arrays (json_encode vs serialize) - php
I need to store a multi-dimensional associative array of data in a flat file for caching purposes. I might occasionally come across the need to convert it to JSON for use in my web app but the vast majority of the time I will be using the array directly in PHP.
Would it be more efficient to store the array as JSON or as a PHP serialized array in this text file? I've looked around and it seems that in the newest versions of PHP (5.3), json_decode is actually faster than unserialize.
I'm currently leaning towards storing the array as JSON as I feel its easier to read by a human if necessary, it can be used in both PHP and JavaScript with very little effort, and from what I've read, it might even be faster to decode (not sure about encoding, though).
Does anyone know of any pitfalls? Anyone have good benchmarks to show the performance benefits of either method?
Depends on your priorities.
If performance is your absolute driving characteristic, then by all means use the fastest one. Just make sure you have a full understanding of the differences before you make a choice
Unlike serialize() you need to add extra parameter to keep UTF-8 characters untouched: json_encode($array, JSON_UNESCAPED_UNICODE) (otherwise it converts UTF-8 characters to Unicode escape sequences).
JSON will have no memory of what the object's original class was (they are always restored as instances of stdClass).
You can't leverage __sleep() and __wakeup() with JSON
By default, only public properties are serialized with JSON. (in PHP>=5.4 you can implement JsonSerializable to change this behavior).
JSON is more portable
And there's probably a few other differences I can't think of at the moment.
A simple speed test to compare the two
<?php
ini_set('display_errors', 1);
error_reporting(E_ALL);
// Make a big, honkin test array
// You may need to adjust this depth to avoid memory limit errors
$testArray = fillArray(0, 5);
// Time json encoding
$start = microtime(true);
json_encode($testArray);
$jsonTime = microtime(true) - $start;
echo "JSON encoded in $jsonTime seconds\n";
// Time serialization
$start = microtime(true);
serialize($testArray);
$serializeTime = microtime(true) - $start;
echo "PHP serialized in $serializeTime seconds\n";
// Compare them
if ($jsonTime < $serializeTime) {
printf("json_encode() was roughly %01.2f%% faster than serialize()\n", ($serializeTime / $jsonTime - 1) * 100);
}
else if ($serializeTime < $jsonTime ) {
printf("serialize() was roughly %01.2f%% faster than json_encode()\n", ($jsonTime / $serializeTime - 1) * 100);
} else {
echo "Impossible!\n";
}
function fillArray( $depth, $max ) {
static $seed;
if (is_null($seed)) {
$seed = array('a', 2, 'c', 4, 'e', 6, 'g', 8, 'i', 10);
}
if ($depth < $max) {
$node = array();
foreach ($seed as $key) {
$node[$key] = fillArray($depth + 1, $max);
}
return $node;
}
return 'empty';
}
JSON is simpler and faster than PHP's serialization format and should be used unless:
You're storing deeply nested arrays:
json_decode(): "This function will return false if the JSON encoded data is deeper than 127 elements."
You're storing objects that need to be unserialized as the correct class
You're interacting with old PHP versions that don't support json_decode
I've written a blogpost about this subject: "Cache a large array: JSON, serialize or var_export?". In this post it is shown that serialize is the best choice for small to large sized arrays. For very large arrays (> 70MB) JSON is the better choice.
You might also be interested in https://github.com/phadej/igbinary - which provides a different serialization 'engine' for PHP.
My random/arbitrary 'performance' figures, using PHP 5.3.5 on a 64bit platform show :
JSON :
JSON encoded in 2.180496931076 seconds
JSON decoded in 9.8368630409241 seconds
serialized "String" size : 13993
Native PHP :
PHP serialized in 2.9125759601593 seconds
PHP unserialized in 6.4348418712616 seconds
serialized "String" size : 20769
Igbinary :
WIN igbinary serialized in 1.6099879741669 seconds
WIN igbinrary unserialized in 4.7737920284271 seconds
WIN serialized "String" Size : 4467
So, it's quicker to igbinary_serialize() and igbinary_unserialize() and uses less disk space.
I used the fillArray(0, 3) code as above, but made the array keys longer strings.
igbinary can store the same data types as PHP's native serialize can (So no problem with objects etc) and you can tell PHP5.3 to use it for session handling if you so wish.
See also http://ilia.ws/files/zendcon_2010_hidden_features.pdf - specifically slides 14/15/16
Y just tested serialized and json encode and decode, plus the size it will take the string stored.
JSON encoded in 0.067085981369 seconds. Size (1277772)
PHP serialized in 0.12110209465 seconds. Size (1955548)
JSON decode in 0.22470498085 seconds
PHP serialized in 0.211947917938 seconds
json_encode() was roughly 80.52% faster than serialize()
unserialize() was roughly 6.02% faster than json_decode()
JSON string was roughly 53.04% smaller than Serialized string
We can conclude that JSON encodes faster and results a smaller string, but unserialize is faster to decode the string.
If you are caching information that you will ultimately want to "include" at a later point in time, you may want to try using var_export. That way you only take the hit in the "serialize" and not in the "unserialize".
I augmented the test to include unserialization performance. Here are the numbers I got.
Serialize
JSON encoded in 2.5738489627838 seconds
PHP serialized in 5.2861361503601 seconds
Serialize: json_encode() was roughly 105.38% faster than serialize()
Unserialize
JSON decode in 10.915472984314 seconds
PHP unserialized in 7.6223039627075 seconds
Unserialize: unserialize() was roughly 43.20% faster than json_decode()
So json seems to be faster for encoding but slow in decoding. So it could depend upon your application and what you expect to do the most.
Really nice topic and after reading the few answers, I want to share my experiments on the subject.
I got a use case where some "huge" table needs to be queried almost every time I talk to the database (don't ask why, just a fact). The database caching system isn't appropriate as it'll not cache the different requests, so I though about php caching systems.
I tried apcu but it didn't fit the needs, memory isn't enough reliable in this case. Next step was to cache into a file with serialization.
Table has 14355 entries with 18 columns, those are my tests and stats on reading the serialized cache:
JSON:
As you all said, the major inconvenience with json_encode/json_decode is that it transforms everything to an StdClass instance (or Object). If you need to loop it, transforming it to an array is what you'll probably do, and yes it's increasing the transformation time
average time: 780.2 ms; memory use: 41.5MB; cache file size: 3.8MB
Msgpack
#hutch mentions msgpack. Pretty website. Let's give it a try shall we?
average time: 497 ms; memory use: 32MB; cache file size: 2.8MB
That's better, but requires a new extension; compiling sometimes afraid people...
IgBinary
#GingerDog mentions igbinary. Note that I've set the igbinary.compact_strings=Offbecause I care more about reading performances than file size.
average time: 411.4 ms; memory use: 36.75MB; cache file size: 3.3MB
Better than msg pack. Still, this one requires compiling too.
serialize/unserialize
average time: 477.2 ms; memory use: 36.25MB; cache file size: 5.9MB
Better performances than JSON, the bigger the array is, slower json_decode is, but you already new that.
Those external extensions are narrowing down the file size and seems great on paper. Numbers don't lie*. What's the point of compiling an extension if you get almost the same results that you'd have with a standard PHP function?
We can also deduce that depending on your needs, you will choose something different than someone else:
IgBinary is really nice and performs better than MsgPack
Msgpack is better at compressing your datas (note that I didn't tried the igbinary
compact.string option).
Don't want to compile? Use standards.
That's it, another serialization methods comparison to help you choose the one!
*Tested with PHPUnit 3.7.31, php 5.5.10 - only decoding with a standard hardrive and old dual core CPU - average numbers on 10 same use case tests, your stats might be different
I know this is late but the answers are pretty old, I thought my benchmarks might help as I have just tested in PHP 7.4
Serialize/Unserialize is much faster than JSON, takes less memory and space, and wins outright in PHP 7.4 but I am not sure my test is the most efficient or the best,
I have basically created a PHP file which returns an array which I encoded, serialised, then decoded and unserialised.
$array = include __DIR__.'/../tests/data/dao/testfiles/testArray.php';
//JSON ENCODE
$json_encode_memory_start = memory_get_usage();
$json_encode_time_start = microtime(true);
for ($i=0; $i < 20000; $i++) {
$encoded = json_encode($array);
}
$json_encode_time_end = microtime(true);
$json_encode_memory_end = memory_get_usage();
$json_encode_time = $json_encode_time_end - $json_encode_time_start;
$json_encode_memory =
$json_encode_memory_end - $json_encode_memory_start;
//SERIALIZE
$serialize_memory_start = memory_get_usage();
$serialize_time_start = microtime(true);
for ($i=0; $i < 20000; $i++) {
$serialized = serialize($array);
}
$serialize_time_end = microtime(true);
$serialize_memory_end = memory_get_usage();
$serialize_time = $serialize_time_end - $serialize_time_start;
$serialize_memory = $serialize_memory_end - $serialize_memory_start;
//Write to file time:
$fpc_memory_start = memory_get_usage();
$fpc_time_start = microtime(true);
for ($i=0; $i < 20000; $i++) {
$fpc_bytes =
file_put_contents(
__DIR__.'/../tests/data/dao/testOneBigFile',
'<?php return '.var_export($array,true).' ?>;'
);
}
$fpc_time_end = microtime(true);
$fpc_memory_end = memory_get_usage();
$fpc_time = $fpc_time_end - $fpc_time_start;
$fpc_memory = $fpc_memory_end - $fpc_memory_start;
//JSON DECODE
$json_decode_memory_start = memory_get_usage();
$json_decode_time_start = microtime(true);
for ($i=0; $i < 20000; $i++) {
$decoded = json_encode($encoded);
}
$json_decode_time_end = microtime(true);
$json_decode_memory_end = memory_get_usage();
$json_decode_time = $json_decode_time_end - $json_decode_time_start;
$json_decode_memory =
$json_decode_memory_end - $json_decode_memory_start;
//UNSERIALIZE
$unserialize_memory_start = memory_get_usage();
$unserialize_time_start = microtime(true);
for ($i=0; $i < 20000; $i++) {
$unserialized = unserialize($serialized);
}
$unserialize_time_end = microtime(true);
$unserialize_memory_end = memory_get_usage();
$unserialize_time = $unserialize_time_end - $unserialize_time_start;
$unserialize_memory =
$unserialize_memory_end - $unserialize_memory_start;
//GET FROM VAR EXPORT:
$var_export_memory_start = memory_get_usage();
$var_export_time_start = microtime(true);
for ($i=0; $i < 20000; $i++) {
$array = include __DIR__.'/../tests/data/dao/testOneBigFile';
}
$var_export_time_end = microtime(true);
$var_export_memory_end = memory_get_usage();
$var_export_time = $var_export_time_end - $var_export_time_start;
$var_export_memory = $var_export_memory_end - $var_export_memory_start;
Results:
Var Export length: 11447
Serialized length: 11541
Json encoded length: 11895
file put contents Bytes: 11464
Json Encode Time: 1.9197590351105
Serialize Time: 0.160325050354
FPC Time: 6.2793469429016
Json Encode Memory: 12288
Serialize Memory: 12288
FPC Memory: 0
JSON Decoded time: 1.7493588924408
UnSerialize Time: 0.19309520721436
Var Export and Include: 3.1974139213562
JSON Decoded memory: 16384
UnSerialize Memory: 14360
Var Export and Include: 192
Seems like serialize is the one I'm going to use for 2 reasons:
Someone pointed out that unserialize is faster than json_decode and a 'read' case sounds more probable than a 'write' case.
I've had trouble with json_encode when having strings with invalid UTF-8 characters. When that happens the string ends up being empty causing loss of information.
I've tested this very thoroughly on a fairly complex, mildly nested multi-hash with all kinds of data in it (string, NULL, integers), and serialize/unserialize ended up much faster than json_encode/json_decode.
The only advantage json have in my tests was it's smaller 'packed' size.
These are done under PHP 5.3.3, let me know if you want more details.
Here are tests results then the code to produce them. I can't provide the test data since it'd reveal information that I can't let go out in the wild.
JSON encoded in 2.23700618744 seconds
PHP serialized in 1.3434419632 seconds
JSON decoded in 4.0405561924 seconds
PHP unserialized in 1.39393305779 seconds
serialized size : 14549
json_encode size : 11520
serialize() was roughly 66.51% faster than json_encode()
unserialize() was roughly 189.87% faster than json_decode()
json_encode() string was roughly 26.29% smaller than serialize()
// Time json encoding
$start = microtime( true );
for($i = 0; $i < 10000; $i++) {
json_encode( $test );
}
$jsonTime = microtime( true ) - $start;
echo "JSON encoded in $jsonTime seconds<br>";
// Time serialization
$start = microtime( true );
for($i = 0; $i < 10000; $i++) {
serialize( $test );
}
$serializeTime = microtime( true ) - $start;
echo "PHP serialized in $serializeTime seconds<br>";
// Time json decoding
$test2 = json_encode( $test );
$start = microtime( true );
for($i = 0; $i < 10000; $i++) {
json_decode( $test2 );
}
$jsonDecodeTime = microtime( true ) - $start;
echo "JSON decoded in $jsonDecodeTime seconds<br>";
// Time deserialization
$test2 = serialize( $test );
$start = microtime( true );
for($i = 0; $i < 10000; $i++) {
unserialize( $test2 );
}
$unserializeTime = microtime( true ) - $start;
echo "PHP unserialized in $unserializeTime seconds<br>";
$jsonSize = strlen(json_encode( $test ));
$phpSize = strlen(serialize( $test ));
echo "<p>serialized size : " . strlen(serialize( $test )) . "<br>";
echo "json_encode size : " . strlen(json_encode( $test )) . "<br></p>";
// Compare them
if ( $jsonTime < $serializeTime )
{
echo "json_encode() was roughly " . number_format( ($serializeTime / $jsonTime - 1 ) * 100, 2 ) . "% faster than serialize()";
}
else if ( $serializeTime < $jsonTime )
{
echo "serialize() was roughly " . number_format( ($jsonTime / $serializeTime - 1 ) * 100, 2 ) . "% faster than json_encode()";
} else {
echo 'Unpossible!';
}
echo '<BR>';
// Compare them
if ( $jsonDecodeTime < $unserializeTime )
{
echo "json_decode() was roughly " . number_format( ($unserializeTime / $jsonDecodeTime - 1 ) * 100, 2 ) . "% faster than unserialize()";
}
else if ( $unserializeTime < $jsonDecodeTime )
{
echo "unserialize() was roughly " . number_format( ($jsonDecodeTime / $unserializeTime - 1 ) * 100, 2 ) . "% faster than json_decode()";
} else {
echo 'Unpossible!';
}
echo '<BR>';
// Compare them
if ( $jsonSize < $phpSize )
{
echo "json_encode() string was roughly " . number_format( ($phpSize / $jsonSize - 1 ) * 100, 2 ) . "% smaller than serialize()";
}
else if ( $phpSize < $jsonSize )
{
echo "serialize() string was roughly " . number_format( ($jsonSize / $phpSize - 1 ) * 100, 2 ) . "% smaller than json_encode()";
} else {
echo 'Unpossible!';
}
I made a small benchmark as well. My results were the same. But I need the decode performance. Where I noticed, like a few people above said as well, unserialize is faster than json_decode. unserialize takes roughly 60-70% of the json_decode time. So the conclusion is fairly simple:
When you need performance in encoding, use json_encode, when you need performance when decoding, use unserialize. Because you can not merge the two functions you have to make a choise where you need more performance.
My benchmark in pseudo:
Define array $arr with a few random keys and values
for x < 100; x++; serialize and json_encode a array_rand of $arr
for y < 1000; y++; json_decode the json encoded string - calc time
for y < 1000; y++; unserialize the serialized string - calc time
echo the result which was faster
On avarage: unserialize won 96 times over 4 times the json_decode. With an avarage of roughly 1.5ms over 2.5ms.
Check out the results here (sorry for the hack putting the PHP code in the JS code box):
http://jsfiddle.net/newms87/h3b0a0ha/embedded/result/
RESULTS: serialize() and unserialize() are both significantly faster in PHP 5.4 on arrays of varying size.
I made a test script on real world data for comparing json_encode vs serialize and json_decode vs unserialize. The test was run on the caching system of an in production e-commerce site. It simply takes the data already in the cache, and tests the times to encode / decode (or serialize / unserialize) all the data and I put it in an easy to see table.
I ran this on PHP 5.4 shared hosting server.
The results were very conclusive that for these large to small data sets serialize and unserialize were the clear winners. In particular for my use case, the json_decode and unserialize are the most important for the caching system. Unserialize was almost an ubiquitous winner here. It was typically 2 to 4 times (sometimes 6 or 7 times) as fast as json_decode.
It is interesting to note the difference in results from #peter-bailey.
Here is the PHP code used to generate the results:
<?php
ini_set('display_errors', 1);
error_reporting(E_ALL);
function _count_depth($array)
{
$count = 0;
$max_depth = 0;
foreach ($array as $a) {
if (is_array($a)) {
list($cnt, $depth) = _count_depth($a);
$count += $cnt;
$max_depth = max($max_depth, $depth);
} else {
$count++;
}
}
return array(
$count,
$max_depth + 1,
);
}
function run_test($file)
{
$memory = memory_get_usage();
$test_array = unserialize(file_get_contents($file));
$memory = round((memory_get_usage() - $memory) / 1024, 2);
if (empty($test_array) || !is_array($test_array)) {
return;
}
list($count, $depth) = _count_depth($test_array);
//JSON encode test
$start = microtime(true);
$json_encoded = json_encode($test_array);
$json_encode_time = microtime(true) - $start;
//JSON decode test
$start = microtime(true);
json_decode($json_encoded);
$json_decode_time = microtime(true) - $start;
//serialize test
$start = microtime(true);
$serialized = serialize($test_array);
$serialize_time = microtime(true) - $start;
//unserialize test
$start = microtime(true);
unserialize($serialized);
$unserialize_time = microtime(true) - $start;
return array(
'Name' => basename($file),
'json_encode() Time (s)' => $json_encode_time,
'json_decode() Time (s)' => $json_decode_time,
'serialize() Time (s)' => $serialize_time,
'unserialize() Time (s)' => $unserialize_time,
'Elements' => $count,
'Memory (KB)' => $memory,
'Max Depth' => $depth,
'json_encode() Win' => ($json_encode_time > 0 && $json_encode_time < $serialize_time) ? number_format(($serialize_time / $json_encode_time - 1) * 100, 2) : '',
'serialize() Win' => ($serialize_time > 0 && $serialize_time < $json_encode_time) ? number_format(($json_encode_time / $serialize_time - 1) * 100, 2) : '',
'json_decode() Win' => ($json_decode_time > 0 && $json_decode_time < $serialize_time) ? number_format(($serialize_time / $json_decode_time - 1) * 100, 2) : '',
'unserialize() Win' => ($unserialize_time > 0 && $unserialize_time < $json_decode_time) ? number_format(($json_decode_time / $unserialize_time - 1) * 100, 2) : '',
);
}
$files = glob(dirname(__FILE__) . '/system/cache/*');
$data = array();
foreach ($files as $file) {
if (is_file($file)) {
$result = run_test($file);
if ($result) {
$data[] = $result;
}
}
}
uasort($data, function ($a, $b) {
return $a['Memory (KB)'] < $b['Memory (KB)'];
});
$fields = array_keys($data[0]);
?>
<table>
<thead>
<tr>
<?php foreach ($fields as $f) { ?>
<td style="text-align: center; border:1px solid black;padding: 4px 8px;font-weight:bold;font-size:1.1em"><?= $f; ?></td>
<?php } ?>
</tr>
</thead>
<tbody>
<?php foreach ($data as $d) { ?>
<tr>
<?php foreach ($d as $key => $value) { ?>
<?php $is_win = strpos($key, 'Win'); ?>
<?php $color = ($is_win && $value) ? 'color: green;font-weight:bold;' : ''; ?>
<td style="text-align: center; vertical-align: middle; padding: 3px 6px; border: 1px solid gray; <?= $color; ?>"><?= $value . (($is_win && $value) ? '%' : ''); ?></td>
<?php } ?>
</tr>
<?php } ?>
</tbody>
</table>
First, I changed the script to do some more benchmarking (and also do 1000 runs instead of just 1):
<?php
ini_set('display_errors', 1);
error_reporting(E_ALL);
// Make a big, honkin test array
// You may need to adjust this depth to avoid memory limit errors
$testArray = fillArray(0, 5);
$totalJsonTime = 0;
$totalSerializeTime = 0;
$totalJsonWins = 0;
for ($i = 0; $i < 1000; $i++) {
// Time json encoding
$start = microtime(true);
$json = json_encode($testArray);
$jsonTime = microtime(true) - $start;
$totalJsonTime += $jsonTime;
// Time serialization
$start = microtime(true);
$serial = serialize($testArray);
$serializeTime = microtime(true) - $start;
$totalSerializeTime += $serializeTime;
if ($jsonTime < $serializeTime) {
$totalJsonWins++;
}
}
$totalSerializeWins = 1000 - $totalJsonWins;
// Compare them
if ($totalJsonTime < $totalSerializeTime) {
printf("json_encode() (wins: $totalJsonWins) was roughly %01.2f%% faster than serialize()\n", ($totalSerializeTime / $totalJsonTime - 1) * 100);
} else {
printf("serialize() (wins: $totalSerializeWins) was roughly %01.2f%% faster than json_encode()\n", ($totalJsonTime / $totalSerializeTime - 1) * 100);
}
$totalJsonTime = 0;
$totalJson2Time = 0;
$totalSerializeTime = 0;
$totalJsonWins = 0;
for ($i = 0; $i < 1000; $i++) {
// Time json decoding
$start = microtime(true);
$orig = json_decode($json, true);
$jsonTime = microtime(true) - $start;
$totalJsonTime += $jsonTime;
$start = microtime(true);
$origObj = json_decode($json);
$jsonTime2 = microtime(true) - $start;
$totalJson2Time += $jsonTime2;
// Time serialization
$start = microtime(true);
$unserial = unserialize($serial);
$serializeTime = microtime(true) - $start;
$totalSerializeTime += $serializeTime;
if ($jsonTime < $serializeTime) {
$totalJsonWins++;
}
}
$totalSerializeWins = 1000 - $totalJsonWins;
// Compare them
if ($totalJsonTime < $totalSerializeTime) {
printf("json_decode() was roughly %01.2f%% faster than unserialize()\n", ($totalSerializeTime / $totalJsonTime - 1) * 100);
} else {
printf("unserialize() (wins: $totalSerializeWins) was roughly %01.2f%% faster than json_decode()\n", ($totalJsonTime / $totalSerializeTime - 1) * 100);
}
// Compare them
if ($totalJson2Time < $totalSerializeTime) {
printf("json_decode() was roughly %01.2f%% faster than unserialize()\n", ($totalSerializeTime / $totalJson2Time - 1) * 100);
} else {
printf("unserialize() (wins: $totalSerializeWins) was roughly %01.2f%% faster than array json_decode()\n", ($totalJson2Time / $totalSerializeTime - 1) * 100);
}
function fillArray( $depth, $max ) {
static $seed;
if (is_null($seed)) {
$seed = array('a', 2, 'c', 4, 'e', 6, 'g', 8, 'i', 10);
}
if ($depth < $max) {
$node = array();
foreach ($seed as $key) {
$node[$key] = fillArray($depth + 1, $max);
}
return $node;
}
return 'empty';
}
I used this build of PHP 7:
PHP 7.0.14 (cli) (built: Jan 18 2017 19:13:23) ( NTS ) Copyright (c)
1997-2016 The PHP Group Zend Engine v3.0.0, Copyright (c) 1998-2016
Zend Technologies
with Zend OPcache v7.0.14, Copyright (c) 1999-2016, by Zend Technologies
And my results were:
serialize() (wins: 999) was roughly 10.98% faster than json_encode()
unserialize() (wins: 987) was roughly 33.26% faster than json_decode()
unserialize() (wins: 987) was roughly 48.35% faster than array
json_decode()
So clearly, serialize/unserialize is the fastest method, while json_encode/decode is the most portable.
If you consider a scenario where you read/write serialized data 10x or more often than you need to send to or receive from a non-PHP system, you are STILL better off to use serialize/unserialize and have it json_encode or json_decode prior to serialization in terms of time.
Before you make your final decision, be aware that the JSON format is not safe for associative arrays - json_decode() will return them as objects instead:
$config = array(
'Frodo' => 'hobbit',
'Gimli' => 'dwarf',
'Gandalf' => 'wizard',
);
print_r($config);
print_r(json_decode(json_encode($config)));
Output is:
Array
(
[Frodo] => hobbit
[Gimli] => dwarf
[Gandalf] => wizard
)
stdClass Object
(
[Frodo] => hobbit
[Gimli] => dwarf
[Gandalf] => wizard
)
just an fyi -- if you want to serialize your data to something easy to read and understand like JSON but with more compression and higher performance, you should check out messagepack.
JSON is better if you want to backup Data and restore it on a different machine or via FTP.
For example with serialize if you store data on a Windows server, download it via FTP and restore it on a Linux one it could not work any more due to the charachter re-encoding, because serialize stores the length of the strings and in the Unicode > UTF-8 transcoding some 1 byte charachter could became 2 bytes long making the algorithm crash.
THX - for this benchmark code:
My results on array I use for configuration are as fallows:
JSON encoded in 0.0031511783599854 seconds
PHP serialized in 0.0037961006164551 seconds
json_encode() was roughly 20.47% faster than serialize()
JSON encoded in 0.0070841312408447 seconds
PHP serialized in 0.0035839080810547 seconds
unserialize() was roughly 97.66% faster than json_encode()
So - test it on your own data.
If to summ up what people say here, json_decode/encode seems faster than serialize/unserialize BUT
If you do var_dump the type of the serialized object is changed.
If for some reason you want to keep the type, go with serialize!
(try for example stdClass vs array)
serialize/unserialize:
Array cache:
array (size=2)
'a' => string '1' (length=1)
'b' => int 2
Object cache:
object(stdClass)[8]
public 'field1' => int 123
This cache:
object(Controller\Test)[8]
protected 'view' =>
json encode/decode
Array cache:
object(stdClass)[7]
public 'a' => string '1' (length=1)
public 'b' => int 2
Object cache:
object(stdClass)[8]
public 'field1' => int 123
This cache:
object(stdClass)[8]
As you can see the json_encode/decode converts all to stdClass, which is not that good, object info lost... So decide based on needs, especially if it is not only arrays...
I would suggest you to use Super Cache, which is a file cache mechanism which won't use json_encode or serialize. It is simple to use and really fast compared to other PHP Cache mechanism.
https://packagist.org/packages/smart-php/super-cache
Ex:
<?php
require __DIR__.'/vendor/autoload.php';
use SuperCache\SuperCache as sCache;
//Saving cache value with a key
// sCache::cache('<key>')->set('<value>');
sCache::cache('myKey')->set('Key_value');
//Retrieving cache value with a key
echo sCache::cache('myKey')->get();
?>
Related
Laravel - Insert array of objects to the database [duplicate]
I need to store a multi-dimensional associative array of data in a flat file for caching purposes. I might occasionally come across the need to convert it to JSON for use in my web app but the vast majority of the time I will be using the array directly in PHP. Would it be more efficient to store the array as JSON or as a PHP serialized array in this text file? I've looked around and it seems that in the newest versions of PHP (5.3), json_decode is actually faster than unserialize. I'm currently leaning towards storing the array as JSON as I feel its easier to read by a human if necessary, it can be used in both PHP and JavaScript with very little effort, and from what I've read, it might even be faster to decode (not sure about encoding, though). Does anyone know of any pitfalls? Anyone have good benchmarks to show the performance benefits of either method?
Depends on your priorities. If performance is your absolute driving characteristic, then by all means use the fastest one. Just make sure you have a full understanding of the differences before you make a choice Unlike serialize() you need to add extra parameter to keep UTF-8 characters untouched: json_encode($array, JSON_UNESCAPED_UNICODE) (otherwise it converts UTF-8 characters to Unicode escape sequences). JSON will have no memory of what the object's original class was (they are always restored as instances of stdClass). You can't leverage __sleep() and __wakeup() with JSON By default, only public properties are serialized with JSON. (in PHP>=5.4 you can implement JsonSerializable to change this behavior). JSON is more portable And there's probably a few other differences I can't think of at the moment. A simple speed test to compare the two <?php ini_set('display_errors', 1); error_reporting(E_ALL); // Make a big, honkin test array // You may need to adjust this depth to avoid memory limit errors $testArray = fillArray(0, 5); // Time json encoding $start = microtime(true); json_encode($testArray); $jsonTime = microtime(true) - $start; echo "JSON encoded in $jsonTime seconds\n"; // Time serialization $start = microtime(true); serialize($testArray); $serializeTime = microtime(true) - $start; echo "PHP serialized in $serializeTime seconds\n"; // Compare them if ($jsonTime < $serializeTime) { printf("json_encode() was roughly %01.2f%% faster than serialize()\n", ($serializeTime / $jsonTime - 1) * 100); } else if ($serializeTime < $jsonTime ) { printf("serialize() was roughly %01.2f%% faster than json_encode()\n", ($jsonTime / $serializeTime - 1) * 100); } else { echo "Impossible!\n"; } function fillArray( $depth, $max ) { static $seed; if (is_null($seed)) { $seed = array('a', 2, 'c', 4, 'e', 6, 'g', 8, 'i', 10); } if ($depth < $max) { $node = array(); foreach ($seed as $key) { $node[$key] = fillArray($depth + 1, $max); } return $node; } return 'empty'; }
JSON is simpler and faster than PHP's serialization format and should be used unless: You're storing deeply nested arrays: json_decode(): "This function will return false if the JSON encoded data is deeper than 127 elements." You're storing objects that need to be unserialized as the correct class You're interacting with old PHP versions that don't support json_decode
I've written a blogpost about this subject: "Cache a large array: JSON, serialize or var_export?". In this post it is shown that serialize is the best choice for small to large sized arrays. For very large arrays (> 70MB) JSON is the better choice.
You might also be interested in https://github.com/phadej/igbinary - which provides a different serialization 'engine' for PHP. My random/arbitrary 'performance' figures, using PHP 5.3.5 on a 64bit platform show : JSON : JSON encoded in 2.180496931076 seconds JSON decoded in 9.8368630409241 seconds serialized "String" size : 13993 Native PHP : PHP serialized in 2.9125759601593 seconds PHP unserialized in 6.4348418712616 seconds serialized "String" size : 20769 Igbinary : WIN igbinary serialized in 1.6099879741669 seconds WIN igbinrary unserialized in 4.7737920284271 seconds WIN serialized "String" Size : 4467 So, it's quicker to igbinary_serialize() and igbinary_unserialize() and uses less disk space. I used the fillArray(0, 3) code as above, but made the array keys longer strings. igbinary can store the same data types as PHP's native serialize can (So no problem with objects etc) and you can tell PHP5.3 to use it for session handling if you so wish. See also http://ilia.ws/files/zendcon_2010_hidden_features.pdf - specifically slides 14/15/16
Y just tested serialized and json encode and decode, plus the size it will take the string stored. JSON encoded in 0.067085981369 seconds. Size (1277772) PHP serialized in 0.12110209465 seconds. Size (1955548) JSON decode in 0.22470498085 seconds PHP serialized in 0.211947917938 seconds json_encode() was roughly 80.52% faster than serialize() unserialize() was roughly 6.02% faster than json_decode() JSON string was roughly 53.04% smaller than Serialized string We can conclude that JSON encodes faster and results a smaller string, but unserialize is faster to decode the string.
If you are caching information that you will ultimately want to "include" at a later point in time, you may want to try using var_export. That way you only take the hit in the "serialize" and not in the "unserialize".
I augmented the test to include unserialization performance. Here are the numbers I got. Serialize JSON encoded in 2.5738489627838 seconds PHP serialized in 5.2861361503601 seconds Serialize: json_encode() was roughly 105.38% faster than serialize() Unserialize JSON decode in 10.915472984314 seconds PHP unserialized in 7.6223039627075 seconds Unserialize: unserialize() was roughly 43.20% faster than json_decode() So json seems to be faster for encoding but slow in decoding. So it could depend upon your application and what you expect to do the most.
Really nice topic and after reading the few answers, I want to share my experiments on the subject. I got a use case where some "huge" table needs to be queried almost every time I talk to the database (don't ask why, just a fact). The database caching system isn't appropriate as it'll not cache the different requests, so I though about php caching systems. I tried apcu but it didn't fit the needs, memory isn't enough reliable in this case. Next step was to cache into a file with serialization. Table has 14355 entries with 18 columns, those are my tests and stats on reading the serialized cache: JSON: As you all said, the major inconvenience with json_encode/json_decode is that it transforms everything to an StdClass instance (or Object). If you need to loop it, transforming it to an array is what you'll probably do, and yes it's increasing the transformation time average time: 780.2 ms; memory use: 41.5MB; cache file size: 3.8MB Msgpack #hutch mentions msgpack. Pretty website. Let's give it a try shall we? average time: 497 ms; memory use: 32MB; cache file size: 2.8MB That's better, but requires a new extension; compiling sometimes afraid people... IgBinary #GingerDog mentions igbinary. Note that I've set the igbinary.compact_strings=Offbecause I care more about reading performances than file size. average time: 411.4 ms; memory use: 36.75MB; cache file size: 3.3MB Better than msg pack. Still, this one requires compiling too. serialize/unserialize average time: 477.2 ms; memory use: 36.25MB; cache file size: 5.9MB Better performances than JSON, the bigger the array is, slower json_decode is, but you already new that. Those external extensions are narrowing down the file size and seems great on paper. Numbers don't lie*. What's the point of compiling an extension if you get almost the same results that you'd have with a standard PHP function? We can also deduce that depending on your needs, you will choose something different than someone else: IgBinary is really nice and performs better than MsgPack Msgpack is better at compressing your datas (note that I didn't tried the igbinary compact.string option). Don't want to compile? Use standards. That's it, another serialization methods comparison to help you choose the one! *Tested with PHPUnit 3.7.31, php 5.5.10 - only decoding with a standard hardrive and old dual core CPU - average numbers on 10 same use case tests, your stats might be different
I know this is late but the answers are pretty old, I thought my benchmarks might help as I have just tested in PHP 7.4 Serialize/Unserialize is much faster than JSON, takes less memory and space, and wins outright in PHP 7.4 but I am not sure my test is the most efficient or the best, I have basically created a PHP file which returns an array which I encoded, serialised, then decoded and unserialised. $array = include __DIR__.'/../tests/data/dao/testfiles/testArray.php'; //JSON ENCODE $json_encode_memory_start = memory_get_usage(); $json_encode_time_start = microtime(true); for ($i=0; $i < 20000; $i++) { $encoded = json_encode($array); } $json_encode_time_end = microtime(true); $json_encode_memory_end = memory_get_usage(); $json_encode_time = $json_encode_time_end - $json_encode_time_start; $json_encode_memory = $json_encode_memory_end - $json_encode_memory_start; //SERIALIZE $serialize_memory_start = memory_get_usage(); $serialize_time_start = microtime(true); for ($i=0; $i < 20000; $i++) { $serialized = serialize($array); } $serialize_time_end = microtime(true); $serialize_memory_end = memory_get_usage(); $serialize_time = $serialize_time_end - $serialize_time_start; $serialize_memory = $serialize_memory_end - $serialize_memory_start; //Write to file time: $fpc_memory_start = memory_get_usage(); $fpc_time_start = microtime(true); for ($i=0; $i < 20000; $i++) { $fpc_bytes = file_put_contents( __DIR__.'/../tests/data/dao/testOneBigFile', '<?php return '.var_export($array,true).' ?>;' ); } $fpc_time_end = microtime(true); $fpc_memory_end = memory_get_usage(); $fpc_time = $fpc_time_end - $fpc_time_start; $fpc_memory = $fpc_memory_end - $fpc_memory_start; //JSON DECODE $json_decode_memory_start = memory_get_usage(); $json_decode_time_start = microtime(true); for ($i=0; $i < 20000; $i++) { $decoded = json_encode($encoded); } $json_decode_time_end = microtime(true); $json_decode_memory_end = memory_get_usage(); $json_decode_time = $json_decode_time_end - $json_decode_time_start; $json_decode_memory = $json_decode_memory_end - $json_decode_memory_start; //UNSERIALIZE $unserialize_memory_start = memory_get_usage(); $unserialize_time_start = microtime(true); for ($i=0; $i < 20000; $i++) { $unserialized = unserialize($serialized); } $unserialize_time_end = microtime(true); $unserialize_memory_end = memory_get_usage(); $unserialize_time = $unserialize_time_end - $unserialize_time_start; $unserialize_memory = $unserialize_memory_end - $unserialize_memory_start; //GET FROM VAR EXPORT: $var_export_memory_start = memory_get_usage(); $var_export_time_start = microtime(true); for ($i=0; $i < 20000; $i++) { $array = include __DIR__.'/../tests/data/dao/testOneBigFile'; } $var_export_time_end = microtime(true); $var_export_memory_end = memory_get_usage(); $var_export_time = $var_export_time_end - $var_export_time_start; $var_export_memory = $var_export_memory_end - $var_export_memory_start; Results: Var Export length: 11447 Serialized length: 11541 Json encoded length: 11895 file put contents Bytes: 11464 Json Encode Time: 1.9197590351105 Serialize Time: 0.160325050354 FPC Time: 6.2793469429016 Json Encode Memory: 12288 Serialize Memory: 12288 FPC Memory: 0 JSON Decoded time: 1.7493588924408 UnSerialize Time: 0.19309520721436 Var Export and Include: 3.1974139213562 JSON Decoded memory: 16384 UnSerialize Memory: 14360 Var Export and Include: 192
Seems like serialize is the one I'm going to use for 2 reasons: Someone pointed out that unserialize is faster than json_decode and a 'read' case sounds more probable than a 'write' case. I've had trouble with json_encode when having strings with invalid UTF-8 characters. When that happens the string ends up being empty causing loss of information.
I've tested this very thoroughly on a fairly complex, mildly nested multi-hash with all kinds of data in it (string, NULL, integers), and serialize/unserialize ended up much faster than json_encode/json_decode. The only advantage json have in my tests was it's smaller 'packed' size. These are done under PHP 5.3.3, let me know if you want more details. Here are tests results then the code to produce them. I can't provide the test data since it'd reveal information that I can't let go out in the wild. JSON encoded in 2.23700618744 seconds PHP serialized in 1.3434419632 seconds JSON decoded in 4.0405561924 seconds PHP unserialized in 1.39393305779 seconds serialized size : 14549 json_encode size : 11520 serialize() was roughly 66.51% faster than json_encode() unserialize() was roughly 189.87% faster than json_decode() json_encode() string was roughly 26.29% smaller than serialize() // Time json encoding $start = microtime( true ); for($i = 0; $i < 10000; $i++) { json_encode( $test ); } $jsonTime = microtime( true ) - $start; echo "JSON encoded in $jsonTime seconds<br>"; // Time serialization $start = microtime( true ); for($i = 0; $i < 10000; $i++) { serialize( $test ); } $serializeTime = microtime( true ) - $start; echo "PHP serialized in $serializeTime seconds<br>"; // Time json decoding $test2 = json_encode( $test ); $start = microtime( true ); for($i = 0; $i < 10000; $i++) { json_decode( $test2 ); } $jsonDecodeTime = microtime( true ) - $start; echo "JSON decoded in $jsonDecodeTime seconds<br>"; // Time deserialization $test2 = serialize( $test ); $start = microtime( true ); for($i = 0; $i < 10000; $i++) { unserialize( $test2 ); } $unserializeTime = microtime( true ) - $start; echo "PHP unserialized in $unserializeTime seconds<br>"; $jsonSize = strlen(json_encode( $test )); $phpSize = strlen(serialize( $test )); echo "<p>serialized size : " . strlen(serialize( $test )) . "<br>"; echo "json_encode size : " . strlen(json_encode( $test )) . "<br></p>"; // Compare them if ( $jsonTime < $serializeTime ) { echo "json_encode() was roughly " . number_format( ($serializeTime / $jsonTime - 1 ) * 100, 2 ) . "% faster than serialize()"; } else if ( $serializeTime < $jsonTime ) { echo "serialize() was roughly " . number_format( ($jsonTime / $serializeTime - 1 ) * 100, 2 ) . "% faster than json_encode()"; } else { echo 'Unpossible!'; } echo '<BR>'; // Compare them if ( $jsonDecodeTime < $unserializeTime ) { echo "json_decode() was roughly " . number_format( ($unserializeTime / $jsonDecodeTime - 1 ) * 100, 2 ) . "% faster than unserialize()"; } else if ( $unserializeTime < $jsonDecodeTime ) { echo "unserialize() was roughly " . number_format( ($jsonDecodeTime / $unserializeTime - 1 ) * 100, 2 ) . "% faster than json_decode()"; } else { echo 'Unpossible!'; } echo '<BR>'; // Compare them if ( $jsonSize < $phpSize ) { echo "json_encode() string was roughly " . number_format( ($phpSize / $jsonSize - 1 ) * 100, 2 ) . "% smaller than serialize()"; } else if ( $phpSize < $jsonSize ) { echo "serialize() string was roughly " . number_format( ($jsonSize / $phpSize - 1 ) * 100, 2 ) . "% smaller than json_encode()"; } else { echo 'Unpossible!'; }
I made a small benchmark as well. My results were the same. But I need the decode performance. Where I noticed, like a few people above said as well, unserialize is faster than json_decode. unserialize takes roughly 60-70% of the json_decode time. So the conclusion is fairly simple: When you need performance in encoding, use json_encode, when you need performance when decoding, use unserialize. Because you can not merge the two functions you have to make a choise where you need more performance. My benchmark in pseudo: Define array $arr with a few random keys and values for x < 100; x++; serialize and json_encode a array_rand of $arr for y < 1000; y++; json_decode the json encoded string - calc time for y < 1000; y++; unserialize the serialized string - calc time echo the result which was faster On avarage: unserialize won 96 times over 4 times the json_decode. With an avarage of roughly 1.5ms over 2.5ms.
Check out the results here (sorry for the hack putting the PHP code in the JS code box): http://jsfiddle.net/newms87/h3b0a0ha/embedded/result/ RESULTS: serialize() and unserialize() are both significantly faster in PHP 5.4 on arrays of varying size. I made a test script on real world data for comparing json_encode vs serialize and json_decode vs unserialize. The test was run on the caching system of an in production e-commerce site. It simply takes the data already in the cache, and tests the times to encode / decode (or serialize / unserialize) all the data and I put it in an easy to see table. I ran this on PHP 5.4 shared hosting server. The results were very conclusive that for these large to small data sets serialize and unserialize were the clear winners. In particular for my use case, the json_decode and unserialize are the most important for the caching system. Unserialize was almost an ubiquitous winner here. It was typically 2 to 4 times (sometimes 6 or 7 times) as fast as json_decode. It is interesting to note the difference in results from #peter-bailey. Here is the PHP code used to generate the results: <?php ini_set('display_errors', 1); error_reporting(E_ALL); function _count_depth($array) { $count = 0; $max_depth = 0; foreach ($array as $a) { if (is_array($a)) { list($cnt, $depth) = _count_depth($a); $count += $cnt; $max_depth = max($max_depth, $depth); } else { $count++; } } return array( $count, $max_depth + 1, ); } function run_test($file) { $memory = memory_get_usage(); $test_array = unserialize(file_get_contents($file)); $memory = round((memory_get_usage() - $memory) / 1024, 2); if (empty($test_array) || !is_array($test_array)) { return; } list($count, $depth) = _count_depth($test_array); //JSON encode test $start = microtime(true); $json_encoded = json_encode($test_array); $json_encode_time = microtime(true) - $start; //JSON decode test $start = microtime(true); json_decode($json_encoded); $json_decode_time = microtime(true) - $start; //serialize test $start = microtime(true); $serialized = serialize($test_array); $serialize_time = microtime(true) - $start; //unserialize test $start = microtime(true); unserialize($serialized); $unserialize_time = microtime(true) - $start; return array( 'Name' => basename($file), 'json_encode() Time (s)' => $json_encode_time, 'json_decode() Time (s)' => $json_decode_time, 'serialize() Time (s)' => $serialize_time, 'unserialize() Time (s)' => $unserialize_time, 'Elements' => $count, 'Memory (KB)' => $memory, 'Max Depth' => $depth, 'json_encode() Win' => ($json_encode_time > 0 && $json_encode_time < $serialize_time) ? number_format(($serialize_time / $json_encode_time - 1) * 100, 2) : '', 'serialize() Win' => ($serialize_time > 0 && $serialize_time < $json_encode_time) ? number_format(($json_encode_time / $serialize_time - 1) * 100, 2) : '', 'json_decode() Win' => ($json_decode_time > 0 && $json_decode_time < $serialize_time) ? number_format(($serialize_time / $json_decode_time - 1) * 100, 2) : '', 'unserialize() Win' => ($unserialize_time > 0 && $unserialize_time < $json_decode_time) ? number_format(($json_decode_time / $unserialize_time - 1) * 100, 2) : '', ); } $files = glob(dirname(__FILE__) . '/system/cache/*'); $data = array(); foreach ($files as $file) { if (is_file($file)) { $result = run_test($file); if ($result) { $data[] = $result; } } } uasort($data, function ($a, $b) { return $a['Memory (KB)'] < $b['Memory (KB)']; }); $fields = array_keys($data[0]); ?> <table> <thead> <tr> <?php foreach ($fields as $f) { ?> <td style="text-align: center; border:1px solid black;padding: 4px 8px;font-weight:bold;font-size:1.1em"><?= $f; ?></td> <?php } ?> </tr> </thead> <tbody> <?php foreach ($data as $d) { ?> <tr> <?php foreach ($d as $key => $value) { ?> <?php $is_win = strpos($key, 'Win'); ?> <?php $color = ($is_win && $value) ? 'color: green;font-weight:bold;' : ''; ?> <td style="text-align: center; vertical-align: middle; padding: 3px 6px; border: 1px solid gray; <?= $color; ?>"><?= $value . (($is_win && $value) ? '%' : ''); ?></td> <?php } ?> </tr> <?php } ?> </tbody> </table>
First, I changed the script to do some more benchmarking (and also do 1000 runs instead of just 1): <?php ini_set('display_errors', 1); error_reporting(E_ALL); // Make a big, honkin test array // You may need to adjust this depth to avoid memory limit errors $testArray = fillArray(0, 5); $totalJsonTime = 0; $totalSerializeTime = 0; $totalJsonWins = 0; for ($i = 0; $i < 1000; $i++) { // Time json encoding $start = microtime(true); $json = json_encode($testArray); $jsonTime = microtime(true) - $start; $totalJsonTime += $jsonTime; // Time serialization $start = microtime(true); $serial = serialize($testArray); $serializeTime = microtime(true) - $start; $totalSerializeTime += $serializeTime; if ($jsonTime < $serializeTime) { $totalJsonWins++; } } $totalSerializeWins = 1000 - $totalJsonWins; // Compare them if ($totalJsonTime < $totalSerializeTime) { printf("json_encode() (wins: $totalJsonWins) was roughly %01.2f%% faster than serialize()\n", ($totalSerializeTime / $totalJsonTime - 1) * 100); } else { printf("serialize() (wins: $totalSerializeWins) was roughly %01.2f%% faster than json_encode()\n", ($totalJsonTime / $totalSerializeTime - 1) * 100); } $totalJsonTime = 0; $totalJson2Time = 0; $totalSerializeTime = 0; $totalJsonWins = 0; for ($i = 0; $i < 1000; $i++) { // Time json decoding $start = microtime(true); $orig = json_decode($json, true); $jsonTime = microtime(true) - $start; $totalJsonTime += $jsonTime; $start = microtime(true); $origObj = json_decode($json); $jsonTime2 = microtime(true) - $start; $totalJson2Time += $jsonTime2; // Time serialization $start = microtime(true); $unserial = unserialize($serial); $serializeTime = microtime(true) - $start; $totalSerializeTime += $serializeTime; if ($jsonTime < $serializeTime) { $totalJsonWins++; } } $totalSerializeWins = 1000 - $totalJsonWins; // Compare them if ($totalJsonTime < $totalSerializeTime) { printf("json_decode() was roughly %01.2f%% faster than unserialize()\n", ($totalSerializeTime / $totalJsonTime - 1) * 100); } else { printf("unserialize() (wins: $totalSerializeWins) was roughly %01.2f%% faster than json_decode()\n", ($totalJsonTime / $totalSerializeTime - 1) * 100); } // Compare them if ($totalJson2Time < $totalSerializeTime) { printf("json_decode() was roughly %01.2f%% faster than unserialize()\n", ($totalSerializeTime / $totalJson2Time - 1) * 100); } else { printf("unserialize() (wins: $totalSerializeWins) was roughly %01.2f%% faster than array json_decode()\n", ($totalJson2Time / $totalSerializeTime - 1) * 100); } function fillArray( $depth, $max ) { static $seed; if (is_null($seed)) { $seed = array('a', 2, 'c', 4, 'e', 6, 'g', 8, 'i', 10); } if ($depth < $max) { $node = array(); foreach ($seed as $key) { $node[$key] = fillArray($depth + 1, $max); } return $node; } return 'empty'; } I used this build of PHP 7: PHP 7.0.14 (cli) (built: Jan 18 2017 19:13:23) ( NTS ) Copyright (c) 1997-2016 The PHP Group Zend Engine v3.0.0, Copyright (c) 1998-2016 Zend Technologies with Zend OPcache v7.0.14, Copyright (c) 1999-2016, by Zend Technologies And my results were: serialize() (wins: 999) was roughly 10.98% faster than json_encode() unserialize() (wins: 987) was roughly 33.26% faster than json_decode() unserialize() (wins: 987) was roughly 48.35% faster than array json_decode() So clearly, serialize/unserialize is the fastest method, while json_encode/decode is the most portable. If you consider a scenario where you read/write serialized data 10x or more often than you need to send to or receive from a non-PHP system, you are STILL better off to use serialize/unserialize and have it json_encode or json_decode prior to serialization in terms of time.
Before you make your final decision, be aware that the JSON format is not safe for associative arrays - json_decode() will return them as objects instead: $config = array( 'Frodo' => 'hobbit', 'Gimli' => 'dwarf', 'Gandalf' => 'wizard', ); print_r($config); print_r(json_decode(json_encode($config))); Output is: Array ( [Frodo] => hobbit [Gimli] => dwarf [Gandalf] => wizard ) stdClass Object ( [Frodo] => hobbit [Gimli] => dwarf [Gandalf] => wizard )
just an fyi -- if you want to serialize your data to something easy to read and understand like JSON but with more compression and higher performance, you should check out messagepack.
JSON is better if you want to backup Data and restore it on a different machine or via FTP. For example with serialize if you store data on a Windows server, download it via FTP and restore it on a Linux one it could not work any more due to the charachter re-encoding, because serialize stores the length of the strings and in the Unicode > UTF-8 transcoding some 1 byte charachter could became 2 bytes long making the algorithm crash.
THX - for this benchmark code: My results on array I use for configuration are as fallows: JSON encoded in 0.0031511783599854 seconds PHP serialized in 0.0037961006164551 seconds json_encode() was roughly 20.47% faster than serialize() JSON encoded in 0.0070841312408447 seconds PHP serialized in 0.0035839080810547 seconds unserialize() was roughly 97.66% faster than json_encode() So - test it on your own data.
If to summ up what people say here, json_decode/encode seems faster than serialize/unserialize BUT If you do var_dump the type of the serialized object is changed. If for some reason you want to keep the type, go with serialize! (try for example stdClass vs array) serialize/unserialize: Array cache: array (size=2) 'a' => string '1' (length=1) 'b' => int 2 Object cache: object(stdClass)[8] public 'field1' => int 123 This cache: object(Controller\Test)[8] protected 'view' => json encode/decode Array cache: object(stdClass)[7] public 'a' => string '1' (length=1) public 'b' => int 2 Object cache: object(stdClass)[8] public 'field1' => int 123 This cache: object(stdClass)[8] As you can see the json_encode/decode converts all to stdClass, which is not that good, object info lost... So decide based on needs, especially if it is not only arrays...
I would suggest you to use Super Cache, which is a file cache mechanism which won't use json_encode or serialize. It is simple to use and really fast compared to other PHP Cache mechanism. https://packagist.org/packages/smart-php/super-cache Ex: <?php require __DIR__.'/vendor/autoload.php'; use SuperCache\SuperCache as sCache; //Saving cache value with a key // sCache::cache('<key>')->set('<value>'); sCache::cache('myKey')->set('Key_value'); //Retrieving cache value with a key echo sCache::cache('myKey')->get(); ?>
Memory footprint way too large
I have tried the memory usage of some simple variables and encounter unexpected results, please see this code: $datetimes = []; $memory_before = memory_get_usage(); for ($x = 0; $x < 1000; $x++) { $datetimes[] = new \DateTime(); } var_dump('DateTimes: ' . (memory_get_usage() - $memory_before)); $ints = []; $memory_before = memory_get_usage(); for ($x = 0; $x < 1000; $x++) { $ints[] = $x; } var_dump('Integers: ' . (memory_get_usage() - $memory_before)); I get this output (on PHP 7.4, 64bit): string(17) "DateTimes: 350504" string(15) "Integers: 37160" 37 KB memory for 1000 ints does not make sense to me, right? I'd expect 8000 byte plus some array overhead. My experiment scales: for a million ints, I get 33558808 byte memory usage. I have disabled xdebug.
It's how PHP works and the disadvantage of having dynamically-typed variables. The integer is in reality a Zend object. 1000 x (64 * 2) = 128 Kbit so 16KB. Add to that the array of size 1000. In memory, zval is represented as two 64-bit words. The first word keeps the value — and the second word keeps the type, type_flags, extra, and reserved fields.
Does a boolean value in PHP take up only 1 bit of memory?
As the question states, would the following array require 5 bits of memory? $flags = array(true, false, true, false, false); [EDIT]: Apologies just found this duplicate.
Each element in the array stored in a separate memory location, you also need to store the hashtable for the array, along with the keys, so NOOOO, it's going to be a lot more.
No. PHP has internal metadata attached to every variable/array element definined. PHP does not support bit fields directly, so the smallest ACTUAL allocation is a byte, plus metadata overhead.
I doubt there is an application that uses less than system arcitecture's data word as a minimum data storage unit. But I am sure it shouldn't be your concern at all.
It depends on the php interpreter. The standard interpreter is extremely wasteful, although this is not uncommon for a dynamic language. The massive overhead is caused by garbage collection, and the dynamic nature of every value; since the contents of an array can take arbitrary values of arbitrary types (i.e. you can write $ar[1] = 's';), the type and additional metainformation must be stored. With the following test script: <?php $n = 20000000; $ar = array(); $i = 0; $before = memory_get_usage(); for ($i = 0;$i < $n;$i++) { $ar[] = ($i % 2 == 0); } $after = memory_get_usage(); echo 'Using ' . ($after - $before) . ' Bytes for ' . $n . ' values'; echo ', per value: ' . (($after - $before) / $n) . "\n"; I get about 150 Bytes per array entry (x64, php 5.4.0-2). This seems to be at the higher end of implementations; ideone reports 73 Bytes/entry (php 5.2.11), and so does codepad.
Cheating PHP integers
This is in relation to my post here but taken in a completely different direction Charset detection in PHP essentially, i'm looking to reduce the memory that many huge arrays cause. These arrays are just full of integers but seeing as PHP uses 32bit and 64 bit integers internally (depending which version you have compiled for your CPU type), it eats the memory. is there a way to cheat PHP into using 8bit or 16bit integers? I've thought about using pack(); to accomplish this so I can have an array of packed binary values and just unpack them as I need them (yes I know this would make it slower but is much faster than the alternative of loading and then running through each array individually as you can stream the text through so they all need to be in memory at the same time to keep speed up) can you suggest any better alternatives to accomplish this? i know it's very hacky but I need to prevent huge memory surges.
Don't tell nobody! class IntegerstringArray IMPLEMENTS ArrayAccess { var $evil = "0000111122220000ffff"; // 16 bit each function offsetExists ( $offset ) { return (strlen($this->evil) / 4) - 1 >= $offset; } function offsetGet ( $offset ) { return hexdec(substr($this->evil, $offset * 4, 4)); } function offsetSet ( $offset , $value ) { $hex = dechex($value); if ($fill = 4 - strlen($hex)) { $hex = str_repeat("0", $fill) . $hex; } for ($i=0; $i<4; $i++) { $this->evil[$offset*4+$i] = $hex[$i]; } } function offsetUnset ( $offset ) { assert(false); } } So you can pretty much create an array object from this: $array = new IntegerstringArray(); $array[2] = 65535; print $array[2]; It internally stores a list and accepts 16-bit integers. The array offsets must be consecutive. Not tested. Just as an implementation guide.
Problem reading files greater than 1GB with XMLReader
Is there a maximum file size the XMLReader can handle? I'm trying to process an XML feed about 3GB large. There are certainly no PHP errors as the script runs fine and successfully loads to the database after it's been run. The script also runs fine with smaller test feeds - 1GB and below. However, when processing larger feeds the script stops reading the XML File after about 1GB and continues running the rest of the script. Has anybody experienced a similar problem? and if so how did you work around it? Thanks in advance.
I had same kind of problem recently and I thought to share my experience. It seems that problem is in the way PHP was compiled, whether it was compiled with support for 64bit file sizes/offsets or only with 32bit. With 32bits you can only address 4GB of data. You can find a bit confusing but good explanation here: http://blog.mayflower.de/archives/131-Handling-large-files-without-PHP.html I had to split my files with Perl utility xml_split which you can find here: http://search.cpan.org/~mirod/XML-Twig/tools/xml_split/xml_split I used it to split my huge XML file into manageable chunks. The good thing about the tool is that it splits XML files over whole elements. Unfortunately its not very fast. I needed to do this one time only and it suited my needs, but I wouldn't recommend it repetitive use. After splitting I used XMLReader on smaller files of about 1GB in size.
Splitting up the file will definitely help. Other things to try... adjust the memory_limit variable in php.ini. http://php.net/manual/en/ini.core.php rewrite your parser using SAX -- http://php.net/manual/en/book.xml.php . This is a stream-oriented parser that doesn't need to parse the whole tree. Much more memory-efficient but slightly harder to program. Depending on your OS, there might also be a 2gb limit on the RAM chunk that you can allocate. Very possible if you're running on a 32-bit OS.
It should be noted that PHP in general has a max file size. PHP does not allow for unsigned integers, or long integers, meaning you're capped at 2^31 (or 2^63 for 64 bit systems) for integers. This is important because PHP uses an integer for the file pointer (your position in the file as you read through), meaning it cannot process a file larger than 2^31 bytes in size. However, this should be more than 1 gigabyte. I ran into issues with two gigabytes (as expected, since 2^31 is roughly 2 billion).
I've run into a similar issue when parsing large documents. What I wound up doing is breaking the feed into smaller chunks using filesystem functions, then parsing those smaller chunks... So if you have a bunch of <record> tags that you are parsing, parse them out with string functions as a stream, and when you get a full record in the buffer, parse that using the xml functions... It sucks, but it works quite well (and is very memory efficient, since you only have at most 1 record in memory at any one time)...
Do you get any errors with libxml_use_internal_errors(true); libxml_clear_errors(); // your parser stuff here.... $r = new XMLReader(...); // .... foreach( libxml_get_errors() as $err ) { printf(". %d %s\n", $err->code, $err->message); } when the parser stops prematurely?
Using WindowsXP, NTFS as filesystem and php 5.3.2 there was no problem with this test script <?php define('SOURCEPATH', 'd:/test.xml'); if ( 0 ) { build(); } else { echo 'filesize: ', number_format(filesize(SOURCEPATH)), "\n"; timing('read'); } function timing($fn) { $start = new DateTime(); echo 'start: ', $start->format('Y-m-d H:i:s'), "\n"; $fn(); $end = new DateTime(); echo 'end: ', $start->format('Y-m-d H:i:s'), "\n"; echo 'diff: ', $end->diff($start)->format('%I:%S'), "\n"; } function read() { $cnt = 0; $r = new XMLReader; $r->open(SOURCEPATH); while( $r->read() ) { if ( XMLReader::ELEMENT === $r->nodeType ) { if ( 0===++$cnt%500000 ) { echo '.'; } } } echo "\n#elements: ", $cnt, "\n"; } function build() { $fp = fopen(SOURCEPATH, 'wb'); $s = '<catalogue>'; //for($i = 0; $i < 500000; $i++) { for($i = 0; $i < 60000000; $i++) { $s .= sprintf('<item>%010d</item>', $i); if ( 0===$i%100000 ) { fwrite($fp, $s); $s = ''; echo $i/100000, ' '; } } $s .= '</catalogue>'; fwrite($fp, $s); flush($fp); fclose($fp); } output: filesize: 1,380,000,023 start: 2010-08-07 09:43:31 ........................................................................................................................ #elements: 60000001 end: 2010-08-07 09:43:31 diff: 07:31 (as you can see I screwed up the output of the end-time but I don't want to run this script another 7+ minutes ;-)) Does this also work on your system? As a side-note: The corresponding C# test application took only 41 seconds instead of 7,5 minutes. And my slow harddrive might have been the/one limiting factor in this case. filesize: 1.380.000.023 start: 2010-08-07 09:55:24 ........................................................................................................................ #elements: 60000001 end: 2010-08-07 09:56:05 diff: 00:41 and the source: using System; using System.IO; using System.Xml; namespace ConsoleApplication1 { class SOTest { delegate void Foo(); const string sourcepath = #"d:\test.xml"; static void timing(Foo bar) { DateTime dtStart = DateTime.Now; System.Console.WriteLine("start: " + dtStart.ToString("yyyy-MM-dd HH:mm:ss")); bar(); DateTime dtEnd = DateTime.Now; System.Console.WriteLine("end: " + dtEnd.ToString("yyyy-MM-dd HH:mm:ss")); TimeSpan s = dtEnd.Subtract(dtStart); System.Console.WriteLine("diff: {0:00}:{1:00}", s.Minutes, s.Seconds); } static void readTest() { XmlTextReader reader = new XmlTextReader(sourcepath); int cnt = 0; while (reader.Read()) { if (XmlNodeType.Element == reader.NodeType) { if (0 == ++cnt % 500000) { System.Console.Write('.'); } } } System.Console.WriteLine("\n#elements: " + cnt + "\n"); } static void Main() { FileInfo f = new FileInfo(sourcepath); System.Console.WriteLine("filesize: {0:N0}", f.Length); timing(readTest); return; } } }