I have a php script that outputs a json-encoded object with large numbers (greater than PHP_MAX_INT) so to store those numbers internally, I have to store them as strings. However, I need them to be shown as un-quoted numbers to the client.
I've thought of several solutions, many of which haven't worked. Most of the ideas revolve around writing my own JSON encoder, which I have done already, but don't want to take the time to change all the places I have json_encode to instead say my_json_encode.
Since I have no control over the server, I cannot turn remove the JSON library. I cannot undeclare json_encode, nor can I rename it. Is there any easy way to handle all this, or is the best option to just go through each and every file and rename all the method calls?
With javascript being loosely typed, why the need to control the type in the JSON data? What are you doing with this number in javascript, and would parseInt\parseFloat not be able to make the leap from string to number on the client side?
The only option I had was to use my own json_encode method renamed to my_json_encode, and then change everywhere that called that method.
Related
I'm currently working with data I'd like to temporarily store in my database as encrypted data. I'm not worried about the database getting hacked into, I just want to ensure the people that had entered the data that it is not reachable by any other than themselves. (and me of course)
The data is not meant to be stored permanently in the database since I'm exporting it to a third party application using their API, but since they have a rate limit I need to store the data in our database until the limit is over and I can upload it. (Assuming the rate limit occurs)
The process:
The request I receive from the form is in an array, so to begin with I serialize() the array to get a long string which I will unserialize() later.
Then I want to use a method that lets me convert the string into numbers and back again without losing information.
The reason I want to turn the data into numbers is because I use the HashIds library, which only encodes numbers. To my knowledge it's an extra layer of security I'm happy to add.
Read more on HashIds here: http://hashids.org/
What I have tried:
I tried converting the string into hex numbers, and then the hex numbers into decimals. Unfortunately the number was too large, and i haven't had any luck using biginteger with it.
base64_encode() which is not going to turn the data into numbers, but then base_converting them is. But I couldn't figure out the base converting in php since apparently it's rather odd.
Conclusion:
How can I convert the data I'm receiving from a form request into a short encoded string which can be converted back into the data without too much hassle? I don't quite know all the options PHP offers yet.
UPDATE:
To conclude this thread, I ended up using OpenSSL to encrypt my serialized array. Only problem I ran into was if the request contained a file I wouldn't be able to serialize it and save the object to the database. I do still need a way around this, since the third party application expects the file to be a multipart/formdata object i can't just save the filepath to the database and upload that. But I guess I will have to figure out that one later.
That link http://hashids.org/ provides a pretty clear example. Lets assume that your integer is 15.
$hashids = new Hashids\Hashids('some random string for a salt. Make sure you use the same salt if you want to be able to decode');
$encoded = $hashids->encode(15);
print_r(['hashedId' => $encoded]);
$decoded = $hashids->decode($hashed);
print_r(['decoded' => $decoded]);
So the value of $decoded should equal 15
Update
Sorry - the hashids bit of your question threw me and as such, I misunderstood what you were asking. I will update my answer:
You should really be using https://secure.php.net/openssl_encrypt and https://secure.php.net/manual/en/function.openssl-decrypt.php
First of all, I couldn't get clear definition of it from WikiPedia or even from serialize function in the PHP manual. I need to know some cases where we need the term serialization and how things are going without it? In other words, Where you need serialization and without it your code will be missing some important feature.
What is serialization?
Serialization encodes objects into another format.
For example you have an array in PHP like this:
$array = array("a" => 1, "b" => 2, "c" => array("a" => 1, "b" => 2));
And then you want to store it in file or send to other application.
There are several format choices, but the idea is the same:
The array has to be encoded (or you could say "translated"), into text or bytes, that can be written to a file or sent via the network.
For example, in PHP, if you:
$data = serialize($array);
you will get this:
a:3:{s:1:"a";i:1;s:1:"b";i:2;s:1:"c";a:2:{s:1:"a";i:1;s:1:"b";i:2;}}
This is PHP's particular serializing format that PHP understands, and it works vice versa, so you are able to use it to deserialize objects.
For example, you stored a serialized array in a file, and you want it back in your code as an array:
$array = unserialize($data);
But you could choose a different serialization format, for example, JSON:
$json = json_encode($array);
will give you this:
{"a":1,"b":2,"c":{"a":1,"b":2}}
The result is not only easily saved, read by human eye, or sent via network, but is also understandable by almost every other language (JavaScript, Java, C#, C++, ...)
Conclusion
Serialization translate objects to another format, in case you want to store or share data.
Are there any situations, where you cannot do anything, but serialize it?
No. But serialization usually makes things easier.
Are JSON and PHP format the only possible formats?
No, no, no and one more time no. There are plenty of formats.
XML (e.g. using a schema like WSDL or XHTML)
Bytes, Protobuf, etc.
Yaml
...
...
Your own formats (you can create your own format for serialization and use it, but that is a big thing to do and is not worth it, most of the time)
Serialization is the process of converting some in-memory object to another format that could be used to either store in a file or sent over the network. Deserialization is the inverse process meaning the actual object instance is restored from the given serialized representation of the object. This is very useful when communicating between various systems.
The serialization format could be either interoperable or non-interoperable. Interoperable formats (such as JSON, XML, ...) allow for serializing some object using a given platform and deserializing it using a different platform. For example with JSON you could use javascript to serialize the object and send it over the network to a PHP script that will deserialize the object and use it.
The serialize() PHP function uses an non-interoperable format. This means that only PHP could be used to both serialize and deserialize the object back.
You could use the json_encode and json_decode() functions in order to serialize/deserialize PHP objects using the JSON interoperable format.
Serialization is the process of turning data (e.g. variables) into a representation such as a string, that can easily be written and read back from for example a file or the database.
Use cases? There are many, but generally it revolves around the idea of taking a complex, nested array or object and turning it into a simple string that can be saved and read later to retrieve the same structure. For example, provided you have in php:
$blub = array();
$blub['a'] = 1;
$blub['a']['b'] = 4;
$blub['b'] = 27;
$blub['b']['b'] = 46;
Instead of going through every array member individually and writing it one could just:
$dataString = serialize($blub);
And the serialized array is ready to be written anywhere as a simple string, in such a way that retrieving this string again and doing unserialize() over it gets you the exact same array structure you had before. Yes, it's really that simple.
I need to know some cases we need the term serialization and how things are going without it?
Serialization can become handy if you need to store complete structures (like an invoice with all associated data like customer address, sender address, product positions, tax caclulcations etc) that are only valid at a certain point in time.
All these data will change in the future, new tax regulations might come, the address of a customer changes, products go out of life. But still the invoice needs to be valid and stored.
This is possible with serialization. Like a snapshot. The object in memory are serialized into a (often like in PHP) binary form that can be just stored. It can be brought back to live later on (and in a different context). Like with this invoice example: In ten years, the data can still be read and the invoice object is the same as it was ten years earlier.
In other word, Where you must need serialization and without it your code will be missing some important feature.
That was one example. It's not that you always needs that, but if things become more complex, serialization can be helpful.
Since you've tagged it with javascript, one kind of serialization could be form serialization.
Here are the references for the jQuery and prototype.JS equivalents.
What they basically do is serialize form input values into comma-separated name-value pairs.
So considering an actual usage..
$.ajax({
url : 'insert.php?a=10,b=15' //values serialized via .serialize()
type: 'GET'
});
And you would probably do $GET["a"] to retrieve those values, I'm not familiar with PHP though.
I have a PHP script that fetches a relatively large amount of data, and formats it as HTML unordered lists for use in an Ajax application.
Considering the data is in the order of tens to possibly more than a hundred KB, and that I want to be able to differentiate between the different lists with Javascript, what would be the best way to go about doing this?
I thought about json_encode, but that results in [null] when more than a certain amount of rows are requested (maybe PHP memory limit?).
Thanks a lot,
Fela
Certain illegal characters in the string could be breaking the json_encode() function in PHP which you will need to sanitize before this will work correctly. You could do this using regular expressions if this becomes a problem.
However, if you are sending requests with that amount of data it may be unwise to send this using AJAX as your application will seem very unresponsive. It may be better to get this data directly from the database as this would be a far faster method although you will have to obviously compromise.
I cannot click up, but I agree with Daniel West.
Ensure your strings are UTF-8 encoded or use mysql_set_charset('utf8') when you connect. The default charset for mysql is unfortunately Latin/Windows. Null is the result of a failed encoding because of this, if it was out of memory, the script itself would fail.
I would pass the data around via JSON, and do it in small batches that you can present incrementally. This will make it appear faster and may get around the memory issues you are having. If the data above the scroll ( first 40 lines or so ) loads quick, it is ok if the rest of the page takes several seconds to load. If you want to get tricky, you can even load the first page and then wait for a scroll event to load the rest, so you don't have to hit the server too much if the user never scrolls to look at the data below the scrollbar. If php is returning null from the json_encode it is because of invalid characters. If you cant control the data, you could just send HTML from the server to the client and avoid all the encoding/decoding, but this means more data to transfer.
With jquery you can transform your unordered list into a javascript array in your ajax application.
$.map( $('li'), function (element) { return $(element).text() });
Also, underscorejs as some very neat functions for javascript arrays and collections.
http://documentcloud.github.com/underscore/
I would prefer JSON, and I thought the 'null' you get from PHP encode is resulted from jSON's double escaping mechanism with javaScript. I have explained it in another post.
You need to double escape special character(One suspicious 'null' cause is '\n' in your case)
json parse error with double quotes
I'm writing a command line application in PHP that accepts a path to a local input file as an argument. The input file will contain one of the following things:
JSON encoded associative array
A serialized() version of the associative array
A base 64 encoded version of the serialized() associative array
Base 64 encoded JSON encoded associative array
A plain old PHP associative array
Rubbish
In short, there are several dissimilar programs that I have no control over that will be writing to this file, in a uniform way that I can understand, once I actually figure out the format. Once I figure out how to ingest the data, I can just run with it.
What I'm considering is:
If the first byte of the file is { , try json_decode(), see if it fails.
If the first byte of the file is < or $, try include(), see if it fails.
if the first three bytes of the file match a:[0-9], try unserialize().
If not the first three, try base64_decode(), see if it fails. If not:
Check the first bytes of the decoded data, again.
If all of that fails, it's rubbish.
That just seems quite expensive for quite a simple task. Could I be doing it in a better way? If so, how?
There isn't much to optimize here. The magic bytes approach is already the way to go. But of course the actual deserialization functions can be avoided. It's feasible to use a verification regex for each instead (which despite the meme are often faster than having PHP actually unpack a nested array).
base64 is easy enough to probe for.
json can be checked with a regex. Fastest way to check if a string is JSON in PHP? is the RFC version for securing it in JS. But it would be feasible to write a complete json (?R) match rule.
serialize is a bit more difficult without a proper unpack function. But with some heuristics you can already assert that it's a serialize blob.
php array scripts can be probed a bit faster with token_get_all. Or if the format and data is constrained enough, again with a regex.
The more important question here is, do you need reliability - or simplicity and speed?
For speed, you could use the file(1) utility and add "magic numbers" in /usr/share/file/magic. It should be faster than a pure PHP alternative.
You can try json_decode() and unserialize() which will return NULL if they fail, then base64_decode() and run that again. It's not fast, but it's infinitely less error prone than hand parsing them...
The issue here is that if you have no idea which it can be, you will need to develop a detection algorithm. Conventions should be set with an extension (check the extension, if it fails, tell whoever put the file there to place the correct extension on), otherwise you will need to check yourself. Most algorithms that detect what type a file actually is do use hereustics to determine it's contents (exe, jpg etc) because generally they have some sort of signature that identifies them. So if you have no idea what the content will be for definate, it's best to look for features that are specific to those contents. This does sometimes mean reading more than a couple of bytes.
So I need to encode an array in PHP and store it in plain text in MySQL database, my question is should I use serialize() or json_encode()? What are the advantages and disadvantages of each of them?
I think either of them would do in this situation. But which one would you prefer and why? If it is for something other than an array?
Main advantage of serialize : it's specific to PHP, which means it can represent PHP types, including instances of your own classes -- and you'll get your objects back, still instances of your classes, when unserializing your data.
Main advantage of json_encode : JSON is not specific to PHP : there are libraries to read/write it in several languages -- which means it's better if you want something that can be manipulated with another language than PHP.
A JSON string is also easier to read/write/modify by hand than a serialized one.
On the other hand, as JSON is not specific to PHP, it's not aware of the stuff that's specific to PHP -- like data-types.
As a couple of sidenotes :
Even if there is a small difference in speed between those two, it shouldn't matter much : you will probably not serialize/unserialize a lot of data
Are you sure this is the best way to store data in a database ?
You won't be able to do much queries on serialized strins, in a DB : you will not be able to use your data in where clauses, nor update it without the intervention of PHP...
I did some analysis on Json Encoding vs Serialization in PHP. And I found that Json is best for plain and simple data like array.
See the results of my experiments at https://www.shozab.com/php-serialization-vs-json-encoding-for-an-array/
Another advantage of json_encode over serialize is the size. I noticed that as I was trying to figure out why our memcache used memory was getting so big, and was trying to find ways to reduce is:
<?php
$myarray = array();
$myarray["a"]="b";
$serialize=serialize($myarray);
$json=json_encode($myarray);
$serialize_size=strlen($serialize);
$json_size=strlen($json);
var_dump($serialize);
var_dump($json);
echo "Size of serialized array: $serialize_size\n";
echo "Size of json encoded array: $json_size\n";
echo "Serialize is " . round(($serialize_size-$json_size)/$serialize_size*100) . "% bigger\n";
Which gives you:
string(22) "a:1:{s:1:"a";s:1:"b";}"
string(9) "{"a":"b"}"
Size of serialized array: 22
Size of json encoded array: 9
Serialize is 59% bigger
Obviously I've taken the most extreme example, as the shorter the array, the more important the overhead with serialize (relative to the initial object size, due to formatting which imposes a minimum number of characters no matter how small the content). Still from a production website I see serialized array that are 20% bigger than their json equivalent.
Well firstly serializing an array or object and storing it in a database is typically a code smell. Sometimes people end up putting a comma separated list into a column and then get into all sorts of trouble when they later find out they need to query on it.
So think very carefully about that if this is that kind of situation.
As for the differences. PHP serialize is probably more compact but only usable with PHP. JSON is cross-platform and possibly slower to encode and decode (although I doubt meaningfully so).
If you data will never has to leave your PHP application, I recommend serialize() because it offers a lot of extra functionality like __sleep() and __wakeup() methods for your objects. It also restores objects as instances of the correct classes.
If you will pass the serialized data to another application, you should use JSON or XML for compatibility.
But storing a serialized objet into a database? Maybe you should think about that again. It can be real trouble later.
First, thanks to Shozab Hasan and user359650 for these tests. I was wondering which choice was the best and now i know:
To encode a simple array, JSON which is OK with both PHP AND javascript, maybe other languages.
To encode a PHP object, serialize is a better choice because of specificity of PHP Objects only instanciable with PHP.
To store datas, either store encoded datas in a file or use MySQL with standard format. It would be much easier to get your datas back. MySQL has great functions to get datas the way you'd like to get them without PHP treatment.
I've never made any test but i think that file storage is the best way to store your datas if system file sorting is enough to get back your files in alphabetical/numeral order.
MySQL is to greedy for this kind of treatment and uses file system too...