Is PHP serialize function compatible UTF-8? - php

I have a site I want to migrate from ISO to UTF-8.
I have a record in database indexed by the following primary key :
s:22:"Informations générales";
The problem is, now (with UTF-8), when I serialize the string, I get :
s:24:"Informations générales";
(notice the size of the string is now the number of bytes, not string length)
So this is not compatible with non-utf8 previous records !
Did I do something wrong ? How could I fix this ?
Thanks

The behaviour is completely correct. Two strings with different encodings will generate different byte streams, thus different serialization strings.

Dump the database in latin1.
In the command line:
sed -e 's/latin1/utf8/g' -i ./DBNAME.sql
Import the file converted to a new database in UTF-8.
Use a php script to update each field.
Make a query, loop through each field and update the serialized string using this:
$str = preg_replace('!s:(\d+):"(.*?)";!se', "'s:'.strlen('$2').':\"$2\";'", $str);
After that, I was able to use unserialize() and everything working with UTF-8.

To unserialize an utf-8 encoded serialized array:
$array = #unserialize($arrayFromDatabase);
if ($array === false) {
$array = #unserialize(utf8_decode($arrayFromDatabase)); //decode first
$array = array_map('utf8_encode', $array ); // encode the array again
}

PHP 4 and 5 do not have built-in Unicode support; I believe PHP 6 is starting to add more Unicode support although I'm not sure how complete that is.

You did nothing wrong. PHP prior to v6 just isn't Unicode aware, and as such doesn't support it, if you don't beat it to be (i.e., via the mbstring extension or other means).
We here wrote our own wrapper around serialize() to remedy this. You could, too, move to other serialization techniques, like JSON (with json_encode() and json_decode() in PHP since 5.2.0).

Related

Weird encoding issue in XML-RPC call

I'm retrieving from Odoo 9 on Ubuntu 14.04 ENG a list of partners via XML-RPC using PHP and ripcord
Some names contain one or more diacritics:
Pièr
Frère Pièr
All those names have been entered from a single computer running Windows 8.1 using one version of Chrome.
The strange fact is that I get a list where some diacritics are correct, some other have encoding problems, like:
Pi�r
Fr�re Pièr
The same diacritic in the same string is correctly encoded or not.
In subsequent calls the result is always the same.
If I edit the string, then it could change the results, giving
Frère Pi�r
Frère Pièr
Fr�re Pi�r...
I need to output a JSON, and thus I need to encode this in UTF-8: but it is currently impossible since I don't have a clue of what encoding the original text is (and it seems to not have any encoding at all!)
Any idea?
I found out that the incoming array was in charset "Latin1"
I solved normalizing the array generated from the XML-RPC output, recursively applying a multbyte conversion function:
// given an XML-RPC output named $arr_output...
function descramble_diacritics(&$entry, $key) {
$entry = mb_convert_encoding($entry, 'UTF-8', 'Latin1');
}
array_walk_recursive($arr_output, 'descramble_diacritics');
header('Access-Control-Allow-Origin: *');
header('Content-Type: application/json');
echo json_encode($arr_output);

PHP & OCI query returns NUMBER columns as STRING

I'm using PHP5 and OCI 8 with Oracle 11g.
When I fetch a row using oci_fetch_all, the whole result is converted as STRING even for the NUMBER columns and even if I use Oracle's TO_NUMBER in the query.
What I'm trying to do is simple: the javascript calls the PHP script through an Ajax request. The script just fetch some NUMBER data and encode them into JSON. I want the data to be encoded as integer, so the javascript can do math stuff on it (add, divide,..etc) without any conversion.
I am pretty sure that the problem comes from OCI and not JSON encoding because when I VAR_DUMP the result of oci_fetch_all, I can clearly see double quotes on every result:
{
"COLUMN1":"12",
"COLUMN2":"52"
}
I want the result to look like this:
{
"COLUMN1":12,
"COLUMN2":52
}
I tried to:
Change the flag of oci_fetch_all (OCI_FETCHSTATEMENT_BY_ROW, OCI_FETCHSTATEMENT_BY_COLUMN...)
Use oci_fetch_array instead of oci_fetch_all
Remove the UTF8 encoding on the connexion to oracle (I know, its stupid)
The strange thing is that I can't find any thing on the internet about this problem... It's like nobody faced the same issue. Maybe i'm doing something wrong...
Thanks in advance
You can use an extra option in json_encode:
json_encode($rows, JSON_NUMERIC_CHECK);
However this option requires a PHP version of 5.3.3 or higher (thus its ok for you).
All database extensions in PHP work like this, there's nothing you can do about it.
You'll have to manually type-cast the database results.

PHP json_encode assigns null instead of value?

I have a CSV file that looks like this:
http://ideone.com/YWuuWx
I read the file and convert it to array, which works completely fine, but then I jsonize the array - but json_encode doesnt put the real values - it puts null - here is the dump of the array and jsonized array:
http://jave.jecool.net/stackoverflowdemos/csv_to_json_to_arraydump.php
I convert like this: $php_array= json_encode($json_array,JSON_PRETTY_PRINT);
anyone knows what might cause the problem?
EDIT: I think ther is like 90% chance that its caused by the latin1 characters - anyone knows the best workaround?
Assuming that it is in fact an encoding error, and that your data is actually encoded in some ISO-8859 variant (I'm guessing latin2 rather than latin1 based on your use of LATIN SMALL LETTER R WITH CARON), and that it is CONSISTENTLY so, you can use iconv() to re-encode it as UTF-8 before doing json_encode():
$foo = iconv('ISO-8859-2', 'utf8', $foo);

PHP json_decode returns null

I'm writing PHP code that uses a database. To do so, I use an array as a hash-map.
Every time content is added or removed from my DB, I save it to file.
I'm forced by my DB structure to use this method and can't use mysql or any other standard DB (School project, so structure stays as is).
I built two functions:
function saveDB($db){
$json_db = json_encode($db);
file_put_contents("wordsDB.json", $json_db);
} // saveDB
function loadDB(){
$json_db = file_get_contents("wordsDB.json");
return json_decode($json_db, true);
} // loadDB
When echo-ing the string I get after the encoding or after loading from file, I get a valid json (Tested it on a json viewer) Whenever I try to decode the string using json_decode(), I get null (Tested it with var_dump()).
The json string itself is very long (~200,000 characters, and that's just for testing).
I tried the following:
Replacing single/double-quotes with double/single-quotes (Without any backslashes, with one backslash and three backslashes. And any combination I could think of with a different number of backslashes in the original and replaced string), both manually and using str_replace().
Adding quotes before and after the json string.
Changing the page's encoding.
Decoding without saving to file (Right after encoding).
Checked for slashes and backslashes. None to be found.
Tried addslashes().
Tried using various "Escape String" variants.
json_last_error() doesn't work. I get no error number (Get null, not 0).
It's not my server, so I'm not sure what PHP version is used, and I can't upgrade/downgrade/install anything.
I believe the size has something to do with it, because small strings seem to work fine.
Thanks Everybody :)
In your JSON file change null to "null" and it will solve the problem.
Check if your file is UTF8 encoded. json_decode works with UTF8 encoded data only.
EDIT:
After I saw uploaded JSON data, I did some digging and found that there are 'null' key. Search for:
"exceeding":{"S01E01.html":{"2217":1}},null:{"S01E01.html":
Change that null to be valid property name and json_decode will do the job.
I had a similar problem last week. my json was valid according to jsonlint.com.
My json string contained a # and a & and those two made json_decode fail and return null.
by using var_dump(json_decode($myvar)) which stops right where it fails I managed to figure out where the problem was coming from.
I suggest var_dumping and using find dunction to look for these king of characters.
Just on the off chance.. and more for anyone hitting this thread rather than the OP's issue...I missed the following, someone had htmlentities($json) way above me in the call stack. Just ensure you haven't been bitten by the same and check the html source.
Kickself #124

How do I get PHP to accept ISO-8859-1 characters in general?

This has been bugging me for ages and I want to get to the bottom of this once and for all. I have an associative array which fields I have defined using ISO-8859-1 characters. For instance:
array("utført" => "red");
I also have another array that I have loaded in from a file. I have printed this array out in a browser, checking that values like Æ, Ø and Å is intact. I try to compare two fields from these arrays and I'm slapped by the message:
Undefined index: utfã¸rt on line 39
I can't help but sob. Every single damn time I involve any letters outside UTF-8 in a script they are at some point converted into ã¸r or similar nonsense.
My script file is encoded in ISO-8859-1, the document from which I'm loading my data is the same, and so is the MySQL table I'm trying to save the data to.
So the only conclusion I can draw is that PHP isn't accepting just any character-sets into it's code, and I have to somehow force PHP to speak Norwegian.
Thanks for any suggestions
Just FYI, I won't accept any answers in the lines of "Just don't use those characters" or "Just replace those characters with UTF equivalents at file load" or any other hack solutions
When you read your data from external file try to convert them in proper encoding.
Something like this I have on my mind...
$f = file_get_contents('externaldata.txt');
$f = mb_convert_encoding($f, 'iso-8859-1');
// from this point deal with $f whatever you want
Also, look at mb_convert_encoding() manual for more info.

Categories