Json_encode Charset problem - php

When I use json_encode to encode my multi lingual strings , It also changes special characters.What should I do to keep them same .
For example
<?
echo json_encode(array('şüğçö'));
It returns something like ["\u015f\u00fc\u011f\u00e7\u00f6"]
But I want ["şüğçö"]

try it:
<?
echo json_encode(array('şüğçö'), JSON_UNESCAPED_UNICODE);

In JSON any character in strings may be represented by a Unicode escape sequence. Thus "\u015f\u00fc\u011f\u00e7\u00f6" is semantically equal to "şüğçö".
Although those character can also be used plain, json_encode probably prefers the Unicode escape sequences to avoid character encoding issues.

PHP 5.4 adds the option JSON_UNESCAPED_UNICODE, which does what you want. Note that json_encode always outputs UTF-8.

You shouldn't want this
It's definitely possible, even without PHP 5.4.
First, use json_encode() to encode the string and save it in a variable.
Then simply use preg_replace() to replace all \uxxxx with unicode again.

json_encode() does not provide any options for choosing the charset the encoding is in in versions prior to 5.4.

<?php
print_r(json_decode(json_encode(array('şüğçö'))));
/*
Array
(
[0] => şüğçö
)
*/
So do you really need to keep these characters unescaped in the JSON?

Json_encode charset solution for PHP 5.3.3
As JSON_UNESCAPED_UNICODE is not working in PHP 5.3.3 so we have used this method and it is working.
$data = array(
'text' => 'Päiväkampanjat'
);
$json_encode = json_encode($data);
var_dump($json_encode); // text: "P\u00e4iv\u00e4kampanjat"
$unescaped_data = preg_replace_callback('/\\\\u(\w{4})/', function ($matches) {
return html_entity_decode('&#x' . $matches[1] . ';', ENT_COMPAT, 'UTF-8');
}, $json_encode);
var_dump($unescaped); // text is unescaped -> Päiväkampanjat

Related

Php json_encode converts utf8 string to characters codes

I have a Persian text "سرما"
And then when I convert it to JSON using json_encode(), I get a series of escaped character codes such as \u0633 which seems to be expected and of a rational process. But my confusion lies where I don't know how to convert them back into readable string of characters. How should I do that in PHP?
Should I use anything of mb_* family? I also have checked json_encode() parameters and have found nothing appropriate for me.
UPDATE
what I get saved in my DB is:
["u0633u0631u0645u0627"]
Which shows the characters are not escaped properly. While if I change it to
["\u0633\u0631\u0645\u0627"] it becomes easily readable by json_decode()
They should be converted back on the other end when it's decoded. This is the safest option as it might not be possible to guaranteed that the transmission or storage will not corrupt a multi-byte encoding.
If you're certain that everything is safe for UTF8 end-to-end you can do:
$res = json_encode($foo, \JSON_UNESCAPED_UNICODE);
http://php.net/manual/en/function.json-encode.php
Maybe try encoding the unicode characters, and then json_encoding it, then on the other side (receiving JSON) decode the json, then decode the unicode.
Example:
//Encode
json_encode(utf8_encode($string));
//Decode
utf8_decode(json_decode($string));
its simple just use JSON_UNESCAPED_SLASHES atribute
your problem is't utf8 you need force JSON to don't escape Slashes
example
$bar = "سرما";
$res = json_encode($bar, JSON_UNESCAPED_SLASHES );
// $res equal to ["\u0633\u0631\u0645\u0627"]
if you check the result in your MYSQL Database
it happen when you did't Use addslashes()
example
$bar = "سرما";
$res = json_encode($bar, JSON_UNESCAPED_SLASHES );
$res = addslashes($res);
// $res equal to ["\\u0633\\u0631\\u0645\\u0627"] now it's ready to use in MYSQL

Encoding string with non-ascii characters

I have a string such as this - Panamá. I need to convert this string to Panam\xE1 so it's readable in a JavaScript file I'm generating using PHP.
Is there a function to encode this in PHP? Any ideas would be appreciated.
My rule is,
If you try to encode or escape data using preg_replace or
using massive mapping arrays or str_replace, STOP you are probably doing it wrong.
All it takes is one missed or eroneous mapping (and you WILL miss some mappings) then you end up with code that doesn't work in all cases and code which corrupts your data in some cases. Whole libraries have been written already dedicated to doing the translations for you (e.g. iconv) and for escaping data, you should use the proper PHP function.
If you plan on outputting the data to a browser (the fact you want to encode for javascript suggests this) then I suggest using UTF8 encoding. If your data is in latin-1, use the utf8_encode function.
Whether your PHP string contains ASCII characters or not, to send any data from PHP to JS you should ALWAYS use the json_encode function.
PHP code
$your_encoding = 'latin1';
$panama = "Panamá";
//Get your data in utf8 if it isnt already
$panama = iconv($your_encoding, "utf-8", $panama);
$panama_encoded = json_encode($panama);
echo "var js_panama = " . $panama_encoded . ";";
JS Output
var js_panama = "Panam\u00e1";
Even though JSON supports unicode, it may not be compatible with your non UTF-8 javascript file. This is not a problem because the json_encode PHP function will escape unicode characters by default.
Assuming that your input is in the latin-1 encoding then ord and dechex will do what you want:
$result = preg_replace_callback(
'/[\x80-\xff]/',
function($match) {
return '\x'.dechex(ord($match[0]));
},
$input);
If your input is in any other encoding then you would need to know what encoding that is and adapt the solution accordingly. Note that in this case it would not be possible to use specifically the \x## notation in the JS output in all cases.
This should work for you:
$str = "Panamá";
$str = preg_replace_callback('/[\x{80}-\x{10FFFF}]/u', function ($m) {
$utf = iconv('UTF-8', 'UCS-4', current($m));
return sprintf("\x%s", ltrim(strtoupper(bin2hex($utf)), "0"));
}, $str);
echo $str;
Output (Source Code):
Panam\xE1

How to produce JSON - un-escaped unicodes in php 5.3.x [duplicate]

When I use json_encode to encode my multi lingual strings , It also changes special characters.What should I do to keep them same .
For example
<?
echo json_encode(array('şüğçö'));
It returns something like ["\u015f\u00fc\u011f\u00e7\u00f6"]
But I want ["şüğçö"]
try it:
<?
echo json_encode(array('şüğçö'), JSON_UNESCAPED_UNICODE);
In JSON any character in strings may be represented by a Unicode escape sequence. Thus "\u015f\u00fc\u011f\u00e7\u00f6" is semantically equal to "şüğçö".
Although those character can also be used plain, json_encode probably prefers the Unicode escape sequences to avoid character encoding issues.
PHP 5.4 adds the option JSON_UNESCAPED_UNICODE, which does what you want. Note that json_encode always outputs UTF-8.
You shouldn't want this
It's definitely possible, even without PHP 5.4.
First, use json_encode() to encode the string and save it in a variable.
Then simply use preg_replace() to replace all \uxxxx with unicode again.
json_encode() does not provide any options for choosing the charset the encoding is in in versions prior to 5.4.
<?php
print_r(json_decode(json_encode(array('şüğçö'))));
/*
Array
(
[0] => şüğçö
)
*/
So do you really need to keep these characters unescaped in the JSON?
Json_encode charset solution for PHP 5.3.3
As JSON_UNESCAPED_UNICODE is not working in PHP 5.3.3 so we have used this method and it is working.
$data = array(
'text' => 'Päiväkampanjat'
);
$json_encode = json_encode($data);
var_dump($json_encode); // text: "P\u00e4iv\u00e4kampanjat"
$unescaped_data = preg_replace_callback('/\\\\u(\w{4})/', function ($matches) {
return html_entity_decode('&#x' . $matches[1] . ';', ENT_COMPAT, 'UTF-8');
}, $json_encode);
var_dump($unescaped); // text is unescaped -> Päiväkampanjat

PHP and accent characters (Ba\u015f\u00e7\u0131l)

I have a string like so "Ba\u015f\u00e7\u0131l". I'm assuming those are some special accent characters. How do I:
1) Display the string with the accents (i.e replace code with actual character)
2) What is best practice for storing strings like this?
2) If I don't want to allow such characters, how do I replace it with "normal characters"?
My educated guess is that you obtained such values from a JSON string. If that's the case, you should properly decode the full piece of data with json_decode():
<?php
header('Content-Type: text/plain; charset=utf-8');
$data = '"Ba\u015f\u00e7\u0131l"';
var_dump( json_decode($data) );
?>
To display the characters look at How to decode Unicode escape sequences like "\u00ed" to proper UTF-8 encoded characters?
You can store the character like that, or decoded, just make sure your storage can handle the UTF8 charset.
Use iconv with the translit flag.
Here's an example...
function replace_unicode_escape_sequence($match) {
return mb_convert_encoding(pack('H*', $match[1]), 'UTF-8', 'UCS-2BE');
}
$str = preg_replace_callback('/\\\\u([0-9a-f]{4})/i', 'replace_unicode_escape_sequence', $str);
echo $str;
echo '<br/>';
$str = iconv('UTF8', 'ASCII//TRANSLIT', $str);
echo $str;
Here's another option:
<html><head>
<!-- don't forget to tell the browser what encoding you're using: -->
<meta http-equiv="Content-type" content="text/html;charset=UTF-8" />
</head><body><?php
$string = "Ba\u015f\u00e7\u0131l";
echo json_decode('"'.str_replace('"', '\"', $string).'"');
?></body></html>
This works because the \u000 syntax is what JSON uses. Note that json_decode() requires the JSON module, which is now a part of the standard PHP installation.
There is no native support in PHP to decode such strings.
There are several tricks to use native function though I am not sure that any of those is safe and injection proof :
json_decode . See http://noteslog.com/post/escaping-and-unescaping-utf-8-characters-in-php/
xml parser
regex replace
If anybody has other options for escaping/unescaping Utf8 using native function, please post a reply.
Another option using Zend Framework is to download the Zend_Utf8 proposal class. See more information at Zend_Utf8 proposal for Zend Framework
Outputing them would output the appropriate character. If you don't provide any encoding for the output document, the browser would try and guess the best one to show. Otherwise you should figure it out and output explicitly.
Simply store them, or turn them into normal chars and binary store them.
Use iconv functions to convert from one encoding to another, then you shuold save your source file with the desired encoding to support it.

Convert a JSON into a UTF-8 string

I want to convert a JSON object into a string. when I am using json_encode I get a string but all with hex letters. I want to convert it to a UTF-8. In other words I want to see the characters. How do I do it?
I was using json_encode to store data such as Arabic Characters in MySQL fields.
It would store the Arabic characters as HEX within the Database like this:
u0644 u063a...
Which is incorrect. You must ensure that you wrap your json_encode with mysql_escape_string().
This will make sure that the data is put in MySQL as:
\u0644\u063a...
Then, when you use json_decode, it converts the HEX strings into UTF-8 and is output correctly.
You can try passing an option to json_encode():
json_encode ( $value, JSON_UNESCAPED_UNICODE );
The JSON_UNESCAPED_UNICODE option is only available in PHP version 5.4.0 and later.
Thanks.
You can't, in PHP. Besides, the strings will still be the same once you decode them.
you are looking exactly for the funcition json_decode
it can convert json strings into utf8
here is an example of arabic word
$re = json_encode('لغة عربية');
echo $re ;
$dd = json_decode($re);
echo $dd ;
die;
it output :
"\u0644\u063a\u0629 \u0639\u0631\u0628\u064a\u0629"
لغة عربية
more examples here
http://php.net/manual/en/function.json-decode.php

Categories