If a json data
{"inf": "Väri-väri"}
is saved like
{"inf": "Vu00e4ri-vu00e4ri"}
How to recover letters õ, ä, ö, ü, etc in whole json with php. utf8_decode, and utf8_encode i tried.
Thank you.
you have some flag for jsong_encode for get option : http://php.net/manual/en/json.constants.php try
json_encode($myVar,JSON_UNESCAPED_UNICODE)
The problem in your case is not the JSON encoding in itself, but how you store the encoded JSON document. Note how the encoded JSON document actually should look like:
$a = ["inf" => "Väri-väri"];
echo json_encode($a) . "\n";
// prints: {"inf":"V\u00e4ri-v\u00e4ri"}
This is the expected behaviour in PHP and totally consistent with the JSON spec in RFC-7159:
Any character may be escaped. If the character is in the Basic
Multilingual Plane (U+0000 through U+FFFF), then it may be
represented as a six-character sequence: a reverse solidus, followed
by the lowercase letter u, followed by four hexadecimal digits that
encode the character's code point. The hexadecimal letters A though
F can be upper or lower case. So, for example, a string containing
only a single reverse solidus character may be represented as
"\u005C".
However, you're losing the \ characters at some point when storing the data. A wild guess is that you're storing these strings in a relational database using SQL and did not escape properly. The first thing I'd suggest is to investigate how you store your data and ensure that backslashes are properly escaped when storing these strings in a database. If stored correctly, json_decode will easily decode the encoded characters back to regular unicode characters.
Alternatively, you can disable this behaviour by passing the JSON_UNESCAPED_UNICODE flag into json_encode:
echo json_encode($a, JSON_UNESCAPED_UNICODE));
Have a look on the php documentation. If you decode the json code, the letters will be recovered.
Related
When using "special" Unicode characters they come out as weird garbage when encoded to JSON:
php > echo json_encode(['foo' => '馬']);
{"foo":"\u99ac"}
Why? Have I done something wrong with my encodings?
(This is a reference question to clarify the topic once and for all, since this comes up again and again.)
First of all: There's nothing wrong here. This is how characters can be encoded in JSON. It is in the official standard. It is based on how string literals can be formed in Javascript ECMAScript (section 7.8.4 "String Literals") and is described as such:
Any code point may be represented as a hexadecimal number. The meaning of such a number is determined by ISO/IEC 10646. If the code point is in the Basic Multilingual Plane (U+0000 through U+FFFF), then it may be represented as a six-character sequence: a reverse solidus, followed by the lowercase letter u, followed by four hexadecimal digits that encode the code point. [...] So, for example, a string containing only a single reverse solidus character may be represented as "\u005C".
In short: Any character can be encoded as \u...., where .... is the Unicode code point of the character (or the code point of half of a UTF-16 surrogate pair, for characters outside the BMP).
"馬"
"\u99ac"
These two string literals represent the exact same character, they're absolutely equivalent. When these string literals are parsed by a compliant JSON parser, they will both result in the string "馬". They don't look the same, but they mean the same thing in the JSON data encoding format.
PHP's json_encode preferably encodes non-ASCII characters using \u.... escape sequences. Technically it doesn't have to, but it does. And the result is perfectly valid. If you prefer to have literal characters in your JSON instead of escape sequences, you can set the JSON_UNESCAPED_UNICODE flag in PHP 5.4 or higher:
php > echo json_encode(['foo' => '馬'], JSON_UNESCAPED_UNICODE);
{"foo":"馬"}
To emphasise: this is just a preference, it is not necessary in any way to transport "Unicode characters" in JSON.
When using "special" Unicode characters they come out as weird garbage when encoded to JSON:
php > echo json_encode(['foo' => '馬']);
{"foo":"\u99ac"}
Why? Have I done something wrong with my encodings?
(This is a reference question to clarify the topic once and for all, since this comes up again and again.)
First of all: There's nothing wrong here. This is how characters can be encoded in JSON. It is in the official standard. It is based on how string literals can be formed in Javascript ECMAScript (section 7.8.4 "String Literals") and is described as such:
Any code point may be represented as a hexadecimal number. The meaning of such a number is determined by ISO/IEC 10646. If the code point is in the Basic Multilingual Plane (U+0000 through U+FFFF), then it may be represented as a six-character sequence: a reverse solidus, followed by the lowercase letter u, followed by four hexadecimal digits that encode the code point. [...] So, for example, a string containing only a single reverse solidus character may be represented as "\u005C".
In short: Any character can be encoded as \u...., where .... is the Unicode code point of the character (or the code point of half of a UTF-16 surrogate pair, for characters outside the BMP).
"馬"
"\u99ac"
These two string literals represent the exact same character, they're absolutely equivalent. When these string literals are parsed by a compliant JSON parser, they will both result in the string "馬". They don't look the same, but they mean the same thing in the JSON data encoding format.
PHP's json_encode preferably encodes non-ASCII characters using \u.... escape sequences. Technically it doesn't have to, but it does. And the result is perfectly valid. If you prefer to have literal characters in your JSON instead of escape sequences, you can set the JSON_UNESCAPED_UNICODE flag in PHP 5.4 or higher:
php > echo json_encode(['foo' => '馬'], JSON_UNESCAPED_UNICODE);
{"foo":"馬"}
To emphasise: this is just a preference, it is not necessary in any way to transport "Unicode characters" in JSON.
When using "special" Unicode characters they come out as weird garbage when encoded to JSON:
php > echo json_encode(['foo' => '馬']);
{"foo":"\u99ac"}
Why? Have I done something wrong with my encodings?
(This is a reference question to clarify the topic once and for all, since this comes up again and again.)
First of all: There's nothing wrong here. This is how characters can be encoded in JSON. It is in the official standard. It is based on how string literals can be formed in Javascript ECMAScript (section 7.8.4 "String Literals") and is described as such:
Any code point may be represented as a hexadecimal number. The meaning of such a number is determined by ISO/IEC 10646. If the code point is in the Basic Multilingual Plane (U+0000 through U+FFFF), then it may be represented as a six-character sequence: a reverse solidus, followed by the lowercase letter u, followed by four hexadecimal digits that encode the code point. [...] So, for example, a string containing only a single reverse solidus character may be represented as "\u005C".
In short: Any character can be encoded as \u...., where .... is the Unicode code point of the character (or the code point of half of a UTF-16 surrogate pair, for characters outside the BMP).
"馬"
"\u99ac"
These two string literals represent the exact same character, they're absolutely equivalent. When these string literals are parsed by a compliant JSON parser, they will both result in the string "馬". They don't look the same, but they mean the same thing in the JSON data encoding format.
PHP's json_encode preferably encodes non-ASCII characters using \u.... escape sequences. Technically it doesn't have to, but it does. And the result is perfectly valid. If you prefer to have literal characters in your JSON instead of escape sequences, you can set the JSON_UNESCAPED_UNICODE flag in PHP 5.4 or higher:
php > echo json_encode(['foo' => '馬'], JSON_UNESCAPED_UNICODE);
{"foo":"馬"}
To emphasise: this is just a preference, it is not necessary in any way to transport "Unicode characters" in JSON.
When using "special" Unicode characters they come out as weird garbage when encoded to JSON:
php > echo json_encode(['foo' => '馬']);
{"foo":"\u99ac"}
Why? Have I done something wrong with my encodings?
(This is a reference question to clarify the topic once and for all, since this comes up again and again.)
First of all: There's nothing wrong here. This is how characters can be encoded in JSON. It is in the official standard. It is based on how string literals can be formed in Javascript ECMAScript (section 7.8.4 "String Literals") and is described as such:
Any code point may be represented as a hexadecimal number. The meaning of such a number is determined by ISO/IEC 10646. If the code point is in the Basic Multilingual Plane (U+0000 through U+FFFF), then it may be represented as a six-character sequence: a reverse solidus, followed by the lowercase letter u, followed by four hexadecimal digits that encode the code point. [...] So, for example, a string containing only a single reverse solidus character may be represented as "\u005C".
In short: Any character can be encoded as \u...., where .... is the Unicode code point of the character (or the code point of half of a UTF-16 surrogate pair, for characters outside the BMP).
"馬"
"\u99ac"
These two string literals represent the exact same character, they're absolutely equivalent. When these string literals are parsed by a compliant JSON parser, they will both result in the string "馬". They don't look the same, but they mean the same thing in the JSON data encoding format.
PHP's json_encode preferably encodes non-ASCII characters using \u.... escape sequences. Technically it doesn't have to, but it does. And the result is perfectly valid. If you prefer to have literal characters in your JSON instead of escape sequences, you can set the JSON_UNESCAPED_UNICODE flag in PHP 5.4 or higher:
php > echo json_encode(['foo' => '馬'], JSON_UNESCAPED_UNICODE);
{"foo":"馬"}
To emphasise: this is just a preference, it is not necessary in any way to transport "Unicode characters" in JSON.
When using "special" Unicode characters they come out as weird garbage when encoded to JSON:
php > echo json_encode(['foo' => '馬']);
{"foo":"\u99ac"}
Why? Have I done something wrong with my encodings?
(This is a reference question to clarify the topic once and for all, since this comes up again and again.)
First of all: There's nothing wrong here. This is how characters can be encoded in JSON. It is in the official standard. It is based on how string literals can be formed in Javascript ECMAScript (section 7.8.4 "String Literals") and is described as such:
Any code point may be represented as a hexadecimal number. The meaning of such a number is determined by ISO/IEC 10646. If the code point is in the Basic Multilingual Plane (U+0000 through U+FFFF), then it may be represented as a six-character sequence: a reverse solidus, followed by the lowercase letter u, followed by four hexadecimal digits that encode the code point. [...] So, for example, a string containing only a single reverse solidus character may be represented as "\u005C".
In short: Any character can be encoded as \u...., where .... is the Unicode code point of the character (or the code point of half of a UTF-16 surrogate pair, for characters outside the BMP).
"馬"
"\u99ac"
These two string literals represent the exact same character, they're absolutely equivalent. When these string literals are parsed by a compliant JSON parser, they will both result in the string "馬". They don't look the same, but they mean the same thing in the JSON data encoding format.
PHP's json_encode preferably encodes non-ASCII characters using \u.... escape sequences. Technically it doesn't have to, but it does. And the result is perfectly valid. If you prefer to have literal characters in your JSON instead of escape sequences, you can set the JSON_UNESCAPED_UNICODE flag in PHP 5.4 or higher:
php > echo json_encode(['foo' => '馬'], JSON_UNESCAPED_UNICODE);
{"foo":"馬"}
To emphasise: this is just a preference, it is not necessary in any way to transport "Unicode characters" in JSON.