Unable to parse json file with TM symbol - php

I receive some generated json files and some files would contain the ™ symbol, if the file has that then json_decode won't work on it, when I print $data NULL will be printed. If I manually remove the symbol then I will see the data.
I am using the below code and it will print out what is in the json file until it gets to the json file that has the ™ symbol
$json = file_get_contents($count . '.json');
$data = json_decode($json);
echo '<pre>';
var_dump($data);
echo '</pre>';
I have tried using urlencode and urldecodeand htmlspecialchars but they don't work either.

json_decode will only parse UTF-8 strings.
If the file you are reading is not UTF-8 format it will fail.
If you do not know the encoding of the file you will be reading,
there are ways to convert the data to UTF-8 before parsing it as shown in this post:
PHP: Convert any string to UTF-8 without knowing the original character set, or at least try

Related

print cyrillic string from json php

I receive json answer from another script. Next I used $json = json_decode($json) and die(json['message']) for show specific string, and this value contains Cyrillic data.
Function mb_detect_encoding() shows that string in UTF-8.
Ok, I use charset="utf-8" in html file, but
I see this output "Пользователь с этим адресом электронной почты уже существует" in my browser.
I used mb_convert_encoding(json['message'], 'UTF-8'), without any effect/
Only var_dump($json) shows me decoded string.
Maybe I wrong to access data in json?
Use mb_convert_encoding(json['message'], "utf-8", "windows-1251"); to properly convert string.

json_decode & file_get_contents doesn't get the UTF8 characters

I use
$link = json_decode(file_get_contents("http://graph.facebook.com/111866602162732"));
the result on that page shows:
"name": "L\u00e9ry, Quebec",
I then want to convert that with the accents.. like this:
$location_name = $link->name;
echo 'NAME ORIGINAL: '.$location_name;
$location_name = preg_replace('/\\\\u([0-9a-fA-F]{4})/', '&#x\1;', $location_name); // convert to UTF8
echo ' NAME after: '.$location_name;
I get the following result:
NAME ORIGINAL: Léry, Quebec NAME after: Léry, Quebec
my preg_replace is correct, so it's the original name that is being transformed by the file_get_contents.
If file_get_contents don't give you back a well format UTF-8 text, then json_decode you would return NULL. Json MUST be in UTF-8 encoding.
This function only works with UTF-8 encoded strings. (json_decode)
So, I guess that you're reading the data with another encoding. Check it out.
Most likely, you're treating a valid UTF-8 output given to you by json_decode as ISO-8859-1
See here, for example: http://www.i18nqa.com/debug/bug-utf-8-latin1.html
Make sure that you're treating your debug output as UTF-8 - that should solve the problem.

JSON encode unicode issue on PHP5.3

a string in hebrew after json_encode looks like this:
[{"id":"1","value":"\u05d1\u05dc\u05d0\u05d2\u05df"}
any Idea what encoding is this and how do I get this to either work or be readable again?
BTW, this is a Joomla system which runs on PHP 5.3, string is from post request, not a database and UTF-8 meta tag do exist.
That's just how JSON encodes non-ASCII characters. The text will be readable again when you pass it through a JSON parser.
PHP 5.4 defines a new option for json_encode, JSON_UNESCAPED_UNICODE, that would pass UTF-8 text through as-is without converting it to escape codes. Since you are using PHP 5.3 you can't use it, but if you had 5.4 this is how it would be used:
$json = json_encode($obj, JSON_UNESCAPED_UNICODE); // PHP 5.4 required
However, this should not be needed because the JSON parser will decode the escape codes.
$encoded = json_encode($json);
$unescaped = preg_replace_callback('/\\\\u(\w{4})/', function ($matches) {
return html_entity_decode('&#x' . $matches[1] . ';', ENT_COMPAT,'UTF-8');
}, $encoded);
file_put_contents('sample.json', $unescaped);

json_encode returning null for UTF-8 charset

I have a json file like this
{"downloads":[
{
"url":"arquivo1.pdf",
"descricao":"árquivo 1"
},
{
"url":"arquivo2.pdf",
"descricao":"arquivo 2"
}
]}
And I save it using UTF-8 encode via Notepad++.
Then I get the file content:
function getContent($name)
{
$content = file_get_contents("configs/" . $name . ".json");
$encoded = utf8_encode($content);
return json_decode($encoded);
}
and json_decode returns null.
If I save the json file as ANSI then it works. But I'd like to save it as UTF-8.
I suspect that the initial file is either already in UTF-8 or in a badly formatted type.
NULL is returned if the json cannot be decoded or if the encoded data is deeper than the recursion limit.
Bad encoding
You can check if your input content is already valid utf-8 by doing this:
$is_valid_utf8 = mb_check_encoding($content, 'utf-8'));
If it is, don't re-encode it.
The documentation has more to offer: http://php.net/mb-check-encoding
BOM
Or maybe Notepad++ sets a BOM which could confuse json_decode.
//Remove UTF-8 BOM if present, json_decode() does not like it.
if(substr($content, 0, 3) == pack("CCC", 0xEF, 0xBB, 0xBF)) {
$content = substr($content, 3);
}
see json_decode's documentation.
json_decode woks fine with UTF-8 without BOM. Do you have any particular reason to use BOM?
If you convert your json file to UTF-8 without BOM you will not need to encode the content later with utf8_encode.
It isn't working because your file is already in UT8, and when you encode it again using utf8_encode(), PHP assumes your string is an ISO-8859-1 string and therefore breaks it.

write unicode characters into a file in php

I have a json array which is holding the correct string independent of language but when the json is encoded and wrriten into the file it doesnot have the correct values. Its has the the other value random english alphabets eg:(uuadb) I want to write a string into a file where the string could be in any language.Now i am testing with tamil language. But i found PHP doesn't support unicode. please help me how to write unicode charaters into the file using PHP.
I tried using pack function but how to use the pack function for any languages Or is there any other way of doing this.Please help me......
My guess is that you're seeing \uXXXX escapes instead of the non-ASCII characters you asked for. json_encode appears to always escape Unicode characters:
<?php
$arr = array("♫");
$json = json_encode($arr);
echo "$json\n";
# Prints ["\u266b"]
$str = '["♫"]';
$array = json_decode($str);
echo "{$array[0]}\n";
# Prints ♫
?>
If this is what you're getting, it's not wrong. You just have to ensure it's being decoded properly on the receiving end.
Another possibility is that the string you're passing is not in UTF-8. According to the documentation for json_encode and json_decode, these functions only work with UTF-8 data. Call mb_detect_encoding on your input string, and make sure it outputs either UTF-8 or ASCII.

Categories