can't decode a urlencoded utf-8 string - php

Here's the string:
%d0%91%d0%b5%d0%b7
I think it's cyrillic and I need it converted to something readable.
mb_detect_encoding() states it's ASCII.
When I do iconv('ASCII', 'UTF-8', $str), it shows me the same string.
Judging by this article, looks like it's in UTF-8, but how do I decode that into readable UTF?
Please help
UPDATE the following site was able to decode the text: http://2cyr.com/decode/?lang=en (thanks to Faiz Rasool for pointing out). The preset I used there is source=utf-8; postfilter=urlencoded, but I've no idea how to reproduce that on my server.

This string look like url encoded try this.
<?php
$str = "%d0%91%d0%b5%d0%b7";
echo utf8_encode(urldecode($str));
?>

Related

Json encode with special char [duplicate]

I have some json I need to decode, alter and then encode without messing up any characters.
If I have a unicode character in a json string it will not decode. I'm not sure why since json.org says a string can contain: any-Unicode-character- except-"-or-\-or- control-character. But it doesn't work in python either.
{"Tag":"Odómetro"}
I can use utf8_encode which will allow the string to be decoded with json_decode, however the character gets mangled into something else. This is the result from a print_r of the result array. Two characters.
[Tag] => Odómetro
When I encode the array again I the character escaped to ascii, which is correct according to the json spec:
"Tag"=>"Od\u00f3metro"
Is there some way I can un-escape this? json_encode gives no such option, utf8_encode does not seem to work either.
Edit I see there is an unescaped_unicode option for json_encode. However it's not working as expected. Oh damn, it's only on php 5.4. I will have to use some regex as I only have 5.3.
$json = json_encode($array, JSON_UNESCAPED_UNICODE);
Warning: json_encode() expects parameter 2 to be long, string ...
I have found following way to fix this issue... I hope this can help you.
json_encode($data,JSON_UNESCAPED_UNICODE|JSON_UNESCAPED_SLASHES);
Judging from everything you've said, it seems like the original Odómetro string you're dealing with is encoded with ISO 8859-1, not UTF-8.
Here's why I think so:
json_encode produced parseable output after you ran the input string through utf8_encode, which converts from ISO 8859-1 to UTF-8.
You did say that you got "mangled" output when using print_r after doing utf8_encode, but the mangled output you got is actually exactly what would happen by trying to parse UTF-8 text as ISO 8859-1 (ó is \x63\xb3 in UTF-8, but that sequence is ó in ISO 8859-1.
Your htmlentities hackaround solution worked. htmlentities needs to know what the encoding of the input string to work correctly. If you don't specify one, it assumes ISO 8859-1. (html_entity_decode, confusingly, defaults to UTF-8, so your method had the effect of converting from ISO 8859-1 to UTF-8.)
You said you had the same problem in Python, which would seem to exclude PHP from being the issue.
PHP will use the \uXXXX escaping, but as you noted, this is valid JSON.
So, it seems like you need to configure your connection to Postgres so that it will give you UTF-8 strings. The PHP manual indicates you'd do this by appending options='--client_encoding=UTF8' to the connection string. There's also the possibility that the data currently stored in the database is in the wrong encoding. (You could simply use utf8_encode, but this will only support characters that are part of ISO 8859-1).
Finally, as another answer noted, you do need to make sure that you're declaring the proper charset, with an HTTP header or otherwise (of course, this particular issue might have just been an artifact of the environment where you did your print_r testing).
JSON_UNESCAPED_UNICODE was added in PHP 5.4 so it looks like you need upgrade your version of PHP to take advantage of it. 5.4 is not released yet though! :(
There is a 5.4 alpha release candidate on QA though if you want to play on your development machine.
A hacky way of doing JSON_UNESCAPED_UNICODE in PHP 5.3. Really disappointed by PHP json support. Maybe this will help someone else.
$array = some_json();
// Encode all string children in the array to html entities.
array_walk_recursive($array, function(&$item, $key) {
if(is_string($item)) {
$item = htmlentities($item);
}
});
$json = json_encode($array);
// Decode the html entities and end up with unicode again.
$json = html_entity_decode($rson);
$json = array('tag' => 'Odómetro'); // Original array
$json = json_encode($json); // {"Tag":"Od\u00f3metro"}
$json = json_decode($json); // Od\u00f3metro becomes Odómetro
echo $json->{'tag'}; // Odómetro
echo utf8_decode($json->{'tag'}); // Odómetro
You were close, just use utf8_decode.
try setting the utf-8 encoding in your page:
header('content-type:text/html;charset=utf-8');
this works for me:
$arr = array('tag' => 'Odómetro');
$encoded = json_encode($arr);
$decoded = json_decode($encoded);
echo $decoded->{'tag'};
Try Using:
utf8_decode() and utf8_encode
To encode an array that contains special characters, ISO 8859-1 to UTF8. (If utf8_encode & utf8_decode is not what is working for you, this might be an option)
Everything that is in ISO-8859-1 should be converted to UTF8:
$utf8 = utf8_encode('이 감사의 마음을 전합니다!'); //contains UTF8 & ISO 8859-1 characters;
$iso88591 = mb_convert_encoding($utf8, 'ISO-8859-1', 'UTF-8');
$data = $iso88591;
Encode should work after this:
$encoded_data = json_encode($data);
Convert UTF-8 to & from ISO 8859-1

Converting to UTF-8 in PHP

I'm calling the Google Translate API and I need to send UTF-8 as input.
I have a piece of code to convert a string to UTF-8 but not matter what I try, when I check the encoding right after the conversion operation I get ASCII as the encoding of the string.
Here is the most popular answer I could find:
iconv(mb_detect_encoding($text, mb_detect_order(), true), "UTF-8", $text);
The other way I tried was like this:
$text = utf8_encode($text);
As soon as I check the encoding again (on both cases) I get ASCII as the result:
echo mb_detect_encoding($text);
What am I missing here?
Thanks for any tips.

json_decode & file_get_contents doesn't get the UTF8 characters

I use
$link = json_decode(file_get_contents("http://graph.facebook.com/111866602162732"));
the result on that page shows:
"name": "L\u00e9ry, Quebec",
I then want to convert that with the accents.. like this:
$location_name = $link->name;
echo 'NAME ORIGINAL: '.$location_name;
$location_name = preg_replace('/\\\\u([0-9a-fA-F]{4})/', '&#x\1;', $location_name); // convert to UTF8
echo ' NAME after: '.$location_name;
I get the following result:
NAME ORIGINAL: Léry, Quebec NAME after: Léry, Quebec
my preg_replace is correct, so it's the original name that is being transformed by the file_get_contents.
If file_get_contents don't give you back a well format UTF-8 text, then json_decode you would return NULL. Json MUST be in UTF-8 encoding.
This function only works with UTF-8 encoded strings. (json_decode)
So, I guess that you're reading the data with another encoding. Check it out.
Most likely, you're treating a valid UTF-8 output given to you by json_decode as ISO-8859-1
See here, for example: http://www.i18nqa.com/debug/bug-utf-8-latin1.html
Make sure that you're treating your debug output as UTF-8 - that should solve the problem.

Decode large string base64

I have to decode a large string base64 encoded:
<?php
$str ='base64code';
echo base64_decode($str);
?>
The link contains the base64 encoded string: http://www.interwebmedia.nl/dataxi/base64.txt
Online decoders give the right result but this php function doesn't. Is there a solution?
base64_decode outputs exactly what was encoded before. It does not adapt any contained values.
You are writing everything out in HTML context. And there any <tags> will not be shown in the browser window. Use show source. Or htmlspecialchars.

write unicode characters into a file in php

I have a json array which is holding the correct string independent of language but when the json is encoded and wrriten into the file it doesnot have the correct values. Its has the the other value random english alphabets eg:(uuadb) I want to write a string into a file where the string could be in any language.Now i am testing with tamil language. But i found PHP doesn't support unicode. please help me how to write unicode charaters into the file using PHP.
I tried using pack function but how to use the pack function for any languages Or is there any other way of doing this.Please help me......
My guess is that you're seeing \uXXXX escapes instead of the non-ASCII characters you asked for. json_encode appears to always escape Unicode characters:
<?php
$arr = array("♫");
$json = json_encode($arr);
echo "$json\n";
# Prints ["\u266b"]
$str = '["♫"]';
$array = json_decode($str);
echo "{$array[0]}\n";
# Prints ♫
?>
If this is what you're getting, it's not wrong. You just have to ensure it's being decoded properly on the receiving end.
Another possibility is that the string you're passing is not in UTF-8. According to the documentation for json_encode and json_decode, these functions only work with UTF-8 data. Call mb_detect_encoding on your input string, and make sure it outputs either UTF-8 or ASCII.

Categories