I want to encode to ISO-8859-1 a url that contains latin characters like à, using PHP.
The encoded string will be used to perform a request to a webservice. So if the request is:
http://www.mywebservice.com?param=à
the encoded string should be:
http://www.mywebservice.com?param=%E0
I've tried using PHP's function urlencode() but it returns the input encoded in UTF-8:
http://www.mywebservice.com?param=%C3%A0
Use utf8_decode before urlencode:
urlencode(utf8_decode("à"))
Related
My PHP application outputs JSON where special characters are encoded, f.ex. the string "Brøndum" is represented as "Br\u00f8ndum".
Can you tell me which encoding this is, as well as how I get back from "Br\u00f8ndum" to "Brøndum".
I have tried utf8_encode/decode but they don't work as expected.
Thanks!
That's standard JSON unicode escaping.
You get back to the actual character by using a JSON parser. json_decode in the case of PHP.
You can tell PHP not to escape Unicode characters in the first place with the JSON_UNESCAPED_UNICODE flag.
json_encode("Brøndum", JSON_UNESCAPED_UNICODE)
mb_detect_encoding is your function. You just pass it the string and it detects the codification. You can also send it an array with the possibilities (as a regular string like "hello" could potentially be encoded in different codifications.
echo mb_detect_encoding("Br\u00f8ndum");
How do you correctly encode an URL with foreign characters in PHP?
I assumed urlencode() would do the trick but it does not.
The correct encoding for the following URL
http://eu.battle.net/wow/en/character/anachronos/Paddestøel/advanced
Is this:
http://eu.battle.net/wow/en/character/anachronos/Paddest%C3%B8el/advanced
But urlencode encodes it like this:
http://eu.battle.net/wow/en/character/anachronos/Paddest%F8el/advanced
What function do I use to encode it like on the second example?
Your PHP scripts seem to use some single-byte encoding. You can either:
Save the source code as UTF-8
Convert data to UTF-8 with iconv() or mb_convert_encoding()
In general, making the full switch to UTF-8 fixes all encoding issues at once but initial migration might require some extra work.
There is no "correct" encoding. URL-percent-encoding simply represents raw bytes. It's up to you what those bytes are or how you're going to interpret them later. If your string is UTF-8 encoded, the percent-encoded raw byte representation is %C3%B8. If your string is not UTF-8 encoded, it's something else. If you want %C3%B8, make sure your string is UTF-8 encoded.
Use UTF-8 encoding
function url_encode($string){
return urlencode(utf8_encode($string));
}
Then use this function to encode your url (got it in a comment here: http://php.net/manual/en/function.urlencode.php)
I've got a string that is in my database like 中华武魂 when I post my request to retrieve the data via my website I'm getting the data to the server in the format %E4%B8%AD%E5%8D%8E%E6%AD%A6%E9%AD%82
What decoding steps to I have to take in order to get it back to the usable form?
While also cleaning the user input to ensure they're not going to try an SQL injection attack?
(escape string before or after encoding?)
EDIT:
rawurldecode(); // returns "ä¸åŽæ¦é‚"
urldecode(); // returns "ä¸åŽæ¦é‚"
public function utf8_urldecode($str) {
$str = preg_replace("/%u([0-9a-f]{3,4})/i","&#x\\1;",urldecode($str));
return html_entity_decode($str,null,'UTF-8');
}
// returns "ä¸åŽæ¦é‚"
... which actually works when I try and use it in an SQL statement.
I think because I was doing an echo and die(); without specifying a header of UTF-8 (thus I guess that was reading to me as latin)
Thanks for the help!
When your data is actually that percent-encoded form, you just have to call rawurldecode:
$data = '%E4%B8%AD%E5%8D%8E%E6%AD%A6%E9%AD%82';
$str = rawurldecode($data);
This suffices as the data already is encoded in UTF-8: 中 (U+4E2D) is encoded with the byte sequence 0xE4B8AD in UTF-8 and that is encoded with %E4%B8%AD when using the percent-encoding.
That your output does not seem to be as expected is probably because the output is interpreted with the wrong character encoding, probably Windows-1252 instead of UTF-8. Because in Windows-1252, 0xE4 represents ä, 0xB8 represents ¸, 0xAD represents å, and so on. So make sure to specify the output character encoding properly.
Use PHP's urldecode:
http://php.net/manual/en/function.urldecode.php
You have choices here: urldecode or rawurldecode.
If you had encoded your string using urlencode, you must use urldecode because of the way spaces are handled. While urlencode converts spaces to +, it is not the same with rawurlencode.
I'm trying to decode a JSON feed containing some Cyrillic characters. Not all of the characters in the feed is Cyrillic though. I'm using json_decode which works fine for anything else, but return garbage when there are Cyrillic characters.
The results look like this: Деффачки
Any ideas?
Your page is being decoded as CP1252 when it's actually UTF-8. Set your headers properly.
>>> print u'Деффачки'.encode('cp1252').decode('utf-8')
Деффачки
if you can not decode unicode characters with json_decode, use addslashes() while using json_encode. The problem comes from unicode chars starting with \ such as \u30d7
$json_data = addslashes(json_encode($unicode_string_or_array));
hermanschutte Use the escape function when sending data through javascript
I have written an XML file which is using the ISO-8859-15 encoding and most of the data within the feed is ran through htmlspecialchars().
I am then using simplyxml_load_string() to retrieve the contents of the XML file to use in my script. However, if I have any special characters (ie: é á ó) it comes out as "é á ó". The
How can I get my script to display the proper special accented characters?
You’re probably using a different character encoding for you output than the XML data is actually encoded.
According to your description, your XML data encoded with UTF-8 but your output is using ISO 8859-15. Because UTF-8 encodes the character é (U+00E9) with 0xC3A9 and that represents the two characters à and © respectively in ISO 8859-15.
So you either use UTF-8 for your output as well. Or you convert the data from UTF-8 to ISO 8859-15 using mb_convert_encoding.