This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
How to get the character from unicode value in PHP?
PHP: Convert unicode codepoint to UTF-8
How can I convert a unicode character such as %u05E1 to a normal character via PHP?
The chr function not covering it and I am looking for something similar.
"%uXXXX" is a non-standard scheme for URL-encoding Unicode characters. Apparently it was proposed but never really used. As such, there's hardly any standard function that can decode it into an actual UTF-8 sequence.
It's not too difficult to do it yourself though:
$string = '%u05E1%u05E2';
$string = preg_replace('/%u([0-9A-F]+)/', '&#x$1;', $string);
echo html_entity_decode($string, ENT_COMPAT, 'UTF-8');
This converts the %uXXXX notation to HTML entity notation &#xXXXX;, which can be decoded to actual UTF-8 by html_entity_decode. The above outputs the characters "סע" in UTF-8 encoding.
Use hexdec to convert it to it's decimal representation first.
echo chr(hexdec("05E1"));
var_dump(hexdec("%u05E1") == hexdec("05E1")); //true
Related
This question already has answers here:
Unicode character in PHP string
(8 answers)
Closed 4 years ago.
Is There a way to print special characters in PHP using only source code with ascii characters?
For example, in javascript, we can use \u00e1 in the middle of text.
In Java we can use \u2202 for example.
And in PHP? How can I use it?
I don't want to include special chars in my source code.
I found 3 ways for this.
Php Documentation: http://php.net/manual/en/language.types.string.php#language.types.string.syntax.double
A good explanation in portuguese: https://pt.stackoverflow.com/questions/293500/escrevendo-c%C3%B3digo-em-php-sem-caracteres-especiais
Sintax added only in PHP7:
\u{[0-9A-Fa-f]+}
the sequence of characters matching the regular expression is a Unicode codepoint.
which will be output to the string as that codepoint's UTF-8 representation
examples:
<?php
echo "\u{00e1}\n";
echo "\u{2202}\n";
echo "\u{aa}\n";
echo "\u{0000aa}\n";
echo "\u{9999}\n";
Sintax for PHP7 and old PHP versions:
\x[0-9A-Fa-f]{1,2}
the sequence of characters matching the regular expression,
is a character in hexadecimal notation
examples:
<?php
echo "\xc3\xa1\n";
echo "\u{00e1}\n";
Using int to binary convertion functions:
<?php
printf('%c%c', 0xC3, 0xA1);
echo chr(0xC3) . chr(0xA1);
printf() Extended Unicode Characters?
http://phptester.net/
Converting unknown symbols in url ,
like this
https://r4---sn-hgn7zn7r.c.docs.google.com/videoplayback?requiressl\u003dyes\u0026id\u003d376b916e4a3c65b1\u0026itag\u003d22\u0026source\u003dwebdrive\u0026app\u003dtexmex\u0026ip\u003d109.110.116.1\u0026ipbits\u003d8\u0026expire\u003d1456065477\u0026sparams\u003drequiressl%2Cid%2Citag%2Csource%2Cip%2Cipbits%2Cexpire\u0026signature\u003d5C06093099C3B4A7DE28AF323E2E15AC7DE5BEEE.758E1110B23CD41EA7E246DE2564ABE5368431FE\u0026key\u003dck2\u0026mm\u003d30\u0026mn\u003dsn-hgn7zn7r\u0026ms\u003dnxu\u0026mt\u003d1456050981\u0026mv\u003dm\u0026nh\u003dIgpwcjAyLm1yczAyKgkxMjcuMC4wLjE\u0026pl\u003d22
to real link,
like this
https://r4---sn-hgn7zn7r.c.docs.google.com/videoplayback?requiressl=yes&id=376b916e4a3c65b1&itag=22&source=webdrive&app=texmex&ip=109.110.116.1&ipbits=8&expire=1456065477&sparams=requiressl,id,itag,source,ip,ipbits,expire&signature=5C06093099C3B4A7DE28AF323E2E15AC7DE5BEEE.758E1110B23CD41EA7E246DE2564ABE5368431FE&key=ck2&mm=30&mn=sn-hgn7zn7r&ms=nxu&mt=1456050981&mv=m&nh=IgpwcjAyLm1yczAyKgkxMjcuMC4wLjE&pl=22
i have no idea how convert it ,
i use this website to convert the link
DDecode - Hex,Octal,HTML Decode
In your case, you have to convert unicode escape sequences like "\uxxxx" into utf8 characters.
Use preg_repalce_callback function to replace all matched escape sequences with the respective utf8 character.
In the callback function we are using pack function which will pack the initial HEX string to binary string, then it will convert that binary order('UCS-2BE') into UTF-8 equivalent with mb-convert-encoding.
$str = "https://r4---sn-hgn7zn7r.c.docs.google.com/videoplayback?requiressl\u003dyes\u0026id\u003d376b916e4a3c65b1\u0026itag\u003d22\u0026source\u003dwebdrive\u0026app\u003dtexmex\u0026ip\u003d109.110.116.1\u0026ipbits\u003d8\u0026expire\u003d1456065477\u0026sparams\u003drequiressl%2Cid%2Citag%2Csource%2Cip%2Cipbits%2Cexpire\u0026signature\u003d5C06093099C3B4A7DE28AF323E2E15AC7DE5BEEE.758E1110B23CD41EA7E246DE2564ABE5368431FE\u0026key\u003dck2\u0026mm\u003d30\u0026mn\u003dsn-hgn7zn7r\u0026ms\u003dnxu\u0026mt\u003d1456050981\u0026mv\u003dm\u0026nh\u003dIgpwcjAyLm1yczAyKgkxMjcuMC4wLjE\u0026pl\u003d22";
$str = preg_replace_callback('/\\\\u([0-9a-fA-F]{4})/', function ($match) {
return mb_convert_encoding(pack('H*', $match[1]), 'UTF-8', 'UCS-2BE');
}, rawurldecode($str));
echo $str;
// the output:
https://r4---sn-hgn7zn7r.c.docs.google.com/videoplayback?requiressl=yes&id=376b916e4a3c65b1&itag=22&source=webdrive&app=texmex&ip=109.110.116.1&ipbits=8&expire=1456065477&sparams=requiressl,id,itag,source,ip,ipbits,expire&signature=5C06093099C3B4A7DE28AF323E2E15AC7DE5BEEE.758E1110B23CD41EA7E246DE2564ABE5368431FE&key=ck2&mm=30&mn=sn-hgn7zn7r&ms=nxu&mt=1456050981&mv=m&nh=IgpwcjAyLm1yczAyKgkxMjcuMC4wLjE&pl=22
http://php.net/manual/en/function.preg-replace-callback.php
It appears to be "Unicode Escape Sequences for Latin 1 Characters" (see http://archive.oreilly.com/pub/a/actionscript/excerpts/as3-cookbook/appendix.html).
A quick search didn't find any native library for decoding this in PHP, but it should be straightforward to decode the characters you're most likely to encounter that need decoding (& and = specifically).
Here's a SO solution to doing it from 5 years ago: How to decode Unicode escape sequences like "\u00ed" to proper UTF-8 encoded characters?
This question already has answers here:
Convert ASCII TO UTF-8 Encoding
(5 answers)
Closed 6 years ago.
is UTF-8 not the same as ASCII? how you would explain the different results i get from:
$result = mb_detect_encoding($PLAINText, mb_detect_order(), true);
Sometimes i get "UTF-8" in $result and sometimes i get "ASCII". so they are different, but that is not my question, my question is why iconv() code doesn't convert from ASCII to UTF-8?
$result = iconv("ASCII","UTF-8//IGNORE",$PLAINText);
i check the $result encoding later using the mb_detect_encoding() function and it is still "ASCII" , not "UTF-8".
The reason is that when using only ASCII characters in an UTF-8 string, the UTF-8 string is indistinguishable from an ASCII string. (Unless a byte order mark is used, but it's optional.)
This question already has answers here:
Detect encoding and make everything UTF-8
(26 answers)
Closed 12 months ago.
I have a legacy database table with a mixed encoding. Some lines are UTF-8 and some lines are ISO 8859-1.
Are there some heuristics I can apply on the content of a line to guess which encoding best represents the content?
Convert from UTF-8. If that fails then it's not UTF-8, so you should probably convert from Latin-1 instead.
Compare
iconv("UTF-8", "ISO-8859-1//IGNORE", $text)
and
iconv("UTF-8", "ISO-8859-1", $text)
If they are not equal - consider it UTF-8.
This question already has answers here:
Closed 12 years ago.
Possible Duplicate:
How to decode Unicode escape sequences like “\u00ed” to proper UTF-8 encoded characters?
How can I convert \u014D to ō in PHP?
Thank You
It's not immediate clear what you mean when you say "to ō". If you're asking how to convert it into a different encoding then a general approach is to use the iconv function. 014D is the UCS-2 (unicode) for your desired function so, if you have a string containing the bytes 014D you could use
iconv('UCS-2', 'UTF-8', $s)
to convert from UCS-2 to UTF-8. Similarly if you want to convert to a different encoding - although you need to be aware that not all encodings will include the character you are using. You'll see from the iconv documentation that the //TRANSLIT option may help in that case.
Note that iconv is taking a byte sequence so, if you actually have a string containing a slash, then a u, then a 0 etc... you'll need to convert that into the byte sequence first.
If you have the escape characters in the string you could use a messy exec statement.
$string = '\\u014D';
exec("\$string = '$string'");
This way, the Unicode escape sequence should be recognized and interpreted as a unicode character When the string is parsed.
Of course, you should never use exec unless absolutely necessary.