php converting unknown symbols to the known symbols in url - php

Converting unknown symbols in url ,
like this
https://r4---sn-hgn7zn7r.c.docs.google.com/videoplayback?requiressl\u003dyes\u0026id\u003d376b916e4a3c65b1\u0026itag\u003d22\u0026source\u003dwebdrive\u0026app\u003dtexmex\u0026ip\u003d109.110.116.1\u0026ipbits\u003d8\u0026expire\u003d1456065477\u0026sparams\u003drequiressl%2Cid%2Citag%2Csource%2Cip%2Cipbits%2Cexpire\u0026signature\u003d5C06093099C3B4A7DE28AF323E2E15AC7DE5BEEE.758E1110B23CD41EA7E246DE2564ABE5368431FE\u0026key\u003dck2\u0026mm\u003d30\u0026mn\u003dsn-hgn7zn7r\u0026ms\u003dnxu\u0026mt\u003d1456050981\u0026mv\u003dm\u0026nh\u003dIgpwcjAyLm1yczAyKgkxMjcuMC4wLjE\u0026pl\u003d22
to real link,
like this
https://r4---sn-hgn7zn7r.c.docs.google.com/videoplayback?requiressl=yes&id=376b916e4a3c65b1&itag=22&source=webdrive&app=texmex&ip=109.110.116.1&ipbits=8&expire=1456065477&sparams=requiressl,id,itag,source,ip,ipbits,expire&signature=5C06093099C3B4A7DE28AF323E2E15AC7DE5BEEE.758E1110B23CD41EA7E246DE2564ABE5368431FE&key=ck2&mm=30&mn=sn-hgn7zn7r&ms=nxu&mt=1456050981&mv=m&nh=IgpwcjAyLm1yczAyKgkxMjcuMC4wLjE&pl=22
i have no idea how convert it ,
i use this website to convert the link
DDecode - Hex,Octal,HTML Decode

In your case, you have to convert unicode escape sequences like "\uxxxx" into utf8 characters.
Use preg_repalce_callback function to replace all matched escape sequences with the respective utf8 character.
In the callback function we are using pack function which will pack the initial HEX string to binary string, then it will convert that binary order('UCS-2BE') into UTF-8 equivalent with mb-convert-encoding.
$str = "https://r4---sn-hgn7zn7r.c.docs.google.com/videoplayback?requiressl\u003dyes\u0026id\u003d376b916e4a3c65b1\u0026itag\u003d22\u0026source\u003dwebdrive\u0026app\u003dtexmex\u0026ip\u003d109.110.116.1\u0026ipbits\u003d8\u0026expire\u003d1456065477\u0026sparams\u003drequiressl%2Cid%2Citag%2Csource%2Cip%2Cipbits%2Cexpire\u0026signature\u003d5C06093099C3B4A7DE28AF323E2E15AC7DE5BEEE.758E1110B23CD41EA7E246DE2564ABE5368431FE\u0026key\u003dck2\u0026mm\u003d30\u0026mn\u003dsn-hgn7zn7r\u0026ms\u003dnxu\u0026mt\u003d1456050981\u0026mv\u003dm\u0026nh\u003dIgpwcjAyLm1yczAyKgkxMjcuMC4wLjE\u0026pl\u003d22";
$str = preg_replace_callback('/\\\\u([0-9a-fA-F]{4})/', function ($match) {
return mb_convert_encoding(pack('H*', $match[1]), 'UTF-8', 'UCS-2BE');
}, rawurldecode($str));
echo $str;
// the output:
https://r4---sn-hgn7zn7r.c.docs.google.com/videoplayback?requiressl=yes&id=376b916e4a3c65b1&itag=22&source=webdrive&app=texmex&ip=109.110.116.1&ipbits=8&expire=1456065477&sparams=requiressl,id,itag,source,ip,ipbits,expire&signature=5C06093099C3B4A7DE28AF323E2E15AC7DE5BEEE.758E1110B23CD41EA7E246DE2564ABE5368431FE&key=ck2&mm=30&mn=sn-hgn7zn7r&ms=nxu&mt=1456050981&mv=m&nh=IgpwcjAyLm1yczAyKgkxMjcuMC4wLjE&pl=22
http://php.net/manual/en/function.preg-replace-callback.php

It appears to be "Unicode Escape Sequences for Latin 1 Characters" (see http://archive.oreilly.com/pub/a/actionscript/excerpts/as3-cookbook/appendix.html).
A quick search didn't find any native library for decoding this in PHP, but it should be straightforward to decode the characters you're most likely to encounter that need decoding (& and = specifically).
Here's a SO solution to doing it from 5 years ago: How to decode Unicode escape sequences like "\u00ed" to proper UTF-8 encoded characters?

Related

How to encode unicode strings to sequence like this "\xd1\x81" in PHP?

I have a string of unicode - сентрября
and i know this is expressed in the sequence like this:
\xd1\x81\xd0\xb5\xd0\xbd\xd1\x82\xd1\x8f\xd1\x80\xd0\xb1\xd1\x80\xd1\x8f
What is this type of expression encoded characters and how to convert any text from unicode to sequences like this in PHP?
The prefix "\x" indicates that it is hexadecimal. If the prefix is removed, you get the same output as with the "bin2hex" function in php.
I think this is the function you search for:
https://www.php.net/manual/de/function.bin2hex.php
bin2hex("сентрября") = d181d0b5d0bdd182d180d18fd0b1d180d18f
Those are characters in Windows-1252 (CP1252) some of them are special characters, so you can decode with iconv function
\x81 = Ñ (LATIN character)
echo iconv("cp1252", "utf-8//IGNORE", "\x81\xd0\xb5\xd0\xbd\xd1\x82\xd1\x8f\xd1\x80\xd0\xb1\xd1\x80\xd1\x8f");
Result of the code Above: ентÑрбрÑ
NOTE this code is usually used for hack and inject code in your website.

Php Convert escaped characters to utf-8

I have emotions in javascript escaped characters in a string and need to pass it to Android as json. In the android side the text has to be utf8 to display the emotion character properly. For example I have \ud83d\ude00 in a string which is a code for GRINNING FACE. But I need it to be converted to f0 9f 98 80 using Php.
I tried mb_convert_encoding and iconv but they outputs some strange characters. Please help. Thanks
Seems like a duplicate of:
How to decode Unicode escape sequences like "\u00ed" to proper UTF-8 encoded characters?
$str = preg_replace_callback('/\\\\u([0-9a-fA-F]{4})/', function ($match) {
return mb_convert_encoding(pack('H*', $match[1]), 'UTF-8', 'UCS-2BE');
}, $str);

PHP, convert string into UTF-8 and then hexadecimal

In PHP, I want to convert a string which contains non-ASCII characters into a sequence of hexadecimal numbers which represents the UTF-8 encoding of these characters. For instance, given this:
$text = 'ąćę';
I need to produce this:
C4=84=C4=87=C4=99
How do I do that?
As your question is written, and assuming that your text is properly UTF-8 encoded to start with, this should work:
$text = 'ąćę';
$result = implode('=', str_split(strtoupper(bin2hex($text)), 2));
If your text is not UTF-8, but some other encoding, then you can use
$utf8 = mb_convert_encoding($text, 'UTF-8', $yourEncoding);
to get it into UTF-8, where $yourEncoding is some other character encoding like 'ISO-8859-1'.
This works because in PHP, strings are just arrays of bytes. So as long as your text is encoded properly to start with, you don't have to do anything special to treat it as bytes. In fact, this code will work for any character encoding you want without modification.
Now, if you want to do quoted-printable, then that's another story. You could try using the function quoted_printable_encode (requires PHP 5.3 or higher).

PHP Utf8 Decoding Issue

I have the following address line: Praha 5, Staré Město,
I need to use utf8_decode() function on this string before I can write it to a PDF file (using domPDF lib).
However, the php utf8 decode function for the above address line appears incorrect (or rather, incomplete).
The following code:
<?php echo utf8_decode('Praha 5, Staré Město,'); ?>
Produces this:
Praha 5, Staré M?sto,
Any idea why ě is not getting decoded?
utf8_decode converts the string from a UTF-8 encoding to ISO-8859-1, a.k.a. "Latin-1".
The Latin-1 encoding cannot represent the letter "ě". It's that simple.
"Decode" is a total misnomer, it does the same as iconv('UTF-8', 'ISO-8859-1', $string).
See What Every Programmer Absolutely, Positively Needs To Know About Encodings And Character Sets To Work With Text.
I wound up using a home-grown UTF-8 / UTF-16 decoding function (convert to &#number; representations), I haven't found any pattern to why UTF-8 isn't detected, I suspect it's because the "encoded-as" sequence isn't always exactly in the same position in the string returned. You might do some additional checking on that.
Three-character UTF-8 indicator: $startutf8 = chr(0xEF).chr(187).chr(191); (if you see this ANYWHERE, not just first three characters, the string is UTF-8 encoded)
Decode according to UTF-8 rules; this replaced an earlier version which chugged through byte by byte:using
function charset_decode_utf_8 ($string) {
/* Only do the slow convert if there are 8-bit characters */
/* avoid using 0xA0 (\240) in ereg ranges. RH73 does not like that */
if (! ereg("[\200-\237]", $string) and ! ereg("[\241-\377]", $string))
return $string;
// decode three byte unicode characters
$string = preg_replace("/([\340-\357])([\200-\277])([\200-\277])/e",
"'&#'.((ord('\\1')-224)*4096 + (ord('\\2')-128)*64 + (ord('\\3')-128)).';'",
$string);
// decode two byte unicode characters
$string = preg_replace("/([\300-\337])([\200-\277])/e",
"'&#'.((ord('\\1')-192)*64+(ord('\\2')-128)).';'",
$string);
return $string;
}
Problem is in your PHP file encoding , save your file in UTF-8 encoding , then even no need to use utf8_decode , if you get these data 'Praha 5, Staré Město,' from database , better change it charset to UTF-8
you don't need that (#Rajeev :this string is automatically detected as utf-8 encoded :
echo mb_detect_encoding('Praha 5, Staré Město,');
will always return UTF-8.).
You'd rather see :
https://code.google.com/p/dompdf/wiki/CPDFUnicode

Convert UTF8 characters returned from Facebook Graph API

The character is UTF8 encoded, like..
"\u676f\u845b"
How to convert it back to normal UTF8 string in PHP?
The simple approach would be to wrap your string into double quotes and let json_decode convert the \u0000 escapes. (Which happen to be Javascript string syntax.)
$str = json_decode("\"$str\"");
Seems to be asian letters: 杯葛 (It's already UTF-8 when json_decode returns it.)
(Source)
http://webarto.com/83/php-unicode_decode-5.3
demo: http://ideone.com/AtY0v
function decode_encoded_utf8($string){
return preg_replace_callback('#\\\\u([0-9a-f]{4})#ism', function($matches) { return mb_convert_encoding(pack("H*", $matches[1]), "UTF-8", "UCS-2BE"); }, $string);
}
echo unicode_decode('\u676f\u845b'); # 杯葛

Categories