Decoding javascript escape sequences in PHP (\x27, \x22, etc...) - php

If I have this PHP string:
$string = '\\x27\\x22';
How would I decode it to '"?

A regex could help you here:
$out = preg_replace_callback(
"(\\\\x([0-9a-f]{2}))i",
function($a) {return chr(hexdec($a[1]));},
$string
);

You do not need to decode it. Just do str_replace('\\x27', "'", $str);. In case your '" was just and example, please note you got repeatable pattern \\xAA, where x indicates hexadecimal notation and AA is hex value itself, so each \\xAA represents single byte and AA is from 0 to 0xFF. So you can use regexp or just walk any other way over your string, extract these AA values and convert it with chr(hexdec($AA)) to coresponding characted and glue with result string.

$out = preg_replace_callback(
"(\\\\x([0-9a-f]{2}))i",
function($a) {return '\u00'.bin2hex(hex2bin($a[1]));},
$string
);
That's ok after I converted the value from ascii to unicode.

Related

convert ASCII value to a character based on specific char in php

I'd like to ask dumb question.
How could I convert #48#49#50 based on # char ?
In PHP, understand that chr() function is used to convert a ASCII value to a character.
48 is 0.
49 is 1.
And 50 is 2.
May I know how to convert #48#49#50 as 012 and how to store 012 in one variable ?
Eg- $num = 012
We can try using preg_replace_callback() here with the regex pattern #\d+. As we capture each #\d+ match, use chr() on that match to generate the ASCII character replacement.
$input = "#48#49#50";
$out = preg_replace_callback(
"/#(\d+)/",
function($m) { return chr($m[1]); },
$input
);
echo $out; // 012

strreplace PHP function for utf-8 string

I want to replace character at specific index of string.
preg_replace('/('.$txt.')/u', $replacement, $str,1);
but it's not taking index, so not working for me.
How can I do this easily?
To manipulate Unicode strings, you need to use appropriate string functions. Here, you can use mb_substr:
Performs a multi-byte safe substr() operation based on number of characters. Position is counted from the beginning of str. First character's position is 0. Second character position is 1, and so on.
Sample PHP code:
$str = "Вася";
$replacement = "н";
$start = 3;
echo mb_substr($str,0,$start-1,"utf8") .
$replacement .
mb_substr($str,$start,mb_strlen($str),"utf8");
This will change Вася into Ваня as the 3rd symbl will get "replaced" with the $replacement.
See IDEONE demo
With php, if you do
$string[1] = $replacement;
It will replace the character within the String.

preg_replace make matched hex value appear as text

How to use preg_replace to convert only matching hex values into text representation of hex?
$string = 'abcd'.hex2bin(23).'abc'.hex2bin(24);
For example str_replace('/[\x20-\x25]/', 'what here?', $string) would get output like:
abcd[HEX:23]abc[HEX:24]
What exactly I want to do: I'm looking for hidden characters, and want to display their hex values.
You need something like preg_replace_callback() to have a callback called against all matches.
Try:
$string = 'abcd'.hex2bin(23).'abc'.hex2bin(24);
$text = preg_replace_callback('/[\\x20-\\x25]/', function($matches) {
$string = bin2hex($matches[0]);
return "[HEX:{$string}]";
}, $string);

how to transform japanese english character to normal english character?

I have an japanese english character.
This character is not normal english string.
Characters: Game
How to transform this character to normal english character in php?
Subtract 65248 from the ordinal value of each character. In other words:
$str = "Game some other text by ヴィックサ";
$str = preg_replace_callback(
"/[\x{ff01}-\x{ff5e}]/u",
function($c) {
// convert UTF-8 sequence to ordinal value
$code = ((ord($c[0][0])&0xf)<<12)|((ord($c[0][1])&0x3f)<<6)|(ord($c[0][2])&0x3f);
return chr($code-0xffe0);
},
$str);
This will replace all of the "Fullwidth" characters with their normal width equivalents.
It would be easier to use mb_convert_kana:
$string = 'Characters: Game';
$newString = mb_convert_kana($string,'a');
I'm sure there is a much easier answer but couldnt you make a dictonary object with the special charter as the key and the char you want as the value
then just do a simple find and replace?

Convert Unicode from JSON string with PHP

I've been reading up on a few solutions but have not managed to get anything to work as yet.
I have a JSON string that I read in from an API call and it contains Unicode characters - \u00c2\u00a3 for example is the £ symbol.
I'd like to use PHP to convert these into either £ or £.
I'm looking into the problem and found the following code (using my pound symbol to test) but it didn't seem to work:
$title = preg_replace("/\\\\u([a-f0-9]{4})/e", "iconv('UCS-4LE','UTF-8',pack('V', hexdec('U$1')))", '\u00c2\u00a3');
The output is £.
Am I correct in thinking that this is UTF-16 encoded? How would I convert these to output as HTML?
UPDATE
It seems that the JSON string from the API has 2 or 3 unescaped Unicode strings, e.g.:
That\u00e2\u0080\u0099s (right single quotation)
\u00c2\u00a (pound symbol)
It is not UTF-16 encoding. It rather seems like bogus encoding, because the \uXXXX encoding is independant of whatever UTF or UCS encodings for Unicode. \u00c2\u00a3 really maps to the £ string.
What you should have is \u00a3 which is the unicode code point for £.
{0xC2, 0xA3} is the UTF-8 encoded 2-byte character for this code point.
If, as I think, the software that encoded the original UTF-8 string to JSON was oblivious to the fact it was UTF-8 and blindly encoded each byte to an escaped unicode code point, then you need to convert each pair of unicode code points to an UTF-8 encoded character, and then decode it to the native PHP encoding to make it printable.
function fixBadUnicode($str) {
return utf8_decode(preg_replace("/\\\\u00([0-9a-f]{2})\\\\u00([0-9a-f]{2})/e", 'chr(hexdec("$1")).chr(hexdec("$2"))', $str));
}
Example here: http://phpfiddle.org/main/code/6sq-rkn
Edit:
If you want to fix the string in order to obtain a valid JSON string, you need to use the following function:
function fixBadUnicodeForJson($str) {
$str = preg_replace("/\\\\u00([0-9a-f]{2})\\\\u00([0-9a-f]{2})\\\\u00([0-9a-f]{2})\\\\u00([0-9a-f]{2})/e", 'chr(hexdec("$1")).chr(hexdec("$2")).chr(hexdec("$3")).chr(hexdec("$4"))', $str);
$str = preg_replace("/\\\\u00([0-9a-f]{2})\\\\u00([0-9a-f]{2})\\\\u00([0-9a-f]{2})/e", 'chr(hexdec("$1")).chr(hexdec("$2")).chr(hexdec("$3"))', $str);
$str = preg_replace("/\\\\u00([0-9a-f]{2})\\\\u00([0-9a-f]{2})/e", 'chr(hexdec("$1")).chr(hexdec("$2"))', $str);
$str = preg_replace("/\\\\u00([0-9a-f]{2})/e", 'chr(hexdec("$1"))', $str);
return $str;
}
Edit 2: fixed the previous function to transform any wrongly unicode escaped utf-8 byte sequence into the equivalent utf-8 character.
Be careful that some of these characters, which probably come from an editor such as Word are not translatable to ISO-8859-1, therefore will appear as '?' after ut8_decode.
The output is correct.
\u00c2 == Â
\u00a3 == £
So nothing is wrong here. And converting to HTML entities is easy:
htmlentities($title);
Here is an updated version of the function using preg_replace_callback instead of preg_replace.
function fixBadUnicodeForJson($str) {
$str = preg_replace_callback(
'/\\\\u00([0-9a-f]{2})\\\\u00([0-9a-f]{2})\\\\u00([0-9a-f]{2})\\\\u00([0-9a-f]{2})/',
function($matches) { return chr(hexdec("$1")).chr(hexdec("$2")).chr(hexdec("$3")).chr(hexdec("$4")); },
$str
);
$str = preg_replace_callback(
'/\\\\u00([0-9a-f]{2})\\\\u00([0-9a-f]{2})\\\\u00([0-9a-f]{2})/',
function($matches) { return chr(hexdec("$1")).chr(hexdec("$2")).chr(hexdec("$3")); },
$str
);
$str = preg_replace_callback(
'/\\\\u00([0-9a-f]{2})\\\\u00([0-9a-f]{2})/',
function($matches) { return chr(hexdec("$1")).chr(hexdec("$2")); },
$str
);
$str = preg_replace_callback(
'/\\\\u00([0-9a-f]{2})/',
function($matches) { return chr(hexdec("$1")); },
$str
);
return $str;
}

Categories