Converting Hex Codes into Characters - php

Does PHP have a function that searches for hex codes in a string and converts them into their char equivalents?
For example - I have a string that contains the following
"Hello World\x19s"
And I want to convert it to
"Hello World's"
Thanks in advance.

This code will convert "Hello World\x27s" into "Hello World's". It will convert "\x19" into the "end of medium" character, since that's what 0x19 represents in ASCII.
$str = preg_replace('/\\\\x([0-9a-f]{2})/e', 'chr(hexdec($1))', $str);

Correct me if i'm wrong but i think you should change the callback like so:
$str = preg_replace('/\\\\x([0-9a-f]{2})/e', 'chr(hexdec(\'$1\'))', $str);
By adding the single quotes characters like '=' (\x3d) will be converted fine too.

The /e will generate an error in current php advising to use preg_replace_callback. Try this:
preg_replace_callback('/\\\\x([0-9a-f]{2})/', function ($m) { return chr(hexdec($m[1])); }, $str );

/e Modifier causes PHP errors. It has been deprecated under new PHP updates. The correct way to convert hexcodes into characters is:
$str = html_entity_decode($str, ENT_QUOTES | ENT_XML1, 'UTF-8');
This will turn ' into ' and & into & etc

Related

php: I want to trim "h;" from "w;h;" to get ''w;' but what I got is "w"

The below program is to right trim the string "w;h;" off string "h;" to get "w;". But unexpectedly what I got is "w", not "w;".
<?php
$string="w;h;";
$str="h;";
$nStr=rtrim($string,$str);
echo $nStr.'</br>';
?>
Use str_replace http://php.net/manual/en/function.str-replace.php
$string="w;h;";
$str="h;";
Echo str_replace($str, "", $string);
https://3v4l.org/P9Hbe
Str_replace replaces $str with nothing. Leaving w;
From the manual:
character_mask
Optionally, the stripped characters can also be specified using the character_mask parameter. Simply list all characters that you want to be stripped. With .. you can specify a range of characters.
If you do rtrim("w;h;", "h;") you're saying "trim either h or ; from the end of the string, meaning cut off characters that are either h or ; in this case it will cut off characters until it is only left with w.
If you want to remove a specific string from the end you have to do something like:
if (substr($string,-strlen($str)) == $str) {
$string = substr($string,0,-strlen($str));
}
Note: This assumes ASCII strings. For UTF-8 multibyte strings use mb_substr .

unexpected output of ltrim in php

Can anybody explain this unusual output of ltrim
var_dump(ltrim('/btcapi/participation/set-user-event-participation','/btcapi'));
rticipation/set-user-event-participation //output
While expected output has
/participation/set-user-event-participation
Use str_replace if you are sure this is the only one occurence in your string.
$str = '/btcapi/participation/set-user-event-participation';
echo str_replace('/btcapi', $str); // returns: '/participation/set-user-event-participation'
Or regex if you need replace/remove just the first at the beginning of string.
$str = '/btcapi/participation/set-user-event-participation';
preg_replace ('~^/btcapi~', '', $str);
The trim characters are read as individuals, not as a String.
It just replaces the second / for example because it is a part of the characters.
Just use str_replace or a custom loop.
RTM: http://php.net/ltrim
the second argument is a character MASK, e.g. characters you want to strip. CHARACTERS, not STRING.
php > $foo = 'abc123';
php > echo ltrim($foo, 'abpq');
c123
php > echo ltrim($foo, 'a1');
bc123
^---not stripped, because 'bc' are not in the mask.
php >
PHP will search strip all characters from the left of the string, based on the characters in the mask, until it encounters a character NOT in the mask.

How to parse characters in a single-quoted string?

To get a double quoted string (which I cannot change) correctly parsed I have to do following:
$string = '15 Rose Avenue\n Irlam\n Manchester';
$string = str_replace('\n', "\n", $string);
print nl2br($string); // demonstrates that the \n's are now linebreak characters
So far, so good.
But in my given string there are characters like \xC3\xA4. There are many characters like this (beginning with \x..)
How can I get them correctly parsed as shown above with the linebreak?
You can use
$str = stripcslashes($str);
You can escape a \ in single quotes:
$string = str_replace('\\n', "\n", $string);
But you're going to have a lot of potential replaces if you need to do \\xC3, etc.... best use a preg_replace_callback() with a function(callback) to translate them to bytes

Convert Unicode from JSON string with PHP

I've been reading up on a few solutions but have not managed to get anything to work as yet.
I have a JSON string that I read in from an API call and it contains Unicode characters - \u00c2\u00a3 for example is the £ symbol.
I'd like to use PHP to convert these into either £ or £.
I'm looking into the problem and found the following code (using my pound symbol to test) but it didn't seem to work:
$title = preg_replace("/\\\\u([a-f0-9]{4})/e", "iconv('UCS-4LE','UTF-8',pack('V', hexdec('U$1')))", '\u00c2\u00a3');
The output is £.
Am I correct in thinking that this is UTF-16 encoded? How would I convert these to output as HTML?
UPDATE
It seems that the JSON string from the API has 2 or 3 unescaped Unicode strings, e.g.:
That\u00e2\u0080\u0099s (right single quotation)
\u00c2\u00a (pound symbol)
It is not UTF-16 encoding. It rather seems like bogus encoding, because the \uXXXX encoding is independant of whatever UTF or UCS encodings for Unicode. \u00c2\u00a3 really maps to the £ string.
What you should have is \u00a3 which is the unicode code point for £.
{0xC2, 0xA3} is the UTF-8 encoded 2-byte character for this code point.
If, as I think, the software that encoded the original UTF-8 string to JSON was oblivious to the fact it was UTF-8 and blindly encoded each byte to an escaped unicode code point, then you need to convert each pair of unicode code points to an UTF-8 encoded character, and then decode it to the native PHP encoding to make it printable.
function fixBadUnicode($str) {
return utf8_decode(preg_replace("/\\\\u00([0-9a-f]{2})\\\\u00([0-9a-f]{2})/e", 'chr(hexdec("$1")).chr(hexdec("$2"))', $str));
}
Example here: http://phpfiddle.org/main/code/6sq-rkn
Edit:
If you want to fix the string in order to obtain a valid JSON string, you need to use the following function:
function fixBadUnicodeForJson($str) {
$str = preg_replace("/\\\\u00([0-9a-f]{2})\\\\u00([0-9a-f]{2})\\\\u00([0-9a-f]{2})\\\\u00([0-9a-f]{2})/e", 'chr(hexdec("$1")).chr(hexdec("$2")).chr(hexdec("$3")).chr(hexdec("$4"))', $str);
$str = preg_replace("/\\\\u00([0-9a-f]{2})\\\\u00([0-9a-f]{2})\\\\u00([0-9a-f]{2})/e", 'chr(hexdec("$1")).chr(hexdec("$2")).chr(hexdec("$3"))', $str);
$str = preg_replace("/\\\\u00([0-9a-f]{2})\\\\u00([0-9a-f]{2})/e", 'chr(hexdec("$1")).chr(hexdec("$2"))', $str);
$str = preg_replace("/\\\\u00([0-9a-f]{2})/e", 'chr(hexdec("$1"))', $str);
return $str;
}
Edit 2: fixed the previous function to transform any wrongly unicode escaped utf-8 byte sequence into the equivalent utf-8 character.
Be careful that some of these characters, which probably come from an editor such as Word are not translatable to ISO-8859-1, therefore will appear as '?' after ut8_decode.
The output is correct.
\u00c2 == Â
\u00a3 == £
So nothing is wrong here. And converting to HTML entities is easy:
htmlentities($title);
Here is an updated version of the function using preg_replace_callback instead of preg_replace.
function fixBadUnicodeForJson($str) {
$str = preg_replace_callback(
'/\\\\u00([0-9a-f]{2})\\\\u00([0-9a-f]{2})\\\\u00([0-9a-f]{2})\\\\u00([0-9a-f]{2})/',
function($matches) { return chr(hexdec("$1")).chr(hexdec("$2")).chr(hexdec("$3")).chr(hexdec("$4")); },
$str
);
$str = preg_replace_callback(
'/\\\\u00([0-9a-f]{2})\\\\u00([0-9a-f]{2})\\\\u00([0-9a-f]{2})/',
function($matches) { return chr(hexdec("$1")).chr(hexdec("$2")).chr(hexdec("$3")); },
$str
);
$str = preg_replace_callback(
'/\\\\u00([0-9a-f]{2})\\\\u00([0-9a-f]{2})/',
function($matches) { return chr(hexdec("$1")).chr(hexdec("$2")); },
$str
);
$str = preg_replace_callback(
'/\\\\u00([0-9a-f]{2})/',
function($matches) { return chr(hexdec("$1")); },
$str
);
return $str;
}

Extract digit from unicode string - PHP RegExpression

I am Parsing a web page for getting the web page prize. the prize include a Rupee symbol (₹).
So i used preg_replace to extract digits.
For example:
$str='₹ 1,195 ';
echo preg_replace("/[^0-9]/", '', $str);
Output is :
2091195
I tried same code to execute on http://writecodeonline.com/php/.
There i m getting correct output 1195.
I'm not getting what is the problem.
Thanks in Advance
If the unicode string is UTF-8, you can use the u (PCRE_UTF8) modifierDocs to tell preg_replace that it should use UTF-8 mode. If not, re-encode it to UTF-8 first and then use the modifier.
Example (Demo):
$subject = '₹ 1,195 ';
$pattern = "/[^0-9]/u";
$result = preg_replace($pattern, '', $subject);
echo $result;

Categories