I want to convert normal text to \x codes for e.g \x14\x65\x60
For example :
normal text = "base64_decode"
converted \x codes for above text = "\x62\141\x73\145\x36\64\x5f\144\x65\143\x6f\144\x65"
How to do this? Thanks in advance.
PHP 5.3 one-liner:
echo preg_replace_callback("/./", function($matched) {
return '\x'.dechex(ord($matched[0]));
}, 'base64_decode');
Outputs \x62\x61\x73\x65\x36\x34\x5f\x64\x65\x63\x6f\x64\x65
The ord() function gives you the decimal value for a single byte. dechex() converts it to hex. So to do this, loop through the every character in the string and apply both functions.
$str = 'base64_decode';
$length = strlen($str);
$result = '';
for ($i = 0; $i < $length; $i++) $result .= '\\x'.str_pad(dechex(ord($str[$i])),2,'0',STR_PAD_LEFT);
print($result);
Here's working code:
function make_hexcodes($text) {
$retval = '';
for($i = 0; $i < strlen($text); ++$i) {
$retval .= '\x'.dechex(ord($text[$i]));
}
return $retval;
}
echo make_hexcodes('base64_decode');
See it in action.
For an alternative to dechex(ord()) you can also use bin2hex($char), sprintf('\x%02X') or unpack('H*', $char). Additionally instead of using preg_replace_callback, you can use array_map with str_split.
Hexadecimal Encoding: https://3v4l.org/Ai3HZ
bin2hex
$word = 'base64_decode';
echo implode(array_map(function($char) {
return '\x' . bin2hex($char);
}, (array) str_split($word)));
unpack
$word = 'base64_decode';
echo implode(array_map(function($char) {
return '\x' . implode(unpack('H*', $char));
}, (array) str_split($word)));
sprintf
$word = 'base64_decode';
echo implode(array_map(function($char) {
return sprintf('\x%02X', ord($char));
}, (array) str_split($word)));
Result
\x62\x61\x73\x65\x36\x34\x5f\x64\x65\x63\x6f\x64\x65
Hexadecimal Decoding
To decode the encoded string back to the plain-text, use one of the following methods.
$encoded = '\x62\x61\x73\x65\x36\x34\x5f\x64\x65\x63\x6f\x64\x65';
$hexadecimal = str_replace('\x', '', $encoded);
hex2bin
echo hex2bin($hexadecimal);
pack
echo pack('H*', $hexadecimal);
sscanf + vprintf
vprintf(str_repeat('%c', count($f = sscanf($hexadecimal, str_repeat('%02X', substr_count($encoded , '\x'))))), $f);
Result
base64_decode
im not read this code \ud83d\udc33 🐳
function unicode_decode(string $str)
{
str="Learn Docker in 12 Minutes \ud83d\udc33"
return preg_replace_callback('/u([0-9a-f]{4})/i', function ($match) {
return mb_convert_encoding(pack('H*', $match[1]), 'UTF-8', 'UCS-2BE');
}, $str);
}
Related
I'm trying to detect the emoji that I get through e.g. a POST (the source ist not necessary).
As an example I'm using this emoji: ✊🏾 (I hope it's visible)
The code for it is U+270A U+1F3FE (I'm using http://unicode.org/emoji/charts/full-emoji-list.html for the codes)
Now I converted the emoji with json_encode and I get: \u270a\ud83c\udffe
Here the only part that is equal is 270a. \ud83c\udffe is not equal to U+1F3FE, not even if I add them together (1B83A)
How do I get from ✊🏾 to U+270A U+1F3FE with e.g. php?
Use mb_convert_encoding and convert from UTF-8 to UTF-32. Then do some additional formatting:
// Strips leading zeros
// And returns str in UPPERCASE letters with a U+ prefix
function format($str) {
$copy = false;
$len = strlen($str);
$res = '';
for ($i = 0; $i < $len; ++$i) {
$ch = $str[$i];
if (!$copy) {
if ($ch != '0') {
$copy = true;
}
// Prevent format("0") from returning ""
else if (($i + 1) == $len) {
$res = '0';
}
}
if ($copy) {
$res .= $ch;
}
}
return 'U+'.strtoupper($res);
}
function convert_emoji($emoji) {
// ✊🏾 --> 0000270a0001f3fe
$emoji = mb_convert_encoding($emoji, 'UTF-32', 'UTF-8');
$hex = bin2hex($emoji);
// Split the UTF-32 hex representation into chunks
$hex_len = strlen($hex) / 8;
$chunks = array();
for ($i = 0; $i < $hex_len; ++$i) {
$tmp = substr($hex, $i * 8, 8);
// Format each chunk
$chunks[$i] = format($tmp);
}
// Convert chunks array back to a string
return implode($chunks, ' ');
}
echo convert_emoji('✊🏾'); // U+270A U+1F3FE
Simple function, inspired by #d3L answer above
function emoji_to_unicode($emoji) {
$emoji = mb_convert_encoding($emoji, 'UTF-32', 'UTF-8');
$unicode = strtoupper(preg_replace("/^[0]+/","U+",bin2hex($emoji)));
return $unicode;
}
Exmaple
emoji_to_unicode("💵");//returns U+1F4B5
You can do like this, consider the emoji a normal character.
$emoji = "✊🏾";
$str = str_replace('"', "", json_encode($emoji, JSON_HEX_APOS));
$myInput = $str;
$myHexString = str_replace('\\u', '', $myInput);
$myBinString = hex2bin($myHexString);
print iconv("UTF-16BE", "UTF-8", $myBinString);
I'm using an RTF converter and I need 240 as &#U050&#U052&#U048 but Im not to sure how to do this!?!
I have tried using the following function:
function string_to_ascii($string) {
$ascii = NULL;
for ($i = 0; $i < strlen($string); $i++) {
$ascii += "&#U"+str_pad(ord($string[$i]),3,"0",STR_PAD_LEFT);
}
return($ascii);
}
But it still just outputs just the number (e.g. 2 = 50) and ord just makes it go mad.
I've tried echo "-&#U"+ord("2")+"-"; and I get 50416 !?!?
I have a feeling it might have something to do with encoding
I think you're over thinking this. Convert the string to an array with str_split, map ord to all of it, then if you want to format each one, use sprintf (or str_pad if you'd like), like this:
function string_to_ascii($string) {
$array = array_map( 'ord', str_split( $string));
// Optional formatting:
foreach( $array as $k => &$v) {
$v = sprintf( "%03d", $v);
}
return "&#U" . implode( "&#U", $array);
}
Now, when you pass string_to_ascii( '240'), you get back string(18) "&#U050&#U052&#U048".
Just found this:
function to_ascii($string) {
$ascii_string = '';
foreach (str_split($string) as $char) {
$ascii_string .= '&#' . ord($char) . ';';
}
return $ascii_string;
}
here
Given a string that may contain any character (including a unicode characters), how can I convert this string into hexadecimal representation, and then reverse and obtain from hexadecimal this string?
Use pack() and unpack():
function hex2str( $hex ) {
return pack('H*', $hex);
}
function str2hex( $str ) {
return array_shift( unpack('H*', $str) );
}
$txt = 'This is test';
$hex = str2hex( $txt );
$str = hex2str( $hex );
echo "{$txt} => {$hex} => {$str}\n";
would produce
This is test => 546869732069732074657374 => This is test
Use a function like this:
<?php
function bin2hex($str) {
$hex = "";
$i = 0;
do {
$hex .= dechex(ord($str{$i}));
$i++;
} while ($i < strlen($str));
return $hex;
}
// Look what happens when ord($str{$i}) is 0...15
// you get a single digit hexadecimal value 0...F
// bin2hex($str) could return something like 4a3,
// decimals(74, 3), whatever the binary value is of those.
function hex2bin($str) {
$bin = "";
$i = 0;
do {
$bin .= chr(hexdec($str{$i}.$str{($i + 1)}));
$i += 2;
} while ($i < strlen($str));
return $bin;
}
// hex2bin("4a3") just broke. Now what?
// Using sprintf() to get it right.
function bin2hex($str) {
$hex = "";
$i = 0;
do {
$hex .= sprintf("%02x", ord($str{$i}));
$i++;
} while ($i < strlen($str));
return $hex;
}
// now using whatever the binary value of decimals(74, 3)
// and this bin2hex() you get a hexadecimal value you can
// then run the hex2bin function on. 4a03 instead of 4a3.
?>
Source: http://php.net/manual/en/function.bin2hex.php
It seems like MySQL does not support characters with more than 3 bytes in its default UTF-8 charset.
So, in PHP, how can I get rid of all 4(-and-more)-byte characters in a string and replace them with something like by some other character?
NOTE: you should not just strip, but replace with replacement character U+FFFD to avoid unicode attacks, mostly XSS:
http://unicode.org/reports/tr36/#Deletion_of_Noncharacters
preg_replace('/[\x{10000}-\x{10FFFF}]/u', "\xEF\xBF\xBD", $value);
Since 4-byte UTF-8 sequences always start with the bytes 0xF0-0xF7, the following should work:
$str = preg_replace('/[\xF0-\xF7].../s', '', $str);
Alternatively, you could use preg_replace in UTF-8 mode but this will probably be slower:
$str = preg_replace('/[\x{10000}-\x{10FFFF}]/u', '', $str);
This works because 4-byte UTF-8 sequences are used for code points in the supplementary Unicode planes starting from 0x10000.
Here's an example:
<?php
mb_internal_encoding("UTF-8");
//utf8 string, 13 bytes, 9 utf8 chars, 7 ASCII, 1 in latin1, 1 outside the BMP
$str = "qué \xF0\x9D\x92\xB3 tal";
$array = mbStringToArray($str);
print "str: [$str] strlen:" . strlen($str) . " chars:" . count($array) . "\n";
$str1 = "";
foreach($array as $c) {
// print "$c : " . strlen($c) ."\n";
$str1 .= strlen($c)<=3? $c : '?';
}
print "[$str1]\n";
function mbStringToArray ($str) {
if (empty($str)) return false;
$len = mb_strlen($str);
$array = array();
for ($i = 0; $i < $len; $i++) {
$array[] = mb_substr($str, $i, 1);
}
return $array;
}
Or, a little more compact and efficient:
<?php ///
mb_internal_encoding("UTF-8");
//utf8 string, 13 bytes, 9 utf8 chars, 7 ASCII, 1 in latin1, 1 outside the BMP
$str = "qué \xF0\x9D\x92\xB3 tal";
$str1 = trimOutsideBMP($str);
print "original: [$str]\n";
print "trimmed: [$str1]\n";
// Replaces non-BMP characters in the UTF-8 string by a '?' character
// Assumes UTF-8 default encoding ( if not sure, call first mb_internal_encoding("UTF-8"); )
function trimOutsideBMP($str) {
if (empty($str)) return $str;
$len = mb_strlen($str);
$str1 = '';
for ($i = 0; $i < $len; $i++) {
$c = mb_substr($str, $i, 1);
$str1 .= strlen($c) <= 3 ? $c : '?';
}
return $str1;
}
Came across this question when trying to solve my own issue (Facebook spits out certain emoticons as 4-byte characters, Amazon Mechanical Turk does not accept 4-byte characters).
I ended up using this, doesn't require mbstring extension:
function remove_4_byte($string) {
$char_array = preg_split('/(?<!^)(?!$)/u', $string );
for($x=0;$x<sizeof($char_array);$x++) {
if(strlen($char_array[$x])>3) {
$char_array[$x] = "";
}
}
return implode($char_array, "");
}
Below function change 3 and 4 bytes characters from utf8 string to '#':
function remove3and4bytesCharFromUtf8Str($str) {
return preg_replace('/([\xF0-\xF7]...)|([\xE0-\xEF]..)/s', '#', $str);
}
Here is my implementation to filter out 4-byte chars
$string = preg_replace_callback(
'/./u',
function (array $match) {
return strlen($match[0]) >= 4 ? null : $match[0];
},
$string
);
you could tweak it and replace null (which removes the char) with some substitute string. You can also replace >= 4 with some other byte-length check.
Another filter implementation, more complex.
It try transliterate to ASCII characters, otherwise iserts unicode replacement character to avoid XSS, eg.: <a href='java\uFEFFscript:alert("XSS")'>
$tr = preg_replace_callback('/([\x{10000}-\x{10FFFF}])/u', function($m){
$c = iconv('ISO-8859-2', 'UTF-8',iconv('utf-8','ISO-8859-2//TRANSLIT//IGNORE', $m[1]));
if($c == '')
return '�';
return $c;
}, $s);
I want to convert this hello#domain.com to
hello#domain.com
I have tried:
url_encode($string)
this provides the same string I entered, returned with the # symbol converted to %40
also tried:
htmlentities($string)
this provides the same string right back.
I am using a UTF8 charset. not sure if this makes a difference....
Here it goes (assumes UTF-8, but it's trivial to change):
function encode($str) {
$str = mb_convert_encoding($str , 'UTF-32', 'UTF-8'); //big endian
$split = str_split($str, 4);
$res = "";
foreach ($split as $c) {
$cur = 0;
for ($i = 0; $i < 4; $i++) {
$cur |= ord($c[$i]) << (8*(3 - $i));
}
$res .= "&#" . $cur . ";";
}
return $res;
}
EDIT Recommended alternative using unpack:
function encode2($str) {
$str = mb_convert_encoding($str , 'UTF-32', 'UTF-8');
$t = unpack("N*", $str);
$t = array_map(function($n) { return "&#$n;"; }, $t);
return implode("", $t);
}
Much easier way to do this:
function convertToNumericEntities($string) {
$convmap = array(0x80, 0x10ffff, 0, 0xffffff);
return mb_encode_numericentity($string, $convmap, "UTF-8");
}
You can change the encoding if you are using anything different.
Fixed map range. Thanks to Artefacto.
function uniord($char) {
$k=mb_convert_encoding($char , 'UTF-32', 'UTF-8');
$k1=ord(substr($k,0,1));
$k2=ord(substr($k,1,1));
$value=(string)($k2*256+$k1);
return $value;
}
the above function works for 1 character but if you have a string you can do like this
$string="anytext";
$arr=preg_split(//u,$string,-1,PREG_SPLIT_NO_EMPTY);
$temp=" ";
foreach($arr as $v){
$temp="&#".uniord($v);//prints the equivalent html entity of string
}