Encode all characters to entities - php

I would like to convert all characters to character entities to act as a spam protection for email address. I need entities in this format
y o u ...
just like on this web (here done by JS):
character entities encoding
Is there any simple way to do that in PHP like with built-in functions?

function encode2($str) {
$str = mb_convert_encoding($str , 'UTF-32', 'UTF-8');
$t = unpack("N*", $str);
$t = array_map(function($n) { return "&#$n;"; }, $t);
return implode("", $t);
}
or
function encode($str) {
$str = mb_convert_encoding($str , 'UTF-32', 'UTF-8'); //big endian
$split = str_split($str, 4);
$res = "";
foreach ($split as $c) {
$cur = 0;
for ($i = 0; $i < 4; $i++) {
$cur |= ord($c[$i]) << (8*(3 - $i));
}
$res .= "&#" . $cur . ";";
}
return $res;
}

Related

convert emoji to their hex code

I'm trying to detect the emoji that I get through e.g. a POST (the source ist not necessary).
As an example I'm using this emoji: ✊🏾 (I hope it's visible)
The code for it is U+270A U+1F3FE (I'm using http://unicode.org/emoji/charts/full-emoji-list.html for the codes)
Now I converted the emoji with json_encode and I get: \u270a\ud83c\udffe
Here the only part that is equal is 270a. \ud83c\udffe is not equal to U+1F3FE, not even if I add them together (1B83A)
How do I get from ✊🏾 to U+270A U+1F3FE with e.g. php?
Use mb_convert_encoding and convert from UTF-8 to UTF-32. Then do some additional formatting:
// Strips leading zeros
// And returns str in UPPERCASE letters with a U+ prefix
function format($str) {
$copy = false;
$len = strlen($str);
$res = '';
for ($i = 0; $i < $len; ++$i) {
$ch = $str[$i];
if (!$copy) {
if ($ch != '0') {
$copy = true;
}
// Prevent format("0") from returning ""
else if (($i + 1) == $len) {
$res = '0';
}
}
if ($copy) {
$res .= $ch;
}
}
return 'U+'.strtoupper($res);
}
function convert_emoji($emoji) {
// ✊🏾 --> 0000270a0001f3fe
$emoji = mb_convert_encoding($emoji, 'UTF-32', 'UTF-8');
$hex = bin2hex($emoji);
// Split the UTF-32 hex representation into chunks
$hex_len = strlen($hex) / 8;
$chunks = array();
for ($i = 0; $i < $hex_len; ++$i) {
$tmp = substr($hex, $i * 8, 8);
// Format each chunk
$chunks[$i] = format($tmp);
}
// Convert chunks array back to a string
return implode($chunks, ' ');
}
echo convert_emoji('✊🏾'); // U+270A U+1F3FE
Simple function, inspired by #d3L answer above
function emoji_to_unicode($emoji) {
$emoji = mb_convert_encoding($emoji, 'UTF-32', 'UTF-8');
$unicode = strtoupper(preg_replace("/^[0]+/","U+",bin2hex($emoji)));
return $unicode;
}
Exmaple
emoji_to_unicode("💵");//returns U+1F4B5
You can do like this, consider the emoji a normal character.
$emoji = "✊🏾";
$str = str_replace('"', "", json_encode($emoji, JSON_HEX_APOS));
$myInput = $str;
$myHexString = str_replace('\\u', '', $myInput);
$myBinString = hex2bin($myHexString);
print iconv("UTF-16BE", "UTF-8", $myBinString);

Optimal function to create a random UTF-8 string in PHP? (letter characters only)

I wrote this function that creates a random string of UTF-8 characters. It works well, but the regular expression [^\p{L}] is not filtering all non-letter characters it seems. I can't think of a better way to generate the full range of unicode without non-letter characters.. short of manually searching for and defining the decimal letter ranges between 65 and 65533.
function rand_str($max_length, $min_length = 1, $utf8 = true) {
static $utf8_chars = array();
if ($utf8 && !$utf8_chars) {
for ($i = 1; $i <= 65533; $i++) {
$utf8_chars[] = mb_convert_encoding("&#$i;", 'UTF-8', 'HTML-ENTITIES');
}
$utf8_chars = preg_replace('/[^\p{L}]/u', '', $utf8_chars);
foreach ($utf8_chars as $i => $char) {
if (trim($utf8_chars[$i])) {
$chars[] = $char;
}
}
$utf8_chars = $chars;
}
$chars = $utf8 ? $utf8_chars : str_split('abcdefghijklmnopqrstuvwxyz');
$num_chars = count($chars);
$string = '';
$length = mt_rand($min_length, $max_length);
for ($i = 0; $i < $length; $i++) {
$string .= $chars[mt_rand(1, $num_chars) - 1];
}
return $string;
}
\p{L} might be catching too much. Try to limit to {Ll} and {LU} -- {L} includes {Lo} -- others.
With PHP7 and IntlChar there is now a better way:
function utf8_random_string(int $length) : string {
$r = "";
for ($i = 0; $i < $length; $i++) {
$codePoint = mt_rand(0x80, 0xffff);
$char = \IntlChar::chr($codePoint);
if ($char !== null && \IntlChar::isprint($char)) {
$r .= $char;
} else {
$i--;
}
}
return $r;
}

PHP Calling a function on matches in regexp

I need to scramble/encode all e-mail addresses in a string, turn them into links and leave the rest of the string intact?
I'm using
$withlinks = preg_replace("/([\w-?&;#~=\.\/]+\#(\[?)[a-zA-Z0-9\-\.]+\.([a-zA-Z]{2,3}|[0-9]{1,3})(\]?))/i","$1",$nolinks);
to make links out of e-mails and
function encode_email($str) {
$str = mb_convert_encoding($str , 'UTF-32', 'UTF-8'); //big endian
$split = str_split($str, 4);
$res = "";
foreach ($split as $c) {
$cur = 0;
for ($i = 0; $i < 4; $i++) {
$cur |= ord($c[$i]) << (8*(3 - $i));
}
$res .= "&#" . $cur . ";";
}
return $res;
}
to encode the addresses but I can't figure out how to put them together, so that only e-mails would be encoded and turned into links.
You can use preg_replace_callback so that you can manipulate the replacement text to be exactly what you want...
<?php
// test string
$nolinks = "amy#winehous.com is an email for bobby#fisher.com plays chess";
// your original function
function encode_email($str)
{
$str = mb_convert_encoding($str, 'UTF-32', 'UTF-8'); //big endian
$split = str_split($str, 4);
$res = "";
foreach ($split as $c) {
$cur = 0;
for ($i = 0; $i < 4; $i++) {
$cur |= ord($c[$i]) << (8 * (3 - $i));
}
$res .= "&#" . $cur . ";";
}
return $res;
}
// function used for callback
function encode_email_and_add_link($in)
{
// get encoded email address (don't actually know what this function does)
$encoded = encode_email($in[1]);
// return a hyperlink string built with encoded email address
return "$encoded";
}
// do the regex with callback
$withlinks = preg_replace_callback("/([\w-?&;#~=\.\/]+\#(\[?)[a-zA-Z0-9\-\.]+\.([a-zA-Z]{2,3}|[0-9]{1,3})(\]?))/i", 'encode_email_and_add_link', $nolinks);
// output the results
echo $withlinks;

Reverse letters in each word of a string without using native splitting or reversing functions [duplicate]

This question already has answers here:
Reverse the letters in each word of a string
(6 answers)
Closed 1 year ago.
This task has already been asked/answered, but I recently had a job interview that imposed some additional challenges to demonstrate my ability to manipulate strings.
Problem: How to reverse words in a string? You can use strpos(), strlen() and substr(), but not other very useful functions such as explode(), strrev(), etc.
Example:
$string = "I am a boy"
Answer:
I ma a yob
Below is my working coding attempt that took me 2 days [sigh], but there must be a more elegant and concise solution.
Intention:
1. get number of words
2. based on word count, grab each word and store into array
3. loop through array and output each word in reverse order
Code:
$str = "I am a boy";
echo reverse_word($str) . "\n";
function reverse_word($input) {
//first find how many words in the string based on whitespace
$num_ws = 0;
$p = 0;
while(strpos($input, " ", $p) !== false) {
$num_ws ++;
$p = strpos($input, ' ', $p) + 1;
}
echo "num ws is $num_ws\n";
//now start grabbing word and store into array
$p = 0;
for($i=0; $i<$num_ws + 1; $i++) {
$ws_index = strpos($input, " ", $p);
//if no more ws, grab the rest
if($ws_index === false) {
$word = substr($input, $p);
}
else {
$length = $ws_index - $p;
$word = substr($input, $p, $length);
}
$result[] = $word;
$p = $ws_index + 1; //move onto first char of next word
}
print_r($result);
//append reversed words
$str = '';
for($i=0; $i<count($result); $i++) {
$str .= reverse($result[$i]) . " ";
}
return $str;
}
function reverse($str) {
$a = 0;
$b = strlen($str)-1;
while($a < $b) {
swap($str, $a, $b);
$a ++;
$b --;
}
return $str;
}
function swap(&$str, $i1, $i2) {
$tmp = $str[$i1];
$str[$i1] = $str[$i2];
$str[$i2] = $tmp;
}
$string = "I am a boy";
$reversed = "";
$tmp = "";
for($i = 0; $i < strlen($string); $i++) {
if($string[$i] == " ") {
$reversed .= $tmp . " ";
$tmp = "";
continue;
}
$tmp = $string[$i] . $tmp;
}
$reversed .= $tmp;
print $reversed . PHP_EOL;
>> I ma a yob
Whoops! Mis-read the question. Here you go (Note that this will split on all non-letter boundaries, not just space. If you want a character not to be split upon, just add it to $wordChars):
function revWords($string) {
//We need to find word boundries
$wordChars = 'abcdefghijklmnopqrstuvwxyz';
$buffer = '';
$return = '';
$len = strlen($string);
$i = 0;
while ($i < $len) {
$chr = $string[$i];
if (($chr & 0xC0) == 0xC0) {
//UTF8 Characer!
if (($chr & 0xF0) == 0xF0) {
//4 Byte Sequence
$chr .= substr($string, $i + 1, 3);
$i += 3;
} elseif (($chr & 0xE0) == 0xE0) {
//3 Byte Sequence
$chr .= substr($string, $i + 1, 2);
$i += 2;
} else {
//2 Byte Sequence
$i++;
$chr .= $string[$i];
}
}
if (stripos($wordChars, $chr) !== false) {
$buffer = $chr . $buffer;
} else {
$return .= $buffer . $chr;
$buffer = '';
}
$i++;
}
return $return . $buffer;
}
Edit: Now it's a single function, and stores the buffer naively in reversed notation.
Edit2: Now handles UTF8 characters (just add "word" characters to the $wordChars string)...
My answer is to count the string length, split the letters into an array and then, loop it backwards. This is also a good way to check if a word is a palindrome. This can only be used for regular string and numbers.
preg_split can be changed to explode() as well.
/**
* Code snippet to reverse a string (LM)
*/
$words = array('one', 'only', 'apple', 'jobs');
foreach ($words as $d) {
$strlen = strlen($d);
$splits = preg_split('//', $d, -1, PREG_SPLIT_NO_EMPTY);
for ($i = $strlen; $i >= 0; $i=$i-1) {
#$reverse .= $splits[$i];
}
echo "Regular: {$d}".PHP_EOL;
echo "Reverse: {$reverse}".PHP_EOL;
echo "-----".PHP_EOL;
unset($reverse);
}
Without using any function.
$string = 'I am a boy';
$newString = '';
$temp = '';
$i = 0;
while(#$string[$i] != '')
{
if($string[$i] == ' ') {
$newString .= $temp . ' ';
$temp = '';
}
else {
$temp = $string[$i] . $temp;
}
$i++;
}
$newString .= $temp . ' ';
echo $newString;
Output: I ma a yob

How to convert all characters to their html entity equivalent using PHP

I want to convert this hello#domain.com to
hello#domain.com
I have tried:
url_encode($string)
this provides the same string I entered, returned with the # symbol converted to %40
also tried:
htmlentities($string)
this provides the same string right back.
I am using a UTF8 charset. not sure if this makes a difference....
Here it goes (assumes UTF-8, but it's trivial to change):
function encode($str) {
$str = mb_convert_encoding($str , 'UTF-32', 'UTF-8'); //big endian
$split = str_split($str, 4);
$res = "";
foreach ($split as $c) {
$cur = 0;
for ($i = 0; $i < 4; $i++) {
$cur |= ord($c[$i]) << (8*(3 - $i));
}
$res .= "&#" . $cur . ";";
}
return $res;
}
EDIT Recommended alternative using unpack:
function encode2($str) {
$str = mb_convert_encoding($str , 'UTF-32', 'UTF-8');
$t = unpack("N*", $str);
$t = array_map(function($n) { return "&#$n;"; }, $t);
return implode("", $t);
}
Much easier way to do this:
function convertToNumericEntities($string) {
$convmap = array(0x80, 0x10ffff, 0, 0xffffff);
return mb_encode_numericentity($string, $convmap, "UTF-8");
}
You can change the encoding if you are using anything different.
Fixed map range. Thanks to Artefacto.
function uniord($char) {
$k=mb_convert_encoding($char , 'UTF-32', 'UTF-8');
$k1=ord(substr($k,0,1));
$k2=ord(substr($k,1,1));
$value=(string)($k2*256+$k1);
return $value;
}
the above function works for 1 character but if you have a string you can do like this
$string="anytext";
$arr=preg_split(//u,$string,-1,PREG_SPLIT_NO_EMPTY);
$temp=" ";
foreach($arr as $v){
$temp="&#".uniord($v);//prints the equivalent html entity of string
}

Categories