PHP Calling a function on matches in regexp

PHP Calling a function on matches in regexp - php

I need to scramble/encode all e-mail addresses in a string, turn them into links and leave the rest of the string intact?
I'm using
$withlinks = preg_replace("/([\w-?&;#~=\.\/]+\#(\[?)[a-zA-Z0-9\-\.]+\.([a-zA-Z]{2,3}|[0-9]{1,3})(\]?))/i","$1",$nolinks);
to make links out of e-mails and
function encode_email($str) {
$str = mb_convert_encoding($str , 'UTF-32', 'UTF-8'); //big endian
$split = str_split($str, 4);
$res = "";
foreach ($split as $c) {
$cur = 0;
for ($i = 0; $i < 4; $i++) {
$cur |= ord($c[$i]) << (8*(3 - $i));
}
$res .= "&#" . $cur . ";";
}
return $res;
}
to encode the addresses but I can't figure out how to put them together, so that only e-mails would be encoded and turned into links.

You can use preg_replace_callback so that you can manipulate the replacement text to be exactly what you want...
<?php
// test string
$nolinks = "amy#winehous.com is an email for bobby#fisher.com plays chess";
// your original function
function encode_email($str)
{
$str = mb_convert_encoding($str, 'UTF-32', 'UTF-8'); //big endian
$split = str_split($str, 4);
$res = "";
foreach ($split as $c) {
$cur = 0;
for ($i = 0; $i < 4; $i++) {
$cur |= ord($c[$i]) << (8 * (3 - $i));
}
$res .= "&#" . $cur . ";";
}
return $res;
}
// function used for callback
function encode_email_and_add_link($in)
{
// get encoded email address (don't actually know what this function does)
$encoded = encode_email($in[1]);
// return a hyperlink string built with encoded email address
return "$encoded";
}
// do the regex with callback
$withlinks = preg_replace_callback("/([\w-?&;#~=\.\/]+\#(\[?)[a-zA-Z0-9\-\.]+\.([a-zA-Z]{2,3}|[0-9]{1,3})(\]?))/i", 'encode_email_and_add_link', $nolinks);
// output the results
echo $withlinks;

Related

Encode all characters to entities

I would like to convert all characters to character entities to act as a spam protection for email address. I need entities in this format
y o u ...
just like on this web (here done by JS):
character entities encoding
Is there any simple way to do that in PHP like with built-in functions?

function encode2($str) {
$str = mb_convert_encoding($str , 'UTF-32', 'UTF-8');
$t = unpack("N*", $str);
$t = array_map(function($n) { return "&#$n;"; }, $t);
return implode("", $t);
}
or
function encode($str) {
$str = mb_convert_encoding($str , 'UTF-32', 'UTF-8'); //big endian
$split = str_split($str, 4);
$res = "";
foreach ($split as $c) {
$cur = 0;
for ($i = 0; $i < 4; $i++) {
$cur |= ord($c[$i]) << (8*(3 - $i));
}
$res .= "&#" . $cur . ";";
}
return $res;
}

convert emoji to their hex code

I'm trying to detect the emoji that I get through e.g. a POST (the source ist not necessary).
As an example I'm using this emoji: ✊🏾 (I hope it's visible)
The code for it is U+270A U+1F3FE (I'm using http://unicode.org/emoji/charts/full-emoji-list.html for the codes)
Now I converted the emoji with json_encode and I get: \u270a\ud83c\udffe
Here the only part that is equal is 270a. \ud83c\udffe is not equal to U+1F3FE, not even if I add them together (1B83A)
How do I get from ✊🏾 to U+270A U+1F3FE with e.g. php?

Use mb_convert_encoding and convert from UTF-8 to UTF-32. Then do some additional formatting:
// Strips leading zeros
// And returns str in UPPERCASE letters with a U+ prefix
function format($str) {
$copy = false;
$len = strlen($str);
$res = '';
for ($i = 0; $i < $len; ++$i) {
$ch = $str[$i];
if (!$copy) {
if ($ch != '0') {
$copy = true;
}
// Prevent format("0") from returning ""
else if (($i + 1) == $len) {
$res = '0';
}
}
if ($copy) {
$res .= $ch;
}
}
return 'U+'.strtoupper($res);
}
function convert_emoji($emoji) {
// ✊🏾 --> 0000270a0001f3fe
$emoji = mb_convert_encoding($emoji, 'UTF-32', 'UTF-8');
$hex = bin2hex($emoji);
// Split the UTF-32 hex representation into chunks
$hex_len = strlen($hex) / 8;
$chunks = array();
for ($i = 0; $i < $hex_len; ++$i) {
$tmp = substr($hex, $i * 8, 8);
// Format each chunk
$chunks[$i] = format($tmp);
}
// Convert chunks array back to a string
return implode($chunks, ' ');
}
echo convert_emoji('✊🏾'); // U+270A U+1F3FE

Simple function, inspired by #d3L answer above
function emoji_to_unicode($emoji) {
$emoji = mb_convert_encoding($emoji, 'UTF-32', 'UTF-8');
$unicode = strtoupper(preg_replace("/^[0]+/","U+",bin2hex($emoji)));
return $unicode;
}
Exmaple
emoji_to_unicode("💵");//returns U+1F4B5

You can do like this, consider the emoji a normal character.
$emoji = "✊🏾";
$str = str_replace('"', "", json_encode($emoji, JSON_HEX_APOS));
$myInput = $str;
$myHexString = str_replace('\\u', '', $myInput);
$myBinString = hex2bin($myHexString);
print iconv("UTF-16BE", "UTF-8", $myBinString);

PHP functions question

I'm fairly new to PHP functions I really dont know what the bottom functions do, can some one give an explanation or working example explaining the functions below. Thanks.
PHP functions.
function mbStringToArray ($str) {
if (empty($str)) return false;
$len = mb_strlen($str);
$array = array();
for ($i = 0; $i < $len; $i++) {
$array[] = mb_substr($str, $i, 1);
}
return $array;
}
function mb_chunk_split($str, $len, $glue) {
if (empty($str)) return false;
$array = mbStringToArray ($str);
$n = 0;
$new = '';
foreach ($array as $char) {
if ($n < $len) $new .= $char;
elseif ($n == $len) {
$new .= $glue . $char;
$n = 0;
}
$n++;
}
return $new;
}

The first function takes a multibyte string and converts it into an array of characters, returning the array.
The second function takes a multibyte string and inserts the $glue string every $len characters.

function mbStringToArray ($str) { // $str is a function argument
if (empty($str)) return false; // empty() checks if the argument is not equal to NULL (but does exist)
$len = mb_strlen($str); // returns the length of a multibyte string (ie UTF-8)
$array = array(); // init of an array
for ($i = 0; $i < $len; $i++) { // self explanatory
$array[] = mb_substr($str, $i, 1); // mb_substr() substitutes from $str one char for each pass
}
return $array; // returns the result as an array
}
That should help you to understand the second function

Reverse letters in each word of a string without using native splitting or reversing functions [duplicate]

This question already has answers here:
Reverse the letters in each word of a string
(6 answers)
Closed 1 year ago.
This task has already been asked/answered, but I recently had a job interview that imposed some additional challenges to demonstrate my ability to manipulate strings.
Problem: How to reverse words in a string? You can use strpos(), strlen() and substr(), but not other very useful functions such as explode(), strrev(), etc.
Example:
$string = "I am a boy"
Answer:
I ma a yob
Below is my working coding attempt that took me 2 days [sigh], but there must be a more elegant and concise solution.
Intention:
1. get number of words
2. based on word count, grab each word and store into array
3. loop through array and output each word in reverse order
Code:
$str = "I am a boy";
echo reverse_word($str) . "\n";
function reverse_word($input) {
//first find how many words in the string based on whitespace
$num_ws = 0;
$p = 0;
while(strpos($input, " ", $p) !== false) {
$num_ws ++;
$p = strpos($input, ' ', $p) + 1;
}
echo "num ws is $num_ws\n";
//now start grabbing word and store into array
$p = 0;
for($i=0; $i<$num_ws + 1; $i++) {
$ws_index = strpos($input, " ", $p);
//if no more ws, grab the rest
if($ws_index === false) {
$word = substr($input, $p);
}
else {
$length = $ws_index - $p;
$word = substr($input, $p, $length);
}
$result[] = $word;
$p = $ws_index + 1; //move onto first char of next word
}
print_r($result);
//append reversed words
$str = '';
for($i=0; $i<count($result); $i++) {
$str .= reverse($result[$i]) . " ";
}
return $str;
}
function reverse($str) {
$a = 0;
$b = strlen($str)-1;
while($a < $b) {
swap($str, $a, $b);
$a ++;
$b --;
}
return $str;
}
function swap(&$str, $i1, $i2) {
$tmp = $str[$i1];
$str[$i1] = $str[$i2];
$str[$i2] = $tmp;
}

$string = "I am a boy";
$reversed = "";
$tmp = "";
for($i = 0; $i < strlen($string); $i++) {
if($string[$i] == " ") {
$reversed .= $tmp . " ";
$tmp = "";
continue;
}
$tmp = $string[$i] . $tmp;
}
$reversed .= $tmp;
print $reversed . PHP_EOL;
>> I ma a yob

Whoops! Mis-read the question. Here you go (Note that this will split on all non-letter boundaries, not just space. If you want a character not to be split upon, just add it to $wordChars):
function revWords($string) {
//We need to find word boundries
$wordChars = 'abcdefghijklmnopqrstuvwxyz';
$buffer = '';
$return = '';
$len = strlen($string);
$i = 0;
while ($i < $len) {
$chr = $string[$i];
if (($chr & 0xC0) == 0xC0) {
//UTF8 Characer!
if (($chr & 0xF0) == 0xF0) {
//4 Byte Sequence
$chr .= substr($string, $i + 1, 3);
$i += 3;
} elseif (($chr & 0xE0) == 0xE0) {
//3 Byte Sequence
$chr .= substr($string, $i + 1, 2);
$i += 2;
} else {
//2 Byte Sequence
$i++;
$chr .= $string[$i];
}
}
if (stripos($wordChars, $chr) !== false) {
$buffer = $chr . $buffer;
} else {
$return .= $buffer . $chr;
$buffer = '';
}
$i++;
}
return $return . $buffer;
}
Edit: Now it's a single function, and stores the buffer naively in reversed notation.
Edit2: Now handles UTF8 characters (just add "word" characters to the $wordChars string)...

My answer is to count the string length, split the letters into an array and then, loop it backwards. This is also a good way to check if a word is a palindrome. This can only be used for regular string and numbers.
preg_split can be changed to explode() as well.
/**
* Code snippet to reverse a string (LM)
*/
$words = array('one', 'only', 'apple', 'jobs');
foreach ($words as $d) {
$strlen = strlen($d);
$splits = preg_split('//', $d, -1, PREG_SPLIT_NO_EMPTY);
for ($i = $strlen; $i >= 0; $i=$i-1) {
#$reverse .= $splits[$i];
}
echo "Regular: {$d}".PHP_EOL;
echo "Reverse: {$reverse}".PHP_EOL;
echo "-----".PHP_EOL;
unset($reverse);
}

Without using any function.
$string = 'I am a boy';
$newString = '';
$temp = '';
$i = 0;
while(#$string[$i] != '')
{
if($string[$i] == ' ') {
$newString .= $temp . ' ';
$temp = '';
}
else {
$temp = $string[$i] . $temp;
}
$i++;
}
$newString .= $temp . ' ';
echo $newString;
Output: I ma a yob

How to convert all characters to their html entity equivalent using PHP

I want to convert this hello#domain.com to
hello#domain.com
I have tried:
url_encode($string)
this provides the same string I entered, returned with the # symbol converted to %40
also tried:
htmlentities($string)
this provides the same string right back.
I am using a UTF8 charset. not sure if this makes a difference....

Here it goes (assumes UTF-8, but it's trivial to change):
function encode($str) {
$str = mb_convert_encoding($str , 'UTF-32', 'UTF-8'); //big endian
$split = str_split($str, 4);
$res = "";
foreach ($split as $c) {
$cur = 0;
for ($i = 0; $i < 4; $i++) {
$cur |= ord($c[$i]) << (8*(3 - $i));
}
$res .= "&#" . $cur . ";";
}
return $res;
}
EDIT Recommended alternative using unpack:
function encode2($str) {
$str = mb_convert_encoding($str , 'UTF-32', 'UTF-8');
$t = unpack("N*", $str);
$t = array_map(function($n) { return "&#$n;"; }, $t);
return implode("", $t);
}

Much easier way to do this:
function convertToNumericEntities($string) {
$convmap = array(0x80, 0x10ffff, 0, 0xffffff);
return mb_encode_numericentity($string, $convmap, "UTF-8");
}
You can change the encoding if you are using anything different.
Fixed map range. Thanks to Artefacto.

function uniord($char) {
$k=mb_convert_encoding($char , 'UTF-32', 'UTF-8');
$k1=ord(substr($k,0,1));
$k2=ord(substr($k,1,1));
$value=(string)($k2*256+$k1);
return $value;
}
the above function works for 1 character but if you have a string you can do like this
$string="anytext";
$arr=preg_split(//u,$string,-1,PREG_SPLIT_NO_EMPTY);
$temp=" ";
foreach($arr as $v){
$temp="&#".uniord($v);//prints the equivalent html entity of string
}

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

PHP Calling a function on matches in regexp - php

Related

Encode all characters to entities

convert emoji to their hex code

PHP functions question

Reverse letters in each word of a string without using native splitting or reversing functions [duplicate]

How to convert all characters to their html entity equivalent using PHP

Categories

Resources