PHP substr breaks emoji - php

I need to get only 30 characters from the paragraph submitted by user. In case the 30th character is an emoji, the output shows question marks. How can I avoid breaking the emojis?
echo substr("Hello world Hello world Hell😄 ", 0, 30);
Output: Hello world Hello world Hell��
Also, when using json_encode to return the output, the output is blank.
$myvariable = array();
$myvariable['hello'] = substr("Hello world Hello world Hell😄 ", 0, 30);
echo json_encode($myvariable);

I think the simplest solution would be to use mb_substr
Performs a multi-byte safe substr() operation based on number of
characters.
php > $myvariable = array();
php > $myvariable['hello'] = mb_substr("Hello world Hello world Hell😄 ", 0, 30);
php > var_dump($myvariable);
array(1) {
["hello"]=>
string(33) "Hello world Hello world Hell😄 "
}
php > echo json_encode($myvariable);
{"hello":"Hello world Hello world Hell\ud83d\ude04 "}
php >

<meta charset="ISO-8859-1">
OR
function entities( $string ) {
$stringBuilder = "";
$offset = 0;
if ( empty( $string ) ) {
return "";
}
while ( $offset >= 0 ) {
$decValue = ordutf8( $string, $offset );
$char = unichr($decValue);
$htmlEntited = htmlentities( $char );
if( $char != $htmlEntited ){
$stringBuilder .= $htmlEntited;
} elseif( $decValue >= 128 ){
$stringBuilder .= "&#" . $decValue . ";";
} else {
$stringBuilder .= $char;
}
}
return $stringBuilder;
}
// source - http://php.net/manual/en/function.ord.php#109812
function ordutf8($string, &$offset) {
$code = ord(substr($string, $offset,1));
if ($code >= 128) { //otherwise 0xxxxxxx
if ($code < 224) $bytesnumber = 2; //110xxxxx
else if ($code < 240) $bytesnumber = 3; //1110xxxx
else if ($code < 248) $bytesnumber = 4; //11110xxx
$codetemp = $code - 192 - ($bytesnumber > 2 ? 32 : 0) - ($bytesnumber > 3 ? 16 : 0);
for ($i = 2; $i <= $bytesnumber; $i++) {
$offset ++;
$code2 = ord(substr($string, $offset, 1)) - 128; //10xxxxxx
$codetemp = $codetemp*64 + $code2;
}
$code = $codetemp;
}
$offset += 1;
if ($offset >= strlen($string)) $offset = -1;
return $code;
}
// source - http://php.net/manual/en/function.chr.php#88611
function unichr($u) {
return mb_convert_encoding('&#' . intval($u) . ';', 'UTF-8', 'HTML-ENTITIES');
}
/* ---- */
var_dump( entities( "&" ) ) . "\n";
var_dump( entities( "<" ) ) . "\n";
var_dump( entities( "😎" ) ) . "\n";
var_dump( entities( "☚" ) ) . "\n";
var_dump( entities( "" ) ) . "\n";
var_dump( entities( "A" ) ) . "\n";
var_dump( entities( "Hello 😎 world" ) ) . "\n";
var_dump( entities( "this & that 😎" ) ) . "\n";

$first = preg_replace_callback('/[\x{80}-\x{10FFFF}]/u', function ($m) {
$char = current($m);
$utf = iconv('UTF-8', 'UCS-4', $char);
return sprintf("&#x%s;", ltrim(strtoupper(bin2hex($utf)), "0"));
}, $string);
Output
string 'Français' (length=13)
OR
echo json_decode('"\uD83D\uDE00"');

Related

how to change the color of decimals points in php

how can I change the colors of ONLY decimals of a number in PHP?
this is my function for formatting numbers
function formatNumber($input, $decimals = 'auto', $prefix = '', $suffix = '') {
$input = floatval($input);
$absInput = abs($input);
if ($decimals === 'auto') {
if ($absInput >= 0.01) {
$decimals = 2;
} elseif (0.0001 <= $absInput && $absInput < 0.01) {
$decimals = 4;
} elseif (0.000001 <= $absInput && $absInput < 0.0001) {
$decimals = 6;
} elseif ($absInput < 0.000001) {
$decimals = 8;
}
}
if($input>1000000000000000){
$result = ROUND(($input/1000000000000000),2).' TH ';
}elseif($input>1000000000000){
$result = ROUND(($input/1000000000000),2).' T ';
}elseif($input>1000000000){
$result = ROUND(($input/1000000000),2).' B ';
}elseif($input>1000000) {
$result = ROUND(($input / 1000000), 2) . ' M ';
} else {
$result = number_format($input, $decimals, config('decimal-separator','.'), config('thousand-separator', ',')) ;
}
return ($prefix ? $prefix : '') . $result. ($suffix ? $suffix : '');
}
and I use it like that
<?php echo formatNumber($chart['assist'], 2)?>
i want my decimals with a different color... can i use css there or add classes?
Here is an example of what I meant in my comment by manipulate the string:
<?php
$n = 123.456;
$whole = floor($n); // 123
$fraction = $n - $whole; // .456
//echo str_replace('.', '<span class="colorme">.</span>', $n);
echo $whole . '<span class="colorme">.</span>' . substr($fraction, strpos($fraction, '.')+1);
//Simply do a string replace on the decimal point.
UPDATED break out parts, concatenate.
A client side approach with Javascript (with some jQuery) would be something like:
$('#myDiv').each(function () {
$(this).html($(this).html().replace(/\./g, '<span class="colorme">.</span>'));
//or decimal point and decimal number part...
$(this).html($(this).html().replace(/\.([0-9]+)/g, '<span class="colorme">.$1</span>'));
});
Remember that other locales don't always use . for divider.
So with your existing code, you could do something like:
$dec_point = config('decimal-separator','.');
$wrapped_dec_point = "<span class='dec_point'>{$dec_point}</span>";
$result = number_format($input, $decimals, $wrapped_dec_point, config('thousand-separator', ',')) ;
and then of course, for your CSS, you would just need
.dec_point {
color: magenta;
}
Here is shorter solution
$n = 123.456;
$nums = explode(".",$n);
echo $nums[0] . '<span class="colorme">.' . $nums[1] . '</span>';

htmlentites not working for emoji

I am trying to show a characters html entity
echo htmlentities(htmlentities("&"));
//outputs &
echo htmlentities(htmlentities("<"));
//outputs <
but it does not seem to work with emoji
echo htmlentities(htmlentities("😎"));
//outputs 😎
How can I get it to output 😎?
Edit:
I am trying to display a string input by the user with all of the html entities encoded.
echo htmlentities(htmlentities($input))
Example:
"this & that 😎" -> "this & that 😎"
This works for regular HTML entities, UTF-8 emoticons (and other utf stuff) as well as regular strings of course.
I was just having trouble with empty string value, so I had to put this condition into the function.
function entities( $string ) {
$stringBuilder = "";
$offset = 0;
if ( empty( $string ) ) {
return "";
}
while ( $offset >= 0 ) {
$decValue = ordutf8( $string, $offset );
$char = unichr($decValue);
$htmlEntited = htmlentities( $char );
if( $char != $htmlEntited ){
$stringBuilder .= $htmlEntited;
} elseif( $decValue >= 128 ){
$stringBuilder .= "&#" . $decValue . ";";
} else {
$stringBuilder .= $char;
}
}
return $stringBuilder;
}
// source - http://php.net/manual/en/function.ord.php#109812
function ordutf8($string, &$offset) {
$code = ord(substr($string, $offset,1));
if ($code >= 128) { //otherwise 0xxxxxxx
if ($code < 224) $bytesnumber = 2; //110xxxxx
else if ($code < 240) $bytesnumber = 3; //1110xxxx
else if ($code < 248) $bytesnumber = 4; //11110xxx
$codetemp = $code - 192 - ($bytesnumber > 2 ? 32 : 0) - ($bytesnumber > 3 ? 16 : 0);
for ($i = 2; $i <= $bytesnumber; $i++) {
$offset ++;
$code2 = ord(substr($string, $offset, 1)) - 128; //10xxxxxx
$codetemp = $codetemp*64 + $code2;
}
$code = $codetemp;
}
$offset += 1;
if ($offset >= strlen($string)) $offset = -1;
return $code;
}
// source - http://php.net/manual/en/function.chr.php#88611
function unichr($u) {
return mb_convert_encoding('&#' . intval($u) . ';', 'UTF-8', 'HTML-ENTITIES');
}
/* ---- */
var_dump( entities( "&" ) ) . "\n";
var_dump( entities( "<" ) ) . "\n";
var_dump( entities( "😎" ) ) . "\n";
var_dump( entities( "☚" ) ) . "\n";
var_dump( entities( "" ) ) . "\n";
var_dump( entities( "A" ) ) . "\n";
var_dump( entities( "Hello 😎 world" ) ) . "\n";
var_dump( entities( "this & that 😎" ) ) . "\n";
$emoji = "\xF0\x9F\x98\x8E"; // its your emoji
I get this callback from convert unicode to html entities hex
$hex = preg_replace_callback('/[\x{80}-\x{10FFFF}]/u', function ($m) {
$char = current($m);
$utf = iconv('UTF-8', 'UCS-4', $char);
return sprintf("&#x%s;", ltrim(strtoupper(bin2hex($utf)), "0"));
}, $emoji);
echo $hex;
echo json_encode(("\xF0\x9F\x98\x8E")); // its decoded. htmlentities doesn't work with it.
Is this OK ?
htmlentities documentation states that
all characters which have HTML character entity equivalents are
translated into these entities.
Your emoji does not have an equivalent like < is for <, so it doesn't get converted. 😎 is just an HTML code, not an HTML entity.
function htmlEntitiesOrCode($string) {
//try htmlentities first
$result = htmlentities($string, ENT_COMPAT, "UTF-8");
//if the output is different from input, an entity was returned
if ($result != $string) {
return $result;
}
//get the html code
$offset = 0;
$code = ord(substr($string, $offset,1));
if ($code >= 128) {
if ($code < 224) {
$bytesnumber = 2;
} else if ($code < 240) {
$bytesnumber = 3;
} else if ($code < 248) {
$bytesnumber = 4;
}
$codetemp = $code - 192 - ($bytesnumber > 2 ? 32 : 0) - ($bytesnumber > 3 ? 16 : 0);
for ($i = 2; $i <= $bytesnumber; $i++) {
$offset ++;
$code2 = ord(substr($string, $offset, 1)) - 128;
$codetemp = $codetemp*64 + $code2;
}
$code = $codetemp;
}
$offset += 1;
if ($offset >= strlen($string)) {
$offset = -1;
}
$result = "&#" . $code;
return $result;
}
HTML code function taken from here: http://php.net/manual/en/function.ord.php#109812

Partially hide email address in PHP

I am building a simple friend/buddy system, and when someone tries to search for new friends, I want to show partially hidden email addresses, so as to give an idea about who the user might be, without revealing the actual details.
So I want abcdlkjlkjk#hotmail.com to become abcdl******#hotmail.com.
As a test I wrote:
<?php
$email = "abcdlkjlkjk#hotmail.com";
$em = explode("#",$email);
$name = $em[0];
$len = strlen($name);
$showLen = floor($len/2);
$str_arr = str_split($name);
for($ii=$showLen;$ii<$len;$ii++){
$str_arr[$ii] = '*';
}
$em[0] = implode('',$str_arr);
$new_name = implode('#',$em);
echo $new_name;
This works, but I was wondering if there was any easier/shorter way of applying the same logic? Like a regex maybe?
here's something quick:
function obfuscate_email($email)
{
$em = explode("#",$email);
$name = implode('#', array_slice($em, 0, count($em)-1));
$len = floor(strlen($name)/2);
return substr($name,0, $len) . str_repeat('*', $len) . "#" . end($em);
}
// to see in action:
$emails = ['"Abc\#def"#iana.org', 'abcdlkjlkjk#hotmail.com'];
foreach ($emails as $email)
{
echo obfuscate_email($email) . "\n";
}
echoes:
"Abc\*****#iana.org
abcdl*****#hotmail.com
uses substr() and str_repeat()
Maybe this is not what you want, but I would go for this:
<?php
/*
Here's the logic:
We want to show X numbers.
If length of STR is less than X, hide all.
Else replace the rest with *.
*/
function mask($str, $first, $last) {
$len = strlen($str);
$toShow = $first + $last;
return substr($str, 0, $len <= $toShow ? 0 : $first).str_repeat("*", $len - ($len <= $toShow ? 0 : $toShow)).substr($str, $len - $last, $len <= $toShow ? 0 : $last);
}
function mask_email($email) {
$mail_parts = explode("#", $email);
$domain_parts = explode('.', $mail_parts[1]);
$mail_parts[0] = mask($mail_parts[0], 2, 1); // show first 2 letters and last 1 letter
$domain_parts[0] = mask($domain_parts[0], 2, 1); // same here
$mail_parts[1] = implode('.', $domain_parts);
return implode("#", $mail_parts);
}
$emails = array(
'a#a.com',
'ab#aa.com',
'abc#aaa.com',
'abcd#aaaa.com',
'abcde#aaaaa.com',
'abcdef#aaaaaa.com',
'abcdefg#aaaaaaa.com',
'abcdefgh#aaaaaaaa.com',
'abcdefghi#aaaaaaaaa.com'
);
foreach ($emails as $email){
echo '<b>'.$email.'</b><br>'.mask_email($email).'<br><hr>';
}
Result:
a#a.com
*#*.com
ab#aa.com
**#**.com
abc#aaa.com
***#***.com
abcd#aaaa.com
ab*d#aa*a.com
abcde#aaaaa.com
ab**e#aa**a.com
abcdef#aaaaaa.com
ab***f#aa***a.com
abcdefg#aaaaaaa.com
ab****g#aa****a.com
abcdefgh#aaaaaaaa.com
ab*****h#aa*****a.com
abcdefghi#aaaaaaaaa.com
ab******i#aa******a.com
Here's my alternate solution for this.
I wouldn't use the exact number of mask characters to match the original length of the email, but rather use a fixed length mask for privacy reasons. I would also set the maximum allowed characters to show as well as never show more than half of the email. I would also mask all emails less than a minimum length.
With those rules in mind, here's my function with optional parameters:
function maskEmail($email, $minLength = 3, $maxLength = 10, $mask = "***") {
$atPos = strrpos($email, "#");
$name = substr($email, 0, $atPos);
$len = strlen($name);
$domain = substr($email, $atPos);
if (($len / 2) < $maxLength) $maxLength = ($len / 2);
$shortenedEmail = (($len > $minLength) ? substr($name, 0, $maxLength) : "");
return "{$shortenedEmail}{$mask}{$domain}";
}
Tests:
$email = "";
$tests = [];
for ($i=0; $i < 22; $i++) {
$email .= chr(97 + $i);
$tests[] = $email . " -> " . maskEmail("{$email}#example.com");
}
print_r($tests);
Results:
Array
(
[0] => a -> ***#example.com
[1] => ab -> ***#example.com
[2] => abc -> ***#example.com
[3] => abcd -> ab***#example.com
[4] => abcde -> ab***#example.com
[5] => abcdef -> abc***#example.com
[6] => abcdefg -> abc***#example.com
[7] => abcdefgh -> abcd***#example.com
[8] => abcdefghi -> abcd***#example.com
[9] => abcdefghij -> abcde***#example.com
[10] => abcdefghijk -> abcde***#example.com
[11] => abcdefghijkl -> abcdef***#example.com
[12] => abcdefghijklm -> abcdef***#example.com
[13] => abcdefghijklmn -> abcdefg***#example.com
[14] => abcdefghijklmno -> abcdefg***#example.com
[15] => abcdefghijklmnop -> abcdefgh***#example.com
[16] => abcdefghijklmnopq -> abcdefgh***#example.com
[17] => abcdefghijklmnopqr -> abcdefghi***#example.com
[18] => abcdefghijklmnopqrs -> abcdefghi***#example.com
[19] => abcdefghijklmnopqrst -> abcdefghij***#example.com
[20] => abcdefghijklmnopqrstu -> abcdefghij***#example.com
[21] => abcdefghijklmnopqrstuv -> abcdefghij***#example.com
)
For instance :
substr($email, 0, 3).'****'.substr($email, strpos($email, "#"));
Which will give you something like:
abc****#hotmail.com
I'm using this:
function secret_mail($email)
{
$prop=2;
$domain = substr(strrchr($email, "#"), 1);
$mailname=str_replace($domain,'',$email);
$name_l=strlen($mailname);
$domain_l=strlen($domain);
for($i=0;$i<=$name_l/$prop-1;$i++)
{
$start.='x';
}
for($i=0;$i<=$domain_l/$prop-1;$i++)
{
$end.='x';
}
return substr_replace($mailname, $start, 2, $name_l/$prop).substr_replace($domain, $end, 2, $domain_l/$prop);
}
Will output something like:
cyxxxxxone#gmxxxxcom
I created a function can help someone
function hideEmail($email)
{
$mail_parts = explode("#", $email);
$length = strlen($mail_parts[0]);
$show = floor($length/2);
$hide = $length - $show;
$replace = str_repeat("*", $hide);
return substr_replace ( $mail_parts[0] , $replace , $show, $hide ) . "#" . substr_replace($mail_parts[1], "**", 0, 2);
}
hideEmail("name#example.com"); // output: na**#**ample.com
hideEmail("something#example.com"); // output: some*****#**ample.com
You can customize as you want .. something like this (if length is 4 or less display only the first)
function hideEmail($email) {
$mail_parts = explode("#", $email);
$length = strlen($mail_parts[0]);
if($length <= 4 & $length > 1)
{
$show = 1;
}else{
$show = floor($length/2);
}
$hide = $length - $show;
$replace = str_repeat("*", $hide);
return substr_replace ( $mail_parts[0] , $replace , $show, $hide ) . "#" . substr_replace($mail_parts[1], "**", 0, 2);
}
hideEmail("name#example.com"); // output: n***#**ample.com
hideEmail("something#example.com"); // output: some*****#**ample.com
Very simple RegExp way:
$email = preg_replace('/\B[^#.]/', '*', $email)
Results:
john#smith.com: j***#s*****.c**
abcdef#example.org: a*****#e******.o**
abcdef: a*****
Sometimes its good to show the last character too.
ABCDEFZ#gmail.com becomes
A*****Z#gmail.com
I will suggest you keep things simple.
Maybe something like this is simple enough
https://github.com/fedmich/PHP_Codes/blob/master/mask_email.php
Masks an email to show first 3 characters and then the last character before the # sign
function mask_email( $email ) {
/*
Author: Fed
Simple way of masking emails
*/
$char_shown = 3;
$mail_parts = explode("#", $email);
$username = $mail_parts[0];
$len = strlen( $username );
if( $len <= $char_shown ){
return implode("#", $mail_parts );
}
//Logic: show asterisk in middle, but also show the last character before #
$mail_parts[0] = substr( $username, 0 , $char_shown )
. str_repeat("*", $len - $char_shown - 1 )
. substr( $username, $len - $char_shown + 2 , 1 )
;
return implode("#", $mail_parts );
}
I m using femich answer above and tweak it a bit for my
function mask_email($email, $char_shown_front = 1, $char_shown_back = 1)
{
$mail_parts = explode('#', $email);
$username = $mail_parts[0];
$len = strlen($username);
if ($len < $char_shown_front or $len < $char_shown_back) {
return implode('#', $mail_parts);
}
//Logic: show asterisk in middle, but also show the last character before #
$mail_parts[0] = substr($username, 0, $char_shown_front)
. str_repeat('*', $len - $char_shown_front - $char_shown_back)
. substr($username, $len - $char_shown_back, $char_shown_back);
return implode('#', $mail_parts);
}
test123#gmail.com -> t*****3#gmail.com
you can pass in the number of character to show in the front and in the back
You can also try this....
<?php
$email = "abcdlkjlkjk#hotmail.com";
$resultmob = substr($email,0,5);
$resultmob .= "**********";
$resultmob .= substr($email,strpos($email, "#"));
echo $resultmob;
?>
Answer:-
abcdl******#hotmail.com
Another variant that was heavily influenced by the answers already shared.
This has two key extra benefits:
It keeps the first characters after a defined set of delimiters, making it more readable while still preserving privacy.
It works with longer domain endings such as .org.uk and .com.au
Example: firstname.lastname#example.co.uk becomes f********.l*******#e*****.c*.u*
function mask_email( $email ) {
$masked = '';
$show_next = true;
foreach ( str_split( $email ) as $chr ) {
if ( $show_next ) {
$masked .= $chr;
$show_next = false;
}
else if ( in_array( $chr, array('.', '#', '+') ) ) {
$masked .= $chr;
$show_next = true;
}
else {
$masked .= '*';
$show_next = false;
}
}
return $masked;
}
Try this function. This will work with valid emails, such as "Abc\#def"#iana.org.
function hideEmail($email){
$prefix = substr($email, 0, strrpos($email, '#'));
$suffix = substr($email, strripos($email, '#'));
$len = floor(strlen($prefix)/2);
return substr($prefix, 0, $len) . str_repeat('*', $len) . $suffix;
}
echo hideEmail('abcdljtrsjtrsjlkjk#hotmail.com');
echo hideEmail('"abc\#def"#iana.org');
Returns
abcdljtrs*********#hotmail.com
"abc\*****#iana.org
I have a function
function hide_email($email){
$final_str = '';
$string = explode('#', $email);
$leftlength = strlen($string[0]);
$string2 = explode('.', $string[1]);
$string2len = strlen($string2[0]);
$leftlength_new = $leftlength-1;
$first_letter = substr($string[0], 0,1);
$stars = '';
$stars2 = '';
for ($i=0; $i < $leftlength_new; $i++) {
$stars .= '*';
}
for ($i=0; $i < $string2len; $i++) {
$stars2 .= '*';
}
$stars;
return $final_str .= $first_letter.$stars.'#'.$stars2.'.'.$string2[1];
}
echo hide_email('Hello#PHP.com');
There was an issue in case if there would be 1 character before #. I have fixed in below function.
function obfuscate_email($email)
{
$em = explode("#",$email);
if(strlen($em[0])==1){
return '*'.'#'.$em[1];
}
$name = implode(array_slice($em, 0, count($em)-1), '#');
$len = floor(strlen($name)/2);
return substr($name,0, $len) . str_repeat('*', $len) . "#" . end($em);
}
Here is version with only 2 lines (if you remove function stuff).
<?php
function censor_email($str,$amount=2, $char='*') {
list($local, $domain)=explode("#",$str);
return substr($local,0,$amount).str_repeat($char,strlen($local)-$amount)."#".$domain;
}
?>
function maskEmail($email) {
preg_match('/^.?(.*)?.#.+$/', $email, $matches);
return str_replace($matches[1], str_repeat('*', strlen($matches[1])), $email);
}
echo maskEmail('abcdefgh#example.com')
echo maskEmail('abh#example.com')
echo maskEmail('ah#example.com')
echo maskEmail('a#example.com')
returns
a******h#example.com
a*h#example.com
ah#example.com
a#example.com
This is what I did, as I required exact number of string count same as plain email.
This function only shows first & last two characters before "#"
function mask_email($email)
{
$em = explode("#",$email);
$len = strlen($em[0]);
$substr_count = 1;
if($len > 6)
$substr_count = 2;
$first = substr($em[0], 0,$substr_count);
$last = substr($em[0], -$substr_count);
$no_of_star = $len - ($substr_count * 2);
return $first.str_repeat('*', $no_of_star).$last."#".end($em);
}
Though this is an old thread & has many answers already. I want to share my own snippet too.
Which checks if it's a valid email or not.
How much characters to censor & to show.
What character should be used to censor.
function get_censored_email($email, $show_chars = 3, $censor_char = '*'){
if (filter_var($email, FILTER_VALIDATE_EMAIL)) {
$char_length = strlen($email);
$censor_count = $char_length - $show_chars;
$return_email = substr($email, 0, $show_chars);
$return_email .= str_repeat("*", $censor_count);
return $return_email;
}
}
$email = 'noman.ibrahim115#gmail.com';
echo get_censored_email($email, 3, '*'); // returns nom***********************
Method 1:
<?php
function hideEmailAddress($email) {
if(filter_var($email, FILTER_VALIDATE_EMAIL)) {
list($first, $last) = explode('#', $email);
$first = str_replace(substr($first, '3'), str_repeat('*', strlen($first)-3), $first);
$last = explode('.', $last);
$last_domain = str_replace(substr($last['0'], '1'), str_repeat('*', strlen($last['0'])-1), $last['0']);
$hideEmailAddress = $first.'#'.$last_domain.'.'.$last['1'];
return $hideEmailAddress;
}
}
$email = "test#example.com";
echo hideEmailAddress($email);
?>
Method 2:
<?php
function hideEmailAddress($email) {
$em = explode("#",$email);
$name = implode(array_slice($em, 0, count($em)-1), '#');
$len = floor(strlen($name)/2);
return substr($name,0, $len) . str_repeat('*', $len) . "#" . end($em);
}
$email = 'test#example.com';
echo hideEmailAddress($email);
?>

Where can I find a php regex generator that matches number ranges?

I am looking for something like this: How to generate a regular expression at runtime to match a numeric range but written in php.
Answering your question here, since comments are horrible for code blocks. I wouldn't translate a statement like that directly, as it's nearly unreadable. It's far easier to pick apart like this:
if ($n == $m) { // max/min ranges are the same, so just look for that number of characters
$format = "\{$n\}"; // {n}
} elseif ($n == 1) { // min range is 1, so use the max
$format = "\{1,$m\}"; // {1,m}
} else { // arbitary n->m range
$format = "\{$n,$m\}"; // {n,m}
}
It CAN be done in PHP as a ternary, it's just as illegible/impossible to debug, though:
$format = ($n == $m) ? "\{$n\}" : (($n == 1) ? "\{1,$m\}" : "\{$n,$m\}");
I think this should work:
class NumericRangeRegexGenerator {
private function baseRange($num,$up, $leading1) {
$c = $num[0];
$low = $up ? $c : ($leading1 ? '1' : '0');
$high = $up ? '9': $c;
if (strlen($num) == 1)
return $this->charClass($low, $high);
$re = $c . "(" . $this->baseRange(substr($num,1), $up, false) . ")";
if ($up) $low++; else $high--;
if ($low <= $high)
$re .= "|" . $this->charClass($low, $high) . $this->nDigits(strlen($num) - 1);
return $re;
}
private function charClass($b, $e) {
//String.format(b==e ? "%c" : e-b>1 ? "[%c-%c]" : "[%c%c]", b, e); (in java)
if ($b == $e) {
$format = $b;
} elseif ($e-$b>1) {
$format = '['.$b.'-'.$e.']';
} else {
$format = '['.$b.$e.']';
}
return $format;
}
private function nDigits($n, $m=null) {
//String.format(n==m ? n==1 ? "":"{%d}":"{%d,%d}", n, m) (in java)
if($m===null){
nDigits($n, $n);
}
if ($n == $m) { // max/min ranges are the same, so just look for that number of characters
$format = "\{$n\}"; // {n}
} elseif ($n == 1) { // min range is 1, so use the max
$format = "\{1,$m\}"; // {1,m}
} else { // arbitary n->m range
$format = "\{$n,$m\}"; // {n,m}
}
return "[0-9]" . $format;
}
private function eqLengths($from, $to) {
$fc = $from[0];
$tc = $to[0];
if (strlen($from) == 1 && strlen($to) == 1)
return $this->charClass($fc, $tc);
if ($fc == $tc)
return $fc . "(".$this->rangeRegex(substr($from,1), substr($to,1)).")";
$re = $fc . "(" . $this->baseRange(substr($from,1), true, false) . ")|"
. $tc . "(" . $this->baseRange(substr($to,1), false, false) . ")";
if (++$fc <= --$tc)
$re .= "|" . $this->charClass($fc, $tc) . $this->nDigits(strlen($from) - 1);
return $re;
}
private function nonEqLengths($from, $to) {
$re = $this->baseRange($from,true,false) . "|" . $this->baseRange($to,false,true);
if (strlen($to) - strlen($from) > 1)
$re .= "|[1-9]" . $this->nDigits(strlen($from), strlen($to) - 2);
return $re;
}
public function rangeRegex($n, $m) {
return strlen($n) == strlen($m) ? $this->eqLengths($n, $m) : $this->nonEqLengths($n, $m);
}
}

optimizing a php function that trims strings

i programmed this php function that takes any text/html string and trims it.
For example:
gen_string("Hello, how are you today?",10);
Returns:
Hello, how...
The problem arises when the function string limit is the same as the position of a special character such as: á, ñ, etc...
In which case:
gen_string("Helló my friend",5);
Returns: Hell�...
Any ideas on how to solve this issue? This is the current function:
# string: advanced substr
function gen_string($string,$min,$clean=false) {
$text = trim(strip_tags($string));
if(strlen($text)>$min) {
$blank = strpos($text,' ');
if($blank) {
# limit plus last word
$extra = strpos(substr($text,$min),' ');
$max = $min+$extra;
$r = substr($text,0,$max);
if(strlen($text)>=$max && !$clean) $r=trim($r,'.').'...';
} else {
# if there are no spaces
$r = substr($text,0,$min).'...';
}
} else {
# if original length is lower than limit
$r = $text;
}
return trim($r);
}
Thanks!
You should use the multibyte string functions to correctly handle unicode characters.
For example you could try using mb_strimwidth to truncate a string to a specified length.
You could also take a different approach and make use of the PCRE regex extension's UTF-8 capabilities (assuming your strings are UTF-8!).
function gen_string($string, $length)
{
$str = trim(strip_tags($string));
$strlen = strlen(utf8_decode($str));
// String is less than limit
if ($strlen <= $length) return $str;
// Shorten string, preserving whole "words" (non-whitespace)
preg_match('/^.{'.($length-1).'}\S*/su', $str, $match);
// Append ellipsis if needed (bytes length is OK to check)
if (strlen($match[0]) !== strlen($str)) $match[0] .= '...';
return $match[0];
}
Aside from the multibyte issue, maybe you can write it shorter
function gen_string($str, $limit) {
if ($str >= strlen($limit))
return $str;
$offset = -(strlen($str) - $limit);
return substr($str, 0, strrpos($str, ' ', $offset)).'...';
}
It will limit the length of the string, so rather than cut it after the first word beyond the limit, it ensures that the length is never larger than the limit.
strlen() cannot be used for UTF-8 string, because it would count also the continuation characters, which should not be counted.
You can try with the following code:
define('PREG_CLASS_UNICODE_WORD_BOUNDARY',
'\x{0}-\x{2F}\x{3A}-\x{40}\x{5B}-\x{60}\x{7B}-\x{A9}\x{AB}-\x{B1}\x{B4}' .
'\x{B6}-\x{B8}\x{BB}\x{BF}\x{D7}\x{F7}\x{2C2}-\x{2C5}\x{2D2}-\x{2DF}' .
'\x{2E5}-\x{2EB}\x{2ED}\x{2EF}-\x{2FF}\x{375}\x{37E}-\x{385}\x{387}\x{3F6}' .
'\x{482}\x{55A}-\x{55F}\x{589}-\x{58A}\x{5BE}\x{5C0}\x{5C3}\x{5C6}' .
'\x{5F3}-\x{60F}\x{61B}-\x{61F}\x{66A}-\x{66D}\x{6D4}\x{6DD}\x{6E9}' .
'\x{6FD}-\x{6FE}\x{700}-\x{70F}\x{7F6}-\x{7F9}\x{830}-\x{83E}' .
'\x{964}-\x{965}\x{970}\x{9F2}-\x{9F3}\x{9FA}-\x{9FB}\x{AF1}\x{B70}' .
'\x{BF3}-\x{BFA}\x{C7F}\x{CF1}-\x{CF2}\x{D79}\x{DF4}\x{E3F}\x{E4F}' .
'\x{E5A}-\x{E5B}\x{F01}-\x{F17}\x{F1A}-\x{F1F}\x{F34}\x{F36}\x{F38}' .
'\x{F3A}-\x{F3D}\x{F85}\x{FBE}-\x{FC5}\x{FC7}-\x{FD8}\x{104A}-\x{104F}' .
'\x{109E}-\x{109F}\x{10FB}\x{1360}-\x{1368}\x{1390}-\x{1399}\x{1400}' .
'\x{166D}-\x{166E}\x{1680}\x{169B}-\x{169C}\x{16EB}-\x{16ED}' .
'\x{1735}-\x{1736}\x{17B4}-\x{17B5}\x{17D4}-\x{17D6}\x{17D8}-\x{17DB}' .
'\x{1800}-\x{180A}\x{180E}\x{1940}-\x{1945}\x{19DE}-\x{19FF}' .
'\x{1A1E}-\x{1A1F}\x{1AA0}-\x{1AA6}\x{1AA8}-\x{1AAD}\x{1B5A}-\x{1B6A}' .
'\x{1B74}-\x{1B7C}\x{1C3B}-\x{1C3F}\x{1C7E}-\x{1C7F}\x{1CD3}\x{1FBD}' .
'\x{1FBF}-\x{1FC1}\x{1FCD}-\x{1FCF}\x{1FDD}-\x{1FDF}\x{1FED}-\x{1FEF}' .
'\x{1FFD}-\x{206F}\x{207A}-\x{207E}\x{208A}-\x{208E}\x{20A0}-\x{20B8}' .
'\x{2100}-\x{2101}\x{2103}-\x{2106}\x{2108}-\x{2109}\x{2114}' .
'\x{2116}-\x{2118}\x{211E}-\x{2123}\x{2125}\x{2127}\x{2129}\x{212E}' .
'\x{213A}-\x{213B}\x{2140}-\x{2144}\x{214A}-\x{214D}\x{214F}' .
'\x{2190}-\x{244A}\x{249C}-\x{24E9}\x{2500}-\x{2775}\x{2794}-\x{2B59}' .
'\x{2CE5}-\x{2CEA}\x{2CF9}-\x{2CFC}\x{2CFE}-\x{2CFF}\x{2E00}-\x{2E2E}' .
'\x{2E30}-\x{3004}\x{3008}-\x{3020}\x{3030}\x{3036}-\x{3037}' .
'\x{303D}-\x{303F}\x{309B}-\x{309C}\x{30A0}\x{30FB}\x{3190}-\x{3191}' .
'\x{3196}-\x{319F}\x{31C0}-\x{31E3}\x{3200}-\x{321E}\x{322A}-\x{3250}' .
'\x{3260}-\x{327F}\x{328A}-\x{32B0}\x{32C0}-\x{33FF}\x{4DC0}-\x{4DFF}' .
'\x{A490}-\x{A4C6}\x{A4FE}-\x{A4FF}\x{A60D}-\x{A60F}\x{A673}\x{A67E}' .
'\x{A6F2}-\x{A716}\x{A720}-\x{A721}\x{A789}-\x{A78A}\x{A828}-\x{A82B}' .
'\x{A836}-\x{A839}\x{A874}-\x{A877}\x{A8CE}-\x{A8CF}\x{A8F8}-\x{A8FA}' .
'\x{A92E}-\x{A92F}\x{A95F}\x{A9C1}-\x{A9CD}\x{A9DE}-\x{A9DF}' .
'\x{AA5C}-\x{AA5F}\x{AA77}-\x{AA79}\x{AADE}-\x{AADF}\x{ABEB}' .
'\x{D800}-\x{F8FF}\x{FB29}\x{FD3E}-\x{FD3F}\x{FDFC}-\x{FDFD}' .
'\x{FE10}-\x{FE19}\x{FE30}-\x{FE6B}\x{FEFF}-\x{FF0F}\x{FF1A}-\x{FF20}' .
'\x{FF3B}-\x{FF40}\x{FF5B}-\x{FF65}\x{FFE0}-\x{FFFD}');
function utf8_strlen($text) {
if (function_exists('mb_strlen')) {
return mb_strlen($text);
}
// Do not count UTF-8 continuation bytes.
return strlen(preg_replace("/[\x80-\xBF]/", '', $text));
}
function utf8_truncate($string, $max_length, $wordsafe = FALSE, $add_ellipsis = FALSE, $min_wordsafe_length = 1) {
$ellipsis = '';
$max_length = max($max_length, 0);
$min_wordsafe_length = max($min_wordsafe_length, 0);
if (utf8_strlen($string) <= $max_length) {
// No truncation needed, so don't add ellipsis, just return.
return $string;
}
if ($add_ellipsis) {
// Truncate ellipsis in case $max_length is small.
$ellipsis = utf8_substr('...', 0, $max_length);
$max_length -= utf8_strlen($ellipsis);
$max_length = max($max_length, 0);
}
if ($max_length <= $min_wordsafe_length) {
// Do not attempt word-safe if lengths are bad.
$wordsafe = FALSE;
}
if ($wordsafe) {
$matches = array();
// Find the last word boundary, if there is one within $min_wordsafe_length
// to $max_length characters. preg_match() is always greedy, so it will
// find the longest string possible.
$found = preg_match('/^(.{' . $min_wordsafe_length . ',' . $max_length . '})[' . PREG_CLASS_UNICODE_WORD_BOUNDARY . ']/u', $string, $matches);
if ($found) {
$string = $matches[1];
}
else {
$string = utf8_substr($string, 0, $max_length);
}
}
else {
$string = utf8_substr($string, 0, $max_length);
}
if ($add_ellipsis) {
$string .= $ellipsis;
}
return $string;
}
function utf8_substr($text, $start, $length = NULL) {
if (function_exists('mb_substr')) {
return $length === NULL ? mb_substr($text, $start) : mb_substr($text, $start, $length);
}
else {
$strlen = strlen($text);
// Find the starting byte offset.
$bytes = 0;
if ($start > 0) {
// Count all the continuation bytes from the start until we have found
// $start characters or the end of the string.
$bytes = -1;
$chars = -1;
while ($bytes < $strlen - 1 && $chars < $start) {
$bytes++;
$c = ord($text[$bytes]);
if ($c < 0x80 || $c >= 0xC0) {
$chars++;
}
}
}
elseif ($start < 0) {
// Count all the continuation bytes from the end until we have found
// abs($start) characters.
$start = abs($start);
$bytes = $strlen;
$chars = 0;
while ($bytes > 0 && $chars < $start) {
$bytes--;
$c = ord($text[$bytes]);
if ($c < 0x80 || $c >= 0xC0) {
$chars++;
}
}
}
$istart = $bytes;
// Find the ending byte offset.
if ($length === NULL) {
$iend = $strlen;
}
elseif ($length > 0) {
// Count all the continuation bytes from the starting index until we have
// found $length characters or reached the end of the string, then
// backtrace one byte.
$iend = $istart - 1;
$chars = -1;
$last_real = FALSE;
while ($iend < $strlen - 1 && $chars < $length) {
$iend++;
$c = ord($text[$iend]);
$last_real = FALSE;
if ($c < 0x80 || $c >= 0xC0) {
$chars++;
$last_real = TRUE;
}
}
// Backtrace one byte if the last character we found was a real character
// and we don't need it.
if ($last_real && $chars >= $length) {
$iend--;
}
}
elseif ($length < 0) {
// Count all the continuation bytes from the end until we have found
// abs($start) characters, then backtrace one byte.
$length = abs($length);
$iend = $strlen;
$chars = 0;
while ($iend > 0 && $chars < $length) {
$iend--;
$c = ord($text[$iend]);
if ($c < 0x80 || $c >= 0xC0) {
$chars++;
}
}
// Backtrace one byte if we are not at the beginning of the string.
if ($iend > 0) {
$iend--;
}
}
else {
// $length == 0, return an empty string.
return '';
}
return substr($text, $istart, max(0, $iend - $istart + 1));
}
}
For your return statement you could try:
return htmlspecialchars(trim($r));
EDIT: I tried your code as you provided it and it ran fine for me without having to use htmlspecialchars(). This is probably due to the face that in the <head> of the page the code was running on, the charset was set to UTF-8. So your options could be to set the encoding of the page like this:
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
or to use htmlspecialchars() as above.

Categories