Looking for regex to obfuscate email to t*#w***.de - php

I have the following:
$pattern = "/^([\w_]{1})(.+)([\w_]{1}#)/u";
$replacement = "$1*$3***$4";
$email = "testa#weste.de";
echo "obfuscated: ".preg_replace($pattern, $replacement, $email).RT;
The result is: t*a#***weste.de
But I would like to have: t*#w***.de
How to grab the letter after the # and not before. And how does it work with the .de part?

For the replacement in the example data, you might use a match with \K to forget what is matched after the first character and keep it.
To keep the first character after the # sign, you can use a capture group and use that in the replacement.
^\w\K[^\s#]+#(\w)[^\s.#]+
^ Start of string
\w Match a single word char (That will also match _)
\K Forget what is matched so far
[^\s#]+ Match 1+ chars other than # or a whitespace char
# Match the # char
(\w) Capture group 1, match a word char (to keep)
[^\s.#]+ Match 1+ chars other than #, a whitespace char or dot
Regex demo | Php demo
In the replacement use a single capture group *#$1***
$email = "testa#weste.de";
$pattern = "/^\w\K[^\s#]+#(\w)[^\s.#]+/";
$replacement = "*#$1***";
echo preg_replace($pattern, $replacement, $email);
Output
t*#w***.de
You can make the pattern as specific as you would like. If there should for example be a dot followed by at least 2 chars a-z at the end of the string, and you don't want to stop matching at the first dot after the #
^\w\K[^\s#]+#(\w)[^\s#]+(?=\.[a-z]{2,}$)
Regex demo

I found this way to do it:
$email = 'someemail#domain.com'
[$firstPart, $lastPart] = explode('#', $email);
$maskedEmail = str_replace(substr($firstPart, 0, 7), str_repeat('*', 7), $email);
Uses PHP native functions and works just fine!

Related

PHP Regular Expression Exclusion

Here is the sample PHP code:
<?php
$str = '10,000.1 $100,000.1';
$pattern = '/(?!\$)\d+(,\d{3})*\.?\d*/';
$replacement_str = 'Without$sign';
echo preg_replace($pattern, $replacement_str, $str);?>
Target is to replace numbers only (i.e. "$100,000.1" should not be replaced). But the above code replaces both 10,000.1 and $100,000.1. How to achieve the exclusion?
This assertion is always true (?!\$)\d+ as you match a digit which can not be a $
As the . and the digits at the end of the pattern are optional, it could also match ending on a dot like for example 0,000.
Instead you can assert a whitespace boundary to the left, and optionally match a dot followed by 1 or more digits:
(?<!\S)\d+(?:,\d{3})*(?:\.\d+)?\b
Regex demo
Example:
$str = '10,000.1 $100,000.1';
$pattern = '/(?<!\S)\d+(?:,\d{3})*(?:\.\d+)?\b/';
$replacement_str = 'Without$sign';
echo preg_replace($pattern, $replacement_str, $str);
Output (If you remove the numbers, the text "Without$sign" is not correct)
Without$sign $100,000.1

Regex only grabbing first digit

I'm trying to grab everything after the following digits, so I end up with just the store name in this string:
full string: /stores/1077029-gacha-pins
what I want to ignore: /stores/1077029-
what I need to grab: gacha-pins
Those digits can change at any time so it's not specifically that ID, but any numbers after /stores/
My attempt so far is only grabbing /stores/1
\/stores\/[0-9]
I'm still trying, just thought I would see if I can get some help in the meantime too, will post an answer if I solve.
You may use
'~/stores/\d+-\K[^/]+$~'
Or a more specific one:
'~/stores/\d+-\K\w+(?:-\w+)*$~'
See the regex demo and this regex demo.
Details
/stores/ - a literal string
\d+ - 1+ digits
- - a hyphen
\K - match reset operator
[^/]+ - any 1+ chars other than /
\w+(?:-\w+)* - 1+ word chars and then 0+ sequences of - and 1+ word chars
$ - end of string.
See the PHP demo:
$s = "/stores/1077029-gacha-pins";
$rx = '~/stores/\d+-\K[^/]+$~';
if (preg_match($rx, $s, $matches)) {
echo "Result: " . $matches[0];
}
// => Result: gacha-pins
You should do it like this:
$string = '/stores/1077029-gacha-pins';
preg_match('#/stores/[0-9-]+(.*)#', $string, $matches);
$part = $matches[1];
print_r($part);

split string in numbers and text but accept text with a single digit inside

Let's say I want to split this string in two variables:
$string = "levis 501";
I will use
preg_match('/\d+/', $string, $num);
preg_match('/\D+/', $string, $text);
but then let's say I want to split this one in two
$string = "levis 5° 501";
as $text = "levis 5°"; and $num = "501";
So my guess is I should add a rule to the preg_match('/\d+/', $string, $num); that looks for numbers only at the END of the string and I want it to be between 2 and 3 digits.
But also the $text match now has one number inside...
How would you do it?
To slit a string in two parts, use any of the following:
preg_match('~^(.*?)\s*(\d+)\D*$~s', $s, $matches);
This regex matches:
^ - the start of the string
(.*?) - Group 1 capturing any one or more characters, as few as possible (as *? is a "lazy" quantifier) up to...
\s* - zero or more whitespace symbols
(\d+) - Group 2 capturing 1 or more digits
\D* - zero or more characters other than digit (it is the opposite shorthand character class to \d)
$ - end of string.
The ~s modifier is a DOTALL one forcing the . to match any character, even a newline, that it does not match without this modifier.
Or
preg_split('~\s*(?=\s*\d+\D*$)~', $s);
This \s*(?=\s*\d+\D*$) pattern:
\s* - zero or more whitespaces, but only if followed by...
(?=\s*\d+\D*$) - zero or more whitespaces followed with 1+ digits followed with 0+ characters other than digits followed with end of string.
The (?=...) construct is a positive lookahead that does not consume characters and just checks if the pattern inside matches and if yes, returns "true", and if not, no match occurs.
See IDEONE demo:
$s = "levis 5° 501";
preg_match('~^(.*?)\s*(\d+)\D*$~s', $s, $matches);
print_r($matches[1] . ": ". $matches[2]. PHP_EOL);
print_r(preg_split('~\s*(?=\s*\d+\D*$)~', $s, 2));

Regex to match words starting with hyphen

I have a regex which does all matches except one match.The PHP Code for the word match is:
$string = preg_replace("/\b".$wordToMatch."\b/","<span class='sp_err' style='background-color:yellow;'>".$wordToMatch."</span>",$string);
Here in the above regex when the $wordToMatch variable value becomes "-abc" and the $string value is "The word -abc should match and abc-abc should not match".With above regex it fails to catch "-abc".
I want enhancement in the above regex so that it can catch "-abc" in $string,but if it tries to match "-abc" in "abc-abc" of $string it should not.
In case your keywords can have non-word characters on both ends you can rely on lookarounds for a whole word match:
"/(?<!\\w)".$wordToMatch."(?!\\w)/"
Here, (?<!\w) will make sure there is no word character before the word to match, and (?!\w) negative lookahead will make sure there is no word character after the word to match. These are unambiguous subpatterns, while \b meaning depends on the context.
See regex demo showing that -abc is not matched in abc-abc and matches if it is not enclosed with word characters.
PHP demo:
$wordToMatch = "-abc";
$re = "/(?<!\\w)" . $wordToMatch . "(?!\\w)/";
$str = "abc-abc -abc";
$subst = "!$0!";
$result = preg_replace($re, $subst, $str);
echo $result; // => abc-abc !-abc!

Match exact word with any character regex

How to match exact word contains any special character ?
$string = 'Fall in love with #PepsiMoji! Celebrate #WorldEmojiDay by downloading our keyboard # http://bit.ly/pepsiKB & take your text game up a notch. - teacher';
preg_match("/\b#worldemojiday\b/i",$string); //false
I want to match exact word containing any character. Like if I want to match word 'download' in this string, It should return false
preg_match("/\bdownload\b/i",$string); //false
But when I search for downloading, It should return true.
Thanks
The problem is with \b word boundary before # non-word character. \b cannot match the position between 2 non-word (or between 2 word) characters, thus, you do not get a match.
A solution is either to remove the first \b, or use \B (a non-word boundary matching between 2 word or 2 non-word characters) instead of it.
\B#worldemojiday\b
Or
#worldemojiday\b
See demo (or this one)
Note that \B also matches at the beginning of a string.
Here is a way to build a regex dynamically, adding word boundaries only where necessary:
$srch = "žvolen";
$srch = preg_quote($srch);
if (preg_match('/\w$/u', $srch)) {
$srch .= '\\b';
}
if (preg_match('/^\w/u', $srch)) {
$srch = '\\b' . $srch;
}
echo preg_match("/" . $srch . "/ui", "žvolen is used.");
What about using lookarounds:
(?<!\w)#WorldEmojiDay(?!\w)
This ensures, that there's no word character before or after the string. See test at regex101

Categories