How to match exact word contains any special character ?
$string = 'Fall in love with #PepsiMoji! Celebrate #WorldEmojiDay by downloading our keyboard # http://bit.ly/pepsiKB & take your text game up a notch. - teacher';
preg_match("/\b#worldemojiday\b/i",$string); //false
I want to match exact word containing any character. Like if I want to match word 'download' in this string, It should return false
preg_match("/\bdownload\b/i",$string); //false
But when I search for downloading, It should return true.
Thanks
The problem is with \b word boundary before # non-word character. \b cannot match the position between 2 non-word (or between 2 word) characters, thus, you do not get a match.
A solution is either to remove the first \b, or use \B (a non-word boundary matching between 2 word or 2 non-word characters) instead of it.
\B#worldemojiday\b
Or
#worldemojiday\b
See demo (or this one)
Note that \B also matches at the beginning of a string.
Here is a way to build a regex dynamically, adding word boundaries only where necessary:
$srch = "žvolen";
$srch = preg_quote($srch);
if (preg_match('/\w$/u', $srch)) {
$srch .= '\\b';
}
if (preg_match('/^\w/u', $srch)) {
$srch = '\\b' . $srch;
}
echo preg_match("/" . $srch . "/ui", "žvolen is used.");
What about using lookarounds:
(?<!\w)#WorldEmojiDay(?!\w)
This ensures, that there's no word character before or after the string. See test at regex101
Related
I have the following:
$pattern = "/^([\w_]{1})(.+)([\w_]{1}#)/u";
$replacement = "$1*$3***$4";
$email = "testa#weste.de";
echo "obfuscated: ".preg_replace($pattern, $replacement, $email).RT;
The result is: t*a#***weste.de
But I would like to have: t*#w***.de
How to grab the letter after the # and not before. And how does it work with the .de part?
For the replacement in the example data, you might use a match with \K to forget what is matched after the first character and keep it.
To keep the first character after the # sign, you can use a capture group and use that in the replacement.
^\w\K[^\s#]+#(\w)[^\s.#]+
^ Start of string
\w Match a single word char (That will also match _)
\K Forget what is matched so far
[^\s#]+ Match 1+ chars other than # or a whitespace char
# Match the # char
(\w) Capture group 1, match a word char (to keep)
[^\s.#]+ Match 1+ chars other than #, a whitespace char or dot
Regex demo | Php demo
In the replacement use a single capture group *#$1***
$email = "testa#weste.de";
$pattern = "/^\w\K[^\s#]+#(\w)[^\s.#]+/";
$replacement = "*#$1***";
echo preg_replace($pattern, $replacement, $email);
Output
t*#w***.de
You can make the pattern as specific as you would like. If there should for example be a dot followed by at least 2 chars a-z at the end of the string, and you don't want to stop matching at the first dot after the #
^\w\K[^\s#]+#(\w)[^\s#]+(?=\.[a-z]{2,}$)
Regex demo
I found this way to do it:
$email = 'someemail#domain.com'
[$firstPart, $lastPart] = explode('#', $email);
$maskedEmail = str_replace(substr($firstPart, 0, 7), str_repeat('*', 7), $email);
Uses PHP native functions and works just fine!
I have a regex which does all matches except one match.The PHP Code for the word match is:
$string = preg_replace("/\b".$wordToMatch."\b/","<span class='sp_err' style='background-color:yellow;'>".$wordToMatch."</span>",$string);
Here in the above regex when the $wordToMatch variable value becomes "-abc" and the $string value is "The word -abc should match and abc-abc should not match".With above regex it fails to catch "-abc".
I want enhancement in the above regex so that it can catch "-abc" in $string,but if it tries to match "-abc" in "abc-abc" of $string it should not.
In case your keywords can have non-word characters on both ends you can rely on lookarounds for a whole word match:
"/(?<!\\w)".$wordToMatch."(?!\\w)/"
Here, (?<!\w) will make sure there is no word character before the word to match, and (?!\w) negative lookahead will make sure there is no word character after the word to match. These are unambiguous subpatterns, while \b meaning depends on the context.
See regex demo showing that -abc is not matched in abc-abc and matches if it is not enclosed with word characters.
PHP demo:
$wordToMatch = "-abc";
$re = "/(?<!\\w)" . $wordToMatch . "(?!\\w)/";
$str = "abc-abc -abc";
$subst = "!$0!";
$result = preg_replace($re, $subst, $str);
echo $result; // => abc-abc !-abc!
I have a PHP app that needs to parse a comma separated list of items in any order. Unfortunately some of the keywords overlap:
$mylist = 'foo,wind,unwind';
$contains_foo = preg_match('/foo/i', $mylist);
$contains_bar = preg_match('/bar/i', $mylist);
$contains_unwind = preg_match('/unwind/i', $mylist);
$contains_wind = preg_match('/wind/i', $mylist); # BUG!
How can I craft a regex that only matches 'wind' if its not preceeded by 'un' ?
Note that I can't match for /,wind/, because it might be the first item in the list.
I could probably do /^wind/ || /,wind/ but would prefer to have it in a single regex.
How can I craft a regex that only matches 'wind' if its not preceeded by 'un' ?
You can use a negative lookbehind:
$contains_wind = preg_match('/(?<!un)wind/i', $mylist);
Here (?<!un) is a negative lookbehind which means fail the match if word wind is preceded by un.
On another note looking at your example you could also use word boundaries:
$contains_wind = preg_match('/\bwind\b/i', $mylist);
Here assertion \b is called word boundary that will match wind only if it is surrounded by non-word characters or anchors.
I am looking to find and replace words in a long string. I want to find words that start looks like this: $test$ and replace it with nothing.
I have tried a lot of things and can't figure out the regular expression. This is the last one I tried:
preg_replace("/\b\\$(.*)\\$\b/im", '', $text);
No matter what I do, I can't get it to replace words that begin and end with a dollar sign.
Use single quotes instead of double quotes and remove the double escape.
$text = preg_replace('/\$(.*?)\$/', '', $text);
Also a word boundary \b does not consume any characters, it asserts that on one side there is a word character, and on the other side there is not. You need to remove the word boundary for this to work and you have nothing containing word characters in your regular expression, so the i modifier is useless here and you have no anchors so remove the m (multi-line) modifier as well.
As well * is a greedy operator. Therefore, .* will match as much as it can and still allow the remainder of the regular expression to match. To be clear on this, it will replace the entire string:
$text = '$fooo$ bar $baz$ quz $foobar$';
var_dump(preg_replace('/\$(.*)\$/', '', $text));
# => string(0) ""
I recommend using a non-greedy operator *? here. Once you specify the question mark, you're stating (don't be greedy.. as soon as you find a ending $... stop, you're done.)
$text = '$fooo$ bar $baz$ quz $foobar$';
var_dump(preg_replace('/\$(.*?)\$/', '', $text));
# => string(10) " bar quz "
Edit
To fix your problem, you can use \S which matches any non-white space character.
$text = '$20.00 is the $total$';
var_dump(preg_replace('/\$\S+\$/', '', $text));
# string(14) "$20.00 is the "
There are three different positions that qualify as word boundaries \b:
Before the first character in the string, if the first character is a word character.
After the last character in the string, if the last character is a word character.
Between two characters in the string, where one is a word character and the other is not a word character.
$ is not a word character, so don't use \b or it won't work. Also, there is no need for the double escaping and no need for the im modifiers:
preg_replace('/\$(.*)\$/', '', $text);
I would use:
preg_replace('/\$[^$]+\$/', '', $text);
You can use preg_quote to help you out on 'quoting':
$t = preg_replace('/' . preg_quote('$', '/') . '.*?' . preg_quote('$', '/') . '/', '', $text);
echo $t;
From the docs:
This is useful if you have a run-time string that you need to match in some text and the string may contain special regex characters.
The special regular expression characters are: . \ + * ? [ ^ ] $ ( ) { } = ! < > | : -
Contrary to your use of word boundary markers (\b), you actually want the inverse effect (\B)-- you want to make sure that there ISN'T a word character next to the non-word character $.
You also don't need to use capturing parentheses because you are not using a backreference in your replacement string.
\S+ means one or more non-whitespace characters -- with greedy/possessive matching.
Code: (Demo)
$text = '$foo$ boo hi$$ mon$k$ey $how thi$ $baz$ bar $foobar$';
var_export(
preg_replace(
'/\B\$\S+\$\B/',
'',
$text
)
);
Output:
' boo hi$$ mon$k$ey $how thi$ bar '
How would I remove repeating characters (e.g. remove the letter k in cakkkke for it to be cake)?
One straightforward way to do this would be to loop through each character of the string and append each character of the string to a new string if the character isn't a repeat of the previous character.
Here is some code that can do this:
$newString = '';
$oldString = 'cakkkke';
$lastCharacter = '';
for ($i = 0; $i < strlen($oldString); $i++) {
if ($oldString[$i] !== $lastCharacter) {
$newString .= $oldString[$i];
}
$lastCharacter = $oldString[$i];
}
echo $newString;
Is there a way to do the same thing more concisely using regex or built-in functions?
Use backrefrences
echo preg_replace("/(.)\\1+/", "$1", "cakkke");
Output:
cake
Explanation:
(.) captures any character
\\1 is a backreferences to the first capture group. The . above in this case.
+ makes the backreference match atleast 1 (so that it matches aa, aaa, aaaa, but not a)
Replacing it with $1 replaces the complete matched text kkk in this case, with the first capture group, k in this case.
You want to first match a character, followed by that character repeated: (.)\1+. Replace that with the first character. The brackets create a backreference to the first character, which you use both to match the repeated instances and as the replacement text.
preg_replace('/(.)\1+/', '$1', $str);