I have a PHP app that needs to parse a comma separated list of items in any order. Unfortunately some of the keywords overlap:
$mylist = 'foo,wind,unwind';
$contains_foo = preg_match('/foo/i', $mylist);
$contains_bar = preg_match('/bar/i', $mylist);
$contains_unwind = preg_match('/unwind/i', $mylist);
$contains_wind = preg_match('/wind/i', $mylist); # BUG!
How can I craft a regex that only matches 'wind' if its not preceeded by 'un' ?
Note that I can't match for /,wind/, because it might be the first item in the list.
I could probably do /^wind/ || /,wind/ but would prefer to have it in a single regex.
How can I craft a regex that only matches 'wind' if its not preceeded by 'un' ?
You can use a negative lookbehind:
$contains_wind = preg_match('/(?<!un)wind/i', $mylist);
Here (?<!un) is a negative lookbehind which means fail the match if word wind is preceded by un.
On another note looking at your example you could also use word boundaries:
$contains_wind = preg_match('/\bwind\b/i', $mylist);
Here assertion \b is called word boundary that will match wind only if it is surrounded by non-word characters or anchors.
Related
I have the following:
$pattern = "/^([\w_]{1})(.+)([\w_]{1}#)/u";
$replacement = "$1*$3***$4";
$email = "testa#weste.de";
echo "obfuscated: ".preg_replace($pattern, $replacement, $email).RT;
The result is: t*a#***weste.de
But I would like to have: t*#w***.de
How to grab the letter after the # and not before. And how does it work with the .de part?
For the replacement in the example data, you might use a match with \K to forget what is matched after the first character and keep it.
To keep the first character after the # sign, you can use a capture group and use that in the replacement.
^\w\K[^\s#]+#(\w)[^\s.#]+
^ Start of string
\w Match a single word char (That will also match _)
\K Forget what is matched so far
[^\s#]+ Match 1+ chars other than # or a whitespace char
# Match the # char
(\w) Capture group 1, match a word char (to keep)
[^\s.#]+ Match 1+ chars other than #, a whitespace char or dot
Regex demo | Php demo
In the replacement use a single capture group *#$1***
$email = "testa#weste.de";
$pattern = "/^\w\K[^\s#]+#(\w)[^\s.#]+/";
$replacement = "*#$1***";
echo preg_replace($pattern, $replacement, $email);
Output
t*#w***.de
You can make the pattern as specific as you would like. If there should for example be a dot followed by at least 2 chars a-z at the end of the string, and you don't want to stop matching at the first dot after the #
^\w\K[^\s#]+#(\w)[^\s#]+(?=\.[a-z]{2,}$)
Regex demo
I found this way to do it:
$email = 'someemail#domain.com'
[$firstPart, $lastPart] = explode('#', $email);
$maskedEmail = str_replace(substr($firstPart, 0, 7), str_repeat('*', 7), $email);
Uses PHP native functions and works just fine!
http://www.tehplayground.com/#0qrTOzTh3
$inputs = array(
'2', // no match
'29.2', // no match
'2.48',
'8.06.16', // no match
'-2.41',
'-.54', // no match
'4.492', // no match
'4.194,32',
'39,299.39',
'329.382,39',
'-188.392,49',
'293.392,193', // no match
'-.492.183,33', // no match
'3.492.249,11',
'29.439.834,13',
'-392.492.492,43'
);
$number_pattern = '-?(?:[0-9]|[0-9]{2}|[0-9]{3}[\.,]?)?(?:[0-9]|[0-9]{2}|[0-9]{3})[\.,][0-9]{2}(?!\d)';
foreach($inputs as $input){
preg_match_all('/'.$number_pattern.'/m', $input, $matches);
print_r($matches);
}
It seems you are looking for
$number_pattern = '-?(?<![\d.,])\d{1,3}(?:[,.]\d{3})*[.,]\d{2}(?![\d.])';
See the PHP demo and a regex demo.
The anchors are not used, there are lookarounds on both sides of the pattern instead.
Pattern details:
-? - an optional hyphen
(?<![\d.,]) - there cannot be a digit, comma or dot befire the current location
-\d{1,3} - 1 to 3 digits
(?:[,.]\d{3})* - zero or more sequences of a comma or dot followed with 3 digits
[.,] - a comma or dot
\d{2} - 2 digits that are
(?![\d.]) - not followed with a digit or dot.
Note in PHP, you do not need to specify the /m MULTILINE mode and use the $ end of string anchor,
preg_match_all('/'.$number_pattern.'/', $input, $matches);
is enough to match the numbers you need in larger texts.
If you need to match them as standalone strings, use a simpler
^-?\d{1,3}(?:[,.]\d{3})*[.,]\d{2}$
See the regex demo.
I have a regex which does all matches except one match.The PHP Code for the word match is:
$string = preg_replace("/\b".$wordToMatch."\b/","<span class='sp_err' style='background-color:yellow;'>".$wordToMatch."</span>",$string);
Here in the above regex when the $wordToMatch variable value becomes "-abc" and the $string value is "The word -abc should match and abc-abc should not match".With above regex it fails to catch "-abc".
I want enhancement in the above regex so that it can catch "-abc" in $string,but if it tries to match "-abc" in "abc-abc" of $string it should not.
In case your keywords can have non-word characters on both ends you can rely on lookarounds for a whole word match:
"/(?<!\\w)".$wordToMatch."(?!\\w)/"
Here, (?<!\w) will make sure there is no word character before the word to match, and (?!\w) negative lookahead will make sure there is no word character after the word to match. These are unambiguous subpatterns, while \b meaning depends on the context.
See regex demo showing that -abc is not matched in abc-abc and matches if it is not enclosed with word characters.
PHP demo:
$wordToMatch = "-abc";
$re = "/(?<!\\w)" . $wordToMatch . "(?!\\w)/";
$str = "abc-abc -abc";
$subst = "!$0!";
$result = preg_replace($re, $subst, $str);
echo $result; // => abc-abc !-abc!
How to match exact word contains any special character ?
$string = 'Fall in love with #PepsiMoji! Celebrate #WorldEmojiDay by downloading our keyboard # http://bit.ly/pepsiKB & take your text game up a notch. - teacher';
preg_match("/\b#worldemojiday\b/i",$string); //false
I want to match exact word containing any character. Like if I want to match word 'download' in this string, It should return false
preg_match("/\bdownload\b/i",$string); //false
But when I search for downloading, It should return true.
Thanks
The problem is with \b word boundary before # non-word character. \b cannot match the position between 2 non-word (or between 2 word) characters, thus, you do not get a match.
A solution is either to remove the first \b, or use \B (a non-word boundary matching between 2 word or 2 non-word characters) instead of it.
\B#worldemojiday\b
Or
#worldemojiday\b
See demo (or this one)
Note that \B also matches at the beginning of a string.
Here is a way to build a regex dynamically, adding word boundaries only where necessary:
$srch = "žvolen";
$srch = preg_quote($srch);
if (preg_match('/\w$/u', $srch)) {
$srch .= '\\b';
}
if (preg_match('/^\w/u', $srch)) {
$srch = '\\b' . $srch;
}
echo preg_match("/" . $srch . "/ui", "žvolen is used.");
What about using lookarounds:
(?<!\w)#WorldEmojiDay(?!\w)
This ensures, that there's no word character before or after the string. See test at regex101
Here is my concern,
I have a string and I need to extract chraracters two by two.
$str = "abcdef" should return array('ab', 'bc', 'cd', 'de', 'ef'). I want to use preg_match_all instead of loops. Here is the pattern I am using.
$str = "abcdef";
preg_match_all('/[\w]{2}/', $str);
The thing is, it returns Array('ab', 'cd', 'ef'). It misses 'bc' and 'de'.
I have the same problem if I want to extract a certain number of words
$str = "ab cd ef gh ij";
preg_match_all('/([\w]+ ){2}/', $str); // returns array('ab cd', 'ef gh'), I'm also missing the last part
What am I missing? Or is it simply not possible to do so with preg_match_all?
For the first problem, what you want to do is match overlapping string, and this requires zero-width (not consuming text) look-around to grab the character:
/(?=(\w{2}))/
The regex above will capture the match in the first capturing group.
DEMO
For the second problem, it seems that you also want overlapping string. Using the same trick:
/(?=(\b\w+ \w+\b))/
Note that \b is added to check the boundary of the word. Since the match does not consume text, the next match will be attempted at the next index (which is in the middle of the first word), instead of at the end of the 2nd word. We don't want to capture from middle of a word, so we need the boundary check.
Note that \b's definition is based on \w, so if you ever change the definition of a word, you need to emulate the word boundary with look-ahead and look-behind with the corresponding character set.
DEMO
In case if you need a Non-Regex solution, Try this...
<?php
$str = "abcdef";
$len = strlen($str);
$arr = array();
for($count = 0; $count < ($len - 1); $count++)
{
$arr[] = $str[$count].$str[$count+1];
}
print_r($arr);
?>
See Codepad.