word boundaries preg_replace

word boundaries preg_replace - php

I would like to catch any piece of string that matches %[a-z0-9], respecting the following examples :
1. %xxxxxxxxxxxxx //match
2. this will work %xxxxxx but not this%xxxxxxxxx. //match 1st, not 2nd
3. and also %xxxxxxxxxx. //match
4. just a line ending with %xxxxxxxxxxx //match
5. %Xxxxxxxxxxx //no match
6. 100% of dogs //no match
7. 65%. Begining of new phrase //no match
8. 65%.Begining of new phrase //no match
It can be at the begining of the string or at the end, but not in the middle of a word. It can of course be in the string as a word (separated by space).
I have tried
/(\b)%[a-z0-9]+(\b)/
/(^|\b)%[a-z0-9]+($|\b)/
/(\w)%[a-z0-9]+(\w)/
and others like this, but I can't get it to work like I would. I guess the \b token does not work in example 2 because there is a boundary before the % sign.
Any help would be greatly appreciated.

Try
/\B%[a-z0-9]+\b/
You don't have a word boundary \b between a space and the %, but you have one between s and %.
\B is the opposite of \b not a word boundary.
See it here on regex101

%[a-z0-9]+(?=\s|$)|(?:^|(?<=\s))%[a-z0-9]+
Try this.See demo.
https://regex101.com/r/iS6jF6/20
$re = "/%[a-z0-9]+(?=\\s|$)|(?:^|(?<=\\s))%[a-z0-9]+/m";
$str = "1. %xxxxxxxxxxxxx //match\n2. this will work %xxxxxx but not this%xxxxxxxxx. //match 1st, not 2nd\n3. and also %xxxxxxxxxx. //match\n4. just a line ending with %xxxxxxxxxxx //match\n5. %Xxxxxxxxxxx //no match\n6. 100% of dogs //no match\n7. 65%. Begining of new phrase //no match\n8. 65%.Begining of new phrase //no match";
preg_match_all($re, $str, $matches);
or
%[a-z0-9]+\b|\b%[a-z0-9]+

Related

no solution for me. how can i replace second occurence of a find in php

im searching a paragrahp (string) for a certain word. and i want to replace that word with another word, but i want to replace on the second occurence of my find.
here is what i tried
$string = 'hello my name is hello';
$output = str_replace('hello', 'Gary', $string);
// desired output
//hello my name is Gary
It is very simple but i cant get it right. Please bare in mind my string is very long and has all types of characters in it

With this regex : /^.*?hello\b.*?\Khello/ :
^ assert position at start of the string
.*? matches any character (except newline)
\b assert position at a word boundary (^\w|\w$|\W\w|\w\W)
\K resets the starting point of the reported match. Any previously consumed characters are no longer included in the final match
Check this demo : https://regex101.com/r/lW2kK1/2
which gives you :
$re = "/^.*?hello\\b.*?\\Khello/";
$str = "hello my name is hello";
$subst = "Gary";
$result = preg_replace($re, $subst, $str);

Regex to match words starting with hyphen

I have a regex which does all matches except one match.The PHP Code for the word match is:
$string = preg_replace("/\b".$wordToMatch."\b/","<span class='sp_err' style='background-color:yellow;'>".$wordToMatch."</span>",$string);
Here in the above regex when the $wordToMatch variable value becomes "-abc" and the $string value is "The word -abc should match and abc-abc should not match".With above regex it fails to catch "-abc".
I want enhancement in the above regex so that it can catch "-abc" in $string,but if it tries to match "-abc" in "abc-abc" of $string it should not.

In case your keywords can have non-word characters on both ends you can rely on lookarounds for a whole word match:
"/(?<!\\w)".$wordToMatch."(?!\\w)/"
Here, (?<!\w) will make sure there is no word character before the word to match, and (?!\w) negative lookahead will make sure there is no word character after the word to match. These are unambiguous subpatterns, while \b meaning depends on the context.
See regex demo showing that -abc is not matched in abc-abc and matches if it is not enclosed with word characters.
PHP demo:
$wordToMatch = "-abc";
$re = "/(?<!\\w)" . $wordToMatch . "(?!\\w)/";
$str = "abc-abc -abc";
$subst = "!$0!";
$result = preg_replace($re, $subst, $str);
echo $result; // => abc-abc !-abc!

Match exact word with any character regex

How to match exact word contains any special character ?
$string = 'Fall in love with #PepsiMoji! Celebrate #WorldEmojiDay by downloading our keyboard # http://bit.ly/pepsiKB & take your text game up a notch. - teacher';
preg_match("/\b#worldemojiday\b/i",$string); //false
I want to match exact word containing any character. Like if I want to match word 'download' in this string, It should return false
preg_match("/\bdownload\b/i",$string); //false
But when I search for downloading, It should return true.
Thanks

The problem is with \b word boundary before # non-word character. \b cannot match the position between 2 non-word (or between 2 word) characters, thus, you do not get a match.
A solution is either to remove the first \b, or use \B (a non-word boundary matching between 2 word or 2 non-word characters) instead of it.
\B#worldemojiday\b
Or
#worldemojiday\b
See demo (or this one)
Note that \B also matches at the beginning of a string.
Here is a way to build a regex dynamically, adding word boundaries only where necessary:
$srch = "žvolen";
$srch = preg_quote($srch);
if (preg_match('/\w$/u', $srch)) {
$srch .= '\\b';
}
if (preg_match('/^\w/u', $srch)) {
$srch = '\\b' . $srch;
}
echo preg_match("/" . $srch . "/ui", "žvolen is used.");

What about using lookarounds:
(?<!\w)#WorldEmojiDay(?!\w)
This ensures, that there's no word character before or after the string. See test at regex101

Regex to insert dot (.) after characters, before new line

I'm reformatting some text, and sometimes I have a string, where there is a sentence which is not ended by a dot.
I'm running various checks for this purpose, and one more I'd like is to "Add dot after last character before new line".
I'm not sure how to form the regular expression for this:]
$string = preg_replace("/???/", ".\n", $string);

Try this one:
$string = preg_replace("/(?<![.])(?=[\n\r]|$)/", ".", $string);
negative lookbehind (?<![.]) is checking previous character is not .
positive lookahead (?=[\n\r]|$) is checking next character is a newline or end of string.

like this I suppose:
<?php
$string = "Add dot after last character before new line\n";
$string = preg_replace("/(.)$/", "$1.\n", $string);
print $string;
?>
This way the dot will be added after the word line in the sentence and before the \n.
demo : http://ideone.com/J4g7tH

I'd do:
$string = "Add dot after last character before new line\n";
$string = preg_replace("/([^.\r\n])$/s", "$1.", $string);

Thanks for all the answers, but none of them really caught all scenarios right.
I fumbled my way to a good solution using the word boundary regex character class:
// Add dot after every word boundary that is followed by a new line.
$string = preg_replace("/[\b][\n]/", ".\n", $string);
I guess [\b][\n] could just as well be \b\n without square brackets.

This works for me:
$content = preg_replace("/(\w+)(\n)/", "$1.$2", $content);
It will match a word immediately followed by a new line, and add a dot in between.
Will match:
Hello\n
Will not match:
Hello \n
or
Hello.\n

Make regex more specific - only select "VB", not variations ("VB%")

Consider the following string and regex:
$string= "just/RB convinced/VBN closing/VBG 10dma/NN need/VBN see/VB";
echo preg_replace("/(\w+)\/(JJ|RB|VB)/", "not$1/$2", $tweet);
I want to concatenate "not" to every word ending in /JJ, /RB or /VB. However, the regex also captures variations on /VB: /VBN and /VBG. The output is
notjust/RB notconvinced/VBN notclosing/VBG 10dma/NN notneed/VBN notsee/VB
However, the expected output is:
notjust/RB convinced/VBN closing/VBG 10dma/NN need/VBN notsee/VB
How can I stop the regex from grabbing there variations?

Use a \b word boundary:
$string= "just/RB convinced/VBN closing/VBG 10dma/NN need/VBN see/VB";
echo preg_replace("/(\w+)\/(JJ|RB|VB)\b/", "not$1/$2", $tweet);
\b only matches between an alphanumeric character and either a non-alphanumeric character or the start/end of the string.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

word boundaries preg_replace - php

Try /\B%[a-z0-9]+\b/ You don't have a word boundary \b between a space and the %, but you have one between s and %. \B is the opposite of \b not a word boundary. See it here on regex101

Related

no solution for me. how can i replace second occurence of a find in php

Regex to match words starting with hyphen

Match exact word with any character regex

Regex to insert dot (.) after characters, before new line

Make regex more specific - only select "VB", not variations ("VB%")

Categories

Resources