PHP Regex Word Boundary exclude underscore _ - php

I'm using regex word boundary \b, and I'm trying to match foo in the following $sentence but the result is not what I need, the underscore is killing me, I want underscore to be word boundary just like hyphen or space:
$sentence = "foo_foo_foo foo-foo_foo";
X X X YES X X
Expected:
$sentence = "foo_foo_foo foo-foo_foo";
YES YES YES YES YES YES
My code:
preg_match("/\bfoo\b/i", $sentence);

You would have to create DIY boundaries.
(?:\b|_\K)foo(?=\b|_)

Does this do what you want?:
preg_match_all("/foo/i", $sentence, $matches);
var_dump($matches);

You can subtract _ from the \w and use unambiguous word boundaries:
/(?<![^\W_])foo(?![^\W_])/i
See this regex demo. Note \bfoo = (?<!\w)foo and foo(?!\w) = foo\b, and subtracting a _ from \w (that is equal to [^\W]) results in [^\W_].
In PHP, you can use preg_match_all to find all occurrences:
preg_match_all("/(?<![^\W_])foo(?![^\W_])/i", $sentence)
To replace / remove all occurrences, you may use preg_replace:
preg_replace("/(?<![^\W_])foo(?![^\W_])/i", "YES", $sentence)
See the PHP demo online:
$sentence = "foo_foo_foo foo-foo_foo";
if (preg_match_all("/(?<![^\W_])foo(?![^\W_])/i", $sentence, $matches)) {
print_r($matches[0]);
}
// => Array( [0] => foo [1] => foo [2] => foo [3] => foo [4] => foo [5] => foo)
echo PHP_EOL . preg_replace("/(?<![^\W_])foo(?![^\W_])/i", "YES", $sentence);
// => YES_YES_YES YES-YES_YES

Related

PHP regexp how get all matches in preg_match

I have string
$s = 'Sections: B3; C2; D4';
and regexp
preg_match('/Sections(?:[:;][\s]([BCDE][\d]+))+/ui', $s, $m);
Result is
Array
(
[0] => Sections: B3; C2; D4
[1] => D4
)
How I can get array with all sections B3, C2, D4
I can't use preg_match_all('/[BCDE][\d]+)/ui', because searching strongly after Sections: word.
The number of elements (B3, ะก2...) can be any.
You may use
'~(?:\G(?!^);|Sections:)\s*\K[BCDE]\d+~i'
See the regex demo
Details
(?:\G(?!^);|Sections:) - either the end of the previous match and a ; (\G(?!^);) or (|) a Sections: substring
\s* - 0 or more whitespace chars
\K - a match reset operator
[BCDE] - a char from the character set (due to i modifier, case insensitive)
\d+ - 1 or more digits.
See the PHP demo:
$s = "Sections: B3; C2; D4";
if (preg_match_all('~(?:\G(?!^);|Sections:)\s*\K[BCDE]\d+~i', $s, $m)) {
print_r($m[0]);
}
Output:
Array
(
[0] => B3
[1] => C2
[2] => D4
)
You don't need regex an explode will do fine.
Remove "Section: " then explode the rest of the string.
$s = 'Sections: B3; C2; D4';
$s = str_replace('Sections: ', '', $s);
$arr = explode("; ", $s);
Var_dump($arr);
https://3v4l.org/PcrNK

How to preg_split without losing a character?

I have a string like this
$string = "Hello; how are you;Hey, I am fine";
$new = preg_split("/;\w/", $string);
print_r($new);
I am trying to split the string only when there is no white-space between the words and ";". But when I do this, I lose the H from Hey. It's probably because the split happens through the recognition of ;H. Could someone tell me how to prevent this?
My output:
$array = [
0 => [
0 => 'Hello; how are you ',
1 => 0,
],
1 => [
0 => 'ey, I am fine',
1 => 21,
],
]
You might use a word boundary \b:
\b;\b
$string = "Hello; how are you;Hey, I am fine";
$new = preg_split("/\b;\b/", $string);
print_r($new);
Demo
Or a negative lookahead and negative lookbehind
(?<! );(?! )
Demo
Lookarounds cost more steps. In terms of pattern efficiency, a word boundary is better and maintains the intended "no-length" character consumption.
In well-formed English, you won't ever have to check for a space before a semi-colon, so only 1 word boundary seems sufficient (I don't know if malformed English is possible because it is not represented in your sample string).
If you want to acquire the offset value, preg_split() has a flag for that.
Code: (Demo)
$string = "Hello; how are you;Hey, I am fine";
$new = preg_split("/;\b/", $string, -1, PREG_SPLIT_OFFSET_CAPTURE);
var_export($new);
Output:
array (
0 =>
array (
0 => 'Hello; how are you',
1 => 0,
),
1 =>
array (
0 => 'Hey, I am fine',
1 => 19,
),
)
Use split with this regex ;(?=\w) then you will not lose the H
You are capturingthe \w in your regex.You dont want that. Therefore, do this:
$new = preg_split("/;(?=\w)/", $string);
A capture group is defined in brackets, but the ?= means match but don't capture.
Check it out here https://3v4l.org/Q77LZ

Not able to match regex

I have a string like "5-2,5-12,15-27,5-22,50-3,5-100"
I need a regular expression which matches all the occurrences like below: -
5-2
5-12
5-22
5-100
What will be the correct regex that matches all of them.
Use below regex:
(?<!\d)5-\d{1,}
DEMO
Not sure to well understand your needs, but, how about:
$str = "5-2,5-12,15-27,5-22,50-3,5-100";
preg_match_all('/\b5-\d+/', $str, $matches);
print_r($matches)
or
preg_match_all('/\b\d-\d+/', $str, $matches);
Output:
Array
(
[0] => Array
(
[0] => 5-2
[1] => 5-12
[2] => 5-22
[3] => 5-100
)
)
How about:
Online Demo
/(?<!\d)\d\-\d{1,3}/g
If understand correctly the first part of the pattern is one single digit \d therefore we need to exclude other number with a lookbehind (?<!\d) followed by a - and last seems to be a number up to 3 digits if you need more you can remove the 3 and it will also work so it is either \d{1,3} or \d{1,}

Split sentence into words (with special word list) [duplicate]

This question already has answers here:
Split sentence into words
(3 answers)
Closed 9 years ago.
I have sentence:
$text = "word word, dr. word: a.sh. word a.k word?!..";
special words are: "dr." , "a.sh" and "a.k"
this :
$text = "word word, dr. word: a.sh. word a.k word?!..";
$split = preg_split("/[^\w]([\s]+[^\w]|$)/", $text, -1, PREG_SPLIT_NO_EMPTY);
print_r($split);
regular expression gives me this:
Array (
[0] => word
[1] => word
[2] => dr
[3] => word
[4] => a.sh
[5] => word
[6] => a.k
[7] => word )
and i need
Array (
[0] => word
[1] => word
[2] => dr. #<----- point must be here becouse "dr." is special word
[3] => word
[4] => a.sh. #<----- point must be here becouse "a.sh" is special word
[5] => word
[6] => a.k
[7] => word)
I think you're going about this backwards. Instead of trying to define a regular expression that is not a word - define what is a word, and capture all character sequences that match that.
$special_words = array("dr.", "a.sh.", "a.k");
array_walk($special_words, function(&$item, $key){ $item= preg_quote($item, '~');});
$regex = '~(?<!\w)(' . implode('|', $special_words) . '|\w+)(?!\w)~';
$str = 'word word, dr. word: a.sh. word a.k word?!..';
preg_match_all($regex, $str, $matches);
var_dump($matches[0]);
The keys here are an array of special words, the array_walk, and the regular expression.
array_walk
This line, right after your array definition, walks through each of your special words and escapes all of the REGEX special characters (like . and ?), including the delimiter we're going to use later. That way, you can define whatever words you like and you don't have to worry about how it will affect the regular expression.
Regular Expression.
The Regex is actually pretty simple. Implode the special words using a | as glue, then add another pipe and your standard word definition (I chose w+ because it makes the most sense to me.) Surround that giant alternation with parentheses to group it, and I added a lookbehind and a lookahead to ensure we weren't stealing from the middle of a word. Because regex works left to right, the a in a.sh. won't be split off into its own word, because the a.sh. special word will capture it. Unless it says a.sh.e, in which case, each part of the three part expression will match as three separate words.
Check it out.

PCRE regex for movie data

i have a string like this
<14> south.park.s14e01.locdog.avi [190713856]
i need a php regexp to get an array like this
array(14, 'south.park.s14e01.locdog.avi', 190713856)
please help
preg_match('/^<(\d+)> \s+ (\S+) \s+ \[(\d+)\]$/x', $input, $your_array);
Where your desired results are in $your_array starting at index 1.
$test = '<14> south.park.s14e01.locdog.avi [190713856]';
preg_match('/<(\d{2})>\s(.+)\s\[(\d{9})\]/',$test,$m);
print_r($m);//[1] => 14 [2] => south.park.s14e01.locdog.avi [3] => 190713856

Categories