Match a pattern by ignoring different brackets

Match a pattern by ignoring different brackets - php

I have a string and I would like to know the first position of a pattern. But it should be found only, if it's not enclosed by with brackets.
Example String: "This is a (first) test with the first hit"
I want to know the position of the second first => 32. To match it, the (first) must be ignored, because it's enclosed in brackets.
Unfortunately I do not have to ignore round brackets ( ) only, I have to ignore square brackets [ ] and brace brackets { } too.
I tried this:
preg_match(
'/^(.*?)(first)/',
"This is a (first) test with the first hit",
$matches
);
$result = strlen( $matches[2] );
It works fine, but the result is the position of the first match (11).
So I need to change the .*?.
I tried to replace it with .(?:\(.*?\))*? in the hope, all characters inside the brackets will be ignored. But this does not match the brackets.
And I can't use negative look ahead '/(?<!\()first(?!\))/', since I have three different bracket types, which have to match open and closing bracket.

You can match all 3 formats that you don't want using a group with and an alternation and make use of (*SKIP)(*FAIL) to not get those matches. Then match first between word boundaries \b
(?:\(first\)|\[first]|{first})(*SKIP)(*FAIL)|\bfirst\b
Regex demo
Example code
$strings = [
"This is a (first) test with the first hit",
"This is a (first] test with the first hit"
];
foreach ($strings as $str) {
preg_match(
'/(?:\(first\)|\[first]|{first})(*SKIP)(*FAIL)|\bfirst\b/',
$str,
$matches,
PREG_OFFSET_CAPTURE);
print_r($matches);
}
Output
Array
(
[0] => Array
(
[0] => first
[1] => 32
)
)
Array
(
[0] => Array
(
[0] => first
[1] => 11
)
)
Php demo

Related

how i can explode (split) a gived string for text and separator?

how i can parse a string formed from two block:
{ text }
text
I have two regex for both situazion:
\{[^\}]+\}
*.
But not understood with combine it in one regex. I have thinked to use or ( | ) operator doing so:
/(\{[^\}]+\}|.*)/
But it don't work. How i can solve it?
In concrete, if i have a string as:
"{this is first text} this is second text {this is third text}"
using preg_match_all i want to have somthing as:
Array
(
[0] => Array
(
[0] => {this is first text}
[1] => this is second text
[2] => {this is third text}
)
)
But i have as result:
Array
(
[0] => Array
(
[0] => {this is first text}
[1] => this is second text {this is third text}
[2] =>
)
)
Thanks very much for help.

Try this one:
$str = "{this is first text} this is second text {} this is third text";
preg_match_all('/\s*(?:{})*\s*({.+?}|[^{}]+)/', $str, $matches);
print_r($matches[1]);
'/\s*(?:{})*\s*({.+?}|[^{}]+)/' means skipping spaces and empty braces and graping pieces in {} (with something inside including curly braces too) or everything except symbols { and }.

You can use preg_split with a your first pattern enclosed with a capturing group:
$str = "{this is first text} this is second text {this is third text} {} more text here";
print_r( preg_split('~\s*(?:(\{[^{}}]+})|\{})\s*~', $str, -1, PREG_SPLIT_DELIM_CAPTURE| PREG_SPLIT_NO_EMPTY) );
See the PHP demo. Output:
Array
(
[0] => {this is first text}
[1] => this is second text
[2] => {this is third text}
[3] => more text here
)
Details
\s*(?:(\{[^{}}]+})|\{})\s* - matches the following way:
\s* - matches 0+ whitespaces (thus, removes them from the result)
(?:(\{[^{}}]+})|\{}) - a non-capturing group matching either of:
(\{[^{}}]+}) - captures into Group 1 (so, this will be returned as part of the resulting array) a {, one or more chars other than { and } and then }
| - or
\{} - just matches (and thus removes in the end) a {} substring
\s* - matches 0+ whitespaces (thus, removes them from the result)
-1 is used as the $limit argument to return all items resulting from the preg_split
PREG_SPLIT_DELIM_CAPTURE| PREG_SPLIT_NO_EMPTY make preg_split return the captured substrings and omit any empty items.

How to get only last bracket from multiple arrays? PHP

I have a few arrays and I have to choose only what is in the last bracket. How to do it?
For example my some arrays always be similar, but be different:
Array
(
[0] => 3 BUILTIN\Users:(OI)(CI)(F)
)
Array
(
[0] => BUILTIN\Users:(OI)(CI)(R)
)
Array
(
[0] => 22 BUILTIN\Users:(OI)(CI)(R,W)
)
And i want get result from that like:
(F)
(R)
(R,W)
I must use substr or what?
Regards

You can do this simply with preg_filter
$arr = array(
'3 BUILTIN\Users:(OI)(CI)(F)',
'BUILTIN\Users:(OI)(CI)(R)',
'22 BUILTIN\Users:(OI)(CI)(R,W)'
);
print_r(preg_filter('/^.+(\([^)]+\))$/', '\1', $arr));
Output
Array
(
[0] => (F)
[1] => (R)
[2] => (R,W)
)
Sandbox
The Regex
^ - match start of string
.+ - match anything one or more "greedy"
(...) - First Capture group
\( the ( literally
[^)]+ match anything "not" )
\) the ) literally
$ - match end of string.
So what this does is replace everything in each array item that is not in the capture group with \1 - the first capture group. Which should match everything from the start of the last ( to the end of that "set" ). Basically what we want is only that "stuff" the last parentheses set, which is good, because that's what the above code does (oddly enough, it's like someone set it just the way we need it ... lol).
This should also remove anything from the array that does not match that pattern. For example:
$arr = array(
'3 BUILTIN\Users:(OI)(CI)(F)',
'BUILTIN\Users:(OI)(CI)(R)',
'22 BUILTIN\Users:(OI)(CI)(R,W)',
'foo' //--- foo will not appear in the results, because it does not end with (...)
);
Hope it helps!
preg_filter() is identical to preg_replace() except it only returns the (possibly transformed) subjects where there was a match. For details about how this function works, read the preg_replace() documentation.
https://www.php.net/manual/en/function.preg-filter.php
*PS I gave the above example as it highlights the difference between preg_replace() and preg_filter() (mentioned above). You could do the same with just preg_replace() if you are sure there will always be a match in each item.

Here you can go
$arr = array(
'3 BUILTIN\Users:(OI)(CI)(F)',
'BUILTIN\Users:(OI)(CI)(R)',
'22 BUILTIN\Users:(OI)(CI)(R,W)'
);
$newArr = array();
foreach($arr as $k => $v){
$lastElement = array_filter(explode('(',explode(':',$v)[1]));
$newArr[] = '('.$lastElement[count($lastElement)];
}
print_r($newArr);
Result :-
Array
(
[0] => (F)
[1] => (R)
[2] => (R,W)
)

Exclude character from being returned in array

I have the following regex function:
function getMatches($string_content) {
$matches = array();
preg_match_all('/#([A-Za-z0-9_]+)/', $string_content, $matches);
return $matches;
}
Right now, it returns an array like this:
Array (
[0] => Array (
[0] => #test
[1] => #test2
)
[1] => Array (
[0] => test
[1] => test2
)
)
How can I make it only return the matches without the # symbol?

Return $matches[1] instead of $matches.
That will give you the first capture group instead of all matches.

With this small tweak (you can inspect the matches in the regex demo):
preg_match_all('~#\K\w+~', $string_content, $matches);
Explanation
In your original regex, the parentheses around ([A-Za-z0-9_]+) create a capture group. This is why the array contains a second element with index #1: this element contains the Group 1 captures.
\w is equivalent to [A-Za-z0-9_]
The \K tells the engine to drop what was matched so far from the final match it returns. It is more efficient than using a lookbehind (?<=#)
The ~ is just a small esthetic tweak—you can use any delimiter you like around your regex patttern.

Just use \K in your regex to avoid # in the final result and you don't need to capture anything,
preg_match_all('~#\K[A-Za-z0-9_]+~', $string_content, $matches);
OR
Use a lookbehind,
preg_match_all('~(?<=#)[A-Za-z0-9_]+~', $string_content, $matches);
DEMO
Explanation:
(?<=#) REgex engine sets the matching marker just after to the # symbol.
[A-Za-z0-9_]+ Matches one or more word characters.

You don't need any change in your regular expression, simply refer to capturing group #1, which would be $matches[1] to print the match result from your capturing group, excluding # from your array matches.
Your code would look like this:
function getMatches($string_content) {
preg_match_all('/#([A-Za-z0-9_]+)/', $string_content, $matches);
return $matches[1];
}
print_r(getMatches('foo bar #test baz #test2 quz'));
Output
Array
(
[0] => test
[1] => test2
)

PHP preg_match_all $matches output contains 3 rows

Here is my test code:
$test = '#12345 abc #12 #abd engng#geneng';
preg_match_all('/(^|\s)#([^# ]+)/', $test, $matches);
print_r($matches);
And the output $matches:
Array ( [0] => Array ( [0] => #12345 [1] => #12 [2] => #abd ) [1] => Array ( [0] => [1] => [2] => ) [2] => Array ( [0] => 12345 [1] => 12 [2] => abd ) )
My question is why does it have an empty row?
[1] => Array ( [0] => [1] => [2] => )
If I get ride of (^|\s) in the regex, the second row will disappear. However I would not able to prevent matching #geneng.
Any answer will be appreciated.

The problem with your regular expression is that it matches # even when it is preceded by whitespace. Because \s will match the whitespace, it will be captured into $matches array. You can solve this problem by using lookarounds. In this case, it can be solved with a positive lookbehind:
preg_match_all('/(?<=^|\s)#([^# ]+)/', $test, $matches);
This will match the part after # only if it is preceded by a space or beginning-of-the line anchor. It's important to note that lookarounds do not actually consume characters. They just assert that the given regular expression is either followed or preceded by something.
Demo

It's because of the memory capture to test (^|\s):
preg_match_all('/(^|\s)#([^# ]+)/', $test, $matches);
^^^^^^
It's captured as memory location #1, so to avoid that you can simply use non-capturing parentheses:
preg_match_all('/(?:^|\s)#([^# ]+)/', $test, $matches);
^^

preg_match_all uses by default the PREG_PATTERN_ORDER flag. This means that you will obtain:
$matches[0] -> all substrings that matches the whole pattern
$matches[1] -> all capture groups 1
$matches[2] -> all capture groups 2
etc.
You can change this behavior using the PREG_SET_ORDER flag:
$matches[0] -> array with the whole pattern and the capture groups for the first result
$matches[1] -> same for the second result
$matches[2] -> etc.
In your code you (PREG_PATTERN_ORDER by default) you obtain $matches[1] with only empty or blank items because it is the content of capture group 1 (^|\s)

There is 2 set of parentheses that's why you get an empty row. PHP thinks, you want 2 set of matching in the string. Removing one of them will remove one array.
FYI: In this case, you can not use [^|\s] instead of (^|\s). Cause PHP will think, you want to exclude the white space.

Regular expersion repeat inside a pattern

I have the following text and I would like to preg_match_all what is within the {'s and }'s if it contains only a-zA-Z0-9 and :
some text,{SOMETHING21} {SOMETHI32NG:MORE}some msdf{TEXT:GET:2}sdfssdf sdf sdf
I am trying to match {SOMETHING21} {SOMETHI32NG:MORE} {TEXT:GET:2} there can be several :'s within the tag.
What I currently have is:
preg_match_all('/\{([a-zA-Z0-9\-]+)(\:([a-zA-Z0-9\-]+))*\}/', $from, $matches, PREG_SET_ORDER);
It works as expected for {SOMETHING21} and {SOMETHI32NG:MORE} but for {TEXT:GET:2} it only matches TEXT and 2
So it only matches the first and last word within the tag, and leaves the middle ones out of the $matches array. Is this even possible or should I just match them and then explode on : ?
-- edit --
Well the question isn't if I can get the tags, the question is if I can get them grouped without having to explode the results again. Even though my current regex finds all the results the subpattern does not come back with all the matches in $matches.
I hope the following will clear it up abit more:
\{ // the match has to start with {
([a-zA-Z0-9\-]+) // after the { the match needs to have alphanum consisting out of 1 or more characters
(
\: // if we have : it should be followed by alphanum consisting out of 1 or more characters
([a-zA-Z0-9\-]+) // <---- !! this is what it is about !! even though this subexpression is between brackets it is not put into $matches if more then one of these is found
)* // there could be none or more of the previous subexpression
\} // the match has to end with }

You can't get all the matched values of a capturing group, you only get the last one.
So you have to match the pattern:
preg_match_all('/{([a-z\d-]+(?::[a-z\d-]+)*)}/i', $from, $matches);
and then split each element in $matches[1] on :.

I used non-capture groupings to eliminate the inner groups, and just capture the outer complete colon-separated list.
$from = "some text,{SOMETHING21} {SOMETHI32NG:MORE}some msdf{TEXT:GET:2}sdfssdf sdf sdf";
preg_match_all('/\{((?:[a-zA-Z0-9\-]+)(?:\:(?:[a-zA-Z0-9\-]+))*)\}/', $from, $matches, PREG_SET_ORDER);
print_r($matches);
Result:
Array
(
[0] => Array
(
[0] => {SOMETHING21}
[1] => SOMETHING21
)
[1] => Array
(
[0] => {SOMETHI32NG:MORE}
[1] => SOMETHI32NG:MORE
)
[2] => Array
(
[0] => {TEXT:GET:2}
[1] => TEXT:GET:2
)
)

Maybe I didn't understand the requirement, but...
preg_match_all('/{[A-Za-z0-9:-]+}/', $from, $matches, PREG_PATTERN_ORDER);
results in:
Array
(
[0] => Array
(
[0] => {SOMETHING21}
[1] => {SOMETHI32NG:MORE}
[2] => {TEXT:GET:2}
)
)

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Match a pattern by ignoring different brackets - php

Related

how i can explode (split) a gived string for text and separator?

How to get only last bracket from multiple arrays? PHP

Exclude character from being returned in array

PHP preg_match_all $matches output contains 3 rows

Regular expersion repeat inside a pattern

Categories

Resources