RegEX not in brackets - php

I need to split text by pipe that is not in brackets. Here is the sample text
I {need|want|{ask|prefer}} you to {help {Jason|Maria|Santa|{Lucia|Raul}'s father}|go to school}
I have found this /\|(?![^{]*})/g
here: regex, extract string NOT between two brackets
now when i want to split this part of string by pipe
help {Jason|Maria|Santa|{Lucia|Raul}'s father}|go to school
it also selects pipes between Jason, Maria, Santa because there is an opening bracket after them. How to change regex to match only pipe if it's not in any of the brackets.
test strings:
help {Jason|Maria|Santa|{Lucia|Raul}'s father}|go to school
should return
help {Jason|Maria|Santa|{Lucia|Raul}'s father}
go to school
.
Jason|Maria|Santa|{Lucia|Raul}'s father
should return
Jason
Maria
Santa
{Lucia|Raul}'s father

You may use a SKIP-FAIL regex:
'~(\{(?:[^{}]++|(?1))*})(*SKIP)(*F)|\|~'
See the regex demo
Details
(\{(?:[^{}]++|(?1))*})(*SKIP)(*F) - match a substring that is between balanced curly braces and skip this match
(\{(?:[^{}]++|(?1))*}) - Capturing group 1 matching {, then 0+ repetitions of 1+ chars other than { and } or the whole Group 1 pattern is recursed ((?1) is a regex subroutine), and then } (balanced curly braces substring)
(*SKIP)(*F) - the PCRE verbs that make the regex engine fail the match and skip the matched text to proceed matching from the match end
| - or
\| - match a literal pipe to split with.
PHP demo:
$re = '~(\{(?:[^{}]++|(?1))*})(*SKIP)(*F)|\|~';
$str = "Jason|Maria|Santa|{Lucia|Raul}'s father";
print_r( preg_split($re, $str) );
Output:
Array
(
[0] => Jason
[1] => Maria
[2] => Santa
[3] => {Lucia|Raul}'s father
)

Related

Regex for find value between curly braces which have pipe separator

$str = ({max_w} * {max_h} * {key|value}) / {key_1|value}
I have the above formula, I want to match the value with curly braces and which has a pipe separator. Right now the issue is it's giving me the values which have not pipe separator. I am new in regex so not have much idea about that. I tried below one
preg_match_all("^\{(|.*?|)\}^",$str, PREG_PATTERN_ORDER);
It gives below output
Array
(
[0] => key|value
[1] => max_w
[2] => max_h
[3] => key_1|value
)
Expected output
Array
(
[0] => key|value
[1] => key_1|value
)
Not sure about PHP. Here's the general regex that will do this.
{([^{}]*\|[^{}]*)}
Here is the demo.
You can use
(?<={)[^}]*\|[^}]*(?=})
For the given string the two matches are shown by the pointy characters:
({max_w} * {max_h} * {key|value}) / {key_1|value}
^^^^^^^^^ ^^^^^^^^^^^
Demo
(?<={) is a positive lookbehind. Arguably, the positive lookahead (?=}) is not be needed if it is known that all braces appear in matching, non-overlapping pairs.
The pattern \{(|.*?|)\} has 2 alternations | that can be omitted as the alternatives on the left and right of it are not really useful.
That leaves \{(.*?)} where the . can match any char including a pipe char, and therefore does not make sure that it is matched in between.
You can use a pattern that does not crosses matching a curly or a pipe char to match a single pipe in between.
{\K[^{}|]*\|[^{}|]*(?=})
{ Match opening {
\K Forget what is matches until now
[^{}|]* Match any char except the listed
\| Match a | char
[^{}|]* Match any char except the listed
(?=}) Assert a closing } to the right
Regex demo | PHP demo
$str = "({max_w} * {max_h} * {key|value}) / {key_1|value}";
$pattern = "/{\K[^{}|]*\|[^{}|]*(?=})/";
preg_match_all($pattern, $str, $matches);
print_r($matches[0]);
Output
Array
(
[0] => key|value
[1] => key_1|value
)
Or using a capture group:
{([^{}|]*\|[^{}|]*)}
Regex demo

Remove text outside of [] {} () bracket

I have a string, from which I want to keep text inside a pair of brackets and remove everything outside of the brackets:
Hello [123] {45} world (67)
Hello There (8) [9] {0}
Desired output:
[123] {45} (67) (8) [9] {0}
Code tried but fails:
$re = '/[^()]*+(\((?:[^()]++|(?1))*\))[^()]*+/';
$text = preg_replace($re, '$1', $text);
If the values in the string are always an opening bracket paired up with a closing bracket and no nested parts, you can match all the bracket pairs which you want to keep, and match all other character except the brackets that you want to remove.
(?:\[[^][]*]|\([^()]*\)|{[^{}]*})(*SKIP)(*F)|[^][(){}]+
Explanation
(?: Non capture gorup
\[[^][]*] Match from [...]
| Or
\([^()]*\) Match from (...)
| Or
{[^{}]*} Match from {...}
) Close non capture group
(*SKIP)(*F)| consume characters that you want to avoid, and that must not be a part of the match result
[^][(){}]+ Match 1+ times any char other than 1 of the listed
Regex demo | Php demo
Example code
$re = '/(?:\[[^][]*]|\([^()]*\)|{[^{}]*})(*SKIP)(*F)|[^][(){}]+/m';
$str = 'Hello [123] {45} world (67)
Hello There (8) [9] {0}';
$result = preg_replace($re, '', $str);
echo $result;
Output
[123]{45}(67)(8)[9]{0}
If you want to remove all other values:
(?:\[[^][]*]|\([^()]*\)|{[^{}]*})(*SKIP)(*F)|.
Regex demo
Looks like you wanted to target nested stuff as well. There are already questions about how to match balanced parenthesis. Adjust one of those patterns to fit your needs, e.g. something like
$pattern = '/\((?:[^)(]*(?R)?)*+\)|\{(?:[^}{]*+(?R)?)*\}|\[(?:[^][]*+(?R)?)*\]/';
You can try this on Regex101. Extract those with preg_match_all and implode the matches.
if(preg_match_all($pattern, $str, $out) > 0)
echo implode(' ', $out[0]);
If you need to match the stuff outside, even with this pattern you can use (*SKIP)(*F) that also used #Thefourthbird in his elaborately answer! For skipping the bracketed see this other demo.
If the brackets are not nested, the following should suffice:
[^[{(\]})]+(?=[[{(]|$)
Demo.
Breakdown:
[^[{(\]})]+ # Match one or more characters except for opening/closing bracket chars.
(?=[[{(]|$) # A positive Lookahead to ensure that the match is either followed by
# an opening bracket char or is at the end of the string.

regex expected value in a postion depends on a random value in another position

I need regex to find all shortcode tag pairs that look like this [sc1-g-data]b[/sc1-g-data] but the number next to the sc can vary but they must match.
So something like this won't work \[sc(.*?)\-((.|\n)*?)\[\/sc(.*?)\- as this matches unmatching tag pairs like this which i don't want [sc1-g-data]b[/sc2-g-data]
so the expected number in the second tag depends on a random number in the first tag
You may use a regex like:
\[(sc\d*-[^\]\[]*)\]([\s\S]*?)\[\/\1\]
See the regex demo
\[ - a [ char
(sc\d*-[^\]\[]*) - Capturing group 1: sc, 0+ digits, -, and then 0+ chars other than ] and [
\] - a ] char
([\s\S]*?) - Capturing group 2: any 0+ chars, as few as possible
\[\/ - a [/ string
\1 - the same text stored in Group 1
\] - a ] char
See the regex graph:
PHP demo:
$pattern = '~\[(sc\d*-[^][]*)](.*?)\[/\1]~s';
$string = '[sc1-g-data]a[/sc1-g-data] ';
if (preg_match($pattern, $string, $matches)) {
print_r($matches);
}
Mind the use of a single quoted string literal, if you use a double quoted one you will need to use \\1, not \1 as '\1' != "\1" in PHP.
Output:
Array
(
[0] => [sc1-g-data]a[/sc1-g-data]
[1] => sc1-g-data
[2] => a
)
If your tags are just anything between brackets [blah][/blah] you can use:
\[(.*?)\].*?\[\/\1\]

get all text between bracket but skip nested bracket

Im trying to figure out how to get the text between two bracket tags but dont stop at the first closing )
__('This is a (TEST) all of this i want') i dont want any of this;
my current pattern is __\((.*?)\)
which gives me
__('This is a (TEST)
but i want
__('This is a (TEST) all of this i want')
Thanks
You may use a regex subroutine to match text inside nested parentheses after __:
if (preg_match_all('~__(\(((?:[^()]++|(?1))*)\))~', $s, $matches)) {
print_r($matches[2]);
}
See the regex demo.
Details
__ - a __ substring
(\(((?:[^()]++|(?1))*)\)) - Group 1 (it will be recursed using the (?1) subroutine):
\( - a ( char
((?:[^()]++|(?1))*) - Group 2 capturing 0 or more repetitions of any 1+ chars other than ( and ) or the whole Group 1 pattern is recursed
\) - a ) char.
See the PHP demo:
$s = "__('This is a (TEST) all of this i want') i dont want any of this; __(extract this)";
if (preg_match_all('~__(\(((?:[^()]++|(?1))*)\))~', $s, $matches)) {
print_r($matches[2]);
}
// => Array ( [0] => 'This is a (TEST) all of this i want' [1] => extract this )
You forgot to escape two parenthesis in your regex : __\((.*)\);
Check on regex101.com.
Use the pattern __\((.*)?\).
The \ escapes the parentheses to catch literal parentheses. This then captures all the text inside that set of parentheses.

Match regex pattern that isn't within a bbcode tag

I am attempting to create a regex patten that will match words in a string that begin with #
Regex that solves this initial problem is '~(#\w+)~'
A second requirement of the code is that it must also ignore any matches that occur within [quote] and [/quote] tags
A couple of attempts that have failed are:
(?:[0-9]+|~(#\w+)~)(?![0-9a-z]*\[\/[a-z]+\])
/[quote[\s\]][\s\S]*?\/quote](*SKIP)(*F)|~(#\w+)~/i
Example: the following string should have an array output as displayed:
$results = [];
$string = "#friends #john [quote]#and #jane[/quote] #doe";
//run regex match
preg_match_all('regex', $string, $results);
//dump results
var_dump($results[1]);
//results: array consisting of:
[1]=>"#friends"
[2]=>"#john"
[3]=>"#doe
You may use the following regex (based on another related question):
'~(\[quote](?:(?1)|.)*?\[/quote])(*SKIP)(*F)|#\w+~s'
See the regex demo. The regex accounts for nested [quote] tags.
Details
(\[quote](?:(?1)|.)*?\[/quote])(*SKIP)(*F) - matches the pattern inside capturing parentheses and then (*SKIP)(*F) make the regex engine omit the matched text:
\[quote] - a literal [quote] string
(?:(?1)|.)*? - any 0+ (but as few as possible) occurrences of the whole Group 1 pattern ((?1)) or any char (.)
\[/quote] - a literal [/quote] string
| - or
#\w+ - a # followed with 1+ word chars.
PHP demo:
$results = [];
$string = "#friends #john [quote]#and #jane[/quote] #doe";
$rx = '~(\[quote\](?:(?1)|.)*?\[/quote])(*SKIP)(*F)|#\w+~s';
preg_match_all($rx, $string, $results);
print_r($results[0]);
// => Array ( [0] => #friends [1] => #john [2] => #doe )

Categories