RegExp in PHP. Get text between first level parentheses

RegExp in PHP. Get text between first level parentheses - php

I have two type of strings in one text:
a(bc)de(fg)h
a(bcd(ef)g)h
I need to get text between first level parentheses. In my example this is:
bc
fg
bcd(ef)g
I tried to use next regular expression /\((.+)\)/ with Ungreedy (U) flag:
bc
fg
bcd(ef
And without it:
bc)de(fg
bcd(ef)g
Both variants don't do what I need. Maybe someone know how solve my issue?

Use PCRE Recursive pattern to match substrings in nested parentheses:
$str = "a(bc)de(fg)h some text a(bcd(ef)g)h ";
preg_match_all("/\((((?>[^()]+)|(?R))*)\)/", $str, $m);
print_r($m[1]);
The output:
Array
(
[0] => bc
[1] => fg
[2] => bcd(ef)g
)
\( ( (?>[^()]+) | (?R) )* \)
First it matches an opening parenthesis. Then it matches any number of
substrings which can either be a sequence of non-parentheses, or a
recursive match of the pattern itself (i.e. a correctly parenthesized
substring). Finally, there is a closing parenthesis.
Technical cautions:
If there are more than 15 capturing parentheses in a pattern, PCRE has
to obtain extra memory to store data during a recursion, which it does
by using pcre_malloc, freeing it via pcre_free afterwards. If no
memory can be obtained, it saves data for the first 15 capturing
parentheses only, as there is no way to give an out-of-memory error
from within a recursion.

This question pretty much has the answer, but the implementations are a little ambiguous. You can use the logic in the accepted answer without the ~s to get this regex:
\(((?:\[^\(\)\]++|(?R))*)\)
Tested with this output:

Please can you try that:
preg_match("/\((.+)\)/", $input_line, $output_array);
Test this code in http://www.phpliveregex.com/
Regex: \((.+)\)
Input: a(bcd(eaerga(er)gaergf)g)h
Output: array(2
0 => (bcd(eaerga(er)gaergf)g)
1 => bcd(eaerga(er)gaergf)g
)

Related

How do I locate and replace text with a common element using regex?

I'm pretty lousy at regex, and need help with the following scenario. I need to locate and replace text that has a common structure, but one aspect will be different:
here is a string (with 3 values)
here is another string (with 5 values)
In the above examples, I need to locate and then replace the value in parenthesis. I can't search by parens alone, as the string may contain other parens. But the value in the parens that needs to be replaced is consistently constructed: (with # values) -- the only difference will be the number.
So ideally the regex returns (with 3 values) and (with 5 values) so I can use a simple str_replace to change the text.
This is regex in a PHP script.

Try with this regex :
\(with\s+\d+\s+values\)
Demo here

The following regex should work for you:
/\(with (\d+) values\)/g
This matches strings of the specified format and gives the value in a capture group so it may be used in the replace. The g flag at the end is only needed if you have multiple of these in one string.
Demo here
If, however, there can only be one digit, then the following will work:
/\(with (\d) values\)/g
Or, if the number can only be a digit greater than 1, for example, then the following:
/\(with ([2-9]) values\)/g

If I got you right, you are looking for exactly three or five items within parentheses (comma separated).
This could be accomplished by
\( # "(" literally
(?:[^,()]+,){2} # not , or ( or ) exactly two times
(?:(?:[^,()]+,){2})? # repeated
[^,()]+ # without the comma in the end
\) # the closing parenthesis
See a demo on regex101.com.
If you're really looking only for two variant of strings, you could very easily do
\(with (?:3|5) values\)
In general
\(with \d+ values\)
as proposed by #SchoolBoy.

Something like this maybe
$str ="here is another string (with 5 values)";
preg_match_all("/\(with (\d+) values\)/", $str, $out );
print_r( $out );
Output:
Array
(
[0] => Array
(
[0] => (with 5 values)
)
[1] => Array
(
[0] => 5
)
)
Here at ideone...
It uses the regex
\(with (\d+) values\)
that matches the literal opening parentheses followed by the string with # values, capturing the actual number #, and finally the closing parentheses.
It returns the complete match (the parenthesized string) in the first dimension and the actual number in the second.

How to alter my regex so that preg_match returns the desired string

I have the string:
<mml:mi>P</mml:mi><mml:mn>2</mml:mn>
and wish to retrieve the 2
My pattern is:
/(?:<mml:)(mn|mi|mo)>(.+)(?:<\/mml:\1>)$/
the return is the 2 as it should be,
but if the string is:
<mml:mi>P</mml:mi><mml:mi>s</mml:mi>
the pattern should then return the s, from inside the second set of tags, but returns the P from inside the first set
P</mml:mi><mml:mi>s
when changing the pattern as in the suggestion below to:
/<mml:(mn|mi|mo)>(.*?)<\/mml:\1>/sU
the return is the same. The line of php is:
preg_match('/<mml:(mn|mi|mo)>(.*?)<\/mml:\1>/sU', '<mml:mi>P</mml:mi><mml:mi>s</mml:mi>', $ret, PREG_OFFSET_CAPTURE);
and $ret contains:
Array
(
[0] => Array
(
[0] => <mml:mi>P</mml:mi><mml:mi>s</mml:mi>
[1] => 0
)
[1] => Array
(
[0] => mi
[1] => 5
)
[2] => Array
(
[0] => P</mml:mi><mml:mi>s
[1] => 8
)
)
and when changed to the edited suggestion, with the ? removed
/<mml:(mn|mi|mo)>(.*)<\/mml:\1>/sU
the return is P, from the first occurrence, rather than the s from the second.

Typing from my phone, so will be brief.
Instead of matching any character (.+), match any character that is not the beginning of the next tag ([^<]+)
This way you don't have to worry about using back references, nor will you grab everything between two identical tags.
(Double check where I put the caret, this is off the top of my head. )
To get the last occurrence, wrap the whole regex in ()+
/(<mml:(mn|mi|mo)>([^<]+)<\/mml:\2>)+/

Here is an optimized pattern, which will not only run faster than Tim's, preg_match() will return less elements in the output array:
~<m{2}l:(m[ino])>\K[^<](?=</m{2}l:\1>$)~
Pattern Demo
Enhancements:
Replace standard pattern delimiter slash / with ~ to avoid escaping for improved brevity.
Use quantifiers for consecutive characters for improved efficiency. {2}
Use character class instead of pipes for improved efficiency and brevity. m[ino]
Use \K to start the fullstring match from middle of pattern, effectively removing the need for an extra capture group for improved efficiency.
Use negated character class to match desired character [^<] *note, if your desired substring is more than one character use: [^<]+
Use positive lookahead to accurately match closing tag followed by end of line anchor $.
PHP Implementation: (Demo)
echo preg_match('~<m{2}l:(m[ino])>\K[^<](?=</m{2}l:\1>$)~','<mml:mi>P</mml:mi><mml:mi>s</mml:mi>',$out)?$out[0]:'fail';
Output:
s

php preg_match_all between ... and

I'm trying to use preg_match_all to match anything between ... and ... and the line does word wrap. I've done number of searches on google and tried different combinations and nothing is working. I have tried this
preg_match_all('/...(.*).../m/', $rawdata, $m);
Below is an example of what the format will look like:
...this is a test...
...this is a test this is a test this is a test this is a test this is a test this is a test this is a test this is a test this is a test...

The s modifier allows for . to include new line characters so try:
preg_match_all('/\.{3}(.*?)\.{3}/s', $rawdata, $m);
The m modifier you were using is so the ^$ acts on a per line basis rather than per string (since you don't have ^$ doesn't make sense).
You can read more about the modifiers here.
Note the . needs to be escaped as well because it is a special character meaning any character. The ? after the .* makes it non-greedy so it will match the first ... that is found. The {3} says three of the previous character.
Regex101 demo: https://regex101.com/r/eO6iD1/1

Please escape the literal dots, since the character is also a regular expressions reservered sign, as you use it inside your code yourself:
preg_match_all('/\.\.\.(.*)\.\.\./m/', $rawdata, $m)
In case what you wanted to state is that there are line breaks within the content to match you would have to add this explicitely to your code:
preg_match_all('/\.\.\.([.\n\r]*)\.\.\./m/', $rawdata, $m)
Check here for reference on what characters the dot includes:
http://www.regular-expressions.info/dot.html

You're almost near to get it,
so you need to update your RE
/\.{3}(.*)\.{3}/m
RE breakdown
/: start/end of string
\.: match .
{3}: match exactly 3(in this case match exactly 3 dots)
(.*): match anything that comes after the first match(...)
m: match strings that are over Multi lines.
and when you're putting all things together, you'll have this
$str = "...this is a test...";
preg_match_all('/\.{3}(.*)\.{3}/m', $str, $m);
print_r($m);
outputs
Array
(
[0] => Array
(
[0] => ...this is a test...
)
[1] => Array
(
[0] => this is a test
)
)
DEMO

PHP preg_split (with the delimiter included)

I was trying to include the delimiter while using preg_split but was unsuccessful.
print_r(preg_split('/((?:fy)[.]+)/', 'fy13 eps fy14 rev', -1, PREG_SPLIT_DELIM_CAPTURE | PREG_SPLIT_NO_EMPTY));
I'm trying to return:
array(
[0] => fy13 eps
[1] => fy14 rev
)
With the flags parameter set to PREG_SPLIT_DELIM_CAPTURE:
If this flag is set, parenthesized expression in the delimiter pattern will be captured and returned as well.
The fy is in parenthesis, so I don't know why this doesn't work.

Your current approach isn't working because "parenthesized expression" here is referring to capturing groups, and the ?: to start your group makes it a non-capturing group. So you can get the fy included by changing your expression to /(fy)/, however I don't think this is what you want because you will get an array that contains fy, 13 eps, fy, and 14 eps (the parenthesized expressions are separate entries in the result).
Instead, try the following:
print_r(preg_split('/(?=fy)/', 'fy13 eps fy14 rev', -1, PREG_SPLIT_NO_EMPTY));
This uses a lookahead to split just before each occurrence of fy in your string.

With the example you gave, I am not sure that you really need to use the preg_split function. For example you can obtain the same with preg_match_all in a more efficient way (from the perspective of performance):
preg_match_all('/fy(?>[^f]++|f++(?!y\d))*/', 'fy13 eps fy14 rev', $results);
print_r($results);
The idea here is to match fy followed by all characters but f one or more times or f not followed by y all zero or more times.
More informations about (?>..) and ++ respectively:
here for atomic groups
and here for possessive quantifiers

Combine multiple match regular expression into one and get the matching ones

I have a list of regular expressions:
suresnes|suresne|surenes|surene
pommier|pommiers
^musique$
^(faq|aide)$
^(file )?loss( )?less$
paris
faq <<< this match twice
My use case is that each pattern which got a match display a link to my user,
so I can have multiple pattern matching.
I test thoses patterns against a simple string of text "live in paris" / "faq" / "pom"...
The simple way to do it is to loop over all the patterns with a preg_match, but I'm will do that a lot on a performance critical page, so this look bad to me.
Here is what I have tried: combining all thoses expressions into one with group names:
preg_match("#(?P<group1>^(faq|aide|todo|paris)$)|(?P<group2>(paris)$)#im", "paris", $groups);
As you can see, each pattern is grouped: (?P<GROUPNAME>PATTERN) and they are all separated by a pipe |.
The result is not what I expect, as only the first group matching is returned. Look like when a match occurs the parsing is stopped.
What I want is the list of all the matching groups. preg_match_all does not help neither.
Thanks!

How about:
preg_match("#(?=(?P<group1>^(faq|aide|todo|paris)$))(?=(?P<group2>(paris)$))#im", "paris", $groups);
print_r($groups);
output:
Array
(
[0] =>
[group1] => paris
[1] => paris
[2] => paris
[group2] => paris
[3] => paris
[4] => paris
)
The (?= ) is called lookahead
Explanation of the regex:
(?= # start lookahead
(?P<group1> # start named group group1
^ # start of string
( # start catpure group #1
faq|aide|todo|paris # match any of faq, aide, todo or paris
) # end capture group #1
$ # end of string
) # end of named group group1
) # end of lookahead
(?= # start lookahead
(?P<group2> # start named group group2
( # start catpure group #2
paris # paris
) # end capture group #2
$ # end of string
) # end of named group group2
) # end of lookahead

Try this approach:
#/ define input string
$str_1 = "{STRING HERE}";
#/ Define regex array
$reg_arr = array(
'suresnes|suresne|surenes|surene',
'pommier|pommiers',
'^musique$',
'^(faq|aide)$',
'^(file )?loss( )?less$',
'paris',
'faq'
);
#/ define a callback function to process Regex array
function cb_reg($reg_t)
{
global $str_1;
if(preg_match("/{$reg_t}/ims", $str_1, $matches)){
return $matches[1]; //replace regex pattern with the result of matching is the key trick here
//or return $matches[0]; if you dont want to get captured parenthesized subpatterns
//or you could return an array of both. its up to you how to do it.
}else{
return '';
}
}
#/ Apply array Regex via much faster function (instead of a loop)
$results = array_map('cb_reg', $reg_arr); //returns regex results
$results = array_diff($results, array('')); //remove empty values returned
Basically, this is the fastest way I could think of.
You can't combine say 100s of Regex into one call, as it would be very complex regex to build and will have several chances to fail matching. This is one of the best way to do it.
In my opinion, combining large number of Regex into 1 regex (if possibly achieved) will be slower to execute with preg_match, as compared to this approach of Callback on Arrays. Just remember, the key here is Callback function on array member values, which is fastest way to handle array for your and similar situation in php.
Also note,
The callback on Array is not equal to looping the Array. Looping is slower and has an n from algorithm analysis. But callback on array elements is internal and is very fast as compared.

You can combine all of your regexes with "|" in between them. Then apply this: http://www.rexegg.com/regex-optimizations.html, which will optimize it, collapse common expressions, etc.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

RegExp in PHP. Get text between first level parentheses - php

This question pretty much has the answer, but the implementations are a little ambiguous. You can use the logic in the accepted answer without the ~s to get this regex: \(((?:\[^\(\)\]++|(?R))*)\) Tested with this output:

Please can you try that: preg_match("/\((.+)\)/", $input_line, $output_array); Test this code in http://www.phpliveregex.com/ Regex: \((.+)\) Input: a(bcd(eaerga(er)gaergf)g)h Output: array(2 0 => (bcd(eaerga(er)gaergf)g) 1 => bcd(eaerga(er)gaergf)g )

Related

How do I locate and replace text with a common element using regex?

How to alter my regex so that preg_match returns the desired string

php preg_match_all between ... and

PHP preg_split (with the delimiter included)

Combine multiple match regular expression into one and get the matching ones

Categories

Resources