PHP regular expression if escaped parenthesis exist - php

In PHP I am trying to complete a simple task of pulling some information from a string using preg_match_all
I have a string like this for example 0(a)1(b)2(c)3(d)4(e)5(f)
and I am trying to return all the contents inside of each () BUT having respect for the fact that escaped parenthesis might exist inside of these.
I have tried multiple combinations but I just can't get any regular expression to allow for something like this 4(here are some escaped parens\(\) more text) to return this here are some escaped parens\(\) more text rather than this here are some escaped parens\(\)
I have a regular expression that works, but not with escaped parenthesis
[0-9]*\(([^ESCAPED PARENTHESIS])*?\)
Can someone give me an idea on how to accomplish this?

You can use a negative look behind to make your regex engine just match the close parenthesis which doesn't precede with backslash:
\((.+?)(?<!\\)\)
See Demo https://regex101.com/r/oU9sF2/1
Debuggex Demo

You can use this regex to match your text:
preg_match_all('/(?<!\\)\((.*?)(?<!\\)\)/', $str, $matches);
print_r($matches[1]);
RegEx Demo

You can use this pattern:
$pattern = <<<'EOD'
~[0-9]+\([^)\\]*+(?s:\\.[^)\\]*)*+\)~
EOD;
demo
The idea is to match all characters until the closing parenthesis and the backslash. When a backslash is reached, the next character is matched too, and "und so weiter", etc., until the end of the world (or a closing parenthesis), all characters that are not a closing parenthesis or a backslash are matched.
Note: possessive quantifiers *+ are only here to limit the backtracking when there is no closing parenthesis.

Here is a working regex:
[0-9]*\(([^()\\]*(?:\\.[^()\\]*?)*)\)
See regex demo
See IDEONE demo:
$re = '~[0-9]*\(([^()\\\\]*(?:\\\\.[^()\\\\]*?)*)\)~s';
$str = "0(a)1(b)2(c)3(d)4(here are some escaped parens\(\) more text)5(f)";
preg_match_all($re, $str, $matches);
print_r($matches[1]);
Regex breakdown:
[0-9]* - matches 0 or more digits
\( - matches a literal (
([^()\\]*(?:\\[()][^()]*?)*) - matches and captures
[^()\\]* - 0 or more symbols other than \, ( and )
(?:\\.[^()]*?)* - matches 0 or more sequences of...
\\. - escaped character followed by
[^()\\]*? - as few as possible characters other than \, ( and )
\) - matches a literal )

Related

preg_match_all for backslash [\] & [u002F]

I have a URL:
https:\u002F\u002Fsite.vid.com\u002F93836af7-f465-4d2c-9feb-9d8128827d85\u002F6njx6dp3gi.m3u8?token=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJjb3VudHJ5IjoiSU4iLCJkZXZpY2VfaWQiOiI1NjYxZTY3Zi0yYWE3LTQ1MjUtOGYwYy01ODkwNGQyMjc3ZmYiLCJleHAiOjE2MTA3MjgzNjEsInBsYXRmb3JtIjoiV0VCIiwidXNlcl9pZCI6MH0.c3Xhi58DnxBhy-_I5yC2XMGSWU3UUkz5YgeVL1buHYc","
And I want to match it using preg_match_all. My regex expression is:
preg_match_all('/(https:\/\/site\.vid\.com\/.*\",")/', $input_lines, $output_array);
But I am not able to match special character \ & u002F in above code. I tried using (escaping fuction). But it is not matching. I know it maybe a lame question, but if anyone could help me in matching \ and u002F or in escaping \ and u002F in preg_match_all, that would be helpfull.
Question Edit:
I want to use only preg_match_all because I am trying to extract above URL from a html page.
You may use
preg_match_all('~https:(?://|(?:\\\\u002F){2})site\.vid\.com(?:/|\\\\u002F)[^"]*~', $string)
See the regex demo. Details:
https: - a literal string (if s is optional, use https?:)
(?://|(?:\\u002F){2}) - a non-capturing group matching either // or (|) two occurrences of \u002F
site\.vid\.com - a literal site.vid.com string (the dot is a metacharacter that matches any char but line break chars, so it must be escaped)
(?:/|\\u002F) - a non-capturing group matching / or \u002F text
[^"]* - a negated character class matching zero or more chars other than ".
See the PHP demo:
$re = '~https:(?://|(?:\\\\u002F){2})site\.vid\.com(?:/|\\\\u002F)[^"]*~';
$str = 'https:\\u002F\\u002Fsite.vid.com\\u002F93836af7-f465-4d2c-9feb-9d8128827d85\\u002F6njx6dp3gi.m3u8?token=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJjb3VudHJ5IjoiSU4iLCJkZXZpY2VfaWQiOiI1NjYxZTY3Zi0yYWE3LTQ1MjUtOGYwYy01ODkwNGQyMjc3ZmYiLCJleHAiOjE2MTA3MjgzNjEsInBsYXRmb3JtIjoiV0VCIiwidXNlcl9pZCI6MH0.c3Xhi58DnxBhy-_I5yC2XMGSWU3UUkz5YgeVL1buHYc","';
preg_match_all($re, $str, $matches, PREG_SET_ORDER, 0);
print_r($matches[0]);
// => Array( [0] => https:\u002F\u002Fsite.vid.com\u002F93836af7-f465-4d2c-9feb-9d8128827d85\u002F6njx6dp3gi.m3u8?token=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJjb3VudHJ5IjoiSU4iLCJkZXZpY2VfaWQiOiI1NjYxZTY3Zi0yYWE3LTQ1MjUtOGYwYy01ODkwNGQyMjc3ZmYiLCJleHAiOjE2MTA3MjgzNjEsInBsYXRmb3JtIjoiV0VCIiwidXNlcl9pZCI6MH0.c3Xhi58DnxBhy-_I5yC2XMGSWU3UUkz5YgeVL1buHYc )

Regex curly braces and quotes get inner text

In the following string {lang('stmt')} I want to get just the stmt where it may also be as follows {lang("stmt")}.
I'm bad with regex, I've tried {lang(.*?)} which gives me ('stmt').
You might match {lang(" or {lang(' and capture the ' or " using a capturing group. This group can by used with a backreference to match the same character.
Use \K to forget what was previously matched.
Then match 0+ characters non greedy .*? and use a positive lookahead using the backreference \1 to assert what follows is ')} or ")}
\{lang\((['"])\K.*?(?=\1\)})
Regex demo
Match either ' or " with a character set, then lazy-repeat any character until the first capture group can be matched again:
lang\((['"])(.*?)\1
https://regex101.com/r/MBKhX3/1
In PHP code:
$str = "{lang('stmt')}";
preg_match('/lang\(([\'"])(.*?)\1/', $str, $matches);
print(json_encode($matches));
Result:
["lang('stmt'","'","stmt"]
(the string you want will be in the second capture group)
Try this one too.
lang\([('")][a-z]*['")]\)
Keep ( and ) outside the (.*) to get value without ( and )
regex:
{lang\('|"['|"]\)}
php: '/{lang\([\'|"](.*?)[\'|"]\)}/'

Regex string inside 2 special characters without space

I am trying to write a RegEx for preg_match_all in php to match a string inside 2 $ symbols, like $abc$ but only if it doesn't have a space, for example, I don't need to match $ab c$.
I wrote this regex /[\$]\S(.*)[\$]/U and some variations but can't get it to work.
Thanks for your help guys.
Overview
Your regex: [\$]\S(.*)[\$]
[\$] - No point in escaping $ inside [] because it's already interpreted as the literal character. No point putting \$ inside [] because \$ is the escaped version. Just use one or the other [$] or \$.
\S(.*) Matches any non-whitespace character (once), followed by any character (except \n) any number of times
Code
See regex in use here
\$\S+\$
\$ Match $ literally
\S+ Match any non-whitespace character one or more times
\$ Match $ literally
Usage
$re = '/\$\S+\$/';
$str = '$abc$
$ab c$';
preg_match_all($re, $str, $matches, PREG_SET_ORDER, 0);
var_dump($matches);
I think this will suit your needs.
https://regex101.com/r/WgUwh9/1
\$([a-zA-Z]*)\$
It will match a-Z of any lenght without space between two $

Explode and/or regex text to HTML link in PHP

I have a database of texts that contains this kind of syntax in the middle of English sentences that I need to turn into HTML links using PHP
"text1(text1)":http://www.example.com/mypage
Notes:
text1 is always identical to the text in parenthesis
The whole string always have the quotation marks, parenthesis, colon, so the syntax is the same for each.
Sometimes there is a space at the end of the string, but other times there is a question mark or comma or other punctuation mark.
I need to turn these into basic links, like
text1
How do I do this? Do I need explode or regex or both?
"(.*?)\(\1\)":(.*\/[a-zA-Z0-9]+)(?=\?|\,|\.|$)
You can use this.
See Demo.
http://regex101.com/r/zF6xM2/2
You can use this replacement:
$pattern = '~"([^("]+)\(\1\)":(http://\S+)(?=[\s\pP]|\z)~';
$replacement = '\1';
$result = preg_replace($pattern, $replacement, $text);
pattern details:
([^("]+) this part will capture text1 in the group 1. The advantage of using a negated character class (that excludes the double quote and the opening parenthesis) is multiple:
it allows to use a greedy quantifier, that is faster
since the class excludes the opening parenthesis and is immediatly followed by a parenthesis in the pattern, if in an other part of the text there is content between double quotes but without parenthesis inside, the regex engine will not go backward to test other possibilities, it will skip this substring without backtracking. (This is because the PCRE regex engine converts automatically [^a]+a into [^a]++a before processing the string)
\S+ means all that is not a whitespace one or more times
(?=[\s\pP]|\z) is a lookahead assertion that checks that the url is followed by a whitespace, a punctuation character (\pP) or the end of the string.
You can use this regex:
"(.*?)\(.*?:(.*)
Working demo
An appropriate Regular Expression could be:
$str = '"text1(text1)":http://www.example.com/mypage';
preg_match('#^"([^\(]+)' .
'\(([^\)]+)\)[^"]*":(.+)#', $str, $m);
print ''.$m[2].'' . PHP_EOL;

exploding a string using a regular expression

I have a string as below (the letters in the example could be numbers or texts and could be either uppercase or lowercase or both. If a value is a sentence, it should be between single quotations):
$string="a,b,c,(d,e,f),g,'h, i j.',k";
How can I explode that to get the following result?
Array([0]=>"a",[1]=>"b",[2]=>"c",[3]=>"(d,e,f)",[4]=>"g",[5]=>"'h,i j'",[6]=>"k")
I think using regular expressions will be a fast as well as clean solution. Any idea?
EDIT:
This is what I have done so far, which is very slow for the strings having a long part between parenthesis:
$separator="*"; // whatever which is not used in the string
$Pattern="'[^,]([^']+),([^']+)[^,]'";
while(ereg($Pattern,$String,$Regs)){
$String=ereg_replace($Pattern,"'\\1$separator\\2'",$String);
}
$Pattern="\(([^(^']+),([^)^']+)\)";
while(ereg($Pattern,$String,$Regs)){
$String=ereg_replace($Pattern,"(\\1$separator\\2)",$String);
}
return $String;
This, will replace all the commas between the parenthesis. Then I can explode it by commas and the replace the $separator with the original comma.
You can do the job using preg_match_all
$string="a,b,c,(d,e,f),g,'h, i j.',k";
preg_match_all("~'[^']+'|\([^)]+\)|[^,]+~", $string, $result);
print_r($result[0]);
Explanation:
The trick is to match parenthesis before the ,
~ Pattern delimiter
'
[^'] All charaters but not a single quote
+ one or more times
'
| or
\([^)]+\) the same with parenthesis
| or
[^,]+ Any characters except commas one or more times
~
Note that the quantifiers in [^']+', in [^)]+\) but also in [^,]+ are all automatically optimized to possessive quantifiers at compile time due to "auto-possessification". The first two because the character class doesn't contain the next character, and the last because it is at the end of the pattern. In both cases, an eventual backtracking is unnecessary.
if you have more than one delimiter like quotes (that are the same for open and close), you can write your pattern like this, using a capture group:
$string="a,b,c,(d,e,f),g,'h, i j.',k,°l,m°,#o,p#,#q,r#,s";
preg_match_all('~([\'##°]).*?\1|\([^)]+\)|[^,]+~', $string, $result);
print_r($result[0]);
explanation:
(['##°]) one character in the class is captured in group 1
.*? any character zero or more time in lazy mode
\1 group 1 content
With nested parenthesis:
$string="a,b,(c,(d,(e),f),t),g,'h, i j.',k,°l,m°,#o,p#,#q,r#,s";
preg_match_all('~([\'##°]).*?\1|(\((?:[^()]+|(?-1))*+\))|[^,]+~', $string, $result);
print_r($result[0]);

Categories