Get all strings matching pattern in text - php

I'm trying to get from text all strings which are between t(" and ") or t(' and ').
I came up with regexp /[^t\(("|\')]*(?=("|\')\))/, but it is not ignoring character 't' when it is not before to '('.
For example:
$str = 'This is a text, t("string1"), t(\'string2\')';
preg_match_all('/[^t\(("|\')]*(?=("|\')\))/', $str, $m);
var_dump($m);
returns ring1 and ring2, but I need to get string1 and string2.
You can consider this also.

You need to use separate regex for each.
(?<=t\(").*?(?="\))|(?<=t\(\').*?(?='\))
DEMO
Code:
$re = "/(?<=t\\(\").*?(?=\"\\))|(?<=t\\(\\').*?(?='\\))/m";
$str = "This is a text, t(\"string1\"), t('string2')";
preg_match_all($re, $str, $matches);
OR
Use capturing group along with \K
t\((['"])\K.*?(?=\1\))
DEMO
\K discards the previously matched characters from printing at the final.

You can do it in few steps with this pattern:
$pattern = '~t\((?|"([^"\\\]*+(?s:\\\.[^"\\\]*)*+)"\)|\'([^\'\\\]*+(?s:\\\.[^\'\\\]*)*+)\'\))~';
if (preg_match_all($pattern, $str, $matches))
print_r($matches[1]);
It is a little long and repetitive, but it is fast and can deal with escaped quotes.
details:
t\(
(?| # Branch reset feature allows captures to have the same number
"
( # capture group 1
[^"\\]*+ # all that is not a double quote or a backslash
(?s: # non capturing group in singleline mode
\\. # an escaped character
[^"\\]* # all that is not a double quote or a backslash
)*+
)
"\)
| # OR the same with single quotes (and always in capture group 1)
'([^'\\]*+(?s:\\.[^'\\]*)*+)'\)
)
demo

Related

Split and catch text by a variable delimiter

I have a text which include delimiter tags in the following format:
<\!--[od]+-\d+--\>
Example:
<!--od-14-->
<!--od-1--\>
<!--od-65--\>
I need a regex which will split the text and catch the \d+ numeric argument in the split, also the text after it.
Here's a regex i come up, the problem is it does not return multiple lines.
https://regex101.com/r/xvw8Xw/2
One option is to make the dot match a newline using for example an inline modifier (?s). Then use a non greedy match with a positive lookahead to assert the next comment or the end of the string:
(?s)<\!--[od]+-(\d+)-->(.*?)(?=<!--|$)
(?s) Inline modifier, make the dot match a newline
<\!-- match <!--
[od]+-(\d+)--> Match 1+ times either o or d (which might just be od)
(.*?) Match any char 0+ times except a newline non greedy
(?=<!--|$) Positive lookahead, assert what is on the right is <!-- or the end of the string
Regex demo | Php demo
For example using /s in the pattern:
$re = '/<\!--[od]+-(\d+)-->(.*?)(?=<!--|$)/s';
$str = '<!--od-1--> cdskc sdkjc
dsd
sk<!--od-2-->cscdscsdcsd
cdscs
csdcsdc
<!--od-432-->cdcdscsd';
preg_match_all($re, $str, $matches, PREG_SET_ORDER, 0);
print_r($matches);
This expression might also work here on m mode:
<!--od-(\d+)--\>([\s\S]*?)(?=<|$)
or this one on s mode:
<!--od-(\d+)--\>(.*?)(?=<|$)
Demo
Test
$re = '/<!--od-(\d+)--\>(.*?)(?=<|$)/s';
$str = '<!--od-1--> cdskc sdkjc
dsd
sk<!--od-2-->cscdscsdcsd
cdscs
csdcsdc
<!--od-432-->cdcdscsd';
preg_match_all($re, $str, $matches, PREG_SET_ORDER, 0);
var_dump($matches);

Need Help About php preg_match

I have a string:
access":"YOU HAVE 0 BALANCE","machine
How can I extract the string in between the double quotes and have only the text (without the double quotes):
YOU HAVE 0 BALANCE
I have tried
if(preg_match("'access":"(.*?)","machine'", $tok2, $matches)){
but with no luck :( .
You may use
'/access":"(.*?)","machine/'
See the regex demo. The value you need is in Group 1.
Details
access":" - a literal substring
(.*?) - Group 1: any 0+ chars other than line break chars, as few as possible, since *? is a lazy quantifier
","machine - a literal substring
See the PHP online demo:
$re = '/access":"(.*?)","machine/';
$str = 'access":"YOU HAVE 0 BALANCE","machine';
if (preg_match($re, $str, $matches)) {
print_r($matches[1]);
}
// => YOU HAVE 0 BALANCE

How can I remove a specific format from string with RegEx?

I have a list of string like this
$16,500,000(#$2,500)
$34,000(#$11.00)
$214,000(#$18.00)
$12,684,000(#$3,800)
How can I extract all symbols and the (#$xxxx) from these strings so that they can be like
16500000
34000
214000
12684000
\(.*?\)|\$|,
Try this.Replace by empty string.See demo.
https://regex101.com/r/vD5iH9/42
$re = "/\\(.*?\\)|\\$|,/m";
$str = "\$16,500,000(#\$2,500)\n\$34,000(#\$11.00)\n\$214,000(#\$18.00)\n\$12,684,000(#\$3,800)";
$subst = "";
$result = preg_replace($re, $subst, $str);
To remove the end (#$xxxx) characters, you could use the regex:
\(\#\$.+\)
and replace it with nothing:
preg_replace("/\(\#\$.+\)/g"), "", $myStringToReplaceWith)
Make sure to use the g (global) modifier so the regex doesn't stop after it finds the first match.
Here's a breakdown of that regex:
\( matches the ( character literally
\# matches the # character literally
\$ matches the $ character literally
.+ matches any character 1 or more times
\) matches the ) character literally
Here's a live example on regex101.com
In order to remove all of these characters:
$ , ( ) # .
From a string, you could use the regex:
\$|\,|\(|\)|#|\.
Which will match all of the characters above.
The | character above is the regex or operator, effectively making it so
$ OR , OR ( OR ) OR # OR . will be matched.
Next, you could replace it with nothing using preg_replace, and with the g (global) modifier, which makes it so the regex doesn't return on the first match:
preg_replace("/\$|\,|\(|\)|#|\./g"), "", $myStringToReplaceWith)
Here's a live example on regex101.com
So in the end, your code could look like this:
$str = preg_replace("/\(\#\$.+\)/g"), "", $str)
$str = preg_replace("/\$|\,|\(|\)|#|\./g"), "", $str)
Although it isn't in one regex, it does not use any look-ahead, or look-behind (both of which are not bad, by the way).

Preg_replace pattern for BBCode quote tag

I want to use the PHP preg_replace function on two strings but I am unsure of the regular expression to use.
For the first string, I only require the author value (so everything after author=, but nothing after the space):
[quote author=username link=1150111054/0#7 date=1150151926]
Result:
[quote=username]
For the second string, there is no author= tag. The username simply appears after a closed open quote
[quote] username link=1142890417/0#43 date=1156429613]
Ideally, the result should be:
[quote=username]
Make the string author= and ] as optional inorder to do replacement on both type of strings.
Regex:
^\[(\S+?)\]?\s+(?:author=)?(\S+).*$
If you want to mention the string quote on your regex then use this,
^\[(quote)\]?\s+(?:author=)?(\S+).*$
Replacement string:
[$1=$2]
DEMO
<?php
$string =<<<EOT
[quote author=username link=1150111054/0#7 date=1150151926]
[quote] username link=1142890417/0#43 date=1156429613]
EOT;
echo preg_replace("~^\[(\S+?)\]?\s+(?:author=)?(\S+).*$~m", "[$1=$2]", $string);
?>
Output:
[quote=username]
[quote=username]
For the first one: /author=(.*?) /
And for the second one /\[quote\] (.*?) /
In you case:
$str1 = "[quote author=username link=1150111054/0#7 date=1150151926]";
$str2 = "[quote] username link=1142890417/0#43 date=1156429613]";
$regex1 = '/author=(.*?) /';
$regex2 = '/\[quote\] (.*?) /';
if (preg_match($regex1, $str1, $match1))
echo '[quote='.$newStr1 = $match1[1].']';
if (preg_match($regex2, $str2, $match2))
echo '[quote='.$newStr2 = $match2[1].']';
Here is another way to handle both with a single regex.
# Find: '~(?|\[quote\]\s*(\S+).*|\[quote\s+author=\s*(\S+).*)~'
# Replace: '[author=$1]'
(?|
\[quote\] \s*
( \S+ )
.*
|
\[quote \s+ author= \s*
( \S+ )
.*
)
Input:
[quote author=username link=1150111054/0#7 date=1150151926]
[quote] username link=1142890417/0#43 date=1156429613]
Output:
[author=username]
[author=username]

Match all occurrences of a string

My search text is as follows.
...
...
var strings = ["aaa","bbb","ccc","ddd","eee"];
...
...
It contains many lines(actually a javascript file) but need to parse the values in variable strings , ie aaa , bbb, ccc , ddd , eee
Following is the Perl code, or use PHP at bottom
my $str = <<STR;
...
...
var strings = ["aaa","bbb","ccc","ddd","eee"];
...
...
STR
my #matches = $str =~ /(?:\"(.+?)\",?)/g;
print "#matches";
I know the above script will match all instants, but it will parse strings ("xyz") in the other lines also. So I need to check the string var strings =
/var strings = \[(?:\"(.+?)\",?)/g
Using above regex it will parse aaa.
/var strings = \[(?:\"(.+?)\",?)(?:\"(.+?)\",?)/g
Using above, will get aaa , and bbb. So to avoid the regex repeating I used '+' quantifier as below.
/var strings = \[(?:\"(.+?)\",?)+/g
But I got only eee, So my question is why I got eee ONLY when I used '+' quantifier?
Update 1: Using PHP preg_match_all (doing it to get more attention :-) )
$str = <<<STR
...
...
var strings = ["aaa","bbb","ccc","ddd","eee"];
...
...
STR;
preg_match_all("/var strings = \[(?:\"(.+?)\",?)+/",$str,$matches);
print_r($matches);
Update 2: Why it matched eee ? Because of the greediness of (?:\"(.+?)\",?)+ . By removing greediness /var strings = \[(?:\"(.+?)\",?)+?/ aaa will be matched. But why only one result? Is there any way it can be achieved by using single regex?
Here's a single-regex solution:
/(?:\bvar\s+strings\s*=\s*\[|\G,)\s*"([^"]*)"/g
\G is a zero-width assertion that matches the position where the previous match ended (or the beginning of the string if it's the first match attempt). So this acts like:
var\s+strings\s*=\s*[\s*"([^"]*)"
...on the first attempt, then:
,\s*"([^"]*)"
...after that, but each match has to start exactly where the last one left off.
Here's a demo in PHP, but it will work in Perl, too.
You may prefer this solution which first looks for the string var strings = [ using the /g modifier. This sets \G to match immediately after the [ for the next regex, which looks for all immediately following occurrences of double-quoted strings, possibly preceded by commas or whitespace.
my #matches;
if ($str =~ /var \s+ strings \s* = \s* \[ /gx) {
#matches = $str =~ /\G [,\s]* "([^"]+)" /gx;
}
Despite using the /g modifier your regex /var strings = \[(?:\"(.+?)\",?)+/g matches only once because there is no second occurrence of var strings = [. Each match returns a list of the values of the capture variables $1, $2, $3 etc. when the match completed, and /(?:"(.+?)",?)+/ (there is no need to escape the double-quotes) captures multiple values into $1 leaving only the final value there. You need to write something like the above , which captures only a single value into $1 for each match.
Because the + tells it to repeat the exact stuff inside brackets (?:"(.+?)",?) one or more times. So it will match the "eee" string, end then look for repetitions of that "eee" string, which it does not find.
use YAPE::Regex::Explain;
print YAPE::Regex::Explain->new(qr/var strings = \[(?:"(.+?)",?)+/)->explain();
The regular expression:
(?-imsx:var strings = \[(?:"(.+?)",?)+)
matches as follows:
NODE EXPLANATION
----------------------------------------------------------------------
(?-imsx: group, but do not capture (case-sensitive)
(with ^ and $ matching normally) (with . not
matching \n) (matching whitespace and #
normally):
----------------------------------------------------------------------
var strings = 'var strings = '
----------------------------------------------------------------------
\[ '['
----------------------------------------------------------------------
(?: group, but do not capture (1 or more times
(matching the most amount possible)):
----------------------------------------------------------------------
" '"'
----------------------------------------------------------------------
( group and capture to \1:
----------------------------------------------------------------------
.+? any character except \n (1 or more
times (matching the least amount
possible))
----------------------------------------------------------------------
) end of \1
----------------------------------------------------------------------
" '"'
----------------------------------------------------------------------
,? ',' (optional (matching the most amount
possible))
----------------------------------------------------------------------
)+ end of grouping
----------------------------------------------------------------------
) end of grouping
----------------------------------------------------------------------
A simpler example would be:
my #m = ('abcd' =~ m/(\w)+/g);
print "#m";
Prints only d. This is due to:
use YAPE::Regex::Explain;
print YAPE::Regex::Explain->new(qr/(\w)+/)->explain();
The regular expression:
(?-imsx:(\w)+)
matches as follows:
NODE EXPLANATION
----------------------------------------------------------------------
(?-imsx: group, but do not capture (case-sensitive)
(with ^ and $ matching normally) (with . not
matching \n) (matching whitespace and #
normally):
----------------------------------------------------------------------
( group and capture to \1 (1 or more times
(matching the most amount possible)):
----------------------------------------------------------------------
\w word characters (a-z, A-Z, 0-9, _)
----------------------------------------------------------------------
)+ end of \1 (NOTE: because you are using a
quantifier on this capture, only the LAST
repetition of the captured pattern will be
stored in \1)
----------------------------------------------------------------------
) end of grouping
----------------------------------------------------------------------
If you use the quantifier on the capture group, only the last instance will be used.
Here's a way that works:
my $str = <<STR;
...
...
var strings = ["aaa","bbb","ccc","ddd","eee"];
...
...
STR
my #matches;
$str =~ m/var strings = \[(.+?)\]/; # get the array first
my $jsarray = $1;
#matches = $array =~ m/"(.+?)"/g; # and get the strings from that
print "#matches";
Update:
A single-line solution (though not a single regex) would be:
#matches = ($str =~ m/var strings = \[(.+?)\]/)[0] =~ m/"(.+?)"/g;
But this is highly unreadable imho.

Categories