Regex/PHP Replace any repeating word group - php

How can match
$string = "Foo Bar (Any Group - ANY GROUP Baz)";
Should return as "Foo Bar (Any Group - Baz)"
Is it possible without bruteforce as here Replace repeating strings in a string ?
Edit:
* The group could consist of 1-4 words while each word could match [A-Za-z0-9\/\(\)]{1,30}
* The separator would always be -

Leaving the space out of the list of allowed "word" characters, the following works for your example:
$result = preg_replace(
'%
( # Match and capture
(?: # the following:...
[\w/()]{1,30} # 1-30 "word" characters
[^\w/()]+ # 1 or more non-word characters
){1,4} # 1 to 4 times
) # End of capturing group 1
([ -]*) # Match any number of intervening characters (space/dash)
\1 # Match the same as the first group
%ix', # Case-insensitive, verbose regex
'\1\2', $subject);

Related

How to do preg_replace that only matches particular conditions?

I am struggling to write a preg_replace command that achieves what I need.
Essentially I have the following array (all the items follow one of these four patterns):
$array = array('Dogs/Cats', 'Dogs/Cats/Mice', 'ANIMALS/SPECIES Dogs/Cats/Mice', '(Animals/Species) Dogs/Cats/Mice' );
I need to be able to get the following result:
Dogs/Cats = Dogs or Cats
Dogs/Cats/Mice = Dogs or Cats or Mice
ANIMALS/SPECIES Dogs/Cats/Mice = ANIMALS/SPECIES Dogs or Cats or Mice
(Animals/Species) Dogs/Cats/Mice = (Animals/Species) Dogs or Cats or Mice
So basically replace slashes in anything that isn't capital letters or brackets.
I am starting to grasp it but still need some guidance:
preg_replace('/(\(.*\)|[A-Z]\W[A-Z])[\W\s\/]/', '$1 or', $array);
As you can see this recognises the first patterns but I don't know where to go from there
Thanks!
You might use the \G anchors to assert the position at the previous match and use \K to forget what was matched to match only a /.
You could optionally match ANIMALS/SPECIES or (Animals/Species) at the start.
(?:^(?:\(\w+/\w+\)\h+|[A-Z]+/[A-Z]+\h+)?|\G(?!^))\w+\K/
Explanation
(?: Non capturing group
^ Assert start of string
(?: Non capturing group, match either
\(\w+/\w+\)\h+ Match between (....) 1+ word chars with a / between ending with 1+ horizontal whitespace chars
| Or
[A-Z]+/[A-Z]+\h+ Match 1+ times [A-Z], / and again 1+ times [A-Z]
)? Close non capturing group and make it optional
| Or
\G(?!^) Assert position at the previous match
)\w+ Close non capturing group and match 1+ times a word char
\K/ Forget what was matched, and match a /
Regex demo | Php demo
In the replacement use a space, or and a space
For example
$array = array('Dogs/Cats', 'Dogs/Cats/Mice', 'ANIMALS/SPECIES Dogs/Cats/Mice', '(Animals/Species) Dogs/Cats/Mice');
$re = '~(?:^(?:\(\w+/\w+\)\h+|[A-Z]+/[A-Z]+\h+)?|\G(?!^))\w+\K/~';
$array = preg_replace($re, " or ", $array);
print_r($array);
Result:
Array
(
[0] => Dogs or Cats
[1] => Dogs or Cats or Mice
[2] => ANIMALS/SPECIES Dogs or Cats or Mice
[3] => (Animals/Species) Dogs or Cats or Mice
)
The way you present your problem with your example strings, doing:
$result = preg_replace('~(?:\S+ )?[^/]*+\K.~', ' or ', $array);
looks enough. In other words, you only have to check if there's a space somewhere to consume the beginning of the string until it and to discard it from the match result using \K.
But to avoid future disappointments, it is sometimes useful to put yourself in the shoes of the Devil to consider more complex cases and ask embarrassing questions:
What if a category, a subcategory or an item contains a space?
~
(?:^
(?:
\( [^)]* \)
|
\p{Lu}+ (?> [ ] \p{Lu}+ \b )*
(?> / \p{Lu}+ (?> [ ] \p{Lu}+ \b )* )*
)
[ ]
)?
[^/]*+ \K .
~xu
demo
In the same way, to deal with hyphens, single quotes or whatever, you can replace [ ] with [^\pL/] (a class that excludes letters and the slash) or something more specific.

split a string which consists decimals instead of integer

I split a string '3(1-5)' like this:
$pattern = '/^(\d+)\((\d+)\-(\d+)\)$/';
preg_match($pattern, $string, $matches);
But I need to do the same thing for decimals, i.e. '3.5(1.5-4.5)'.
And what do I have to do, if the user writes '3,5(1,5-4,5)'?
Output of '3.5(1.5-4.5)' should be:
$matches[1] = 3.5
$matches[2] = 1.5
$matches[3] = 4.5
You can use the following regular expression.
$pattern = '/^(\d+(?:[.,]\d+)?)\(((?1))-((?1))\)$/';
The first capturing group ( ... ) matches the following pattern:
( # group and capture to \1:
\d+ # digits (0-9) (1 or more times)
(?: # group, but do not capture (optional):
[.,] # any character of: '.', ','
\d+ # digits (0-9) (1 or more times)
)? # end of grouping
) # end of \1
Afterwords we look for an opening parenthesis and then recurse (match/capture) the 1st subpattern followed by a hyphen (-) and then recurse (match/capture) the 1st subpattern again followed by a closing parenthesis.
Code Demo
This pattern should help:
^(\d+\.?\,?\d+)\((\d+\,?\.?\d+)\-(\d+\.?\,?\d+)\)$

Preg_match/Preg_replace in php for matching pattern and replacing it in php

I want to replace value in string with XXX
input:
insert into employees values('shrenik', 555, NULL)
output:
insert into employees values('XXX', XXX, NULL)
I tried this: ([0-9]|\'.*\')
I want to match first for insert into after that want to skip the string up to (. I already mentioned in the statement the pattern and output I required.
Thanks in advance.
You can use this:
$sql = 'insert into employees values(\'shrenik\', 555, NULL)';
$pattern = '~(?:\binsert into [^(]*\(|\G(?<!^),(?:\s*+NULL,)*)\s*+\K(\')?(?(1)[^\']*\'|(?!NULL\b)[^\s,)]*)~i';
$sql = preg_replace($pattern, '$1XXX$1', $sql);
pattern details
~ # pattern delimiter
(?: # non capturing group: where the pattern is allowed to start
\binsert into [^(]*\( # after "insert to" until the opening parenthesis
| # OR
\G(?<!^), # after a precedent match if there is a comma
(?:\s*+NULL,)* # skip NULL values
)
\s*+ # zero or more spaces
\K # reset all that was matched before from match result
(')? # optional capture group 1 with single quote
(?(1) # IF capture group 1 exists:
[^']*' # THEN matches all characters except ' followed by a literal '
| # ELSE
(?!NULL\b)[^\s,)]* # matches all characters except spaces, comma, ) and the last NULL value
) # ENDIF
~i # closing pattern delimiter, case-insensitive

Regex/PHP Replace any repeating (but flexible) word group

How can I match "Any Group" repeated as "ANY GROUP" or "ANYGROUP"
$string = "Foo Bar (Any Group - ANY GROUP Baz)
Foo Bar (Any Group - ANYGROUP Baz)";
so they return as "Foo Bar (Any Group - Baz)"
The separator would always be -
This post extends Regex/PHP Replace any repeating word group
This matches "Any Group - ANY GROUP" but not when repeated without blank.
$result = preg_replace(
'%
( # Match and capture
(?: # the following:...
[\w/()]{1,30} # 1-30 "word" characters
[^\w/()]+ # 1 or more non-word characters
){1,4} # 1 to 4 times
) # End of capturing group 1
([ -]*) # Match any number of intervening characters (space/dash)
\1 # Match the same as the first group
%ix', # Case-insensitive, verbose regex
'\1\2', $subject);
This is ugly (as I said it would be), but it should work:
$result = preg_replace(
'/((\b\w+)\s+) # One repeated word
\s*-\s*
\2
|
((\b\w+)\s+(\w+)\s+) # Two repeated words
\s*-\s*
\4\s*\5
|
((\b\w+)\s+(\w+)\s+(\w+)\s+) # Three
\s*-\s*
\7\s*\8\s*\9
|
((\b\w+)\s+(\w+)\s+(\w+)\s+(\w+)\s+) # Four
\s*-\s*
\11\s*\12\s*\13\s*\14\b/ix',
'\1\3\6\10-', $subject);
Up to 6 word(s) solution is:
$result = preg_replace(
'/
(\(\s*)
(([^\s-]+)
\s*?([^\s-]*)
\s*?([^\s-]*)
\s*?([^\s-]*)
\s*?([^\s-]*)
\s*?([^\s-]*))
(\s*\-\s*)
\3\s*\4\s*\5\s*\6\s*\7\s*\8\s*
/ix',
'\1\2\9',
$string);
Check this demo.

Match all occurrences of a string

My search text is as follows.
...
...
var strings = ["aaa","bbb","ccc","ddd","eee"];
...
...
It contains many lines(actually a javascript file) but need to parse the values in variable strings , ie aaa , bbb, ccc , ddd , eee
Following is the Perl code, or use PHP at bottom
my $str = <<STR;
...
...
var strings = ["aaa","bbb","ccc","ddd","eee"];
...
...
STR
my #matches = $str =~ /(?:\"(.+?)\",?)/g;
print "#matches";
I know the above script will match all instants, but it will parse strings ("xyz") in the other lines also. So I need to check the string var strings =
/var strings = \[(?:\"(.+?)\",?)/g
Using above regex it will parse aaa.
/var strings = \[(?:\"(.+?)\",?)(?:\"(.+?)\",?)/g
Using above, will get aaa , and bbb. So to avoid the regex repeating I used '+' quantifier as below.
/var strings = \[(?:\"(.+?)\",?)+/g
But I got only eee, So my question is why I got eee ONLY when I used '+' quantifier?
Update 1: Using PHP preg_match_all (doing it to get more attention :-) )
$str = <<<STR
...
...
var strings = ["aaa","bbb","ccc","ddd","eee"];
...
...
STR;
preg_match_all("/var strings = \[(?:\"(.+?)\",?)+/",$str,$matches);
print_r($matches);
Update 2: Why it matched eee ? Because of the greediness of (?:\"(.+?)\",?)+ . By removing greediness /var strings = \[(?:\"(.+?)\",?)+?/ aaa will be matched. But why only one result? Is there any way it can be achieved by using single regex?
Here's a single-regex solution:
/(?:\bvar\s+strings\s*=\s*\[|\G,)\s*"([^"]*)"/g
\G is a zero-width assertion that matches the position where the previous match ended (or the beginning of the string if it's the first match attempt). So this acts like:
var\s+strings\s*=\s*[\s*"([^"]*)"
...on the first attempt, then:
,\s*"([^"]*)"
...after that, but each match has to start exactly where the last one left off.
Here's a demo in PHP, but it will work in Perl, too.
You may prefer this solution which first looks for the string var strings = [ using the /g modifier. This sets \G to match immediately after the [ for the next regex, which looks for all immediately following occurrences of double-quoted strings, possibly preceded by commas or whitespace.
my #matches;
if ($str =~ /var \s+ strings \s* = \s* \[ /gx) {
#matches = $str =~ /\G [,\s]* "([^"]+)" /gx;
}
Despite using the /g modifier your regex /var strings = \[(?:\"(.+?)\",?)+/g matches only once because there is no second occurrence of var strings = [. Each match returns a list of the values of the capture variables $1, $2, $3 etc. when the match completed, and /(?:"(.+?)",?)+/ (there is no need to escape the double-quotes) captures multiple values into $1 leaving only the final value there. You need to write something like the above , which captures only a single value into $1 for each match.
Because the + tells it to repeat the exact stuff inside brackets (?:"(.+?)",?) one or more times. So it will match the "eee" string, end then look for repetitions of that "eee" string, which it does not find.
use YAPE::Regex::Explain;
print YAPE::Regex::Explain->new(qr/var strings = \[(?:"(.+?)",?)+/)->explain();
The regular expression:
(?-imsx:var strings = \[(?:"(.+?)",?)+)
matches as follows:
NODE EXPLANATION
----------------------------------------------------------------------
(?-imsx: group, but do not capture (case-sensitive)
(with ^ and $ matching normally) (with . not
matching \n) (matching whitespace and #
normally):
----------------------------------------------------------------------
var strings = 'var strings = '
----------------------------------------------------------------------
\[ '['
----------------------------------------------------------------------
(?: group, but do not capture (1 or more times
(matching the most amount possible)):
----------------------------------------------------------------------
" '"'
----------------------------------------------------------------------
( group and capture to \1:
----------------------------------------------------------------------
.+? any character except \n (1 or more
times (matching the least amount
possible))
----------------------------------------------------------------------
) end of \1
----------------------------------------------------------------------
" '"'
----------------------------------------------------------------------
,? ',' (optional (matching the most amount
possible))
----------------------------------------------------------------------
)+ end of grouping
----------------------------------------------------------------------
) end of grouping
----------------------------------------------------------------------
A simpler example would be:
my #m = ('abcd' =~ m/(\w)+/g);
print "#m";
Prints only d. This is due to:
use YAPE::Regex::Explain;
print YAPE::Regex::Explain->new(qr/(\w)+/)->explain();
The regular expression:
(?-imsx:(\w)+)
matches as follows:
NODE EXPLANATION
----------------------------------------------------------------------
(?-imsx: group, but do not capture (case-sensitive)
(with ^ and $ matching normally) (with . not
matching \n) (matching whitespace and #
normally):
----------------------------------------------------------------------
( group and capture to \1 (1 or more times
(matching the most amount possible)):
----------------------------------------------------------------------
\w word characters (a-z, A-Z, 0-9, _)
----------------------------------------------------------------------
)+ end of \1 (NOTE: because you are using a
quantifier on this capture, only the LAST
repetition of the captured pattern will be
stored in \1)
----------------------------------------------------------------------
) end of grouping
----------------------------------------------------------------------
If you use the quantifier on the capture group, only the last instance will be used.
Here's a way that works:
my $str = <<STR;
...
...
var strings = ["aaa","bbb","ccc","ddd","eee"];
...
...
STR
my #matches;
$str =~ m/var strings = \[(.+?)\]/; # get the array first
my $jsarray = $1;
#matches = $array =~ m/"(.+?)"/g; # and get the strings from that
print "#matches";
Update:
A single-line solution (though not a single regex) would be:
#matches = ($str =~ m/var strings = \[(.+?)\]/)[0] =~ m/"(.+?)"/g;
But this is highly unreadable imho.

Categories