Regular expression for dotted string with exception - php

I'm not sure that is it possible to do in one regex, but it doesn't hurt to ask.
I created the expression:
/(?<variable>\w+)((\.(?<method>\w+)\((?<parameter>[^{}%]*)\))|(\.(?<subvariable>\w+)))?/i
which helps me to "convert" dotted strings to arrays or call to methods:
core.settings => $core['settings']
core.set(param1, param2) => $core->set('param1', 'param2')
It works very well. But I have no idea how to build a several level expression which will work like this:
string: core.settings
group <variable> = core
group <subvariable> = settings
string: core.get(param)
group <variable> = core
group <method> = get
group <parameter> = param
string core.settings.time
group <variable> = core
group <subvariable> = settings.time
string core.settings.time.set(param)
group <variable> = core
group <subvariable> = settings.time
group <method> = set
group <parameter> = param
Any ideas? And whether it is generally possible?

You can use
^(?<variable>\w+)(?:\.(?<subvariable>\w+(?:\.\w+)*))??(?:\.(?<method>\w+)\((?<parameter>[^{}%]*)\))?$
See the regex demo.
Details:
^ - start of string
(?<variable>\w+) - Group "variable": one or more word chars
(?:\.(?<subvariable>\w+(?:\.\w+)*))?? - zero or one occurrence of . and then Group "subvariable" matching one or more word chars followed with zero or more occurrences of a . and one or more word chars
(?:\.(?<method>\w+)\((?<parameter>[^{}%]*)\))? - an optional sequence of
\. - a dot
(?<method>\w+) - Group "method": one or more word chars
\( - a ( char
(?<parameter>[^{}%]*) - Group "parameter": zero or more chars other than {, }, %
\) - a ) char
$ - end of string.

Related

regex expected value in a postion depends on a random value in another position

I need regex to find all shortcode tag pairs that look like this [sc1-g-data]b[/sc1-g-data] but the number next to the sc can vary but they must match.
So something like this won't work \[sc(.*?)\-((.|\n)*?)\[\/sc(.*?)\- as this matches unmatching tag pairs like this which i don't want [sc1-g-data]b[/sc2-g-data]
so the expected number in the second tag depends on a random number in the first tag
You may use a regex like:
\[(sc\d*-[^\]\[]*)\]([\s\S]*?)\[\/\1\]
See the regex demo
\[ - a [ char
(sc\d*-[^\]\[]*) - Capturing group 1: sc, 0+ digits, -, and then 0+ chars other than ] and [
\] - a ] char
([\s\S]*?) - Capturing group 2: any 0+ chars, as few as possible
\[\/ - a [/ string
\1 - the same text stored in Group 1
\] - a ] char
See the regex graph:
PHP demo:
$pattern = '~\[(sc\d*-[^][]*)](.*?)\[/\1]~s';
$string = '[sc1-g-data]a[/sc1-g-data] ';
if (preg_match($pattern, $string, $matches)) {
print_r($matches);
}
Mind the use of a single quoted string literal, if you use a double quoted one you will need to use \\1, not \1 as '\1' != "\1" in PHP.
Output:
Array
(
[0] => [sc1-g-data]a[/sc1-g-data]
[1] => sc1-g-data
[2] => a
)
If your tags are just anything between brackets [blah][/blah] you can use:
\[(.*?)\].*?\[\/\1\]

Using php regex to split string

I've having trouble splitting this string into components.
The example string I have is Criminal.Minds.S10E22.WEB-DL.x264-FUM[ettv]. I'm trying to split it into the following:
Criminal Minds, 10, 22.
Though I've dabbled a bit in perl regex, the php implementation is confusing me.
I've written the following:
$word = "Criminal.Minds.S10E22.WEB-DL.x264-FUM[ettv]";
// First replace periods and dashes by spaces
$patterns = array();
$patterns[0] = '/\./';
$patterns[1] = '/-/';
$replacement = ' ';
$word = preg_replace($patterns, $replacement, $word);
print_r(preg_split('#([a-zA-Z])+\sS(\d+)E(\d+)#i', $word));
Which outputs Array ( [0] => Criminal [1] => WEB DL x264 FUM[ettv] )
Please point me in the right direction.
Use matching rather than splitting if the string is always in this format:
$word = "Criminal.Minds.S10E22.WEB-DL.x264-FUM[ettv]";
preg_match('~^(?<name>.*?)\.S(?<season>\d+)E(?<episode>\d+)~', $word, $m);
print_r($m);
See the PHP demo
Then, you can access the name, season and episode values using $m["name"], $m["season"] and $m["episode"].
Pattern details:
^ - start of string
(?<name>.*?) - a named capturing group matching any 0+ chars other than line break symbols, as few as possible, up to the first....
\.S - .S substring of literal chars
(?<season>\d+) - a "season" named capturing group matching 1+ digits
E - a literal char E
(?<episode>\d+) - an "episode" named capturing group matching 1+ digits

Is it possible to match all attributes in a preg_match with empty or missing attributes?

I'm having a little bit of an issue with pre_match.
I have a string that can come with attributes in any order (eg. [foobar a="b" c="d" f="g"] or [foobar c="d" a="b" f="g"] or [foobar f="g" a="b" c="d"] etc.)
These are the patterns I have tried:
// Matches when all searched for attributes are present
// doesn't match if one of them is missing
// http://www.phpliveregex.com/p/dHi
$pattern = '\[foobar\b(?=\s)(?=(?:(?!\]).)*\s\ba=(["|'])((?:(?!\1).)*)\1)(?=(?:(?!\]).)*\s\bc=(["'])((?:(?!\3).)*)\3)(?:(?!\]).)*]'
// Matches only when attributes are in the right order
// http://www.phpliveregex.com/p/dHj
$pattern = '\[foobar\s+a=["\'](?<a>[^"\']*)["\']\s+c=["\'](?<c>[^"\']*).*?\]'
I'm trying to figure it out, but can't seem to get it right.
Is there a way to match all the attributes, even when other ones are missing or empty (a='')?
I've even toyed with explode at the spaces between the attributes and then str_replace, but that seemed too overkill and not the right way to go about this.
In the links I've only matched for a="b" and c="d" but I also want to match these cases even if there is an e="f" or a z="x"
If you have the [...] strings as separate strings, not inside larger text, it is easy to use a \G based regex to mark a starting boundary ([some_text) and then match any key-value pair with some basic regex subpatterns using negated character classes.
Here is the regex:
(?:\[foobar\b|(?!^)\G)\s+\K(?<key>[^=]+)="(?<val>[^"]*)"(?=\s+[^=]+="|])
Here is what it matches in human words:
(?:\[foobar\b|(?!^)\G) - a leading boundary, the regex engine should find it first before proceeding, and it matches literal [foobar or the end of the previous successful match (\G matches the string start or position right after the last successful match, and since we need the latter only, the negative lookahead (?!^) excludes the beginning of the string)
\s+ - 1 or more whitespaces (they are necessary to delimit tag name with attribute values)
\K - regex operator that forces the regex engine to omit all the matched characters grabbed so far. A cool alternative to a positive lookbehind in PCRE.
(?<key>[^=]+) - Named capture group "key" matching 1 or more characters other than a =.
=" - matches a literal =" sequence
-(?<val>[^"]*) - Named capture group "val" matching 0 or more characters (due to * quantifier) other than a "
" - a literal " that is a closing delimiter for a value substring.
(?=\s+[^=]+="|]) - a positive lookahead making sure there is a next attribute or the end of the [tag xx="yy"...] entity.
PHP code:
$re = '/(?:\[foobar\b|(?!^)\G)\s+\K(?<key>[^=]+)="(?<val>[^"]*)"(?=\s+[^=]+="|])/';
$str = "[foobar a=\"b\" c=\"d\" f=\"g\"]";
preg_match_all($re, $str, $matches);
print_r(array_combine($matches["key"], $matches["val"]));
Output: [a] => b, [c] => d, [f] => g.
You could use the following function:
function toAssociativeArray($str) {
// Single key/pair extraction pattern:
$pattern = '(\w+)\s*=\s*"([^"]*)"';
$res = array();
// Valid string?
if (preg_match("/\[foobar((\s+$pattern)*)\]/", $str, $matches)) {
// Yes, extract key/value pairs:
preg_match_all("/$pattern/", $matches[1], $matches);
for ($i = 0; $i < count($matches[1]); $i += 1) {
$res[$matches[1][$i]] = $matches[2][$i];
}
};
return $res;
}
This is how you could use it:
// Some test data:
$testData = array('[foobar a="b" c="d" f="g"]',
'[foobar a="b" f="g" a="d"]',
'[foobar f="g" a="b" c="d"]',
'[foobar f="g" a="b"]',
'[foobar f="g" c="d" f="x"]');
// Properties I am interested in, with a default value:
$base = array("a" => "null", "c" => "nothing", "f" => "");
// Loop through the test data:
foreach ($testData as $str) {
// get the key/value pairs and merge with defaults:
$res = array_merge($base, toAssociativeArray($str));
// print value of the "a" property
echo "value of a is {$res['a']} <br>";
}
This script outputs:
value of a is b
value of a is d
value of a is b
value of a is b
value of a is null

Regex/PHP Replace any repeating word group

How can match
$string = "Foo Bar (Any Group - ANY GROUP Baz)";
Should return as "Foo Bar (Any Group - Baz)"
Is it possible without bruteforce as here Replace repeating strings in a string ?
Edit:
* The group could consist of 1-4 words while each word could match [A-Za-z0-9\/\(\)]{1,30}
* The separator would always be -
Leaving the space out of the list of allowed "word" characters, the following works for your example:
$result = preg_replace(
'%
( # Match and capture
(?: # the following:...
[\w/()]{1,30} # 1-30 "word" characters
[^\w/()]+ # 1 or more non-word characters
){1,4} # 1 to 4 times
) # End of capturing group 1
([ -]*) # Match any number of intervening characters (space/dash)
\1 # Match the same as the first group
%ix', # Case-insensitive, verbose regex
'\1\2', $subject);

PHP regular expressions exact data inside brackets

I have some string like Western Australia 223/5 (59.3 ov)
I would like to split this string and extract the following informations with regular expressions
$team = 'Western Australia'
$runs = 223/5
$overs = 59.3
Issue is, format of the text is varying, it may any of the follwing
Western Australia 223/5 (59.3 ov)
Australia 223/5 (59.3 ov)
KwaZulu-Natal Inland
Sri Lanka v West Indies
Any help (like is it possible to have in a single regexp) will be appreciated..
if (preg_match(
'%^ # start of string
(?P<team>.*?) # Any number of characters, as few as possible (--> team)
(?:\s+ # Try to match the following group: whitespace plus...
(?P<runs>\d+ # a string of the form number...
(?:/\d+)? # optionally followed by /number
) # (--> runs)
)? # optionally
(?:\s+ # Try to match the following group: whitespace plus...
\( # (
(?P<overs>[\d.]+) # a number (optionally decimal) (--> overs)
\s+ov\) # followed by ov)
)? # optionally
\s* # optional whitespace at the end
$ # end of string
%six',
$subject, $regs)) {
$team = $regs['team'];
$runs = $regs['runs'];
$overs = $regs['overs'];
} else {
$result = "";
}
You might need to catch an error if the matches <runs> and/or <overs> are not actually present in the string. I don't know much about PHP. (Don't know much biology...SCNR)
Assuming you are using preg_match, you can use the following:
preg_match('/^([\w\s]+)\s*(\d+\/\d+)?\s*(\(\d+\.\d+ ov\))?$/', $input, $matches);
Then, you can inspect $matches to see which one of the options you are supossed to manage was found.
See preg_match documentation for more information.

Categories