Getting multiple subpatterns with the same name - php

Regarding my previous post I'm trying to match with regular expressions all use statements in a class file.
<?php
use Vendor\ProjectArticle\Model\Peer,
Vendor\Library\Template;
use Vendor\Blablabla;
$file = file_get_contents($class_path);
$a = preg_match_all('#use (?:(?<ns>[^,;]+),?)+;#mi', $file, $use);
var_dump(array('$a' => $a, '$use' => $use));
Unfortunately I'm not blessed with all namespaces used in case of multiple class names in one use statement. Only last one matched is being stored.
Array
(
[$a] => 2
[$use] => Array
(
[0] => Array
(
[0] => use Vendor\ProjectArticle\Model\Peer,
Vendor\Library\Template;
[1] => use Vendor\Blablabla;
)
[ns] => Array
(
[0] =>
Vendor\Library\Template
[1] => Vendor\Blablabla
)
[1] => Array
(
[0] =>
Vendor\Library\Template
[1] => Vendor\Blablabla
)
)
)
Can this be accomplished with some pattern modifier or something?
~Thanks

Should be able to use the \G anchor for this.
# '~(?:(?!\A)\G|^Use\s+),?\s*(?<ns>[^,;]+)(?=(?:,|[^,;]*)*;)~mi'
(?xmi-) # Inline modifier = expanded, multiline, case insensitive
(?:
(?! \A ) # Not beginning of string
\G # If matched before, start at end of last match
| # or,
^ Use \s+ # Beginning of line then 'Use' + whitespace
)
,? \s* # Whitespace trim
(?<ns> [^,;]+ ) # (1), A namespace value
(?= # Lookahead, each match validates a final ';'
(?: , | [^,;]* )*
;
)
Output:
** Grp 0 - ( pos 0 , len 36 )
use Vendor\ProjectArticle\Model\Peer
** Grp 1 - ( pos 4 , len 32 )
Vendor\ProjectArticle\Model\Peer
---------------------
** Grp 0 - ( pos 36 , len 30 )
,
Vendor\Library\Template
** Grp 1 - ( pos 43 , len 23 )
Vendor\Library\Template
---------------------
** Grp 0 - ( pos 69 , len 20 )
use Vendor\Blablabla
** Grp 1 - ( pos 73 , len 16 )
Vendor\Blablabla

Related

Time String to Seconds

How can i parse strings with regex to calculate the total seconds?
The strings will be in example:
40s
11m1s
1h47m3s
I started with the following regex
((\d+)h)((\d+)m)((\d+)s)
But this regex will only match the last example.
How can i make the parts optional?
Is there a better regex?
The format that you are using is very similar to the one that is used by java.time.Duration:
https://docs.oracle.com/javase/8/docs/api/java/time/Duration.html#parse-java.lang.CharSequence-
Maybe you can use it instead of writing something custom?
Duration uses a format like this:
P1H47M3S
Maybe you can add the leading "P", and parse it (not sure if you have to uppercase)?
The format is called "ISO-8601":
https://en.wikipedia.org/wiki/ISO_8601
For example,
$set = array(
'40s',
'11m1s',
'1h47m3s'
);
$date = new DateTime();
$date2 = new DateTime();
foreach ($set as $value) {
$date2->add(new DateInterval('PT'.strtoupper($value)));
}
echo $date2->getTimestamp() - $date->getTimestamp(); // 7124 = 1hour 58mins 44secs.
You could use optional non-capture groups, for each (\dh, \dm, \ds):
$strs = ['40s', '11m1s', '1h47m3s'];
foreach ($strs as $str) {
if (preg_match('~(?:(\d+)h)?(?:(\d+)m)?(?:(\d+)s)?~', $str, $matches)) {
print_r($matches);
}
}
Outputs:
Array
(
[0] => 40s
[1] => // h
[2] => // m
[3] => 40 // s
)
Array
(
[0] => 11m1s
[1] => // h
[2] => 11 // m
[3] => 1 // s
)
Array
(
[0] => 1h47m3s
[1] => 1 // h
[2] => 47 // m
[3] => 3 // s
)
Regex:
(?: # non-capture group 1
( # capture group 1
\d+ # 1 or more number
) # end capture group1
h # letter 'h'
) # end non-capture group 1
? # optional
(?: # non-capture group 2
( # capture group 2
\d+ # 1 or more number
) # end capture group1
m # letter 'm'
) # end non-capture group 2
? # optional
(?: # non-capture group 3
( # capture group 3
\d+ # 1 or more number
) # end capture group1
s # letter 's'
) # end non-capture group 3
? # optional
This expression:
/(\d*?)s|(\d*?)m(\d*?)s|(\d*?)h(\d*?)m(\d*?)s/gm
returns 3 matches, one for each line. Each match is separated into the salient groups of only numbers.
The gist is that this will match either any number of digits before an 's' or that plus any number of digits before an 'm' or that plus any number of digits before an 'h'.

regexp monetary strings with decimals and thousands separator

https://www.tehplayground.com/KWmxySzbC9VoDvP9
Why is the first string matched?
$list = [
'3928.3939392', // Should not be matched
'4.239,99',
'39',
'3929',
'2993.39',
'393993.999'
];
foreach($list as $str){
preg_match('/^(?<![\d.,])-?\d{1,3}(?:[,. ]?\d{3})*(?:[^.,%]|[.,]\d{1,2})-?(?![\d.,%]|(?: %))$/', $str, $matches);
print_r($matches);
}
output
Array
(
[0] => 3928.3939392
)
Array
(
[0] => 4.239,99
)
Array
(
[0] => 39
)
Array
(
[0] => 3929
)
Array
(
[0] => 2993.39
)
Array
(
)
You seem to want to match the numbers as standalone strings, and thus, you do not need the lookarounds, you only need to use anchors.
You may use
^-?(?:\d{1,3}(?:[,. ]\d{3})*|\d*)(?:[.,]\d{1,2})?$
See the regex demo
Details
^ - start of string
-? - an optional -
(?: - start of a non-capturing alternation group:
\d{1,3}(?:[,. ]\d{3})* - 1 to 3 digits, followed with 0+ sequences of ,, . or space and then 3 digits
| - or
\d* - 0+ digits
) - end of the group
(?:[.,]\d{1,2})? - an optional sequence of . or , followed with 1 or 2 digits
$ - end of string.

Matching math between < > symbols

I'm trying to build a function that matches the math expression between two greater (or equal) or smaller (or equal) symbols.
I have the following preg_match function:
preg_match("/(<=?|>=?)(([0-9]|\+|\(|\))+)(<=?|>=?)/", "2<(2+2)<8", $matches);
When I read the $matches array I get:
Array
(
[0] => <(2+2)<
[1] => <
[2] => (2+2)
[3] => )
[4] => <
)
Can anyone explain why the closing ) gets matched as part of the (2+2) and on it's own? I would like it to only match the whole (2+2).
Because you've got two capturing groups for the expression between comparison signs:
(<=?|>=?)(([0-9]|\+|\(|\))+)(<=?|>=?)
^^ ^ ^
|`----- $3 -----' |
`------- $2 ------'
Change it to
(<=?|>=?)((?:[0-9]|\+|\(|\))+)(<=?|>=?)
^^
Because you have a quantified capture group (...)+
Each pass through the capture group, resets the capture group to empty.
The result is you only see the last capture.
You can see it below as 3 start/end.
( <=? | >=? ) # (1)
( # (2 start)
( # (3 start)
[0-9]
| \+
| \(
| \)
)+ # (3 end)
) # (2 end)
( <=? | >=? ) # (4)
The individual pieces are of no use in this case,
changing it to a cluster group will exclude it from the output array.
( <=? | >=? ) # (1)
( # (2 start)
(?:
[0-9]
| \+
| \(
| \)
)+
) # (2 end)
( <=? | >=? ) # (3)
Output
** Grp 0 - ( pos 0 , len 7 )
<(2+2)<
** Grp 1 - ( pos 0 , len 1 )
<
** Grp 2 - ( pos 1 , len 5 )
(2+2)
** Grp 3 - ( pos 6 , len 1 )
<

Parsing parameter from string | regex | php

I am having problems parsing parameter from a string.
Parameter are defined by the following:
can be written in short or long notation, p.ex:
-a / --long
characters range from [a-z0-9] for short and [a-z0-9\-] for long notation, p.ex:
--long-with-dash
can have a value, but don't have to, p.ex:
-a test / --aaaa
can have multiple arguments, without being in quotes, p.ex:
-a val1 val2
(that should be captures as one group: value = "val1 val2")
can have custom text inside quotes
--custom "here can stand everything, --test test :( "
parameter can have a "!" infront
! --test test / ! -a
values can have "-" inside
-a value-with-dash
All these Parameters come in one long string, p.ex:
-a val1 ! -b val2 --other "string with crazy -a --test stuff inside" --param-with-dash val1 val2 -test value-with-dash ! -c -d ! --test
-- EDIT ----
also --param value-with-dash
-- END EDIT ---
This is as close as i can get:
https://regex101.com/r/3aPHzp/1
/(?:(?P<inverted>\!) )?(?P<names>\-{1,2}\S+)($| (?P<values>.+(?=(?: [\!|\-])|$)))/U
unfortunatly it breaks when it comes to the free text value inside quotes. And when a parameter without value is followed by the next parameter.
(i try to parse the output of iptables-save, in case you are interessted. Also, maybe i split can split the string in an other fancy way before, to avoid a hugh regex, but i don't see it).
Thank you very much for your help!
-- FINAL SOLUTION --
for PHP >= 5.6
(?<inverted>!)?\s*(?<name>--?\w[\w-]*)\s*(?<values>(?:\s*(?:\w\S*|["'](?:[^"'\\]*(?:\\.[^"'\\]*)*)['"]))*)\K
Demo: https://regex101.com/r/xSfgxP/1
for PHP < 5.6
(?<inverted>\!)?\s*(?<=(?:\s)|^)(?<name>\-{1,2}\w[\w\-]*)\s+(?<value>(?:\s*(?:\w\S*|["'](?:[^"'\\]*(?:\\.[^"'\\]*)*)['"]))*)
RegEx:
(?<inverted>!)?\s*(?<name>--?\w[\w-]*)\s*(?<values>(?:\s*(?:\w\S+|["'](?:[^"'\\]*(?:\\.[^"'\\]*)*)['"]))*)\K
Live demo (updated)
Breakdown
(?<inverted> ! )? # (1) Named-capturing group for inverted result
\s* # Match any spaces
(?<name> --? \w [\w-]* ) # (2) Named-capturing group for parameter name
\s* # Match any spaces
(?<values> # (3 start) Named capturing group for values
(?: # Beginning of a non-capturing group (a)
\s* # Match any spaces
(?: # Beginning of a non-capturing group (b)
\w\S+ # Match a [a-zA-Z0-9_] character then any non-whitespace characters
| # Or
["'] # Match a qoutation mark
(?: # Beginning of a non-capturing group (c)
[^"'\\]* # Match anything except `"`, `'` or `\`
(?: \\ . [^"'\\]* )* # Match an escaped character then anyhthing except `"`, `'` or `\` as much as possible
) # End of non-capturing group (c)
['"] # Match qutation pair
) # End of non-capturing group (b)
)* # Greedy (a), end of non-capturing group (a)
) # (3 end)
\K # Reset allocated memory of all previously matched characters
PHP code:
<?php
$str = '-a val1 ! -b val2 --custom "string :)(#with crazy -a --test stuff inside" --param-with-dash val1 val2 -c ! -d ! --test';
$re = <<< 'RE'
~(?<inverted>!)?\s*(?<name>--?\w[\w-]*)\s*(?<values>(?:\s*(?:\w\S+|["'](?:[^"'\\]*(?:\\.[^"'\\]*)*)['"]))*)\K~
RE;
preg_match_all($re, $str, $matches, PREG_SET_ORDER);
print_r(array_map('array_filter', $matches));
Output:
Array
(
[0] => Array
(
[name] => -a
[2] => -a
[values] => val1
[3] => val1
)
[1] => Array
(
[inverted] => !
[1] => !
[name] => -b
[2] => -b
[values] => val2
[3] => val2
)
[2] => Array
(
[name] => --custom
[2] => --custom
[values] => "string :)(#with crazy -a --test stuff inside"
[3] => "string :)(#with crazy -a --test stuff inside"
)
[3] => Array
(
[name] => --param-with-dash
[2] => --param-with-dash
[values] => val1 val2
[3] => val1 val2
)
[4] => Array
(
[name] => -c
[2] => -c
)
[5] => Array
(
[inverted] => !
[1] => !
[name] => -d
[2] => -d
)
[6] => Array
(
[inverted] => !
[1] => !
[name] => --test
[2] => --test
)
)

Regexp tip request

I have a string like
"first,second[,b],third[a,b[1,2,3]],fourth[a[1,2]],sixth"
I want to explode it to array
Array (
0 => "first",
1 => "second[,b]",
2 => "third[a,b[1,2,3]]",
3 => "fourth[a[1,2]]",
4 => "sixth"
}
I tried to remove brackets:
preg_replace("/[ ( (?>[^[]]+) | (?R) )* ]/xis",
"",
"first,second[,b],third[a,b[1,2,3]],fourth[a[1,2]],sixth"
);
But got stuck one the next step
PHP's regex flavor supports recursive patterns, so something like this would work:
$text = "first,second[,b],third[a,b[1,2,3]],fourth[a[1,2]],sixth";
preg_match_all('/[^,\[\]]+(\[([^\[\]]|(?1))*])?/', $text, $matches);
print_r($matches[0]);
which will print:
Array
(
[0] => first
[1] => second[,b]
[2] => third[a,b[1,2,3]]
[3] => fourth[a[1,2]]
[4] => sixth
)
The key here is not to split, but match.
Whether you want to add such a cryptic regex to your code base, is up to you :)
EDIT
I just realized that my suggestion above will not match entries starting with [. To do that, do it like this:
$text = "first,second[,b],third[a,b[1,2,3]],fourth[a[1,2]],sixth,[s,[,e,[,v,],e,],n]";
preg_match_all("/
( # start match group 1
[^,\[\]] # any char other than a comma or square bracket
| # OR
\[ # an opening square bracket
( # start match group 2
[^\[\]] # any char other than a square bracket
| # OR
(?R) # recursively match the entire pattern
)* # end match group 2, and repeat it zero or more times
] # an closing square bracket
)+ # end match group 1, and repeat it once or more times
/x",
$text,
$matches
);
print_r($matches[0]);
which prints:
Array
(
[0] => first
[1] => second[,b]
[2] => third[a,b[1,2,3]]
[3] => fourth[a[1,2]]
[4] => sixth
[5] => [s,[,e,[,v,],e,],n]
)

Categories