I have been using a regex expression as follows. My aim is to extract a once decimal point number like 8.4 from a string. The code I have used is:
$reg = "/[0-9]+[ ]+([0-9]\.[0-9])/";
preg_match_all($reg, $buffer, $matches);
For an input like
0000001222 86257 8.4
I am getting the array $matches as:
Array
(
[0] => Array
(
[0] => 86257 8.4
)
[1] => Array
(
[0] => 8.4
)
)
Why is the pattern matched 2 times? I would like my matches array to be like:
Array
(
[0] => 8.4
)
Match #0 in a regular expression is (almost) always the entire matched string. In your case, because you're using parentheses (a capture group) you're telling the expression to capture a subset of the matched string too. Capture groups are returned starting from match #1.
If you explicitly don't want to capture the full string, consider using lookarounds instead; note that you'll need to refactor your expression a bit because lookarounds in PHP lookbehinds need to be zero-width, and thus quantifiers (the +) are not allowed.
You should check the manual (PREG_PATTERN_ORDER is the default):
Orders results so that $matches[0] is an array of full pattern matches, $matches[1] is an array of strings matched by the first parenthesized subpattern, and so on.
So in your case the result you want will be always in $matches[1].
Related
I have the string:
<mml:mi>P</mml:mi><mml:mn>2</mml:mn>
and wish to retrieve the 2
My pattern is:
/(?:<mml:)(mn|mi|mo)>(.+)(?:<\/mml:\1>)$/
the return is the 2 as it should be,
but if the string is:
<mml:mi>P</mml:mi><mml:mi>s</mml:mi>
the pattern should then return the s, from inside the second set of tags, but returns the P from inside the first set
P</mml:mi><mml:mi>s
when changing the pattern as in the suggestion below to:
/<mml:(mn|mi|mo)>(.*?)<\/mml:\1>/sU
the return is the same. The line of php is:
preg_match('/<mml:(mn|mi|mo)>(.*?)<\/mml:\1>/sU', '<mml:mi>P</mml:mi><mml:mi>s</mml:mi>', $ret, PREG_OFFSET_CAPTURE);
and $ret contains:
Array
(
[0] => Array
(
[0] => <mml:mi>P</mml:mi><mml:mi>s</mml:mi>
[1] => 0
)
[1] => Array
(
[0] => mi
[1] => 5
)
[2] => Array
(
[0] => P</mml:mi><mml:mi>s
[1] => 8
)
)
and when changed to the edited suggestion, with the ? removed
/<mml:(mn|mi|mo)>(.*)<\/mml:\1>/sU
the return is P, from the first occurrence, rather than the s from the second.
Typing from my phone, so will be brief.
Instead of matching any character (.+), match any character that is not the beginning of the next tag ([^<]+)
This way you don't have to worry about using back references, nor will you grab everything between two identical tags.
(Double check where I put the caret, this is off the top of my head. )
To get the last occurrence, wrap the whole regex in ()+
/(<mml:(mn|mi|mo)>([^<]+)<\/mml:\2>)+/
Here is an optimized pattern, which will not only run faster than Tim's, preg_match() will return less elements in the output array:
~<m{2}l:(m[ino])>\K[^<](?=</m{2}l:\1>$)~
Pattern Demo
Enhancements:
Replace standard pattern delimiter slash / with ~ to avoid escaping for improved brevity.
Use quantifiers for consecutive characters for improved efficiency. {2}
Use character class instead of pipes for improved efficiency and brevity. m[ino]
Use \K to start the fullstring match from middle of pattern, effectively removing the need for an extra capture group for improved efficiency.
Use negated character class to match desired character [^<] *note, if your desired substring is more than one character use: [^<]+
Use positive lookahead to accurately match closing tag followed by end of line anchor $.
PHP Implementation: (Demo)
echo preg_match('~<m{2}l:(m[ino])>\K[^<](?=</m{2}l:\1>$)~','<mml:mi>P</mml:mi><mml:mi>s</mml:mi>',$out)?$out[0]:'fail';
Output:
s
I've used preg_match() to check a string coming from an XML feed (ie: $resp = simplexml_load_file($API);) which returns upwards of 1000 items and with preg_match I've extracted a bit of data from each item which is stored in $matches but I don't know how to make use of what preg_match has stored in $matches
Here's what I've got and what I've tried.
Note: I have print_r($matches); just so I could see the results while modifying the preg pattern.
$matches;
preg_match('/(?<=\s|^)[a-zA-Z]{5,19} ?-?\d\d\d\d\d\d\d\d?*(?=\s|$)/', $Apples, $matches);
print_r($matches);
/*Note: $matches returns an array as such: Array ( [0] => Stringdata ) Array ( [0] => moreStringdata ) Array ( [0] => stillmoreStringData ) Array ( [0] => evenmoreStringData ) Array ( [0] => moreStringDataStill )... and I'm just wanting to use array[0] from each in the $results string which is output to the screen */
$results.= "<div class='MyClass'><img src=\"$linktopicture\">$matches</div>";
I Also tried $matches(), $matches[] and $matches[0] in the $results string but nothing works and since I don't know much about using arrays I thought I'd ask so if anyone wouldn't mind setting me straight with what is probably very elementrary I'd be most appreciative and I thank you all in advance.
Make sure to read the preg_match doc page to understand the way the function works.
Firstly, check to see whether preg_match returns 1 (which means the value in $Apples does match the pattern) or 0 (which means $Apples does not match the pattern) or FALSE (which means an error occurred).
Assuming 1 is returned, then $matches[0] will contain the entire portion of the $Apples string which matches the pattern. If you have capture groups then the portion of that match which falls within the first capture group will be found in $matches[1], second in $matches[2] and so on.
If you can't share your regex pattern it's not possible to see whether your pattern contains any capture groups, so let's use this example:
preg_match("/key:([A-Z]+);value:([0-9]+)/", "key:ERRORCODE;value:500", $matches);
Now $matches[0] should contain "key:ERRORCODE;value:500" because the entire string matches the pattern, and $matches[1] should contain "ERRORCODE", and $matches[2] should contain "500", because these portions fit the patterns in the capture groups of the full pattern.
i try to build a small system that catches variables from html template.
the variables defined as #XXX# and may (but not required) to have extra parameters that are sent with colon (:), i.e. #XXX#:j to send the data as json encoded.
what i manage to do is to create a preg_match_all to capture the variables and those extra parameters. so i came up with this preg:
preg_match_all("/(#.*#(?:(j|n|x|z))?)/imU", $string,$this->localVariables,PREG_PATTERN_ORDER);
j|n|x|z are the avialiable extra parameters that are possible.
the string that i send to $string is:
#geterr# #domain#:j #jhon#:n
the result i get from preg_match_all is:
Array
(
[0] => Array
(
[0] => #geterr#
[1] => #domain#
[2] => #jhon#
)
[1] => Array
(
[0] => #geterr#
[1] => #domain#
[2] => #jhon#
)
[2] => Array
(
[0] =>
[1] =>
[2] =>
)
)
i know (or think i know) that ?: is used for optional sub pattern
the modifires iv'e used are:
i for case insensitive
m for to allow my string to be multiline
U - to be non greedy
i have no other clue what i do wrong.
any help shall be greatly appriciated
There are some issues in your pattern /(#.*#(?:(j|n|x|z))?)/imU
You don't need a capturing group around the whole pattern.
?: is creating a non capturing group, it is not limited to optional groups.
The modifier m is called multiline, but this is a bit misleading it affects only the anchors ^ and $ to match also the start and end of a row and not only of the string.
What you want is the modifier s, the singleline modifier. It treats the whole string as one line, and affects the . to match also newline characters.
The modifier U makes your whole regex ungreedy. This is not what you want, because it affects also your optional group and because it is at the end of the pattern it will never match.
You need to match the : in your string
So I would remove U and make only the first quantifier ungreedy, by adding a ? after it.
So I think your regex should be:
/#(.*?)#(?::(j|n|x|z))?/is
This would put the first part between the # in the first capturing group and the parameter in the second group.
See it here on Regexr
I am trying to parse a integer from a list of uris like this:
uri.com/upload/123456789_abc.ext
I am using this pattern:
preg_match( "#uri\.com\/upload\/(.*?)_#is", $uri, $match );
Which works and returns:
Array
(
[0] => uri.com/upload/123456789_
[1] => 123456789
)
But I was wondering if there's a way to make $match == "123456789" intead of returning an array with multiple values.
Is it possible to do it by only modifying the pattern?
It will always return an array, but you can change the pattern, so that it only matches what you want.
$uri = "uri.com/upload/123456789_abc.ext";
preg_match('#(?<=uri\.com/upload/)\d+#is', $uri, $match );
print_r($match);
returns
Array ( [0] => 123456789 )
so it is still an array, but it does only contain the whole match, that is your number.
(?<=uri\\.com/upload/) is a lookbehind, it does not match that part, so it is not part of the result.
\d+ is only matching digits, so the _ is not needed anymore.
Not in php. In perl, on the other hand, you do have automatic variables, like $1, $2 that refer to the last matching regex. That is, in your example, after performing the match, $1 variable would hold the integer.
So the idea is generally ok. You would love perl.. :-)
Suppose I have '/srv/www/site.com/htdocs/system/application/views/' and want to test it against a regexp that matches each directory name in the path?
Something like this pattern: '(/[^/])'
That yields an array with 'srv','www','site.com'... etc.
PS: the regexp syntax I wrote is just to illustrate, it's not tested and surely wrong, but just to give an idea.
PS2: I know there's explode() but let's see if we can do this with a regexp (it's useful for other languages and frameworks which don't have explode).
preg_match_all:
$str = '/srv/www/site.com/htdocs/system/application/views/';
preg_match_all('/\/([^\/]+)/', $str, $matches);
// $matches[0] contains matching strings
// $matches[1] contains first subgroup matches
print_r($matches[1]);
Output:
Array
(
[0] => srv
[1] => www
[2] => site.com
[3] => htdocs
[4] => system
[5] => application
[6] => views
)
There is preg_split for splitting files on regular expressions similar to explode, and then there is preg_match_all which does what you want.
I don't think you can, but you could instead use preg_match_all() to get multiple matches from a regular expression. There is also preg_split() which may be more appropriate.