regex issue with ab*a - php

a* means zero or more of a.
In the string 'abbabba' we have two occurrence of abba. (abba)bba and abb(abba).
preg_match_all matches only first occurrence.
Am i missing any basic of regex fundamental?
$string = 'abbabba';
preg_match_all("/ab*a/", $string, $matches);
print_r($matches);
Array ( [0] => Array ( [0] => abba ) )

Searches subject for all matches to the regular expression given in pattern and puts them in matches in the order specified by flags.
After the first match is found, the subsequent searches are continued on from end of the last match.
Source

Solution: Use a lookahead assertion together with a capturing group:
preg_match_all('/(?=(ab*a))/', $subject, $result, PREG_PATTERN_ORDER);
$result = $result[1];

Because ab*a consumes text. It means the parser matches the first occurrence abba and starts again from bba, not matching with your pattern.

To find number of occurrences go with:
preg_match_all('/(?=ab*a)/', $input, $result);
print(count($result[0]));
To find matches, use:
preg_match_all('/(?=(ab*a))/', $input, $result);
print_r($result[1]);

The * indicates a greedy search. When the engine sees b* it will match all the way to the end of the string and move backwards until it gets to a b, then it will check if the matched string to that point matches your pattern.
First it looks to match the a, which it does right off the bat. Then it sees b* so it matches the string all the way up until the last occurrence of b in the string after the first a (because the * indicates the b should match 'greedily'), essentially matching abb at this point. It then sees it needs to match another a to make a successful match, so it grabs the next character, which is an a, and then it's done, leaving bba remaining, which won't match your pattern. Hope this helps.
If you do what Omega said you will achieve epic victory.

Related

Capture two regular expression from string

I have strings containing this (where the number are integers representing the user id)
#[calbert](3)
#[username](684684)
I figured I need the following to get the username and user id
\((.*?)\)
and
\[(.*?)])
But is there a way to get both at once?
And PHP returns, is it possible to only get the result without the parenthesis (and brackets in the username case)
Array
(
[0] => (3)
[1] => 3
)
\[([^\]]*)\]|\(([^)]*)\)
Try this.See demo.You need to use | or operator.This provides regex engine to provide alternating capturing group if the first one fails.
https://regex101.com/r/tX2bH4/31
$re = "/\\[([^\\]]*)\\]|\\(([^)]*)\\)/im";
$str = " #[calbert](3)\n #[username](684684)";
preg_match_all($re, $str, $matches);
Or you can use ur own regex. \((.*?)\)|\[(.*?)\])
Through positive lookbehind and lookahead assertion.
(?<=\[)[^\]]*(?=])|(?<=\()\d+(?=\))
(?<=\[) Asserts that the match must be preceeded by [ character,
[^\]]* matches any char but not of ] zero or more times.
(?=]) Asserts that the match must be followed by ] symbol.
| Called logical OR operator used to combine two regexes.
DEMO

PHP RegEx get first letter after set of characters

I have some text with heading string and set of letters.
I need to get first one-digit number after set of string characters.
Example text:
ABC105001
ABC205001
ABC305001
ABCD105001
ABCD205001
ABCD305001
My RegEx:
^(\D*)(\d{1})(?=\d*$)
Link: http://www.regexr.com/390gv
As you cans see, RegEx works ok, but it captures first groups in results also. I need to get only this integer and when I try to put ?= in first group like this: ^(?=\D*)(\d{1})(?=\d*$) , Regex doesn't work.
Any ideas?
Thanks in advance.
(?=..) is a lookahead that means followed by and checks the string on the right of the current position.
(?<=...) is a lookbehind that means preceded by and checks the string on the left of the current position.
What is interesting with these two features, is the fact that contents matched inside them are not parts of the whole match result. The only problem is that a lookbehind can't match variable length content.
A way to avoid the problem is to use the \K feature that remove all on the left from match result:
^[A-Z]+\K\d(?=\d*$)
You're trying to use a positive lookahead when really you want to use non-capturing groups.
The one match you want will work with this regex:
^(?:\D*\d{1})(\d*)$
The (?: string will start a non-capturing group. This will not come back in matches.
So, if you used preg_match(';^(?:\D*\d{1})(\d*)$;', $string, $matches) to find your match, $matches[1] would be the string for which you're looking. (This is because $matches[0] will always be the full match from preg_match.)
try:
^(?:\D*)(\d{1})(?=\d*$) // (?: is the beginning of a no capture group

How do I avoid capturing the primary group of a given regex pattern?

I have a regexp pattern:
<^(([a-z]+)\:([0-9]+)\/?.*)$>
How do I avoid capturing the primary group?
<^(?:([a-z]+)\:([0-9]+)\/?.*)$>
The above pattern will still put the whole string 'localhost:8080' into the first (0) group. But I need to get only 2 matched groups, so that first (0) group is populated with 'localhost' and second (1) with '8080'.
Where did I make a mistake?
The first group, 0, will always be the entire match.
from the docs:
matches
If matches is provided, then it is filled with the results of search. $matches[0] will contain the text that matched the full pattern, $matches[1] will have the text that matched the first captured parenthesized subpattern, and so on.
if you don't care about the full match, you can use array_shift() to remove the unwanted element.
array_shift($matches);
That's just the way the regex functions work. The first group is always the entire match. You can use array_shift to get rid of it.
http://www.php.net/manual/en/function.array-shift.php
In a regex $0 is always equal to match string and not one of the groupings. Match groups will always start at $1. So look at $1 and $2 instead of $0 and $1.
If you are dealing with URLs, you can try using PEAR NetURL, or what might be better for you in this case would be parse-url()
print_r(parse_url($url));

Regular Expression get part of string

How can I get only the text inside "()"
For example from "(en) English" I want only the "en".
I've written this pattern "/\(.[a-z]+\)/i" but it also gets the "()";
Thanks in advance.
<?php
$string = '(en) English';
preg_match('#\((.*?)\)#is', $string, $matches);
echo $matches[1]; # en
?>
$matches[0] will contain entire matches string, $matches[1] will first group, in this case (.*?) between ( and ).
What is the dot in your regex good for, I assume its there by mistake.
Second to give you an alternative to the capturing group answer (which is perfectly fine!), here is to soltution using lookbehind and lookahead.
(?<=\()[a-z]+(?=\))
See it here on Regexr
The trick here is, those lookarounds do not match the characters inside, they just check if they are there. So those characters are not included in the result.
(?<=\() positive look behind assertion, checking for the character ( before its position
(?=\) positive look ahead assertion, checking for the character ( ahead of its position
That should do the job.
"/\(([a-z]+)\)/i"
The easiest way is to get "/\(([a-z]+)\)/i" and use the capture group to get what you want.
Otherwise, you have to get into look ahead, look behinds
You could use a capture group like everyone else proposes
OR
you can make your match only check if your match is preceded by "(" and followed by ")". It's called Lookahead and lookbehind.
"/(?<=\().[a-z]+(?=\))/i"

Erroneous Matches with Regular Expression

$regexp = '/(?:<input\stype="hidden"\sname="){1}([a-zA-Z0-9]*)(?:"\svalue="1"\s\/>)/';
$response = '<input type="hidden" name="7d37dddd0eb2c85b8d394ef36b35f54f" value="1" />';
preg_match($regexp, $response, $matches);
echo $matches[1]; // Outputs: 7d37dddd0eb2c85b8d394ef36b35f54f
So I'm using this regular expression to search for an authentication token on a webpage implementing Joomla in order to preform a scripted login.
I've got all this working but am wondering what is wrong with my regular expression as it always returns 2 items.
Array ( [0] => [1] => 7d37dddd0eb2c85b8d394ef36b35f54f)
Also the name of the input I'm checking for changes every page load both in length and name.
Nothing is wrong. Item [0] always contains the entire match. From the docs (emphasis mine):
If matches is provided, then it is
filled with the results of search.
$matches[0] will contain the text that
matched the full pattern, $matches[1]
will have the text that matched the
first captured parenthesized
subpattern, and so on.
Your regex (overlooking the fact that you are working on HTML with regexes in the first place, which you know you shouldn't) is a bit too complicated.
$regexp = '#<input\s+type="hidden"\s+name="([0-9a-f]*)"\s+value="1"\s*/>#i'
You don't need the non-capturing groups at all.
You use \s, which limits you to a single character. \s+ is probably better.
Using something different than / as the regex boundary makes escaping of forward slashes in the regex unnecessary.
Making the regex case-insensitive could be useful, too.
The auth token looks like a hex string, so matching a-z is unnecessary.
As per the manual entry for preg_match:
If matches is provided, then it is filled with the results of search. $matches[0] will contain the text that matched the full pattern, $matches[1] will have the text that matched the first captured parenthesized subpattern, and so on.

Categories