preg_match fails when brackets are added - php

the following is a simplification of a regex i am using. on my development machine both $pattern1 and $pattern2 return a match, however on my production machine only $pattern1 returns a match! clearly the only difference between $pattern1 and $pattern2 is that one of them has brackets around a word. however both are valid patterns which should match the given haystack (as far as i know).
$pattern1 = '/\<a name="ERROR TEXT"\>\<\/a\>\s*?validated\s*?\<\/span\>\s*?\<\/h1\>/';
$pattern2 = '/\<a name="ERROR TEXT"\>\<\/a\>\s*?(validated)\s*?\<\/span\>\s*?\<\/h1\>/';
$haystack = '- IFCS msg value, BOOKMARKED AS ERROR TEXT -->
<a name="ERROR TEXT"></a>
validated</span>
</h1>
<!-- START: .formActionHolder -->
<div class="formActionHolder">';
preg_match($pattern1, $haystack, $matches);
print_r($matches);
has anyone found this problem before? note that this is not the whole of the regex - this is a simplified version which i have identified as being the problem. in my actual code, the value of 'validated' is not a constant - hence my reason for using brackets to capture the word. of course the patterns have other characters within the parenthesis as well so that i can capture the variable words here. this is just a simplified example which hones in on the problem that i am having with two seemingly fine regexes.
on my development machine i am using php5.3.2 with the pcre 7.8 library and on my production machine i am using php5.2.4 with pcre 7.4.

Parenthesis are used for grouping in a php regex and act as such unless you escape them to make them act as the characters themselves.

are you sure the $pattern2 coundn't match? In my eclipse, it match, show
Array ( [0] => validated
[1] => validated )

i had a thought about the ?( combination in $pattern2 so i removed the ? to make
$pattern = '/\<a name="ERROR TEXT"\>\<\/a\>\s*(validated)\s*?\<\/span\>\s*?\<\/h1\>/';
and that works!! its very strange - possibly even a bug?
so it looks like the ?(validated) bit was being interpreted as a conditional subpattern rather than the question mark being used to make the \s* ungreedy
that doesn't look like correct behavior to me.
ah well...its a bit of a pain since now my * will be greedy. the regex pattern does what i want in this instance though...
thanks for all your helpful comments!

Related

Weird PHP Regex Preg_Match Bug?

My PHP version is PHP 7.2.24-0ubuntu0.18.04.7 (cli). However it looks like this problem occurs with all versions I've tested.
I've encountered a very weird bug when using preg_match. Anyone know a fix?
The first section of code here works, the second one doesn't. But the regex itself is valid. For some reason the something_happened word is causing it to fail.
$one = ' (branch|leaf)';
echo "ONE:\n";
preg_match('/(?:\( ?)?((?:(?:\**\[(?:!?\d+)?\])*\**[A-Za-z_]\w*)+(?: ?\| ?(?:\**\[(?:!?\d+)?\])*\**[A-Za-z_]\w*)+)(?: ?\))?/', $one, $matches, PREG_OFFSET_CAPTURE);
print_r($matches); // this works
$two = 'something_happened (branch|leaf)';
echo "\nTWO:\n";
preg_match('/(?:\( ?)?((?:(?:\**\[(?:!?\d+)?\])*\**[A-Za-z_]\w*)+(?: ?\| ?(?:\**\[(?:!?\d+)?\])*\**[A-Za-z_]\w*)+)(?: ?\))?/', $two, $matches2, PREG_OFFSET_CAPTURE);
print_r($matches2); // this doesn't work
It seems somehow related to the word something_happened. If I change this word it works.
The regex is matching 2 or more type names separated by | that may or may not be surrounded in (), and each type name may or may not be preceded by any number of [] (or [some number] or [!some number]) and *.
Try it and see for yourself! Please let me know if you know how to fix it!
The problem lies in the (?:(?:\**\[(?:!?\d+)?\])*\**[A-Za-z_]\w*)+ group: the + quantifier quantifies a group with many subsequent optional patterns, and that creates too many options to match a string before the subsequent patterns.
In PHP, you can workaround the problem by using either
Possessive quantifier:
'/(?:\(\ ?)?((?:(?:\**\[(?:!?\d+)?\])*\**[A-Za-z_]\w*)++(?:\ ?\|\ ?(?:\**\[(?:!?\d+)?\])*\**[A-Za-z_]\w*)+)(?:\ ?\))?/'
Note the ++ at the end of the group mentioned.
2. Atomic group:
'/(?:\(\ ?)?((?>(?:\**\[(?:!?\d+)?\])*\**[A-Za-z_]\w*)+(?:\ ?\|\ ?(?:\**\[(?:!?\d+)?\])*\**[A-Za-z_]\w*)+)(?:\ ?\))?/'
See this regex demo. Note the (?>...) syntax.
Also, note how the regex is formatted here, it is very convenient to use the x (extended) flag to break the regex into several lines, format it, so that it could be easier to track down the issue. It is required to escape all literal whitespace and # chars, but it is a minor inconvenience when it comes to debugging long patterns like this.

Correct regex for this pattern

I've got some issues understanding this regex.
I tried doing a pattern but does not work like intended.
What I want is [A-Za-z]{2,3}[0-9]{2,30}
That is 2-3 letters in the beginning and 2-30 numbers after that
FA1321321
BFA18098097
I want to use it to validate an input field but can't figure out how the regex should look like.
Can any one that can help me out even explain a bit about it?
Your regex is correct - just make sure to surround it with / in PHP, and perhaps ^, $ if you want it to strictly match the entire string (no extra characters before/after).
$pattern = "/^[A-Za-z]{2,3}[0-9]{2,30}$/"
$found = preg_match($pattern, $your_str);
From the PHP documentation:
preg_match() returns 1 if the pattern matches given subject, 0 if it does not, or FALSE if an error occurred.

How to get a number from a html source page?

I'm trying to retrieve the followed by count on my instagram page. I can't seem to get the Regex right and would very much appreciate some help.
Here's what I'm looking for:
y":{"count":
That's the beginning of the string, and I want the 4 numbers after that.
$string = preg_replace("{y"\"count":([0-9]+)\}","",$code);
Someone suggested this ^ but I can't get the formatting right...
You haven't posted your strings so it is a guess to what the regex should be... so I'll answer on why your codes fail.
preg_replace('"followed_by":{"count":\d')
This is very far from the correct preg_replace usage. You need to give it the replacement string and the string to search on. See http://php.net/manual/en/function.preg-replace.php
Your second usage:
$string = preg_replace(/^y":{"count[0-9]/","",$code);
Is closer but preg_replace is global so this is searching your whole file (or it would if not for the anchor) and will replace the found value with nothing. What your really want (I think) is to use preg_match.
$string = preg_match('/y":\{"count(\d{4})/"', $code, $match);
$counted = $match[1];
This presumes your regex was kind of correct already.
Per your update:
Demo: https://regex101.com/r/aR2iU2/1
$code = 'y":{"count:1234';
$string = preg_match('/y":\{"count:(\d{4})/', $code, $match);
$counted = $match[1];
echo $counted;
PHP Demo: https://eval.in/489436
I removed the ^ which requires the regex starts at the start of your string, escaped the { and made the\d be 4 characters long. The () is a capture group and stores whatever is found inside of it, in this case the 4 numbers.
Also if this isn't just for learning you should be prepared for this to stop working at some point as the service provider may change the format. The API is a safer route to go.
This regexp should capture value you're looking for in the first group:
\{"count":([0-9]+)\}
Use it with preg_match_all function to easily capture what you want into array (you're using preg_replace which isn't for retrieving data but for... well replacing it).
Your regexp isn't working because you didn't escaped curly brackets. And also you didn't put count quantifier (plus sign in my example) so it would only capture first digit anyway.

Getting String in Text

I can't seem to figure out exactly how to get a certain portion of text I need out of a larger block. I need to be able to grab a string between a : and a ( in a regular block of text while parsing an email. An example is below. I know a bit about preg_match and I figured that was the answer, but can't seem to get that to work either. Any help would be appreciated as searches of here and Google have turned up nothing.
For GC3J26P: Wise Lake II (Traditional)
I need just the text between the : and the beginning parentheses. Thanks for any help.
Since you say you've tried and not managed to get anything going, try this:
$str = "For GC3J26P: Wise Lake II (Traditional)";
preg_match('/(?<=:).*(?=\()/', $str, $str);
if ($str) echo $str[0];
If you're new to the murky - yet beautiful - world of REGEX, allow me to explain what's happening here.
It's all about the pattern. Our pattern defines what is and is not acceptable in what we match - i.e. capture.
More than that, it even states what should immediately precede and procede our match. These are called look-behind and look-ahead assertions, respectively. These assertions are anchors for our matching points - they do not contribute to the captured match itself.
So our pattern translates as:
1) begin the match after a colon (but do not include the colon in the match)
2) then allow and capture any character (barring line breaks and certain other spacial characters), zero or more times
3) match up to (but not including) an opening bracket
Our pattern is what's called greedy. In our case, this means that, should the sub-string you wish to match itself contain a colon or bracket, this will be no problem and won't break things. As long as there is a valid match available starting from SOME colon, and up to SOME opening bracket, all's fine. (Note: greedy behaviour can be modified, if required).
There's far, far more to REGEX and you either love it hate it. If you're interested, I suggest reading up on it. It's very satisfying once you get into it.
And with that, it's bed time.
You can try this:
$string = 'For GC3J26P: Wise Lake II (Traditional)';
preg_match('/\:(.*?)\(/', $string, $matches);
echo $matches[0]; // : Wise Lake II (
echo $matches[1]; // Wise Lake II
Here it is:
I used a 'named sub-pattern' (named it "name", but could be called almost anything)...
$str="For GC3J26P: Wise Lake II (Traditional)";
preg_match('/:(?P<name>[\S\s]+)\(/', $str, $matches);
echo $matches["name"];

Need to negate this regex pattern, but no clue how

I found a regex pattern for PHP that does the exact OPPOSITE of what I'm needing, and I'm wondering how I can reverse it?
Let's say I have the following text: Item_154 ($12)
This pattern /\((.*?)\)/ gets what's inside the parenthesis, but I need to get "Item_154" and cut out what's in parenthesis and the space before the parenthesis.
Anybody know how I can do that?
Regex is above my head apparently...
/^([^( ]*)/
Match everything from the start of the string until the first space or (.
If the item you need to match can have spaces in it, and you only want to get rid of whitespace immediately before the parenthetical, then you can use this instead:
/^([^(]*?)\s*\(/
The following will match anything that looks like text (...) but returns just the text part in the match.
\w+(?=\s*\([^)]*\))
Explanation:
The \w includes alphanumeric and underscore, with + saying match one or more.
The (?= ) group is positive lookahead, saying "confirm this exists but don't match it".
Then we have \s for whitespace, and * saying zero or more.
The \( and \) matches literal ( and ) characters (since its normally a special chat).
The [^)] is anything non-) character, and again * is zero or more.
Hopefully all makes sense?
/(.*)\(.*\)/
What is not in () will now be your 1st match :)
One site that really helped me was http://gskinner.com/RegExr/
It'll let you build a regex and then paste in some sample targets/text to test it against, highlighting matches. All of the possible regex components are listed on the right with (essentially) a tooltip describing the function.
<?php
$string = 'Item_154 ($12)';
$pattern = '/(.*)\(.*?\)/';
preg_match($pattern, $string, $matches);
var_dump($matches[1]);
?>
Should get you Item_154
The following regex works for your string as a replacement if that helps? :-
\s*\(.*?\)
Here's an explanation of what's it doing...
Whitespace, any number of repetitions - \s*
Literal - \(
Any character, any number of repetitions, as few as possible - .*?
Literal - \)
I've found Expresso (http://www.ultrapico.com/) is the best way of learning/working out regular expressions.
HTH
Here is a one-shot to do the whole thing
$text = 'Item_154 ($12)';
$text = preg_replace('/([^\s]*)\s(\()[^)]*(\))/', $1$2$3, $text);
var_dump($text);
//Outputs: Item_154()
Keep in mind that using any PCRE functions involves a fair amount of overhead, so if you are using something like this in a long loop and the text is simple, you could probably do something like this with substr/strpos and then concat the parens on to the end since you know that they should be empty anyway.
That said, if you are looking to learn REGEXs and be productive with them, I would suggest checking out: http://rexv.org
I've found the PCRE tool there to very useful, though it can be quirky in certain ways. In particular, any examples that you work with there should only use single quotes if possible, as it doesn't work with double quotes correctly.
Also, to really get a grip on how to use regexs, I would check out Mastering Regular Expressions by Jeffrey Friedl ISBN-13:978-0596528126
Since you are using PHP, I would try to get the 3rd Edition since it has a section specifically on PHP PCRE. Just make sure to read the first 6 chapters first since they give you the foundation needed to work with the material in that particular chapter. If you see the 2nd Edition on the cheap somewhere, that pretty much the same core material, so it would be a good buy as well.

Categories