Lookahead, lookbehind condition in regular expression - php

The following example is about using lookahead assertion as a condition. I found it in the PHP manual at: http://www.php.net/manual/en/regexp.reference.conditional.php
(?(?=[^a-z]*[a-z])
\d{2}-[a-z]{3}-\d{2} | \d{2}-\d{2}-\d{2} )
Here's the description about this regex:
The condition is a positive lookahead assertion that matches an optional sequence of non-letters followed by a letter. In other words, it tests for the presence of at least one letter in the subject. If a letter is found, the subject is matched against the first alternative; otherwise it is matched against the second. This pattern matches strings in one of the two forms dd-aaa-dd or dd-dd-dd, where aaa are letters and dd are digits.
Could anyone tell me why we use lookahead assertion as the condition in this example? Why don't we use lookbehind assertion? I get confused when they're used as conditions like this because I don't know how do they match the subject string. Thanks in advance!

In this case we're using a lookahead assertion to decide which regex to use. It looks like it's deciding between matching dates of the form 01-Jan-12 and 01-01-12. The lookahead assertion sees if there are any letters within what we're trying to match and if so uses the \d{2}-[a-z]{3}-\d{2} to try and match 01-Jan-12 if not it uses \d{2}-\d{2}-\d{2} to try and match 01-01-12.

Related

why regexp doesn't match?

Below is a pattern that matches numbers. It works almost. The second line should be matched with 99 but there is no match? Why?
(?<!\d[- ]|[\d.,])\(?-?(?:(?:[1-9]\d{0,2}(?:(?:[. ]\d{3})*|\d*))|0)(?:\b|[,]\d{1,3})-?\)?(?![\d.,\/]|-[\d\/])
100,00stk => 100,00
99stk => 99 \\ this is not matched
10,45stk => 10,45
https://regex101.com/r/nwRCKo/1
The main problem here is the use of word boundary, but fixing the issue is not that evident.
The main point about the regex you have is that it matches some numbers in some specific context, and the lookarounds on both sides are meant to fail the match, so that you do not get a match at all. If you place a negative lookahead after an optional ) char, the regex engine may backtrack and you will still get this match. You need to prevent any backtracking here after removing the word boundary.
So, replace (?:\b|[,]\d{1,3}) with (?:[,]\d{1,3})? and make all the subsequent optional patterns atomic by applying the possessive quantifiers:
(?<!\d[- ]|[\d.,])\(?-?(?:(?:[1-9]\d{0,2}(?:(?:[. ]\d{3})*|\d*))|0)(?:,\d{1,3})?+-?+\)?+(?![\d.,\/]|-[\d\/])
See this regex demo.

PHP regex - match everything but not exactly one or more word

I try to find any string it not exactly one or more word
My pattern
(?!(^ignoreme$)|(^ignoreme2$))
Iam looking for
ignoreme - no
ignoreme2 - no
ignoremex - match
ignorem - match
gnoreme - match
ignoreme22 - match
But it return many space. How to do that thank.
https://regex101.com/r/u4EsNv/1
You may use this corrected regex:
^(?!ignoreme2?$).*$
Updated RegEx Demo
RegEx Details:
^: Start
(?!ignoreme2?$): Negartive lookahead to fail the match when we have ignoreme or ignoreme2 ahead till end.
.*: Match 0 more of any characters
$: End
Note that regex (?!(^ignoreme$)|(^ignoreme2$)) matches first 2 invalid cases because you have included ^ in negative lookahead expressions not outside. This causes regex engine to start matching after 1st character to satisfy lookahead assertions. (You can see that in regex101 highlighted matches)

Exclude a certain match in a capturing group in regex

I have a regex capturing group and I want to exclude a number if it matches a certain pattern also.
This is my capturing group:
https://regex101.com/r/zL1tL8/1
if \n is followed by a number and character like "1st", "2nd", "4dffgsd", "3sf" then it should stop the match BEFORE the number.
0-9 is important in the capturing group.
So far I have this pattern [0-9][a-zA-Z]+ to match a number followed by characters. How do I apply this to the capturing group as a condition?
Update:
https://regex101.com/r/zL1tL8/4
Line 1 is wrong.
It should not match a number followed by characters
You'll want to use a negative lookahead to "stop" the match if something after matches your pattern. So, something like this might work:
(\\n(?![0-9][a-zA-Z]))
See it in use here: https://regex101.com/r/zL1tL8/2
Here's a page with some more info on lookahead and lookbehind: http://www.rexegg.com/regex-lookarounds.html

Regex capturing words that have at least one lowercase letter

I'm trying to capture words in a string like:
1vTvFpU
KOoy6Cc
With regex pattern:
\b(?=(?:.*?[a-z]){1,})[A-Za-z0-9\/\-_.]{7,7}\b
But I have a problem because it also matches words like:
FDSFDFI
WEWEFDP
RRRRRRR
In a string:
FDSFDFI sdfdfdf
WEWEFDP traliii
RRRRRRR sdfdfdf
What Am I doing wrong?
I suggest you to use \S* instead of .* inside the lookahead. Because when you include .*? inside the lookahead, it checks for atleast one lower-case letter for the whole line not for the word.
\b(?=(?:\S*?[a-z]))[A-Za-z0-9\/\-_.]{7}\b
{7,7} is equal to {7}
DEMO
No need to use a lookahead to do that, character classes suffice:
[^\Wa-z]*+\w+
Then checks the string length with php (for example with array_filter).

PHP RegEx get first letter after set of characters

I have some text with heading string and set of letters.
I need to get first one-digit number after set of string characters.
Example text:
ABC105001
ABC205001
ABC305001
ABCD105001
ABCD205001
ABCD305001
My RegEx:
^(\D*)(\d{1})(?=\d*$)
Link: http://www.regexr.com/390gv
As you cans see, RegEx works ok, but it captures first groups in results also. I need to get only this integer and when I try to put ?= in first group like this: ^(?=\D*)(\d{1})(?=\d*$) , Regex doesn't work.
Any ideas?
Thanks in advance.
(?=..) is a lookahead that means followed by and checks the string on the right of the current position.
(?<=...) is a lookbehind that means preceded by and checks the string on the left of the current position.
What is interesting with these two features, is the fact that contents matched inside them are not parts of the whole match result. The only problem is that a lookbehind can't match variable length content.
A way to avoid the problem is to use the \K feature that remove all on the left from match result:
^[A-Z]+\K\d(?=\d*$)
You're trying to use a positive lookahead when really you want to use non-capturing groups.
The one match you want will work with this regex:
^(?:\D*\d{1})(\d*)$
The (?: string will start a non-capturing group. This will not come back in matches.
So, if you used preg_match(';^(?:\D*\d{1})(\d*)$;', $string, $matches) to find your match, $matches[1] would be the string for which you're looking. (This is because $matches[0] will always be the full match from preg_match.)
try:
^(?:\D*)(\d{1})(?=\d*$) // (?: is the beginning of a no capture group

Categories