Regex with negative lookbehind and unknown mid-section PHP - php

When performing a preg_match in PHP I'm using the following regex:
/\/(bp|s)?\d+\//
This correctly matches the following strings:
/v/test/bp21/
Matched: /bp21/
/v/test/s21/
Matched: /s21/
/v/test/21/
Matched: /21/
I now want to stop matching if the string begins with /cp/, I thought this would be as simple as adding a negative lookbehind:
/(?<!^\/cp\/)\/(bp|s)?\d+\//
However this doesn't seem to work, instead I get the following results:
/v/test/bp21/
Matched: /bp21/
/v/test/s21/
Matched: /s21/
/v/test/21/
Matched: /21/
/cp//123123/
No match (desired effect)
/cp/test/123123/
Matched: /123123/ - undesired
I'm guessing this is because I'm not specifying anything in-between the lookbehind and the main expression, but the string could contain any number of characters after /cp/ and before /21/ for example.
Here is the example on RegExr: https://regexr.com/3npis
I've tried loads of variations of .* and lazy quantifiers but I can't seem to get it there, has anyone else successfully overcome this? Thanks!

Note the following regular expression uses a character other than / as a delimiter for the pattern in PHP. The regex101 link below uses ~ as the delimiter. This is to prevent the need to escape all / (since they're common in this regex).
See regex in use here
^(?!/cp/).*\K/(?:bp|s)?\d+/
^(?!/cp/).*/(?:bp|s)?\d+/ # If you're just doing boolean operations
^(?!/cp/).*\K/(bp|s)?\d+/ # If you want to capture bp or s
^ Assert position at the start of the line
(?!/cp/) Negative lookahead ensuring /cp/ does not follow
.* Match any character any number of times
\K Resets the starting point of the reported match. Any previously consumed characters are no longer included in the final match
/ Match this literally
(?:bp|s)? Optionally match bp or s
\d+ Match one or more digits
/ Match this literally
Results:
/v/test/bp21/ # Matches /bp21/
/v/test/s21/ # Matches /s21/
/v/test/21/ # Matches /21/
/cp//123123/ # Does not match
/cp/test/123123/ # Does not match

Related

Special Expression not allowed - Regular Expression in PHP

I am trying match my String to not allow the case: for example 150x150 from the image name below:
test-string-150x150.png
I am using the following pattern to match this String:
/^([^0-9x0-9]+)\..+/
It works fine, Except in such a case:
teststring.com-150x150.jpg
What i need to get - the mask must disallow only dimensions in the end of string, here is some examples:
test-string-150x150.png > must disallow
any-string.png > allow
200x200-test.png > allow
1x1.png-100x100.jpg > disallow
You could use a negative lookahead to assert that the string does not contain the sizes followed by a dot and 1+ word characters till the end of the string.
^(?!.*\d+x\d+\.\w+$).+$
Explanation
^ Start of string
(?! Negative lookahead, assert what is on the right is not
.* Match 0+ occurrences of any char except a newline
\d+x\d+ Match the sizes format, where \d+ means 1 or more digits
\.\w+$ Match a dot, 1+ word characters and assert the end of the string $
) Close lookahead
.+ Match 1+ occurrences of any char except a newline
$ End of string
Regex demo
If I understand your question, you're trying to find image names that do not include the image dimensions. If so, try this:
/^(?![\w-\.]+(\d+x\d+))[\w-\.]+\.\w+$/gm
For details about this code, please see regexr.com/4tmd1. This site is a great place to play around with regexes to make sure you're getting the results you expect.
Be aware that the exact syntax of the regular expression depends on the regex engine used by whatever program you're running.

Match regular expression specific character quantities in any order

I need to match a series of strings that:
Contain at least 3 numbers
0 or more letters
0 or 1 - (not more)
0 or 1 \ (not more)
These characters can be in any position in the string.
The regular expression I have so far is:
([A-Z0-9]*[0-9]{3,}[\/]?[\-]?[0-9]*[A-Z]*)
This matches the following data in the following cases. The only one that does not match is the first one:
02ABU-D9435
013DFC
1123451
03323456782
ADS7124536768
03SDFA9433/0
03SDFA9433/
03SDFA9433/1
A41B03423523
O4AGFC4430
I think perhaps I am being too prescriptive about positioning. How can I update this regex to match all possibilities?
PHP PCRE
The following would not match:
01/01/2018 [multiple / or -]
AA-AA [no numbers]
Thanks
One option could be using lookaheads to assert 3 digits, not 2 backslashes and not 2 times a hyphen.
(?<!\S)(?=(?:[^\d\s]*\d){3})(?!(?:[^\s-]*-){2})(?!(?:[^\s\\]*\\){2})[A-Z0-9/\\-]+(?!\S)
About the pattern
(?<!\S) Assert what is on the left is not a non whitespace char
(?=(?:[^\d\s]*\d){3}) Assert wat is on the right is 3 times a whitespace char or digit
(?!(?:[^\s-]*-){2}) Assert what is on the right is not 2 times a whitespace char a hyphen
(?!(?:[^\s\\]*\\){2}) Assert what is on the right is not 2 times a whitespace char a backslash
[A-Z0-9/\\-]+ Match any of the listed 1+ times
(?!\S) Assert what is on the right is not a non whitespace char
Regex demo
Your patterns can be checked with positive/negative lookaheads anchored at the start of the string:
at least 3 digits -> find (not necessarily consecutive) 3 digits
no more than 1 '-' -> assert absence of (not necessarily consecutive) 2 '-' characters
no more than 1 '/' -> assert absence of (not necessarily consecutive) 2 '/' characters
0 or more letters -> no check needed.
If these conditions are met, any content is permitted.
The regex implementing this:
^(?=(([^0-9\r\n]*\d){3}))(?!(.*-){2})(?!(.*\/){2}).*$
Check out this Regex101 demo.
Remark
This solution assumes that each string tested resides on its own line, ie. not just being separated by whitespace.
In case the strings are separated by whitespace, choose the solution of user #TheFourthBird (which essentially is the same as this one but caters for the whitespace separation)
You can test the condition for both the hyphen and the slash into a same lookahead using a capture group and a backreference:
~\A(?!.*([-/]).*\1)(?:[A-Z/-]*\d){3,}[A-Z/-]*\z~
demo
detailled:
~ # using the tild as pattern delimiter avoids to escape all slashes in the pattern
\A # start of the string
(?! .* ([-/]) .* \1 ) # negative lookahead:
# check that there's no more than one hyphen and one slash
(?: [A-Z/-]* \d ){3,} # at least 3 digits
[A-Z/-]* # eventual other characters until the end of the string
\z # end of the string.
~
To better understand (if you are not familiar with): these three subpatterns start from the same position (in this case the beginning of the string):
\A
(?! .* ([-/]) .* \1 )
(?: [A-Z/-]* \d ){3,}
This is possible only because the two first are zero-width assertions that are simple tests and don't consume any character.

PHP regex - match everything but not exactly one or more word

I try to find any string it not exactly one or more word
My pattern
(?!(^ignoreme$)|(^ignoreme2$))
Iam looking for
ignoreme - no
ignoreme2 - no
ignoremex - match
ignorem - match
gnoreme - match
ignoreme22 - match
But it return many space. How to do that thank.
https://regex101.com/r/u4EsNv/1
You may use this corrected regex:
^(?!ignoreme2?$).*$
Updated RegEx Demo
RegEx Details:
^: Start
(?!ignoreme2?$): Negartive lookahead to fail the match when we have ignoreme or ignoreme2 ahead till end.
.*: Match 0 more of any characters
$: End
Note that regex (?!(^ignoreme$)|(^ignoreme2$)) matches first 2 invalid cases because you have included ^ in negative lookahead expressions not outside. This causes regex engine to start matching after 1st character to satisfy lookahead assertions. (You can see that in regex101 highlighted matches)

regex matches numbers, but not letters

I have a string that looks like this:
[if-abc] 12345 [if-def] 67890 [/if][/if]
I have the following regex:
/\[if-([a-z0-9-]*)\]([^\[if]*?)\[\/if\]/s
This matches the inner brackets just like I want it to. However, when I replace the 67890 with text (ie. abcdef), it doesn't match it.
[if-abc] 12345 [if-def] abcdef [/if][/if]
I want to be able to match ANY characters, including line breaks, except for another opening bracket [if-.
This part doesn't work like you think it does:
[^\[if]
This will match a single character that is neither of [, i or f. Regardless of the combination. You can mimic the desired behavior using a negative lookahead though:
~\[if-([a-z0-9-]*)\]((?:(?!\[/?if).)*)\[/if\]~s
I've also included closing tags in the lookahead, as this avoid the ungreedy repetition (which is usually worse performance-wise). Plus, I've changed the delimiters, so that you don't have to escape the slash in the pattern.
So this is the interesting part ((?:(?!\[/?if).)*) explained:
( # capture the contents of the tag-pair
(?: # start a non-capturing group (the ?: are just a performance
# optimization). this group represents a single "allowed" character
(?! # negative lookahead - makes sure that the next character does not mark
# the start of either [if or [/if (the negative lookahead will cause
# the entire pattern to fail if its contents match)
\[/?if
# match [if or [/if
) # end of lookahead
. # consume/match any single character
)* # end of group - repeat 0 or more times
) # end of capturing group
Modifying a little results in:
/\[if-([a-z0-9-]+)\](.+?)(?=\[if)/s
Running it on [if-abc] 12345 [if-def] abcdef [/if][/if]
Results in a first match as: [if-abc] 12345
Your groups are: abc and 12345
And modifying even further:
/\[if-([a-z0-9-]+)\](.+?)(?=(?:\[\/?if))/s
matches both groups. Although the delimiter [/if] is not captured by either of these.
NOTE: Instead of matching the delimeters I used a lookahead ((?=)) in the regex to stop when the text ahead matches the lookahead.
Use a period to match any character.

Regex matching if maximum two occurrences of dot and dash

I need a regular expression that will match any string containing at most 2 dashes and 2 dots.
There does not HAVE to be a dash nor a dot, but if there is 3+ dashes or 3 dots or even both 3+ dashes and 3+ dots, then the regex must not match the string.
Intended for use in PHP.
I know of easy alternatives using PHP functions, but it is to be used in a large system that just allows filtering using regular expressions.
Example string that will be MATCHED:
hello-world.com
Example string that will NOT be matched:
www.hello-world.easy.com or hello-world-i-win.com
Is this matching your expectations?
(?!^.*?([.-]).*\1.*\1.*$)^.*$
See it here on Regexr
(?!^.*?([.-]).*\1.*\1.*$) is a negative lookahead. It matches the first .- put it in the capture group 1, and then checks if there are two more of them using hte backreference \1. As soon as it found three, the expression will not match anymore.
^.*$ matches everything from start to the end, if the negative lookahead has not matched.
Use this: (?!^.*?([-.])(?:.*\1){2}.*$)^.*$
This tested regex will do the trick:
$re = '/# Match string with 2 or fewer dots or dashes
^ # Anchor to start of string.
(?=[^.]*(?:\.[^.]*){0,2}$) # Assert 2 or fewer dots.
(?=[^\-]*(?:-[^\-]*){0,2}$) # Assert 2 or fewer dashes.
.* # Ok to match string.
$ # Anchor to end of string.
/sx';

Categories