Building a complex regex with "conditions" - php

I'm trying to build a complex regex with the following constraints:
1. My string can only be composed of:
"Regular" alphanumeric characters : a-zA-Z0-9
4 specials characters : space . _ -
2. Length has to be between 3 and 25
So far it's quite easy but then it gets complicated :
3. There cannot be 2 consecutive special characters, unless the 1st one is a space and the 2nd one isn't a space. Logical consequence : there cannot be 3 consecutive special characters
4 The string cannot start or end with a space
I'm especially struggling with 3.
Any help/hint would be much appreciated.
Examples:
" lkjsdi1SD" => FALSE (starts with a space)
"-lkjsdi1SD" => TRUE
"lkjsd -i1SD " => FALSE (ends with a space)
".Dg5 -lkjsdi1SD" => TRUE
"jhv5675gjjvghHJHvg655775vfFVHFJFf445576JHFFfhd12" => FALSE (too long)
"jhv 12" => FALSE (two consecutive spaces)
"as" => FALSE (too short)
"a r" => TRUE

I suggest using:
^ # Start of string
(?=.{3,25}$) # The total string length is from 3 to 25
[._-]? # An optional . _ or - (? means "match 1 or 0 times")
[a-zA-Z0-9]+ # one or more alphanumeric symbols
(?: # Zero or more sequences of:
(?:[._-]|[ ][._-]?) # one . _ or - OR a space followed with an optional . _ or -
[a-zA-Z0-9]+ # one or more alphanumerics
)* # (here * defines zero or more times)
[._-]? # one optional . _ or -
$ # End of string
See the inline description for each part (I used /x VERBOSE (or free-space) modifier to enable comments that is helpful to keep long patterns readable).
See the regex demo
More pattern details
^ - start of string anchor, the regex engine will only look for the whole pattern at the string start. Thus, if there is a space at the start, no match will be returned as [a-zA-Z0-9]+, the first obligatory subpattern, requires an alphanumeric, and [._-]? (a character class that matches one or zero ., _, or - (the ? is a quantifier matching one or zero occurrences of the quantified subpattern) only allows 1 of these 3 characters before the first alphanumeric.
(?=.{3,25}$) is a positive lookahead anchored at the start, that requires at least 3 and at most 25 characters other than a newline (. matches any char other than a LF if /s modifier is not defined) from start till end ($ is the string end anchor that matches at the end of string or before the final char that is a newline character, replace with \z if you want to disallow matching a string with a newline symbol at the end).
The {3,25} is a limiting quantifier that allows matching min to max amount of characters conforming to the subpattern quantified. Note that a lookahead does not consume the text, i.e. the regex engine returns to the place where it starts matching the lookahead pattern with the true or false result, and if true, goes on matching the rest of the pattern.
[._-]? - an optional single char, one of the defined chars in the character class (see explanation above)
[a-zA-Z0-9]+ - one or more (I wrote "1+") characters (the + quantifier matches 1 or more occurrences) that are in the ranges defined in the character class.
(?:(?:[._-]|[ ][._-]?)[a-zA-Z0-9]+)* - is a non-capturing group used only for grouping subpatterns (to match them consecutively) that can match one or more (as the * stands after it) sequences of (?:[._-]|[ ][._-]?)[a-zA-Z0-9]+:
(?:[._-]|[ ][._-]?) - either a ., _, or -, OR (due to the | alternation operator) the space (I put the space into a character class [ ] because I used the /x VERBOSE modifier to introduce newline formatting and comments into the pattern, you may use a regular space if you do not use the /x modifier) followed with ., _, or -.
[a-zA-Z0-9]+ - 1 or more (due to +) alphanumerics.

Try using this:
^(?:[a-zA-Z0-9]|[._-](?![ ._-]))(?:[a-zA-Z0-9 ]|[._-](?![ ._-])){1,23}[a-zA-Z0-9._-]$
The part [._-](?![ ._-]) means "match [._-] if it's not followed by [ ._-].
In general you can look into lookarounds

Related

Match regular expression specific character quantities in any order

I need to match a series of strings that:
Contain at least 3 numbers
0 or more letters
0 or 1 - (not more)
0 or 1 \ (not more)
These characters can be in any position in the string.
The regular expression I have so far is:
([A-Z0-9]*[0-9]{3,}[\/]?[\-]?[0-9]*[A-Z]*)
This matches the following data in the following cases. The only one that does not match is the first one:
02ABU-D9435
013DFC
1123451
03323456782
ADS7124536768
03SDFA9433/0
03SDFA9433/
03SDFA9433/1
A41B03423523
O4AGFC4430
I think perhaps I am being too prescriptive about positioning. How can I update this regex to match all possibilities?
PHP PCRE
The following would not match:
01/01/2018 [multiple / or -]
AA-AA [no numbers]
Thanks
One option could be using lookaheads to assert 3 digits, not 2 backslashes and not 2 times a hyphen.
(?<!\S)(?=(?:[^\d\s]*\d){3})(?!(?:[^\s-]*-){2})(?!(?:[^\s\\]*\\){2})[A-Z0-9/\\-]+(?!\S)
About the pattern
(?<!\S) Assert what is on the left is not a non whitespace char
(?=(?:[^\d\s]*\d){3}) Assert wat is on the right is 3 times a whitespace char or digit
(?!(?:[^\s-]*-){2}) Assert what is on the right is not 2 times a whitespace char a hyphen
(?!(?:[^\s\\]*\\){2}) Assert what is on the right is not 2 times a whitespace char a backslash
[A-Z0-9/\\-]+ Match any of the listed 1+ times
(?!\S) Assert what is on the right is not a non whitespace char
Regex demo
Your patterns can be checked with positive/negative lookaheads anchored at the start of the string:
at least 3 digits -> find (not necessarily consecutive) 3 digits
no more than 1 '-' -> assert absence of (not necessarily consecutive) 2 '-' characters
no more than 1 '/' -> assert absence of (not necessarily consecutive) 2 '/' characters
0 or more letters -> no check needed.
If these conditions are met, any content is permitted.
The regex implementing this:
^(?=(([^0-9\r\n]*\d){3}))(?!(.*-){2})(?!(.*\/){2}).*$
Check out this Regex101 demo.
Remark
This solution assumes that each string tested resides on its own line, ie. not just being separated by whitespace.
In case the strings are separated by whitespace, choose the solution of user #TheFourthBird (which essentially is the same as this one but caters for the whitespace separation)
You can test the condition for both the hyphen and the slash into a same lookahead using a capture group and a backreference:
~\A(?!.*([-/]).*\1)(?:[A-Z/-]*\d){3,}[A-Z/-]*\z~
demo
detailled:
~ # using the tild as pattern delimiter avoids to escape all slashes in the pattern
\A # start of the string
(?! .* ([-/]) .* \1 ) # negative lookahead:
# check that there's no more than one hyphen and one slash
(?: [A-Z/-]* \d ){3,} # at least 3 digits
[A-Z/-]* # eventual other characters until the end of the string
\z # end of the string.
~
To better understand (if you are not familiar with): these three subpatterns start from the same position (in this case the beginning of the string):
\A
(?! .* ([-/]) .* \1 )
(?: [A-Z/-]* \d ){3,}
This is possible only because the two first are zero-width assertions that are simple tests and don't consume any character.

PHP: Complex Multi-Line RegEx, Tilde delimeters

I need to generate a regex that will match the following format:
-1 LKSJDF LSAALSKJ~
Syjsdf
lkjdf
This block may contain multiple characters including digits, colons, etc. Any character other than a tilde.
~
I'm currently using this:
/(-\d|\d)\s([^$\~][a-zA-Z\s]*)\~\n/s
Which matches the first line fine. I need to capture the -1 through 60 that begins the pattern, the words after the space and up until the first tilde. I then need to capture all of the text BETWEEN the tildes.
I'm not the strongest with regex in the first place, but I'm having trouble getting this to work without also capturing the tildes.
You can use
'/^(-?\d+)\s+([^~]*)~([^~]+)~/m'
See demo
The regex matches:
^ - start of a line (due to /m modifier ^ does not match start of string any longer)
(-?\d+) - (Group 1) a one or zero - followed with one or more digits
\s+ - one or more whitespace symbols (to only match tab and regular spaces, use \h+ instead)
([^~]*) - (Group 2) zero or more characters other than a ~ (you can force to match these characters on the first line only by adding a \n\r to the negated character class - [^~\n\r])
~ - a literal leading tilde
([^~]+) - (Group 3) one or more characters other than a tilde
~ - a literal trailing tilde
If you need to only match these strings if the number is an integer between -1 and 60, you can use
'/^(-1|[1-5]?[0-9]|60)\s+([^~]*)~([^~]+)~/m'
See another demo
Here, the first group matches integer numbers from -1 to 60 with (-1|[1-5]?[0-9]|60) alternation group. -1 and 60 match literal numbers, and [1-5]?[0-9] matches one or zero (optional) digit from 1 to 5 (replace with [0-5]? if a leading zero is allowed) and then any one digit may follow.

explanation of preg_replace function in PHP [duplicate]

This question already has an answer here:
Reference - What does this regex mean?
(1 answer)
Closed 7 years ago.
The preg_replace() function has so many possible values, like:
<?php
$patterns = array('/(19|20)(\d{2})-(\d{1,2})-(\d{1,2})/', '/^\s*{(\w+)}\s*=/');
$replace = array('\3/\4/\1\2', '$\1 =');
echo preg_replace($patterns, $replace, '{startDate} = 1999-5-27');
What does:
\3/\4/\1\2
And:
/(19|20)(\d{2})-(\d{1,2})-(\d{1,2})/','/^\s*{(\w+)}\s*=/
mean?
Is there any information available to help understand the meanings at one place? Any help or documents would be appreciated! Thanks in Advance.
Take a look at http://www.tutorialspoint.com/php/php_regular_expression.htm
\3 is the captured group 3
\4 is the captured group 4
...an so on...
\w means any word character.
\d means any digit.
\s means any white space.
+ means match the preceding pattern at least once or more.
* means match the preceding pattern 0 times or more.
{n,m} means match the preceding pattern at least n times to m times max.
{n} means match the preceding pattern exactly n times.
(n,} means match the preceding pattern at least n times or more.
(...) is a captured group.
So, the first thing to point out, is that we have an array of patterns ($patterns), and an array of replacements ($replace). Let's take each pattern and replacement and break it down:
Pattern:
/(19|20)(\d{2})-(\d{1,2})-(\d{1,2})/
Replacement:
\3/\4/\1\2
This takes a date and converts it from a YYYY-M-D format to a M/D/YYYY format. Let's break down it's components:
/ ... / # The starting and trailing slash mark the beginning and end of the expression.
(19|20) # Matches either 19 or 20, capturing the result as \1.
# \1 will be 19 or 20.
(\d{2}) # Matches any two digits (must be two digits), capturing the result as \2.
# \2 will be the two digits captured here.
- # Literal "-" character, not captured.
(\d{2}) # Either 1 or 2 digits, capturing the result as \3.
# \3 will be the one or two digits captured here.
- # Literal "-" character, not captured.
(\d{2}) # Either 1 or 2 digits, capturing the result as \4.
# \4 will be the one or two digits captured here.
This match is replaced by \3/\4/\1\2, which means:
\3 # The two digits captured in the 3rd set of `()`s, representing the month.
/ # A literal '/'.
\4 # The two digits captured in the 4rd set of `()`s, representing the day.
/ # A literal '/'.
\1 # Either '19' or '20'; the first two digits captured (first `()`s).
\2 # The two digits captured in the 2nd set of `()`s, representing the last two digits of the year.
Pattern:
/^\s*{(\w+)}\s*=/
Replacement:
$\1 =
This takes a variable name encoded as {variable} and converts it to $variable = <date>. Let's break it down:
/ ... / # The starting and trailing slash mark the beginning and end of the expression.
^ # Matches the beginning of the string, anchoring the match.
# If the following character isn't matched exactly at the beginning of the string, the expression won't match.
\s* # Any whitespace character. This can include spaces, tabs, etc.
# The '*' means "zero or more occurrences".
# So, the whitespace is optional, but there can be any amount of it at the beginning of the line.
{ # A literal '{' character.
(\w+) # Any 'word' character (a-z, A-Z, 0-9, _). This is captured in \1.
# \1 will be the text contained between the { and }, and is the only thing "captured" in this expression.
} # A literal '}' character.
\s* # Any whitespace character. This can include spaces, tabs, etc.
= # A literal '=' character.
This match is replaced by $\1 =, which means:
$ # A literal '$' character.
\1 # The text captured in the 1st and only set of `()`s, representing the variable name.
# A literal space.
= # A literal '=' character.
Lastly, I wanted to show you a couple of resources. The regex-format you're using is called "PCRE", or Perl-Compatible Regular Expressions. Here is a quick cheat-sheet on PCRE for PHP. Over the last few years, several tools have been popping up to help you visualize, explain, and test regular expressions. One is Regex 101 (just Google "regex tester" or "regex visualizer"). If you look here, this is an explanation of the first RegEx, and here is an explanation of the second. There are others as well, like Debuggex, Regex Tester, etc. But I find the detailed match breakdown on Regex 101 to be pretty useful.

Password Regular expression with four criteria

I am trying to write a regular expression in PHP to ensure a password matches a criteria which is:
It should atleast 8 characters long
It should include at least one special character
It should include at least one capital letter.
I have written the following expression:
$pattern=([a-zA-Z\W+0-9]{8,})
However, it doesn't seem to work as per the listed criteria. Could I get another pair of eyes to aid me please?
Your regex - ([a-zA-Z\W+0-9]{8,}) - actually searches for a substring in a larger text that is at least 8 characters long, but also allowing any English letters, non-word characters (other than [a-zA-Z0-9_]), and digits, so it does not enforce 2 of your requirements. They can be set with look-aheads.
Here is a fixed regex:
^(?=.*\W.*)(?=.*[A-Z].*).{8,}$
Actually, you can replace [A-Z] with \p{Lu} if you want to also match/allow non-English letters. You can also consider using \p{S} instead of \W, or further precise your criterion of a special character by adding symbols or character classes, e.g. [\p{P}\p{S}] (this will also include all Unicode punctuation).
An enhanced regex version:
^(?=.*[\p{S}\p{P}].*)(?=.*\p{Lu}.*).{8,}$
A human-readable explanation:
^ - Beginning of a string
(?=.*\W.*) - Requirement to have at least 1 non-word character
OR (?=.*[\p{S}\p{P}].*) - At least 1 Unicode special or punctuation symbol
(?=.*[A-Z].*) - Requirement to have at least 1 uppercase English letter
OR (?=.*\p{Lu}.*) - At least 1 Unicode letter
.{8,} - Requirement to have at least 8 symbols
$ - End of string
See Demo 1 and Demo 2 (Enhanced regex)
Sample code:
if (preg_match('/^(?=.*\W.*)(?=.*[A-Z].*).{8,}$/u', $header)) {
// PASS
}
else {
# FAIL
}
Using positive lookahead ?= we make sure that all password requirements are met.
Requirements for strong password:
At least 8 chars long
At least 1 Capital Letter
At least 1 Special Character
Regex:
^((?=[\S]{8})(?:.*)(?=[A-Z]{1})(?:.*)(?=[\p{S}])(?:.*))$
PHP implementation:
if (preg_match('/^((?=[\S]{8})(?:.*)(?=[A-Z]{1})(?:.*)(?=[\p{S}])(?:.*))$/u', $password)) {
# Strong Password
} else {
# Weak Password
}
Examples:
12345678 - WEAK
1234%fff - WEAK
1234_44A - WEAK
133333A$ - STRONG
Regex Explanation:
^ assert position at start of the string
1st Capturing group ((?=[\S]{8})(?:.*)(?=[A-Z]{1})(?:.*)(?=[\p{S}])(?:.*))
(?=[\S]{8}) Positive Lookahead - Assert that the regex below can be matched
[\S]{8} match a single character present in the list below
Quantifier: {8} Exactly 8 times
\S match any kind of visible character [\P{Z}\H\V]
(?:.*) Non-capturing group
.* matches any character (except newline) [unicode]
Quantifier: * Between zero and unlimited times, as many times as possible, giving back as needed [greedy]
(?=[A-Z]{1}) Positive Lookahead - Assert that the regex below can be matched
[A-Z]{1} match a single character present in the list below
Quantifier: {1} Exactly 1 time (meaningless quantifier)
A-Z a single character in the range between A and Z (case sensitive)
(?:.*) Non-capturing group
.* matches any character (except newline) [unicode]
Quantifier: * Between zero and unlimited times, as many times as possible, giving back as needed [greedy]
(?=[\p{S}]) Positive Lookahead - Assert that the regex below can be matched
[\p{S}] match a single character present in the list below
\p{S} matches math symbols, currency signs, dingbats, box-drawing characters, etc
(?:.*) Non-capturing group
.* matches any character (except newline) [unicode]
Quantifier: * Between zero and unlimited times, as many times as possible, giving back as needed [greedy]
$ assert position at end of the string
u modifier: unicode: Pattern strings are treated as UTF-16. Also causes escape sequences to match unicode characters
Demo:
https://regex101.com/r/hE2dD2/1

PHP: RegEx Syntax

I am a serious newbie with regular expression so please disregard my mistakes. I need to be sure that several criteria in a string are met.
Requirements:
Have at most 5 words
Max of 256 characters
Word is considered 1 or more characters - no spaces
Shouldn't contain two consecutive spaces
Example:
Tree blows in the wind
1-Tree falls over
Failure Example:
Tree blows in the night sky
Tree breaks 2 limbs during night
Can this be done in one single expression or should it be broken up?
Validating for 2 spaces:
- /^\s\s$/
Max of 256 characters:
- /^[a-zA-Z0-9]{,256}$/
I am not sure how to test case for the 5 words and combine the other criteria that I impose. Can anyone help?
Test for word:
- /^\w{1,5}$
You can try this:
(?s)\A(?!.{257}|.*\s\s)\W*\w*(?:\W+\w+){0,4}\W*\z
pattern details:
(?s) # turn on the singleline mode: allow the dot to match newlines
\A # start of the string anchor
(?! # open a negative lookahead assertion: means not followed by
.{257} # 257 characters
| # OR
.*\s\s # two consecutive whitespaces
) # close the negative lookahead
\W* # optional non-word characters
\w* # optional word characters (nothing in your requirements forbids to have a string without words or an empty string)
(?: # open a non-capturing group
\W+ # non-word characters: words are obviously separated with non-word characters
\w+ # an other word
){0,4} # repeat the non-capturing group between zero and 4 times
\W* # optional non-word characters
\z # anchor for the end of the string

Categories