Help with password complexity regex - php

I'm using the following regex to validate password complexity:
/^.*(?=.{6,12})(?=.*[0-9]{2})(?=.*[A-Z]{2})(?=.*[a-z]{2}).*$/
In a nutshell: 2 lowercase, 2 uppercase, 2 numbers, min length is 6 and max length is 12.
It works perfectly, except for the maximum length, when I'm using a minimum length as well.
For example:
/^.*(?=.{6,})(?=.*[0-9]{2})(?=.*[A-Z]{2})(?=.*[a-z]{2}).*$/
This correctly requires a minimum length of 6!
And this:
/^.*(?=.{,12})(?=.*[0-9]{2})(?=.*[A-Z]{2})(?=.*[a-z]{2}).*$/
Correctly requires a maximum length of 12.
However, when I pair them together as in the first example, it just doesn't work!!
What gives? Thanks!

You want:
/^(?=.{6,12}$)...
What you're doing is saying: find me any sequence of characters that is followed by:
6-12 characters
another sequence of characters that is followed by 2 digits
another sequence of characters that is followed by 2 uppercase letters
another sequence of characters that is followed by 2 lowercase letters
And all that is followed by yet another sequence of characters. That's why the maximum length isn't working because 30 characters followed by 00AAaa and another 30 characters will pass.
Also what you're doing is forcing two numbers together. To be less stringent than that but requiring at least two numbers anywhere in the string:
/^(?=.{6,12}$)(?=(.*?\d){2})(?=(.*?[A-Z]){2})(?=(.*?[a-z]){2})/
Lastly you'll note that I'm using non-greedy expressions (.*?). That will avoid a lot of backtracking and for this kind of validation is what you should generally use. The difference between:
(.*\d){2}
and
(.*?\d){2}
Is that the first will grab all the characters with .* and then look for a digit. It won't find one because it will be at the end of the string so it will backtrack one characters and then look for a digit. If it's not a digit it will keep backtracking until it finds one. After it does it will match that whole expression a second time, which will trigger even more backtracking.
That's what greedy wildcards means.
The second version will pass on zero characters to .*? and look for a digit. If it's not a digit .*? will grab another characters and then look for a digit and so on. Particularly on long search strings this can be orders of magnitude faster. On a short password it almost certainly won't make a difference but it's a good habit to get into of knowing how the regex matcher works and writing the best regex you can.
That being said, this is probably an example of being too clever for your own good. If a password is rejected as not satisfying those conditions, how do you determine which one failed in order to give feedback to the user about what to fix? A programmatic solution is, in practice, probably preferable.

Related

Regex repeating numbers and sequences

Another Regex question. Have spend ages trawling through StackOverflow with no joy.
I need regexs (regexai?) for the following:
Can’t have more than 4 double numbers in a row. Eg 22334455 fails,
Can’t have a sequence of numbers longer than or equal to 5 digits. Eg
12345 or 56789 both fail,
Must have 4 or more different digits. Eg
77788778877 fails
I don't expect one expression to fit all, guessing it'll probably be 2/3 required.
Cheers
The only requirement, in my opinion which can be solved with a regular expression is the first, with an expression such as this: ((\d)\2){4}. This will attempt to match a digit and the same digit 4 times (it will look for 4 pairs).
The other requirements, such as checking if a digit is one less than the one the follows it and the last one cannot, to my knowledge be solved with a regular expression.
My recommendation would be to have a method which checks for each requirement and yield a boolean value denoting failure or success. This way at least you will have an idea of what you are doing and would be in a position to maintain the solution should one day the requirements change.
Long story short, what you are after can be achieved through a simple loop and some numerical checks.

Regular expression - avoid the repetition of the sequence of the same letters

I'm trying to make a check on the password inserted by a user, working on a PHP website.
My check wants to:
at least 8 characters
maximum 20 characters
accept letters, numbers, and common special characters like (dot) # + $ - _ !
Until this point I've been able to figure out the right expression, but now I want to add an other rule, where an user can't write the same sequence of letter more then 1 time.
Let's say that, not considering the repetition of two times of the same letter, if the user write the same string (equal or more than 3 characters) more then once, it should not match.
For example:
abcde not valid - should be at least 8 characters
abcde1234 valid
abcd1abcd1 not valid due to repetition of the string abcd1
More examples (updated):
abababab not valid - the string "ab" is repited 2 times or more
aaaaaaaa not valid - the string aaa is repited more then once
helloworld valid - even if there is the letter "l" repeated two times
Any suggestion?
I don't know is it's possibile to write down a correct RegExp, maybe I'm trying to do something impossibile.
Before leaving the idea, I was curious to check the opinion of someone who know more then me in RegExp.
Thanks in advance
^(?!.*?(.+)\1)([\w#+$!.-]+){8,20}$
seems to work well: http://regex101.com/r/cU9lD0/1
The tricky part is ^(?!.*?(.+)\1) which reads "start of input, when not followed by: something, then some substring, then that substring once again".
That being said, all "password validation" is a quite pointless enterprise, which actually stops people from using really good passwords.

Password Strength Pattern

Should I use a password pattern like a-zA-Z0-9 and also require at least one of each character class in the password, or simply allow anything inside the password?
What do sites allow the user to use as his/her password? Is there anything else I should consider?
a-ZA-Z0-9 is overly limited. You should let me use any characters, and enforce minimum requirements (i.e. at least 8 characters, at least one letter and one number)
Password Entropy
The test of a good password is not the number of sets of characters represented but Entropy.
Testing for Entropy: The people at Dropbox have put together this fantastic tool called zxcvbn to do just that. I would highly recommend reading their write-up explaining it here.
Brief Explanation: Both character classes (lower case, upper case, digits and special characters) and length are both important because together they raise password entropy (length does this much faster than character classes though) but users then tend toward predictable patterns which lowers entropy.
This may be humour but it helpfully illustrates part of the point:
http://xkcd.com/936/
There should be no limit to what the user should be able to use. Since you would hash the password before you store it anyways (i hope) this will make no difference what the password contain.
If you set requirements, they should be minimum requirements.
Password Regular Expression Pattern
((?=.*\d)(?=.*[a-z])(?=.*[A-Z])(?=.*[##$%]).{6,20})
Breakdown
( # Start of group
(?=.*\d) # must contains one digit from 0-9
(?=.*[a-z]) # must contains one lowercase characters
(?=.*[A-Z]) # must contains one uppercase characters
(?=.*[##$%]) # must contains one special symbols in the list "##$%"
. # match anything with previous condition checking
{6,20} # length at least 6 characters and maximum of 20
) # End of group
Related:
Regular Expression for Password
minimum 8 characters, preferable 12
at least one digit, at least one lower case, at least one upper case, at least one symbol (*/%...)

Editing a regex that isn't mine, not sure how to adjust it for needs

I have a regex that was written for me for passwords:
~^[a-z0-9!##\$%\^&\*\(\)]{8,16}$~i
It's supposed to match strings of alphanumerics and symbols of 8-16 characters. Now I need to remove the min and max length requirement as I need to split the error messages for user friendliness - I tried to just take out the {8,16} portion but then it breaks it. How would I do this? Thanks ahead of time.
I take it you're doing separate checks for too-long or too-short strings, and this regex is only making sure there are no invalid characters. This should do it:
~^[a-z0-9!##$%^&*()]+$~i
+ means one or more, * means zero or more; it probably doesn't matter which one you use.
I got rid of some unnecessary backslashes, too; none of those characters has any special meaning in a character class (inside the square brackets, that is).

In RegEx, how do you find a line that contains no more than 3 unique characters?

I am looping through a large text file and im looking for lines that contain no more than 3 different characters (those characters, however, can be repeated indefinitely). I am assuming the best way to do this would be some sort of regular expression.
All help is appreciated.
(I am writing the script in PHP, if that helps)
Regex optimisation fun time exercise for kids! Taking gnarf's regex as a starting point:
^(.)\1*(.)?(?:\1*\2*)*(.)?(?:\1*\2*\3*)*$
I noticed that there were nested and sequential *s here, which can cause a lot of backtracking. For example in 'abcaaax' it will try to match that last string of ‘a’s as a single \1* of length 3, a \1* of length two followed by a single \1, a \1 followed by a 2-length \1*, or three single-match \1s. That problem gets much worse when you have longer strings, especially when due to the regex there is nothing stopping \1 from being the same character as \2.
^(.)\1*(.)?(?:\1|\2)*(.)?(?:\1|\2|\3)*$
This was over twice as fast as the original, testing on Python's PCRE matcher. (It's quicker than setting it up in PHP, sorry.)
This still has a problem in that (.)? can match nothing, and then carry on with the rest of the match. \1|\2 will still match \1 even if there is no \2 to match, resulting in potential backtracking trying to introduce the \1|\2 and \1|\2|\3 clauses earlier when they can't result in a match. This can be solved by moving the ? optionalness around the whole of the trailing clauses:
^(.)\1*(?:(.)(?:\1|\2)*(?:(.)(?:\1|\2|\3)*)?)?$
This was twice as fast again.
There is still a potential problem in that any of \1, \2 and \3 can be the same character, potentially causing more backtracking when the expression does not match. This would stop it by using a negative lookahead to not match a previous character:
^(.)\1*(?:(?!\1)(.)(?:\1|\2)*(?:(?!\1|\2)(.)(?:\1|\2|\3)*)?)?$
However in Python with my random test data I did not notice a significant speedup from this. Your mileage may vary in PHP dependent on test data, but it might be good enough already. Possessive-matching (*+) might have helped if this were available here.
No regex performed better than the easier-to-read Python alternative:
len(set(s))<=3
The analogous method in PHP would probably be with count_chars:
strlen(count_chars($s, 3))<=3
I haven't tested the speed but I would very much expect this to be faster than regex, in addition to being much, much nicer to read.
So basically I just totally wasted my time fiddling with regexes. Don't waste your time, look for simple string methods first before resorting to regex!
At the risk of getting downvoted, I will suggest regular expressions are not meant to handle this situation.
You can match a character or a set of characters, but you can't have it remember what characters of a set have already been found to exclude those from further match.
I suggest you maintain a character set, you reset it before you begin with a new line, and you add there elements while going over the line. As soon as the count of elements in the set exceeds 3, you drop the current line and proceed to the next.
Perhaps this will work:
preg_match("/^(.)\\1*(.)?(?:\\1*\\2*)*(.)?(?:\\1*\\2*\\3*)*$/", $string, $matches);
// aaaaa:Pass
// abababcaaabac:Pass
// aaadsdsdads:Pass
// aasasasassa:Pass
// aasdasdsadfasf:Fail
Explaination:
/
^ #start of string
(.) #match any character in group 1
\\1* #match whatever group 1 was 0 or more times
(.)? #match any character in group 2 (optional)
(?:\\1*\\2*)* #match group 1 or 2, 0 or more times, 0 or more times
#(non-capture group)
(.)? #match any character in group 3 (optional)
(?:\\1*\\2*\\3*)* #match group 1, 2 or 3, 0 or more times, 0 or more times
#(non-capture group)
$ #end of string
/
An added benifit, $matches[1], [2], [3] will contain the three characters you want. The regular expression looks for the first character, then stores it and matches it up until something other than that character is found, catches that as a second character, matching either of those characters as many times as it can, catches the third character, and matches all three until the match fails or the string ends and the test passes.
EDIT
This regexp will be much faster because of the way the parsing engine and backtracking works, read bobince's answer for the explanation:
/^(.)\\1*(?:(.)(?:\\1|\\2)*(?:(.)(?:\\1|\\2|\\3)*)?)?$/
for me - as a programmer with fair-enough regular expression knowledge this sounds not like a problem that you can solve using Regexp only.
more likely you will need to build a hashMap/array data structure key: character value:count and iterate the large text file, rebuilding the map for each line. at each new character check if the already-encountered character count is 2, if so, skip current line.
but im keen to be suprised if one mad regexp hacker will come up with a solution.

Categories