Regular expression fails to identify valid character at end of string [duplicate] - php

This question already has an answer here:
Reference - Password Validation
(1 answer)
Closed 5 years ago.
I found a script online and it has a password regex in JavaScript. I still want to use it, but for more security I want to use PHP to validate my password too but I'm useless with regex.
The requirements:
Must be a minimum of 8 characters
Must contain at least 1 number
Must contain at least one uppercase character
Must contain at least one lowercase character
How can I construct a regex string to meet these requirements?

^\S*(?=\S{8,})(?=\S*[a-z])(?=\S*[A-Z])(?=\S*[\d])\S*$
From the fine folks over at Zorched.
^: anchored to beginning of string
\S*: any set of characters
(?=\S{8,}): of at least length 8
(?=\S*[a-z]): containing at least one lowercase letter
(?=\S*[A-Z]): and at least one uppercase letter
(?=\S*[\d]): and at least one number
$: anchored to the end of the string
To include special characters, just add (?=\S*[\W]), which is non-word characters.

I find that doing it in one big regex is a bit of a code maintenance nightmare. Splitting it up is far easier to figure out for someone else looking at your code, and it allows you to give more specific error messages as well.
$uppercase = preg_match('#[A-Z]#', $password);
$lowercase = preg_match('#[a-z]#', $password);
$number = preg_match('#[0-9]#', $password);
if(!$uppercase || !$lowercase || !$number || strlen($password) < 8) {
// tell the user something went wrong
}

One possible regex pattern is:
/^(?=.*[a-z])(?=.*[A-Z])(?=.*\d)[a-zA-Z\d]{8,}$/
As in this example.
But you really shouldn't limit passwords!
Admit it. As a developer we have done more to contribute to the failure of our customer's and user's online security because we are too stubborn or lazy to handle passwords properly. Just look at some of the fruit of our labor:
Password must be between 5 and 32 characters in length. Valid characters include letters, numbers, and underscore.
Password must be between 6 and 12 characters in length. Valid characters include letters and numbers.
Password must be a minimum of 8 characters and contain at least one capital letter, a number and a special character such as an underscore or exclamation point.
Then there is this gem. The original requirements were a minimum of 8 characters. Accidentally putting in 7 characters causes an error to appear before the user:
Password Limitation Gone Wrong
Note the tag line. Irony?
I could go on here, but I think you get the point. We have written code to support this nonsense, wrapping our heads around the right regex to account for every case. Agonizing over transmission, hashing and storage. We've talked about this so much the situation has even received proper pop culture status with its memorialization on xkcd.
There is no doubt our intentions were good. After all, users and customers cannot be expected to protect themselves properly. They don't create strong passwords, they use the word 'password' as their password more often than not. They don't heed the warnings, the news stories or the horror exrpressed by friends who have suffered through identity theft. The hacking of large retail chains phases them very little. We, as developers, set out to help our users avoid these pitfalls. I will alledge our attempts fell short and may have even contributed to the problem.
Very likely we've made it worse.
By placing arcane restrictions on passwords we have actually forced our users into a bad way of thinking and therefore made them seek the path of least resistance, simple, hackable passwords. We did this because we were used to restrictions on us. Sysadmins limited us to 8 characters so we projected the limit on to the rest of the world. It is time we stopped and learned how to handle any length of password with any character included. We may want to exclude white spaces from the password, but other than that we shouldn't place any restrictions on passwords.
Then we can encourage good security practices like passphrases or random words. Users, once they discover this, will be blissfully happy they don't have to remember some goofy combination of letters and numbers like f#rtp00p.
I can see you rolling your eyes. It means you have to learn how to properly hash passwords and how to compare entered passwords with the hashes. You'll have to toss some really hard won regex. Heaven forbid you might have to refactor some code! Databases can hold very large hashed passwords and we should take advantage of the capability.
Keep in mind the general security of the data is on me, the developer along with the sysadmin and others. The security of a user's account is on them and I shouldn't do anything to hold them back. Personally I do not care what my users have for their passwords. All I do when users create their passwords is provide a strength meter and some basic guidelines:
"We have found using passphrases or multiple word combinations to be the most secure when it comes to preventing a hacker, who is trying to crack your login information, from being successful."
What should you do?
PHP's built-in functions handle password security perfectly, spaces, special characters and all.. If you're using a PHP version less than 5.5 you can use the password_hash() compatibility pack.
We need to remove the limitations on passwords and free up the users to own their security online. Are you in?

PHP regular expression for strong password validation
The link above looks like the regex you want. You could try something like the code below:
if(preg_match((?=^.{8,}$)((?=.*\d)|(?=.*\W+))(?![.\n])(?=.*[A-Z])(?=.*[a-z]).*$), $_POST['password']):
echo 'matched';
else:
echo 'not matched';
endif;

This checks for min. 1 number and also min/max chars:
^(?=.*\d)(?!.*\s).{4,8}$

Related

convert multiple whitespaces value to tab value for password storing

I dont know; may be it seems crazy or totally unprofessional newbie question. However is it good choice to convert four spaces to tab for a password field?
Here is what I want to do- whenever the user put password in the password field; I want to trim the left and right whitespace (if any)! and in the middle of string if user put four spaces convert it to TAB key value (or vice versa??) and then hash the value..
I want to mention that the password field will accept whitespace and the password field is not only restricted to English character set.
Is it good practice?
Trimming the start and end is definitely good practice.
However converting whitespace characters to a tab would be a very bad idea. How would the user be able to log in? When they press the Tab button in the password box the browser will move the focus out of the password box to the next control on the page. There is no way for them to be able to type a Tab into the password!
Leave any spaces in the middle of the password as they are.
One can discuss the trimming of the password, i myself think it is a good idea.
Altering the password although, wont give you any benefit and can even be harmful. Assuming that you are properly hashing passwords before storing them, you can see the alteration as just an additional part of the hashing algorithm. Whatever changes you make, the entropy of the password cannot be increased by an algorithm, the password cannot become any stronger. On the other side it can decrease the entropy. An easy example:
The same password, once with 4 spaces, once with a single tab will result in the same hash-value.
So go with trimmed passwords for convenience if you like, but leave the content of the password unaltered.
great that you plan on removing the leading/trailing spaces, however I don't see a reason to change those spaces to tabs, since it's just an extra step before encrypting them.
If there's no good reason to put something in it's generally better to... not put it in
(edit: I'm assuming the same check would be in place on login)
(ps: this type of question isn't really fit for StackOverflow though since it involves personal opinions)

Password Strength Pattern

Should I use a password pattern like a-zA-Z0-9 and also require at least one of each character class in the password, or simply allow anything inside the password?
What do sites allow the user to use as his/her password? Is there anything else I should consider?
a-ZA-Z0-9 is overly limited. You should let me use any characters, and enforce minimum requirements (i.e. at least 8 characters, at least one letter and one number)
Password Entropy
The test of a good password is not the number of sets of characters represented but Entropy.
Testing for Entropy: The people at Dropbox have put together this fantastic tool called zxcvbn to do just that. I would highly recommend reading their write-up explaining it here.
Brief Explanation: Both character classes (lower case, upper case, digits and special characters) and length are both important because together they raise password entropy (length does this much faster than character classes though) but users then tend toward predictable patterns which lowers entropy.
This may be humour but it helpfully illustrates part of the point:
http://xkcd.com/936/
There should be no limit to what the user should be able to use. Since you would hash the password before you store it anyways (i hope) this will make no difference what the password contain.
If you set requirements, they should be minimum requirements.
Password Regular Expression Pattern
((?=.*\d)(?=.*[a-z])(?=.*[A-Z])(?=.*[##$%]).{6,20})
Breakdown
( # Start of group
(?=.*\d) # must contains one digit from 0-9
(?=.*[a-z]) # must contains one lowercase characters
(?=.*[A-Z]) # must contains one uppercase characters
(?=.*[##$%]) # must contains one special symbols in the list "##$%"
. # match anything with previous condition checking
{6,20} # length at least 6 characters and maximum of 20
) # End of group
Related:
Regular Expression for Password
minimum 8 characters, preferable 12
at least one digit, at least one lower case, at least one upper case, at least one symbol (*/%...)

Unicode in usernames (and passwords)? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 1 year ago.
Improve this question
After reviewing this I realised I still have a few questions left regarding the topic.
Are there any characters that should be 'left out' for legitimate security purposes? This includes all characters, such as brackets, commas, apostrophes, and parentheses.
While on this subject, I admittedly don't understand why admins seem to enjoy enforcing the "you can only use the alphabet, numbers, and spaces" rule. Does anything else have the potential to be a security flaw or break something I'm not aware of (even in ASCII)? As far as I've seen during my coding days there is absolutely no reason that any character should be barred from being in a username.
There's no security reason to not use certain characters. If you're properly handling all input, it doesn't make any difference whether you're only handling alphanumeric characters or Chinese.
It is easier to handle only alphnum usernames. You don't need to think about ambiguity with collations in your database, encoding usernames in URLs and things like that. But again, if you're properly handling it, there's no technical reason against it.
For practical reasons passwords are often only alphanumeric. Most password inputs don't accept IME input for example, so it's almost impossible to have a Japanese password. There's no security reason for disallowing non-alphanum characters though. On the contrary, the larger the usable alphabet, the better.
If your application handles Unicode input properly throughout, I'd certainly allow non-ASCII characters in usernames and passwords, with a few caveats:
If you use HTTP Basic Authentication, you can't properly support non-ASCII characters in usernames and passwords, because the process of passing those details involves an encode-to-bytes-in-base64 step that, currently, browsers don't agree on:
Safari uses ISO-8859-1, and breaks if there are any non-8859-1 characters present;
Mozilla uses the low byte of each character encoded to UTF-16 code units (same as ISO-8859-1 for those characters);
Opera and Chrome use UTF-8
IE uses the ANSI code page on the system it's installed on, which could be anything, but neever ISO-8859-1 or UTF-8. Characters that don't fit the encoding are arbitrarily mangled.
If you use cookies, you must ensure any Unicode characters are encoded in some way (eg URL-encoding), as once again trying to send non-ASCII characters gives vastly different results in different browsers.
"you can only use the alphabet, numbers, and spaces"
You get spaces? Luxury!
It are often exactly those characters which can be used to inject malicious code in your program. For example SQL injection (quotes, dashes, etc), XSS/CSRF (quotes, fish braces, etc) or even programming language injection when eval() is used elsewhere in your code.
Those characters does usually not harm when you as being the developer sanitize the user-controlled input/output properly, i.e. everything which comes in with the HTTP request; the headers, parameters and body. E.g. parameterized queries or using mysql_real_escape_string() when inlining them in a SQL query to prevent SQL injections and htmlspecialchars() when inlining them in HTML to prevent XSS. But I can imagine that admins don't trust all developers, so they add those restrictions.
See also:
OWASP on PHP top 5 vulrenabilities
I don't think there is a reason to not allow unicode in username. Passwords are different story, since you don't usually see password when you type it into a form, allowing only ASCII makes sense to prevent possible confusion.
I think it makes sense to use email address as the login credential rather than requiring create a new username. Then user can select any nickname, using any unicode characters and have that nick displayed next to user's posts and comments.
Isn't this how it's done on Facebook?
I think that most of the time when things (usernames or passwords) are being forced down to ASCII, it's because someone is afraid that more complex character sets will cause breakage in some unknown component. Whether this fear is justified or not is case dependent, but trying to verify that your entire stack really does Unicode correctly in all cases might be difficult. It's getting better every day, but you can still find problems with Unicode in some places.
I personally keep my usernames and passwords all ASCII, and I even try not to use too much punctuation. One reason is that some input devices (like some mobile phones) make it kind of difficult to get to some of the more esoteric characters. Another reason is that I've more than once encountered a system that had no restrictions on the password contents, but then screwed up if you actually used something other than a letter or number.
There is a risk involved if some parts of your program assume strings with different bytes are different, but other parts of the program would compare strings according to unicode semantics and think they're the same.
For example filesystems on Mac OS X enforce uniform representation of Unicode characters, so two different filenames Ą ('A with ogonek') and A+̨ (latin A followed by 'combining ogonek') will refer to the same file.
Similarly one can produce invalid UTF-8 byte sequences where 1-byte codepoints are encoded usnig multiple bytes (called overlong sequences). If you normalize or reject UTF-8 input before processing it it'll be safe, but e.g. if you use Unicode-ignorant programming language and Unicode-aware database these two will see different inputs.
So to avoid that:
You should filter UTF-8 input as early as possible. Reject invalid/overlong sequences.
When comparing Unicode stings always convert both sides of comparison to the same Unicode Normal Form. For usernames you might want NFKD to reduce amount of homograph attacks possible.

php regular expression for validating password with wild characters

I am trying to add wild characters to my current alphanumeric only regular expression to make the password validation stronger. I am not trying to require the user to enter wild characters, just allowing them to enter wild characters.
'/^[a-z0-9]{8,16}$/i'
I am also using cakephp and doing the validation in the model if that helps, but not really needed for this answer.
'rule' => '/^[a-z0-9]{8,16}$/i',
'on' => 'create',
'allowEmpty' => true
Just add the characters you want to allow to the character class ([...]):
/^[a-z0-9!#$%&]{8,16}$/i
you are doing it totally wrong.
never use regexp for password fields. this way you dont allow the user anything, you are just disallowing the user to enter whatever he wants to use as password (maybe some special chars like & or { or whatever.
in any case your approach hurts more than it helps.
what you should do, is encouraging the user to use specialchars and more complex passwords simply by displaying a "red-yellow-green" indicator besides the password field.
I also think you should allow "everything", thus remove the validation on content, and only forbid "empty" strings, or too short strings (ideally with a live javascript validation as an indicator so that people don't have to try 10 times before figuring out what works).
You shouldn't care what people type in, even in Japanese, as you are going to encode this string anyway (I hope!), using CakePHP's built in function, with sha1 and md5 and salt, and you'll end up with something harmless in the end.
Use $this->Auth->password($string);

PHP and Regular Expressions question?

I was wondering if the codes below are the correct way to check for a street address, email address, password, city and url using preg_match using regular expressions?
And if not how should I fix the preg_match code?
preg_match ('/^[A-Z0-9 \'.-]{1,255}$/i', $trimmed['address']) //street address
preg_match ('/^[\w.-]+#[\w.-]+\.[A-Za-z]{2,6}$/', $trimmed['email'] //email address
preg_match ('/^\w{4,20}$/', $trimmed['password']) //password
preg_match ('/^[A-Z \'.-]{1,255}$/i', $trimmed['city']) //city
preg_match("/^[a-zA-Z]+[:\/\/]+[A-Za-z0-9\-_]+\\.+[A-Za-z0-9\.\/%&=\?\-_]+$/i", $trimmed['url']) //url
Your street address: ^[A-Z0-9 \'.-]{1,255}$
you need not escape the single quote.
since you have a dot in the char
class, it will allow all char (except
newline). So effective your regex becomes ^.{1,255}$
you are allowing it to be of min
length of 1 and max of length 255. I
would suggest you to increase the min
length to something more than 1.
Your email regex: ^[\w.-]+#[\w.-]+\.[A-Za-z]{2,6}$
again you are having . in the char
class. fix that.
Your password regex: ^\w{4,20}$
allows for a passwd of length 4 to 20
and can contain only alphabets(upper
and lower), digits and underscore. I would suggest you to allow
special char too..to make your
password stronger.
Your city regex: ^[A-Z \'.-]{1,255}$
has . in char class
allows min length of 1 (if you want
to allow cities of 1 char length this
is fine).
EDIT:
Since you are very new to regex, spend some time on Regular-Expressions.info
This seems overly complicated to me. In particular I can see a few things that won't work:
Your regex will fail for cities with non-ASCII letters in their names, such as "Malmö" or 서울, etc.
Your password validator doesn't allow for spaces in the password (which is useful for entering pass-phrases) it doesn't even allow digits or punctuation, which many people will like to put in their passwords for added security.
You address validator won't allow for people who live in apartments (12/345 Foo St)
(this is assuming you meant "\." instead of "." since "." matches anything)
And so on. In general, I think over-reliance on regular expressions for validation is not a good thing. You're probably better off allowing anything for those fields and just validating them some other way.
For example, with email addresses: just because an address is valid according to the RFC standard doesn't mean you'll actually be able to send email to it (or that it's the correct email address for the person). The only reliable way to validate an email address is to actually send an email to it and get the person to click on a link or something.
Same thing with URLs: just because it's valid according to the standard doesn't actually mean there's a web page there. You can validate the URL by trying to do an actual request to fetch the page.
But my personal preference would be to just do the absolute minimum verification possible, and leave it at that. Let people edit their profile (or whatever it is you're verifying) in case they make a mistake.
There's not really a 'correct' way to check for any of those things. It depends on what exactly your requirements are.
For e-mail addresses and URLs, I'd recommend using filter_var instead of regexps - just pass it FILTER_VALIDATE_EMAIL or FILTER_VALIDATE_URL.
With the other regexps, you need to make sure you escape . inside character classes (otherwise it'll allow everything), and you might want to consider that the City/Street ones would allow rubbish such as ''''', or just whitespace.
Please don't assume that you know how an address is made up. There are thousands of cities, towns and villages with characters like & and those from other alphabets.
Just DON'T try to validate an address unless you do it thru an API specific to a country (USPS for the US, for example).
And why would you want to limit the characters in a users password? Don't have ANY requirements on the password except for it existing.
Your site will be unusable if you use those regex.

Categories