Would this regular expression work?

Would this regular expression work? - php

^([a-zA-Z0-9!##$%^&*|()_\-+=\[\]{}:;\"',<.>?\/~`]{4,})$
Would this regular expression work for these rules?
Must be atleast 4 characters
Characters can be a mix of alphabet (capitalized/non-capitalized), numeric, and the following characters: ! # # $ % ^ & * ( ) _ - + = | [ { } ] ; : ' " , < . > ? /
It's intended to be a password validator. The language is PHP.

Yes?
Honestly, what are you asking for? Why don't you test it?
If, however, you want suggestions on improving it, some questions:
What is this regex checking for?
Why do you have such a large set of allowed characters?
Why don't you use /\w/ instead of /0-9a-zA-Z_/?
Why do you have the whole thing in ()s? You don't need to capture the whole thing, since you already have the whole thing, and they aren't needed to group anything.
What I would do is check the length separately, and then check against a regex to see if it has any bad characters. Your list of good characters seems to be sufficiently large that it might just be easier to do it that way. But it may depend on what you're doing it for.
EDIT: Now that I know this is PHP-centric, /\w/ is safe because PHP uses the PCRE library, which is not exactly Perl, and in PCRE, \w will not match Unicode word characters. Thus, why not check for length and ensure there are no invalid characters:
if(strlen($string) >= 4 && preg_match('[\s~\\]', $string) == 0) {
# valid password
}
Alternatively, use the little-used POSIX character class [[:graph:]]. It should work pretty much the same in PHP as it does in Perl. [[:graph:]] matches any alphanumeric or punctuation character, which sounds like what you want, and [[:^graph:]] should match the opposite. To test if all characters match graph:
preg('^[[:graph:]]+$', $string) == 1
To test if any characters don't match graph:
preg('[[:^graph:]]', $string) == 0

You forgot the comma (,) and full stop (.) and added the tilde (~) and grave accent (`) that were not part of your specification. Additionally just a few characters inside a character set declaration have to be escaped:
^([a-zA-Z0-9!##$%^&*()|_\-+=[\]{}:;"',<.>?/~`]{4,})$
And that as a PHP string declaration for preg_match:
'/^([a-zA-Z0-9!##$%^&*()|_\\-+=[\\]{}:;"\',<.>?\\/~`]{4,})$/'

I noticed that you essentially have all of ASCII, except for backslash, space and the control characters at the start, so what about this one, instead?
^([!-\[\]-~]{4,})$

You are extra escaping and aren't using some predefined character classes (such as \w, or at least \d).
Besides of that and that you are anchoring at the beginning and at the end, meaning that the regex will only match if the string starts and ends matching, it looks correct:
^([a-zA-Z\d\-!$##$%^&*()|_+=\[\]{};,."'<>?/~`]{4,})$
If you really mean to use this as a password validator, it reeks of insecurity:
Why are you allowing 4 chars passwords?
Why are you forbidding some characters? PHP can't handle some? Why would you care? Let the user enter the characters he pleases, after all you'll just end up storing a hash + salt of it.

No. That regular expression would not work for the rules you state, for the simple reason that $ by default matches before the final character if it is a newline. You are allowing password strings like "1234\n".
The solution is simple. Either use \z instead of $, or apply the D modifier to the regex.

Related

PHP Preg_replace() pattern not work

I wan to replace text using preg_replace.But my search string have a / so it makes problem.
How can I solve it?
$search='r/trtrt';
echo preg_replace('/\b'.addslashes($search).'\b/', 'ERTY', 'TG FRT');
I am getting error preg_replace(): Unknown modifier 'T'

Use a different delimiter and don't use addslashes, that is escaping non-regex special characters (or a mix of regex and non-regex characters, I'd say the majority of the time dont use addslashes).
$search='r/trtrt';
echo preg_replace('~\b'. $search.'\b~', 'ERTY', 'TG FRT');
You could use preg_quote as an alternative. Just changing the delimiter is the easiest solution though.

use ~ as delimiter:
$search='r/trtrt';
echo preg_replace('~\b'.addslashes($search).'\b~', 'ERTY', 'TG FRT');
I always use ~ as it is one of the least used char in a string but you can use any character you want and won't need to escape your regexp chars!
You don't need addslashes() in your case but if you have a more complex regexp and you want to escape chars you should use preg_quote($search).

Why not escape it the way it is meant to be done
$search='r/trtrt';
echo preg_replace('/\b'.preg_quote($search, '/').'\b/', 'ERTY', 'TG FRT');
http://php.net/manual/en/function.preg-quote.php
preg_quote() takes str and puts a backslash in front of every
character that is part of the regular expression syntax. This is
useful if you have a run-time string that you need to match in some
text and the string may contain special regex characters
delimiter
If the optional delimiter is specified, it will also be escaped.
This is useful for escaping the delimiter that is required by the PCRE
functions. The / is the most commonly used delimiter.
Add slashes is not the function to use here. It provides no escaping for any of the special characters in Regx.
The special regular expression characters are: . \ + * ? [ ^ ] $ ( ) {
} = ! < > | : -
Using the proper functions promote readability of the code, if at some later point in time you or another coder see the ~ delimiter they may just think its part of a personal "style" or pay it little attention. However, seeing the input properly escaped will tell any experienced coder that the input could contain characters that conflict with regular expressions.
Personally, readability is at the top of my list whenever I write code. If you cant understand it at a glance, what good is it.

preg_match username validation regex allows > and < despite those characters not being whitelisted

I have this relatively simple regex for usernames
// Enforce that username has to be 3-100 characters, alphanumeric, and first character a letter.
// Possibility without begin/end characters and i: [a-z][a-z0-9#.+-_]{2,100}
// Allow for simple email usernames in the future...
return !!preg_match('#^[a-zA-Z][a-zA-Z0-9#.+-_]{2,100}$#', trim($username));
Which, unfortunately, allows these XSS-ready test strings:
'angle<bracket',
'angle>bracket',
'html<script>inside',
And I have no idea why since they already should explicitly be disallowed by the regex.
Here is a running test case:
http://ideone.com/od7dj
Anyone know why angle brackets are being allowed by a regex that doesn't explicitly allow for them? Am I supposed to escape one of those characters (.+-) as literals?

+-_ is your problem. You need to escape the - in a character class or move it to the end or beginning of the class.
For example:
/^[a-z][a-z0-9#.+_-]{2,100}\z/i

I think it's because of this: [+-_]
You are including all chars between '+' and '_', try changing the order to [+_-] (putting the dash at the end) or escape the dash.

REGEXP not catching some names correctly if certain values are at certain positions in the string

I have the following regex meant to test against valid name formats:
^[a-zA-Z]+(([\'\,\.\- ][a-zA-Z ])?[a-zA-Z]*)*$
it seems to work fine with all the expected odd name possibilities, including the following:
o'Bannon
Smith, Jr.
Double-barreled
I'm having problem when I plug this into my PHP code. If the first character is a number it passes through as valid.
If the last character is a space, comma, full-stop or other special allowed character, it's failing as invalid.
My PHP code is :
$v = 'Tested Value';
$value = (filter_var($v, FILTER_VALIDATE_REGEXP,array("options"=>array("regexp"=>"^[a-zA-Z]+(([\'\,\.\-,\ ][a-zA-Z ])?[a-zA-Z]*)*$^"))));
if (strlen($value) <2 && strlen($v) !=0) {
return "not valid";
}
What am I doing wrong here?

^[a-zA-Z]+(([\'\,\.\-,\ ][a-zA-Z ])?[a-zA-Z]*)*$^
The carets (^) at the beginning and end of the regex are being interpreted as regex deliminators, not as anchors. The regex isn't really matching the digits at the beginning of the string, it's skipping over them so it can start matching at the first letter it finds. You can use almost any ASCII punctuation character as the regex deliminator, but most people use # or ~, which are relatively uncommon and have no special meaning in regexes.
As for not allowing punctuation at the end, that's how the regex is written. Specifically, [\'\,\.\- ][a-zA-Z ] requires that each apostrophe, comma, period or hyphen be followed by a letter or a space. If you really want to allow any of those characters at the end, it's pretty simple:
~^(?:[a-z]+[',. -]*)+$~i
Of course, that's not a particularly good regex for validating names, but I have nothing better to offer; it's a job for which regexes are particularly ill-suited. And do you really want to be the one to tell your users their own names are invalid?

Your regex is way to complex
/^[a-z]+[',. a-z-]*$/i
should do the same thing

Why does this regex not validate in the same way in PHP?

when I try preg_match with the following expression: /.{0,5}/, it still matches string longer than 5 characters.
It does, however, work properly when trying in online regexp matcher

The site you reference, myregexp.com, is focussed on Java.
Java has a specific function for matching an exact pattern, without needing to use anchor characters. This is the function which myregexp.com uses.
In most other languages, in order to match an exact pattern, you would need to add the anchoring characters ^ and $ at the start and end of the pattern respectively, otherwise the regex assumes it only needs to find the matched pattern somewhere within the string, rather than the whole string being the match.
This means that without the anchors, your pattern will match any string, of any length, because whatever the string, it will contain within it somewhere a match for "zero to five of any character".
So in PHP, and Perl, and virtually any other language, you need your pattern to look like this:
/^.{0,5}$/
Having explained all that, I would make one final observation though: this specific pattern really doesn't need to be a regular expression -- you could achieve the same thing with strlen(). In addition, the dot character in regex may not work exactly as you expect: it typically matches almost any character; some characters, including new line characters, are excluded by default, so if your string contains five characters, but one of them is a new line, it will fail your regex when you might have expected it to pass. With this in mind, strlen() would be a safer option (or mb_strlen() if you expect to have unicode characters).
If you need to match any character in regex, and the default behaviour of the dot isn't good enough, there are two options: One is to add the s modifier at the end of the expression (ie it becomes /^.{0,5}$/s). The s modifier tells regex to include new line characters in the dot "any character" match.
The other option (which is useful for languages that don't support the s modifier) is to use an expression and its negative together in a character class - eg [\s\S] - instead of the dot. \s matches any white space character, and \S is a negative of \s, so any character not matched by \s. So together in a character class they match any character. It's more long winded and less readable than a dot, but in some languages it's the only way to be sure.
You can find out more about this here: http://www.regular-expressions.info/dot.html
Hope that helps.

You need to anchor it with ^$. These symbols match the beginning and end of the string respectively, so it must be 0-5 characters between the beginning and end. Leaving out the anchors will match anywhere in the string so it could be longer.
/^.{0,5}$/
For better readability, I would probably also enclose the . in (), but that's kind of subjective.
/^(.){0,5}$/

How bad is my regex?

Ok so I managed to solve a problem at work with regex, but the solution is a bit of a monster.
The string to be validated must be:
zero or more: A-Z a-z 0-9, spaces, or these symbols: . - = + ' , : ( ) /
But, the first and/or last characters must not be a forward slash /
This was my solution (used preg_match php function):
"/^[a-z\d\s\.\-=\+\',:\(\)][a-z\d\s\.\-=\+\',\/:\(\)]*[a-z\d\s\.\-=\+\',:\(\)]$|^[a-z\d\s\.\-=\+\',:\(\)]$/i"
A colleague thinks this is too big and complicated. Well it works, so is it really that bad? Anyone in the mood for some regex-golf?

You can simplify your expression to this:
/^(?:[a-z\d\s.\-=+',:()]+(?:/+[a-z\d\s.\-=+',:()]+)*)?$/i
The outer (?:…)? is to allow an empty string. The [a-z\d\s.\-=+',:()]+ allows to start with one or more of the specified characters except the /. If a / follows, it also must be followed by one or more of the other specified characters ((?:/[a-z\d\s.\-=+',:()]+)*).
Furthermore, inside a character set, you only need to escape the characters \, ], and depending on the position also ^ and -.

Try something like this instead
function validate($string) {
return (preg_match("/[a-zA-Z0-9.\-=+',:()/]*/", $string) && substr($string, 0,1) != '/' && substr($string, -1) != '/'))
}
It's a lot simpler to check the first and last character specifically. Otherwise you're left with dealing with a lot of overhead when it comes to empty strings and such. Your regex, for example, requires the string to be at least one character long, otherwise it doesn't validate. Despite "" fitting your criteria.

'#^(?!/)[a-z\d .=+\',:()/-]*$(?<!/)#i'
As others have observed, most of those characters don't need to be escaped inside a character class. Additionally, the hyphen doesn't need to be escaped if it's the last thing listed, and the slash doesn't need to be escaped if you use a different character as the regex delimiter (# in this case, but ~ is a popular choice, too).
I also ditched the double-quotes in favor of single-quotes, which meant I had to escape the single-quote in the regex. That's worth it because single-quoted strings are so much simpler to work with: no $variable interpolation, no embedded executable {code}, and the only characters you have to escape for them are the single-quote and the backslash.
But the main innovation here is the use of lookahead and lookbehind to exclude the slash as the first or last character. That's not just a code-golf tactic, either; I would write the regex this way anyway, because it expresses my intent so much better. Why force the next guy to parse those almost-identical character classes, when you can just say what you mean? "...but the first and last character can't be slashes."

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Would this regular expression work? - php

I noticed that you essentially have all of ASCII, except for backslash, space and the control characters at the start, so what about this one, instead? ^([!-\[\]-~]{4,})$

Related

PHP Preg_replace() pattern not work

preg_match username validation regex allows > and < despite those characters not being whitelisted

REGEXP not catching some names correctly if certain values are at certain positions in the string

Why does this regex not validate in the same way in PHP?

How bad is my regex?

Categories

Resources