I am insanely green with regular expressions. Here is what I am trying to accomplish: I want to require an 8 character password that allows upper and lowercase letters, numbers and !##$%^&*-_ characters. Here is what I have that doesn't appear to be working:
preg_match('([A-za-z0-9-_!##$%^&*]{8,})', $password)
Am I missing something really obvious?
Update: Yes, I was missing something really obvious - the open bracket [. However it still returns true when I use characters like a single quote or bracket. (Which are what I am trying to avoid.)
Basically you miss an opening [ character group bracket here:
↓
preg_match('([A-za-z0-9-_!##$%^&*()]{8,})', $password)
And you should also use delimiters. The parens will behave as such, but it's better to use a different pair to avoid ambiguity with a capture group:
preg_match('/^([A-za-z0-9-_!##$%^&*()]{8,})$/', $password)
This also adds start ^ and end $ assertions to match the whole string.
This may be easier to break into individual rules. I.e. a rule to check if the password is at least 8 characters long, another to check for an uppercase, another for lowercase, etc.
Also, it looks like you have some special characters in terms of regular expressions without any escaping. For example, the * character has special meaning in regular expressions other than some special cases. It either needs to be escaped \* or in brackets [*]
Related
I have a part of a string. I need to check if it would equal another string when I add a few symbols on. However, my use of delimiters (I believe) is not allowing for the matches to take place.
My IF statement:
if (preg_match("{" . "$words[$counter_words]" . "[<]N}", "$corpus[$counter_corpus]"))
My corpus:
{3(-)D[<]AN}
{dog[<]N}
{4(-)H(')er[<]N}
{4(-)H[<]A}
{A battery[<]h}
My partial array is as follows
dog
cat
3-D
plant
My goal is to match "dog" with "{dog[<]N}" (the [] and {} are delimiters). To try to compensate for this, I glue delimiters to the start and end of the string. Preg_match accepts it, but does not match the two together.
What would be the solution to this? I cannot find or think of a solution. Your help is greatly appreciated.
if (preg_match("{" . $words[$counter_words] . "\\[<\\]N}", $corpus[$counter_corpus]))
[ and ] have special meaning in regular expressions. If you don't want the special meaning you need to escape them, \[. But because this is inside a PHP string, to get a \ character, you must enter \\.
I have a part of a string. I need to check if it would equal another string when I add a few symbols on.
To check if a string equals another one, we don't need preg_match(); this would do:
if ("{{$words[$counter_words]}[<]N}" == $corpus[$counter_corpus])
If you use preg_match(), you have to heed the PCRE regex syntax:
When using the PCRE functions, it is required that the pattern is enclosed
by delimiters. A delimiter can be any non-alphanumeric,
non-backslash, non-whitespace character.
Often used delimiters are forward slashes (/), …
The added symbols { and } at the start and end of your string were taken as pattern-enclosing delimiters, while you meant them to be part of the pattern. You have to add actual delimiters:
if (preg_match("/{{$words[$counter_words]}\[<]N}/", $corpus[$counter_corpus]))
Also of interest:
Double quoted
Variable parsing / Complex (curly) syntax
I've thought that php's perl compatible regular expression (preg library) supports curly brackets as delimiters. This should be fine:
{ello {world}i // should match on Hello {World
The main point of curly brackets is that it only takes the most left and right ones, thus requiring no escaping for the inner ones. As far as I know, php requires the escaping
{ello \{world}i // this actually matches on Hello {World
Is this the expected behavior or bug in php preg implementation?
When in Perl you use for the pattern delimiter any of the four paired ASCII bracket types, you only need to escape unpaired brackets within the pattern. This is indeed the entire purpose of using brackets. This is documented in the perlop manpage under “Quote and Quote-like Operators”, which reads in part:
Non-bracketing delimiters use the same character fore and aft,
but the four sorts of brackets (round, angle, square, curly)
will all nest, which means that
q{foo{bar}baz}
is the same as
'foo{bar}baz'
Note, however, that this does not always work for quoting Perl code:
$s = q{ if($a eq "}") ... }; # WRONG
That’s why you often see people use m{…} or qr{…} in Perl code, especially for multiline patterns used with /x ᴀᴋᴀ (?x). For example:
return qr{
(?= # pure lookahead for conjunctive matching
\A # always from start
. *? # going only as far as we need to to find the pattern
(?:
${case_flag}
${left_boundary}
${positive_pattern}
${right_boundary}
)
)
}sxm;
Notice how those nested braces are no problem.
Expected behavior as far as I know, otherwise how else would the compiler allow group limiters? e.g.
[a-z]{1,5}
From http://lv.php.net/manual/en/regexp.reference.delimiters.php:
If the delimiter needs to be matched
inside the pattern it must be escaped
using a backslash. If the delimiter
appears often inside the pattern, it
is a good idea to choose another
delimiter in order to increase
readability.
So this is expected behavior, not a bug.
I found that no escaping is required in this case:
'ello {world'i
(ello {world)i
So my theory is, that the problem is with the '{' delimiters only. Also, the following two produce the same error:
{ello {world}i
(ello (world)i
Using starting/ending braces as delimiters may require to escape the given braces in the expression.
I got this issue figuring out how to build a regexp for verifying a netbios name. According to the ms standard these characters are illegal
\/:*?"<>|
So, thats what I'm trying to detect. My regex is looking like this
^[\\\/:\*\?"\<\>\|]$
But, that wont work.
Can anyone point me in the right direction? (not regexlib.com please...)
And if it matters, I'm using php with preg_match.
Thanks
Your regular expression has two problems:
you insist that the match should span the entire string. As Andrzej says, you are only matching strings of length 1.
you are quoting too many characters. In a character class (i.e. []), you only need to quote characters that are special within character classes, i.e. hyphen, square bracket, backslash.
The following call works for me:
preg_match('/[\\/:*?"<>|]/', "foo"); /* gives 0: does not include invalid characters */
preg_match('/[\\/:*?"<>|]/', "f<oo"); /* gives 1: does include invalid characters */
As it stands at the moment, your regex will match the start of the string (^), then exactly one of the characters in the square brackets (i.e. the illegal characters), then then end of the string ($).
So this likely isn't working because a string of length > 1 will trivially fail to match the regex, and thus be considered OK.
You likely don't need the start and end anchors (the ^ and $). If you remove these, then the regex should match one of the bracketed characters occurring anywhere on the input text, which is what you want.
(Depending on the exact regex dialect, you may canonically need less backslashes within the square brackets, but they are unlikely to do any harm in any case).
Question
What does it mean when a regular expression is surrounded by # symbols? Does that mean something different than being surround by slashes? What about when #x or #i are on the end? Now that I think about it, what do the surrounding slashes even mean?
Background
I saw this StackOverflow answer, posted by John Kugelman, in which he displays serious Regex skills.
Now, I'm used to seeing regexes surrounded by slashes as in
/^abc/
But he used a regex surrounded by # symbols:
'#
^%
(.{2}) # State, 2 chars
([^^]{0,12}.) # City, 13 chars, delimited by ^
([^^]{0,34}.) # Name, 35 chars, delimited by ^
([^^]{0,28}.) # Address, 29 chars, delimited by ^
\?$
#x'
In fact, it seems to be in the format:
#^abc#x
In the process of trying to google what that means (it's a tough question to google!), I also saw the format:
#^abc#i
It's clear the x and the i are not matched characters.
So what does it all mean???
Thanks in advance for any and all responses,
-gMale
The surrounding slashes are just the regex delimiters. You can use any character (afaik) to do that - the most commonly used is the /, other I've seen somewhat commonly used is #
So in other words, #whatever#i is essentially the same as /whatever/i (i is modifier for a case-insensitive match)
The reason you might want to use something else than the / is if your regex contains the character. You avoid having to escape it, similar to using '' for strings instead of "".
Found this from a "Related" link.
The delimiter can be any character that is not alphanumeric, whitespace or a backslash character.
/ is the most commonly used delimiter, since it is closely associated with regex literals, for instance in JavaScript where they are the only valid delimiter. However, any symbol can be used.
I have seen people use ~, #, #, even ! to delimit their regexes in a way that avoids using symbols that are also in the regex. Personally I find this ridiculous.
A lesser-known fact is that you can use a matching pair of brackets to delimit a regex in PHP. This has the tremendous advantage of having an obvious difference between the closing delimiter, and the symbol showing up in the pattern, and therefore don't need any escaping. My personal preference is this:
(^abc)i
By using parentheses, I remind myself that in a match, $m[0] is always the full match, and the subpatterns start at $m[1].
^([a-zA-Z0-9!##$%^&*|()_\-+=\[\]{}:;\"',<.>?\/~`]{4,})$
Would this regular expression work for these rules?
Must be atleast 4 characters
Characters can be a mix of alphabet (capitalized/non-capitalized), numeric, and the following characters: ! # # $ % ^ & * ( ) _ - + = | [ { } ] ; : ' " , < . > ? /
It's intended to be a password validator. The language is PHP.
Yes?
Honestly, what are you asking for? Why don't you test it?
If, however, you want suggestions on improving it, some questions:
What is this regex checking for?
Why do you have such a large set of allowed characters?
Why don't you use /\w/ instead of /0-9a-zA-Z_/?
Why do you have the whole thing in ()s? You don't need to capture the whole thing, since you already have the whole thing, and they aren't needed to group anything.
What I would do is check the length separately, and then check against a regex to see if it has any bad characters. Your list of good characters seems to be sufficiently large that it might just be easier to do it that way. But it may depend on what you're doing it for.
EDIT: Now that I know this is PHP-centric, /\w/ is safe because PHP uses the PCRE library, which is not exactly Perl, and in PCRE, \w will not match Unicode word characters. Thus, why not check for length and ensure there are no invalid characters:
if(strlen($string) >= 4 && preg_match('[\s~\\]', $string) == 0) {
# valid password
}
Alternatively, use the little-used POSIX character class [[:graph:]]. It should work pretty much the same in PHP as it does in Perl. [[:graph:]] matches any alphanumeric or punctuation character, which sounds like what you want, and [[:^graph:]] should match the opposite. To test if all characters match graph:
preg('^[[:graph:]]+$', $string) == 1
To test if any characters don't match graph:
preg('[[:^graph:]]', $string) == 0
You forgot the comma (,) and full stop (.) and added the tilde (~) and grave accent (`) that were not part of your specification. Additionally just a few characters inside a character set declaration have to be escaped:
^([a-zA-Z0-9!##$%^&*()|_\-+=[\]{}:;"',<.>?/~`]{4,})$
And that as a PHP string declaration for preg_match:
'/^([a-zA-Z0-9!##$%^&*()|_\\-+=[\\]{}:;"\',<.>?\\/~`]{4,})$/'
I noticed that you essentially have all of ASCII, except for backslash, space and the control characters at the start, so what about this one, instead?
^([!-\[\]-~]{4,})$
You are extra escaping and aren't using some predefined character classes (such as \w, or at least \d).
Besides of that and that you are anchoring at the beginning and at the end, meaning that the regex will only match if the string starts and ends matching, it looks correct:
^([a-zA-Z\d\-!$##$%^&*()|_+=\[\]{};,."'<>?/~`]{4,})$
If you really mean to use this as a password validator, it reeks of insecurity:
Why are you allowing 4 chars passwords?
Why are you forbidding some characters? PHP can't handle some? Why would you care? Let the user enter the characters he pleases, after all you'll just end up storing a hash + salt of it.
No. That regular expression would not work for the rules you state, for the simple reason that $ by default matches before the final character if it is a newline. You are allowing password strings like "1234\n".
The solution is simple. Either use \z instead of $, or apply the D modifier to the regex.