How to include hypens and apostrophes for a PHP Regex? - php

I'm writing a password regex in PHP that should return false for any string that has at least one character that is not:
a lowercase letter a-z
an uppercase letter A-Z
a number 0-9
a whitespace " *"
a punctuation symbol :,.!().?";
So far I have this:
<?php
$password = 'azAZ0 giggles 9*":,.!() .?";';
$regex1 = '#^[a-zA-Z0-9" *":,.!().?";\']+$#i';
if (preg_match($regex1, $password)) {
echo "A match was found.";
} else {
echo "A match was not found.";
}
?>
Does this seem to be working as I intend it to, or do you see any glaring errors?
And what should I add to the regex so that it should return false for any string that has at least one character that is not:
a hyphen -

Your regex is pretty close to the target, but not totally correct.
I would use this one:
$regex1 = '/^[a-z0-9 :,.!().?";\'-]+$/i';
Points of interest:
Moved the hyphen to the end of the list, so that it won't be mistaken for a character range delimiter
Included an apostrophe by escaping it with a backslash, as per PHP's string escaping rules
Removed the A-Z part since the regex includes the case-insensitive modifier
Replaced * (which in this context means "a space or an asterisk") with just a space -- if you want to also allow tabs and newlines as part of the password (unlikely), replace it with \s

You simply need to escape ' using \. Try this
$regex1 = '#^[a-zA-Z0-9" *":,.!-().?";\']+$#i';
And you already seem to have - in the regex.

Within a character class (denoted by square brackets in regex), a minus - is always introducing a range: [A-Z].
You have !-(, which is no meaningful range and therefore does not do what you think. Solution:
Move the - to the start or the end of the character class: [-A-Z...] / [A-Z...-]
Escape the -: [A-Z\-...]
The other question you ask is "How do I get a single quote into a PHP string?" and really has nothing to do with regex. But "escape it" is the answer, of course.

Related

I need a regex that will limit input to letters, numbers and a couple special chars like newline

I'd like to limit the users input to a string that contains between 2 and 1024 letters, numbers, spaces, periods, underscores, dashes, carriage returns (new lines) and tabs. The carriage returns and tabs do not work in my regex. I do realize that there are other ways to check the length.
if (!preg_match('/^[a-zA-Z0-9 ._-\r\t]{2,1024}$/', $userstring))
{
echo '<p>Bad string</p>';
}
Thanks ahead of time.
The page has a form with the a control on it.
If I type: 1CR2 (that is 1 followed by a carriage return and then a 2), and submit the page, the error message will be displayed and the box will have 1rn2 in it.
As you're trying to match strings that may have line breaks in them, you need to enable multi-line in your regex using a Pattern Modifier. m will enable multi-line, e.g.:
if (!preg_match('/^[a-zA-Z0-9 ._-\r\t]{2,1024}/m', $userstring))
{
echo '<p>Bad string</p>';
}
The $ was also removed in case there is trailing white space. Suggestions that others have made to use \s instead of \r\t seem reasonable to me.
Reserved characters (including . and -) need to be escaped with a backslash (\). Try the following regex: ^[\w\s\.\-]{2,1024}$. \w matches word characters ([a-zA-Z0-9_]) and \s matches whitespace characters ([ \t\r\n\f]). That leaves you with the . and - that need to be escaped. Final PHP code:
if (!preg_match('/^[\w\s.\-]{2,1024}$/', $userstring))
{
echo '<p>Bad string</p>';
}
More info on shorthand classes and reserved characters.
Edit: . does not need to be escaped in a character class ([]), thanks Barmar.
Try:
if (!preg_match("/^[a-zA-Z0-9 ._\-\n\t]{2,1024}$/", $userstring))
{
echo '<p>Bad string</p>';
}
I believe you'll want your regular expression wrapped in double-quotes so that your newline (you should be using \n instead of \r) and tab characters are properly interpolated. Also, you should escape the - because it otherwise is used to define a range when used within brackets.

PCRE regex with lookahead and lookbehind always returns true

I’m trying to create a regex for form validation but it always returns true. The user must be able to add something like {user|2|S} as input but also use brackets if they are escaped with \.
This code checks for the left bracket { for now.
$regex = '/({(?=([a-zA-Z0-9]+\|[0-9]*\|(S|D[0-9]*)}))|[^{]|(?<=\\\){)*/';
if (preg_match($regex, $value)) {
return TRUE;
} else {
return FALSE;
}
A possible correct input would be:
Hello {user|1|S}, you have {amount|2|D2}
or
Hello {user|1|S}, you have {amount|2|D2} in \{the_bracket_bank\}
However, this should return false:
Hello {user|1|S}, you have {amount|2}
and this also:
Hello {user|1|S}, you have {amount|2|D2} in {the_bracket_bank}
A live example can be found here: http://regexr.com?37tpu Note that there is a \ in the lookbehind at the end, PHP was giving me error messages because I had to escape it an extra time in my code.
The main error is that you do not specify that the regex should match from the beginning to the of the checked string. Use the ^ and $ assertions.
I think you have to escape { and } in your regex as they have special meaning. Together they form a quantifier.
The (?<=\\\) is better written (?<=\\\\). The backslash has to be double escaped as it has special meaning in both single-quoted string and PCRE regex. Using \\\ works too, because if single-quoted string contains any escape sequence except \\ and \', it handles it as literal backslash and letter, therefore \) is taken literally. But explicitly escaping the backslash twice seems easier to read to me.
The regex should be
$regex = '/^(\{(?=([a-zA-Z0-9]+\|[0-9]*\|(S|D[0-9]*)\}))|[^{]|(?<=\\\\)\{)*$/';
But notice that the look-around assertions are not necessary. This regex should do the job too:
$regex = '/^([^{]|\\\{|\{[a-zA-Z0-9]+\|[0-9]*\|(S|D[0-9]*)\})*$/';
Any non-{ characters are matched by the first alternative. When a { is read, one of the remaining two alternatives is used. Either the pattern for the brace thing matches, or the regex engine backtracks one character and tries to match \{ character sequence. If it fails, both ways, it backtracks further till it reaches string start and fails completely.
Matching without lookbehind
You can make a regex for this without using lookbehind/lookaheads (which is usually recommended).
For example, if your requirement is that you can match any character but a { and a } unless it's preceded by a \. You can also say:
Match any character but a { and a } OR match a \{ or a \}. To match any character but a { and a } use:
[^{}]
To match a \{ use:
\\\{
One backslash is for escaping the { (which might not be necessary, depending on your regex compiler) and one backslash is for escaping the other backslash.
You would end up with this:
(?:
[^{}]
|
\\\{
|
\\\}
)+
I nicely formatted this regex so that it's readable. If you want to use it in your code like this make sure to use the [PCRE_EXTENDED][1] modifier.
Looks more of a job for a lookbehind to me:
/((?<!\\\\)\{[a-zA-Z0-9]+\|[0-9]+\|[SD][0-9]*\})/
However, the obfuscation factor is so high that I would rather recognize all bracketed strings and parse them later.

allow parentheses and other symbols in regex

I've made this regex:
^[a-zA-Z0-9_.-]*$
Supports:
letters [uppercase and lowercase]
numbers [from 0 to 9]
underscores [_]
dots [.]
hyphens [-]
Now, I want to add these:
spaces [ ]
comma [,]
exclamation mark [!]
parenthesis [()]
plus [+]
equal [=]
apostrophe [']
double quotation mark ["]
at [#]
dollar [$]
percent [%]
asterisk [*]
For example, this code accept only some of the symbols above:
^[a-zA-Z0-9 _.,-!()+=“”„#"$#%*]*$
Returns:
Warning: preg_match(): Compilation failed: range out of order in character class at offset 16
Make sure to put hyphen - either at start or at end in character class otherwise it needs to be escaped. Try this regex:
^[a-zA-Z0-9 _.,!()+=`,"#$#%*-]*$
Also note that because * it will even match an empty string. If you don't want to match empty strings then use +:
^[a-zA-Z0-9 _.,!()+=`,"#$#%*-]+$
Or better:
^[\w .,!()+=`,"#$#%*-]+$
TEST:
$text = "_.,!()+=,#$#%*-";
if(!preg_match('/\A[\w .,!()+=`,"#$#%*-]+\z/', $text)) {
echo "error.";
}
else {
echo "OK.";
}
Prints:
OK.
The hyphen is being treated as a range marker -- when it sees ,-! it thinks you're asking for a range all characters in the charset that fall between , and ! (ie the same way that A-Z works. This isn't what you want.
Either make sure the hyphen is the last character in the character class, as it was before, or escape it with a backslash.
I would also point out that the quote characters you're using “”„ are part of an extended charset, and are not the same as the basic ASCII quotes "'. You may want to include both sets in your pattern. If you do need to include the non-ASCII characters in the pattern, you should also add the u modifier after the end of your pattern so it correctly picks up unicode characters.
Try escaping your regex: [a-zA-Z0-9\-\(\)\*]
Check if this help you: How to escape regular expression special characters using javascript?
Inside of a character class [...] the hyphen - has a special meaning unless it is the first or last character, so you need to escape it:
^[a-zA-Z0-9 _.,\-!()+=“”„#"$#%*]*$
None of the other characters need to be escaped in the character class (except ]). You will also need to escape the quote indicating the string. e.g.
'/[\']/'
"/[\"]/"
try this
^[A-Z0-9][A-Z0-9*&!_^%$#!~#,=+,./\|}{)(~`?][;:\'""-]{0,8}$
use this link to test
trick is i reverse ordered the parenthesis and other braces that took care of some problems. And for square braces you must escape them

PHP regular expression pattern allows unwanted literal asterisks

I have a regular expression that allows only specific characters from the name fields in an HTML form, namely letters, white space, single quotes, hyphens and periods. Here is the pattern:
return mb_ereg_match("^[\w\s'-\.]+$", $name);
Problem is this pattern, for some reason, returns true when there are literal asterisks in $name. This shouldn't be possible unless I'm missing something. I've done multiple searches on literal asterisks and all I found was the "\*" pattern for intentionally matching them.
The same pattern in preg_match() also returns a match when passed a string like "*John".
What the heck am I missing?
You need a double-backslash in front of these codes. One to escape the backslash, one to escape the escape sequence.
You also need to escape the -, otherwise it accepts all characters "between" ' and ..
return mb_ereg_match("^[\\w\\s'\\-\\.]+$", $name);
Have a look at a working case (using preg_match): http://ideone.com/E8afAM
When enclosed in square-brackets, the hyphen acts as a special character to denote a range. In your case, it's matching all characters in the range ' to ..
Escaping the hyphen should return the desired result:
^[\w\s'\-\.]+$
I have a regular expression that allows only specific characters from the name fields in an HTML form, namely letters, white space, single quotes, hyphens and periods.
You miss, that \w is not a letter character. php.net says:
A "word" character is any letter or digit or the underscore character, that is, any character which can be part of a Perl "word".
And, the perl definition is:
A \w matches a single alphanumeric character (an alphabetic character, or a decimal digit) or a connecting punctuation character, such as an underscore ("_").
The connecting punctuation character should mean only _ as i read, but this is maybe a multibyte extension's bug.
If you use mb_ereg_match only for whole unicode matches, give a try to preg_match's /u modifier & the Unicode character properties feature, since php 5.1.0

REGEXP not catching some names correctly if certain values are at certain positions in the string

I have the following regex meant to test against valid name formats:
^[a-zA-Z]+(([\'\,\.\- ][a-zA-Z ])?[a-zA-Z]*)*$
it seems to work fine with all the expected odd name possibilities, including the following:
o'Bannon
Smith, Jr.
Double-barreled
I'm having problem when I plug this into my PHP code. If the first character is a number it passes through as valid.
If the last character is a space, comma, full-stop or other special allowed character, it's failing as invalid.
My PHP code is :
$v = 'Tested Value';
$value = (filter_var($v, FILTER_VALIDATE_REGEXP,array("options"=>array("regexp"=>"^[a-zA-Z]+(([\'\,\.\-,\ ][a-zA-Z ])?[a-zA-Z]*)*$^"))));
if (strlen($value) <2 && strlen($v) !=0) {
return "not valid";
}
What am I doing wrong here?
^[a-zA-Z]+(([\'\,\.\-,\ ][a-zA-Z ])?[a-zA-Z]*)*$^
The carets (^) at the beginning and end of the regex are being interpreted as regex deliminators, not as anchors. The regex isn't really matching the digits at the beginning of the string, it's skipping over them so it can start matching at the first letter it finds. You can use almost any ASCII punctuation character as the regex deliminator, but most people use # or ~, which are relatively uncommon and have no special meaning in regexes.
As for not allowing punctuation at the end, that's how the regex is written. Specifically, [\'\,\.\- ][a-zA-Z ] requires that each apostrophe, comma, period or hyphen be followed by a letter or a space. If you really want to allow any of those characters at the end, it's pretty simple:
~^(?:[a-z]+[',. -]*)+$~i
Of course, that's not a particularly good regex for validating names, but I have nothing better to offer; it's a job for which regexes are particularly ill-suited. And do you really want to be the one to tell your users their own names are invalid?
Your regex is way to complex
/^[a-z]+[',. a-z-]*$/i
should do the same thing

Categories