PHP preg_match expression not matching correct characters - php

I'm trying to catch any characters that are not letters, numbers, or .-_ (period, dash, underscore)
My code is
return !preg_match('/[^A-Za-z0-9.-_]/', $strToFilter);
My hope is that it will return false when it find an invalid character. As of now it allows ._ (period and underscore) but does not allow - (dash). It also does not detect characters like /, \, [, ], %, ^, etc as invalid characters.
What is wrong with my expression?

In Regex character classes, you can't match a literal hyphen unless it is:
immediately against either bracket,
follows the negate caret (^), or
is escaped using the backslash (\)
The hyphen can be included right after the opening bracket, or right before the closing bracket, or right after the negating caret. Both [-x] and [x-] match an x or a hyphen. [^-x] and [^x-] match any character thas is not an x or a hyphen. This works in all flavors discussed in this tutorial. Hyphens at other positions in character classes where they can't form a range may be interpreted as literals or as errors. Regex flavors are quite inconsistent about this.
Source - See Metacharacters Inside Character Classes.

Just escape the dash:
return !preg_match('/[^A-Za-z0-9.\-_]/', $strToFilter);

Related

PCRE2 Regex error escape sequence is invalid in character class

I have the following regex expression, for whatever reason I keep getting an error when using this with PCRE2. I'm unsure what would be causing the error.
/^.(?=.{1,})(?=.[A-Z])(?=.[0-9])(?=.[\d\X])(?=(?:.[!##$%^&()\\\-_=\+{}[\]|;:,.]){1,}).{8,}$/
The error in the log is:
exception: preg_match(): Compilation failed: escape sequence is invalid in character class at offset 43
As per this Red Hat Bugzilla bug, this is a documented PCRE2 behavior:
Escape sequences in character classes
All the sequences that define a
single character value can be used both inside and outside character
classes. In addition, inside a character class, \b is interpreted as
the backspace character (hex 08).
When not followed by an opening brace, \N is not allowed in a
character class. \B, \R, and \X are not special inside a character
class. Like other unrecognized alphabetic escape sequences, they cause
an error. Outside a character class, these sequences have different
meanings.
To fix your regex, I'd suggest something like
if (preg_match('/^(?=.*[A-Z])(?=.*[a-z])(?=.*[!##$%^&()\\\\_=+{}[\]|;:,.-]).{8,}$/', 'aB9!ssssddssdd')){
echo "yes";
}
where
^ - start of string
(?=.*[A-Z]) - at least one uppercase ASCII letter
(?=.*[a-z]) - at least one lowercase ASCII letter
(?=.*[0-9]) - at least one ASCII digit
(?=.*[!##$%^&()\\\\_=+{}[\]|;:,.-]) - at least one special char, !, #, #, $, %, ^, &, (, ), \, _, =, +, {, }, [, ], |, ;, :, ,, . and -
.{8,} - at least 8 chars, no line breaks
$ - end of string.

matching dot in a range preg_match

I have been trying to match a literal dot . using preg_match(), but the regex engine complains at the position of the \..
What is wrong here and how can I fix it?
/^[A-Za-z0-9_-\.]{3,16}$/
You have a malformed character class. Use
/^[A-Za-z0-9_.-]{3,16}$/
You are getting - Text range out of order error. You do not have to escape a hyphen at the final position in the character range. If you use it inside the character class, you must escape it.
Inside the character class, almost all characters are treated as literals, except for closing bracket ], the backslash \, the caret ^, and the hyphen -. (see Metacharacters Inside Character Classes at Regular-Expressions.info).
The caret at the non-initial position will also be treated as a literal. If ] is at the initial position, it does not have to be escaped in PHP (but must be escaped in JavaScript!).
From PCRE reference:
Perl, when in warning mode, gives warnings for character classes
such as [A-\d] or [a-[:digit:]]. It then treats the hyphens as literals. PCRE has no warning features, so it gives an error in these cases because they are almost certainly user mistakes.
Problem is presence of unescaped hyphen in the middle of character class. Fix it by:
/^[A-Za-z0-9_.-]{3,16}$/

allow parentheses and other symbols in regex

I've made this regex:
^[a-zA-Z0-9_.-]*$
Supports:
letters [uppercase and lowercase]
numbers [from 0 to 9]
underscores [_]
dots [.]
hyphens [-]
Now, I want to add these:
spaces [ ]
comma [,]
exclamation mark [!]
parenthesis [()]
plus [+]
equal [=]
apostrophe [']
double quotation mark ["]
at [#]
dollar [$]
percent [%]
asterisk [*]
For example, this code accept only some of the symbols above:
^[a-zA-Z0-9 _.,-!()+=“”„#"$#%*]*$
Returns:
Warning: preg_match(): Compilation failed: range out of order in character class at offset 16
Make sure to put hyphen - either at start or at end in character class otherwise it needs to be escaped. Try this regex:
^[a-zA-Z0-9 _.,!()+=`,"#$#%*-]*$
Also note that because * it will even match an empty string. If you don't want to match empty strings then use +:
^[a-zA-Z0-9 _.,!()+=`,"#$#%*-]+$
Or better:
^[\w .,!()+=`,"#$#%*-]+$
TEST:
$text = "_.,!()+=,#$#%*-";
if(!preg_match('/\A[\w .,!()+=`,"#$#%*-]+\z/', $text)) {
echo "error.";
}
else {
echo "OK.";
}
Prints:
OK.
The hyphen is being treated as a range marker -- when it sees ,-! it thinks you're asking for a range all characters in the charset that fall between , and ! (ie the same way that A-Z works. This isn't what you want.
Either make sure the hyphen is the last character in the character class, as it was before, or escape it with a backslash.
I would also point out that the quote characters you're using “”„ are part of an extended charset, and are not the same as the basic ASCII quotes "'. You may want to include both sets in your pattern. If you do need to include the non-ASCII characters in the pattern, you should also add the u modifier after the end of your pattern so it correctly picks up unicode characters.
Try escaping your regex: [a-zA-Z0-9\-\(\)\*]
Check if this help you: How to escape regular expression special characters using javascript?
Inside of a character class [...] the hyphen - has a special meaning unless it is the first or last character, so you need to escape it:
^[a-zA-Z0-9 _.,\-!()+=“”„#"$#%*]*$
None of the other characters need to be escaped in the character class (except ]). You will also need to escape the quote indicating the string. e.g.
'/[\']/'
"/[\"]/"
try this
^[A-Z0-9][A-Z0-9*&!_^%$#!~#,=+,./\|}{)(~`?][;:\'""-]{0,8}$
use this link to test
trick is i reverse ordered the parenthesis and other braces that took care of some problems. And for square braces you must escape them

PHP regular expression pattern allows unwanted literal asterisks

I have a regular expression that allows only specific characters from the name fields in an HTML form, namely letters, white space, single quotes, hyphens and periods. Here is the pattern:
return mb_ereg_match("^[\w\s'-\.]+$", $name);
Problem is this pattern, for some reason, returns true when there are literal asterisks in $name. This shouldn't be possible unless I'm missing something. I've done multiple searches on literal asterisks and all I found was the "\*" pattern for intentionally matching them.
The same pattern in preg_match() also returns a match when passed a string like "*John".
What the heck am I missing?
You need a double-backslash in front of these codes. One to escape the backslash, one to escape the escape sequence.
You also need to escape the -, otherwise it accepts all characters "between" ' and ..
return mb_ereg_match("^[\\w\\s'\\-\\.]+$", $name);
Have a look at a working case (using preg_match): http://ideone.com/E8afAM
When enclosed in square-brackets, the hyphen acts as a special character to denote a range. In your case, it's matching all characters in the range ' to ..
Escaping the hyphen should return the desired result:
^[\w\s'\-\.]+$
I have a regular expression that allows only specific characters from the name fields in an HTML form, namely letters, white space, single quotes, hyphens and periods.
You miss, that \w is not a letter character. php.net says:
A "word" character is any letter or digit or the underscore character, that is, any character which can be part of a Perl "word".
And, the perl definition is:
A \w matches a single alphanumeric character (an alphabetic character, or a decimal digit) or a connecting punctuation character, such as an underscore ("_").
The connecting punctuation character should mean only _ as i read, but this is maybe a multibyte extension's bug.
If you use mb_ereg_match only for whole unicode matches, give a try to preg_match's /u modifier & the Unicode character properties feature, since php 5.1.0

Validating Regular Sentence Grammar with Regex

So I want to validate a string based on whether or not it contains only grammatical characters as well as numbers and letters.(basically A-Z, a-z, 0-9, plus periods(.), commas(,) colons(:), semicolons(;), hyphens(-), single quotes('), double quotes(") and parentheses(). I am getting a PHP error that says "Compilation failed: range out of order in character clas". What regex code should I be using?
This is the one I'm currently using:
^[a-zA-Z0-9_:;-()'\" ]*^
You need to escape - which would then become this ^[a-zA-Z0-9_:;\-()'\" ]* . - has a special meaning inside character set so it needs to be escaped. ^ in the end is also not necessary. The regex can also simplified using \w like this
^[\w:;()'"\s-]*
\w matches letters, digits, and underscores.
The problem is that you have a dash character in the regex, which the parser is interpreting as a range instead of as a literal dash. You can fix that by:
Escaping it with a backslash (^[a-zA-Z0-9_:;\-()'\" ]*^)
Putting it at the start (^[-a-zA-Z0-9_:;()'\" ]*^)
Putting it at the end (^[a-zA-Z0-9_:;()'\" -]*^)

Categories