matching dot in a range preg_match - php

I have been trying to match a literal dot . using preg_match(), but the regex engine complains at the position of the \..
What is wrong here and how can I fix it?
/^[A-Za-z0-9_-\.]{3,16}$/

You have a malformed character class. Use
/^[A-Za-z0-9_.-]{3,16}$/
You are getting - Text range out of order error. You do not have to escape a hyphen at the final position in the character range. If you use it inside the character class, you must escape it.
Inside the character class, almost all characters are treated as literals, except for closing bracket ], the backslash \, the caret ^, and the hyphen -. (see Metacharacters Inside Character Classes at Regular-Expressions.info).
The caret at the non-initial position will also be treated as a literal. If ] is at the initial position, it does not have to be escaped in PHP (but must be escaped in JavaScript!).
From PCRE reference:
Perl, when in warning mode, gives warnings for character classes
such as [A-\d] or [a-[:digit:]]. It then treats the hyphens as literals. PCRE has no warning features, so it gives an error in these cases because they are almost certainly user mistakes.

Problem is presence of unescaped hyphen in the middle of character class. Fix it by:
/^[A-Za-z0-9_.-]{3,16}$/

Related

Meaning of a dash between mixed characters in regex?

I'm just getting my feet wet with regexes and I came across this within a PHP program that someone else had written:
[ -\w]. Note that the dash is not the first character, there is a space preceding it.
I can't make heads or tails of what it means. I know that the dash between characters inside brackets normally indicates a range, i.e. [a-z] matches any lowercase character "a" through "z", but what does it match when the dash is between characters of different types?
My first thought was that it just matches any space or alphanumeric character, but then the dash wouldn't be necessary. My second thought was that it's matching spaces, alphanumerics, and the dash; but then I realized that the dash would probably be either escaped or moved to the front or back for that.
I've googled around and can't find anything about using a dash in a character class with mixed characters. Maybe I'm using the wrong search terms.
This might help : http://www.regular-expressions.info/charclass.html in the section "Metacharacters Inside Character Classes" it says :
Hyphens at other positions in character classes where they can't
form a range may be interpreted as literals or as errors. Regex
flavors are quite inconsistent about this.
My guess would be that it is being intepreted as a literal, so the regexp would match a space, hyphen or \w .
As a reference, it looks invalid in PCRE:
Debuggex Demo
In the PCRE reference §16. we find:
Perl, when in warning mode, gives warnings for character classes
such as [A-\d] or [a-[:digit:]]. It then treats the hyphens as liter-
als. PCRE has no warning features, so it gives an error in these cases
because they are almost certainly user mistakes.
[ -\w] produces a warning in perl but not in php.
Your regex [ -\w] seems to be a misplaced one as it will only match characters like this:
[ !"#$%&'()*+,./-]
As due to - appearing in the middle it will act as a range between space (32) and first \w (48) characters.

Differences between two regular expressions

Do anybody know why this regex:
/^(([a-zA-Z0-9\(\)áéíóúÁÉÍÓÚñÑ,\.°-]+ *)+)$/
works but this one doesn't:
/^(([a-zA-Z0-9áéíóúÁÉÍÓÚñÑ,\.°-\(\)]+ *)+)$/
The difference is the place where the parenthesis are... I tryed with some online PHP regex testers and got the same result. The second one simply doesn't work...
PHP returns:
preg_match(): Compilation failed: range out of order in character class at offset 44 in...
This is not a critic question because I've managed to make it work but I have the curiosity!
Maybe the unicode characters are changing something?
When the - character is used inside of brackets (indicating a character set) it indicates a range unless it is the last character in the set, first character in the set, or directly after the opening negating character. Then it means a literal dash. By moving it from the end to the middle you changed its meaning. If you want to keep it in the middle you will need to escape it: \-.
If the hyphen is placed as the first or last character in the character class, it is treated as a literal - (as opposed to a range), and as a result do not require escaping.
These are the positions where the hyphen do not need to be escaped:
right after the opening bracket ([), or
right before the closing bracket (]), or
right after the negating caret (^)
In the second regular expression, you're placing the hyphen in the middle, and the regular expression engine tries to create a range with the character before the hyphen, the character after the hyphen, and all characters that lie between them in numerical order. As such a range isn't possible, an error message is triggered. See asciitable.com for the character table.
Putting the hyphen last in the expression actually causes it to not require escaping, as it then can't be part of a range, however you might still want to get into the habit of always escaping it.
At your first regex you've managed every thing correctly even that - hyphen which is at the end of it. well it should be there too! I mean it has two places if you don't want to escape it, one place is at the end of char class and the other one at the beginning of char class!
You guessed nice! otherwise you should escape it!

PHP preg_match expression not matching correct characters

I'm trying to catch any characters that are not letters, numbers, or .-_ (period, dash, underscore)
My code is
return !preg_match('/[^A-Za-z0-9.-_]/', $strToFilter);
My hope is that it will return false when it find an invalid character. As of now it allows ._ (period and underscore) but does not allow - (dash). It also does not detect characters like /, \, [, ], %, ^, etc as invalid characters.
What is wrong with my expression?
In Regex character classes, you can't match a literal hyphen unless it is:
immediately against either bracket,
follows the negate caret (^), or
is escaped using the backslash (\)
The hyphen can be included right after the opening bracket, or right before the closing bracket, or right after the negating caret. Both [-x] and [x-] match an x or a hyphen. [^-x] and [^x-] match any character thas is not an x or a hyphen. This works in all flavors discussed in this tutorial. Hyphens at other positions in character classes where they can't form a range may be interpreted as literals or as errors. Regex flavors are quite inconsistent about this.
Source - See Metacharacters Inside Character Classes.
Just escape the dash:
return !preg_match('/[^A-Za-z0-9.\-_]/', $strToFilter);

allow parentheses and other symbols in regex

I've made this regex:
^[a-zA-Z0-9_.-]*$
Supports:
letters [uppercase and lowercase]
numbers [from 0 to 9]
underscores [_]
dots [.]
hyphens [-]
Now, I want to add these:
spaces [ ]
comma [,]
exclamation mark [!]
parenthesis [()]
plus [+]
equal [=]
apostrophe [']
double quotation mark ["]
at [#]
dollar [$]
percent [%]
asterisk [*]
For example, this code accept only some of the symbols above:
^[a-zA-Z0-9 _.,-!()+=“”„#"$#%*]*$
Returns:
Warning: preg_match(): Compilation failed: range out of order in character class at offset 16
Make sure to put hyphen - either at start or at end in character class otherwise it needs to be escaped. Try this regex:
^[a-zA-Z0-9 _.,!()+=`,"#$#%*-]*$
Also note that because * it will even match an empty string. If you don't want to match empty strings then use +:
^[a-zA-Z0-9 _.,!()+=`,"#$#%*-]+$
Or better:
^[\w .,!()+=`,"#$#%*-]+$
TEST:
$text = "_.,!()+=,#$#%*-";
if(!preg_match('/\A[\w .,!()+=`,"#$#%*-]+\z/', $text)) {
echo "error.";
}
else {
echo "OK.";
}
Prints:
OK.
The hyphen is being treated as a range marker -- when it sees ,-! it thinks you're asking for a range all characters in the charset that fall between , and ! (ie the same way that A-Z works. This isn't what you want.
Either make sure the hyphen is the last character in the character class, as it was before, or escape it with a backslash.
I would also point out that the quote characters you're using “”„ are part of an extended charset, and are not the same as the basic ASCII quotes "'. You may want to include both sets in your pattern. If you do need to include the non-ASCII characters in the pattern, you should also add the u modifier after the end of your pattern so it correctly picks up unicode characters.
Try escaping your regex: [a-zA-Z0-9\-\(\)\*]
Check if this help you: How to escape regular expression special characters using javascript?
Inside of a character class [...] the hyphen - has a special meaning unless it is the first or last character, so you need to escape it:
^[a-zA-Z0-9 _.,\-!()+=“”„#"$#%*]*$
None of the other characters need to be escaped in the character class (except ]). You will also need to escape the quote indicating the string. e.g.
'/[\']/'
"/[\"]/"
try this
^[A-Z0-9][A-Z0-9*&!_^%$#!~#,=+,./\|}{)(~`?][;:\'""-]{0,8}$
use this link to test
trick is i reverse ordered the parenthesis and other braces that took care of some problems. And for square braces you must escape them

PHP regular expression pattern allows unwanted literal asterisks

I have a regular expression that allows only specific characters from the name fields in an HTML form, namely letters, white space, single quotes, hyphens and periods. Here is the pattern:
return mb_ereg_match("^[\w\s'-\.]+$", $name);
Problem is this pattern, for some reason, returns true when there are literal asterisks in $name. This shouldn't be possible unless I'm missing something. I've done multiple searches on literal asterisks and all I found was the "\*" pattern for intentionally matching them.
The same pattern in preg_match() also returns a match when passed a string like "*John".
What the heck am I missing?
You need a double-backslash in front of these codes. One to escape the backslash, one to escape the escape sequence.
You also need to escape the -, otherwise it accepts all characters "between" ' and ..
return mb_ereg_match("^[\\w\\s'\\-\\.]+$", $name);
Have a look at a working case (using preg_match): http://ideone.com/E8afAM
When enclosed in square-brackets, the hyphen acts as a special character to denote a range. In your case, it's matching all characters in the range ' to ..
Escaping the hyphen should return the desired result:
^[\w\s'\-\.]+$
I have a regular expression that allows only specific characters from the name fields in an HTML form, namely letters, white space, single quotes, hyphens and periods.
You miss, that \w is not a letter character. php.net says:
A "word" character is any letter or digit or the underscore character, that is, any character which can be part of a Perl "word".
And, the perl definition is:
A \w matches a single alphanumeric character (an alphabetic character, or a decimal digit) or a connecting punctuation character, such as an underscore ("_").
The connecting punctuation character should mean only _ as i read, but this is maybe a multibyte extension's bug.
If you use mb_ereg_match only for whole unicode matches, give a try to preg_match's /u modifier & the Unicode character properties feature, since php 5.1.0

Categories