Validating Regular Sentence Grammar with Regex - php

So I want to validate a string based on whether or not it contains only grammatical characters as well as numbers and letters.(basically A-Z, a-z, 0-9, plus periods(.), commas(,) colons(:), semicolons(;), hyphens(-), single quotes('), double quotes(") and parentheses(). I am getting a PHP error that says "Compilation failed: range out of order in character clas". What regex code should I be using?
This is the one I'm currently using:
^[a-zA-Z0-9_:;-()'\" ]*^

You need to escape - which would then become this ^[a-zA-Z0-9_:;\-()'\" ]* . - has a special meaning inside character set so it needs to be escaped. ^ in the end is also not necessary. The regex can also simplified using \w like this
^[\w:;()'"\s-]*
\w matches letters, digits, and underscores.

The problem is that you have a dash character in the regex, which the parser is interpreting as a range instead of as a literal dash. You can fix that by:
Escaping it with a backslash (^[a-zA-Z0-9_:;\-()'\" ]*^)
Putting it at the start (^[-a-zA-Z0-9_:;()'\" ]*^)
Putting it at the end (^[a-zA-Z0-9_:;()'\" -]*^)

Related

PHP preg_match expression not matching correct characters

I'm trying to catch any characters that are not letters, numbers, or .-_ (period, dash, underscore)
My code is
return !preg_match('/[^A-Za-z0-9.-_]/', $strToFilter);
My hope is that it will return false when it find an invalid character. As of now it allows ._ (period and underscore) but does not allow - (dash). It also does not detect characters like /, \, [, ], %, ^, etc as invalid characters.
What is wrong with my expression?
In Regex character classes, you can't match a literal hyphen unless it is:
immediately against either bracket,
follows the negate caret (^), or
is escaped using the backslash (\)
The hyphen can be included right after the opening bracket, or right before the closing bracket, or right after the negating caret. Both [-x] and [x-] match an x or a hyphen. [^-x] and [^x-] match any character thas is not an x or a hyphen. This works in all flavors discussed in this tutorial. Hyphens at other positions in character classes where they can't form a range may be interpreted as literals or as errors. Regex flavors are quite inconsistent about this.
Source - See Metacharacters Inside Character Classes.
Just escape the dash:
return !preg_match('/[^A-Za-z0-9.\-_]/', $strToFilter);

allow parentheses and other symbols in regex

I've made this regex:
^[a-zA-Z0-9_.-]*$
Supports:
letters [uppercase and lowercase]
numbers [from 0 to 9]
underscores [_]
dots [.]
hyphens [-]
Now, I want to add these:
spaces [ ]
comma [,]
exclamation mark [!]
parenthesis [()]
plus [+]
equal [=]
apostrophe [']
double quotation mark ["]
at [#]
dollar [$]
percent [%]
asterisk [*]
For example, this code accept only some of the symbols above:
^[a-zA-Z0-9 _.,-!()+=“”„#"$#%*]*$
Returns:
Warning: preg_match(): Compilation failed: range out of order in character class at offset 16
Make sure to put hyphen - either at start or at end in character class otherwise it needs to be escaped. Try this regex:
^[a-zA-Z0-9 _.,!()+=`,"#$#%*-]*$
Also note that because * it will even match an empty string. If you don't want to match empty strings then use +:
^[a-zA-Z0-9 _.,!()+=`,"#$#%*-]+$
Or better:
^[\w .,!()+=`,"#$#%*-]+$
TEST:
$text = "_.,!()+=,#$#%*-";
if(!preg_match('/\A[\w .,!()+=`,"#$#%*-]+\z/', $text)) {
echo "error.";
}
else {
echo "OK.";
}
Prints:
OK.
The hyphen is being treated as a range marker -- when it sees ,-! it thinks you're asking for a range all characters in the charset that fall between , and ! (ie the same way that A-Z works. This isn't what you want.
Either make sure the hyphen is the last character in the character class, as it was before, or escape it with a backslash.
I would also point out that the quote characters you're using “”„ are part of an extended charset, and are not the same as the basic ASCII quotes "'. You may want to include both sets in your pattern. If you do need to include the non-ASCII characters in the pattern, you should also add the u modifier after the end of your pattern so it correctly picks up unicode characters.
Try escaping your regex: [a-zA-Z0-9\-\(\)\*]
Check if this help you: How to escape regular expression special characters using javascript?
Inside of a character class [...] the hyphen - has a special meaning unless it is the first or last character, so you need to escape it:
^[a-zA-Z0-9 _.,\-!()+=“”„#"$#%*]*$
None of the other characters need to be escaped in the character class (except ]). You will also need to escape the quote indicating the string. e.g.
'/[\']/'
"/[\"]/"
try this
^[A-Z0-9][A-Z0-9*&!_^%$#!~#,=+,./\|}{)(~`?][;:\'""-]{0,8}$
use this link to test
trick is i reverse ordered the parenthesis and other braces that took care of some problems. And for square braces you must escape them

PHP regular expression pattern allows unwanted literal asterisks

I have a regular expression that allows only specific characters from the name fields in an HTML form, namely letters, white space, single quotes, hyphens and periods. Here is the pattern:
return mb_ereg_match("^[\w\s'-\.]+$", $name);
Problem is this pattern, for some reason, returns true when there are literal asterisks in $name. This shouldn't be possible unless I'm missing something. I've done multiple searches on literal asterisks and all I found was the "\*" pattern for intentionally matching them.
The same pattern in preg_match() also returns a match when passed a string like "*John".
What the heck am I missing?
You need a double-backslash in front of these codes. One to escape the backslash, one to escape the escape sequence.
You also need to escape the -, otherwise it accepts all characters "between" ' and ..
return mb_ereg_match("^[\\w\\s'\\-\\.]+$", $name);
Have a look at a working case (using preg_match): http://ideone.com/E8afAM
When enclosed in square-brackets, the hyphen acts as a special character to denote a range. In your case, it's matching all characters in the range ' to ..
Escaping the hyphen should return the desired result:
^[\w\s'\-\.]+$
I have a regular expression that allows only specific characters from the name fields in an HTML form, namely letters, white space, single quotes, hyphens and periods.
You miss, that \w is not a letter character. php.net says:
A "word" character is any letter or digit or the underscore character, that is, any character which can be part of a Perl "word".
And, the perl definition is:
A \w matches a single alphanumeric character (an alphabetic character, or a decimal digit) or a connecting punctuation character, such as an underscore ("_").
The connecting punctuation character should mean only _ as i read, but this is maybe a multibyte extension's bug.
If you use mb_ereg_match only for whole unicode matches, give a try to preg_match's /u modifier & the Unicode character properties feature, since php 5.1.0

How do you allow '-' in regular expression?

I am trying to allow '-' in the regular expression for telephone numbers, but the - is usually used for ranges (e.g. A-Z). So how do I allow just the character? I tried escapting using /-, but that's not working.
$reg_num = "/[^0-9+ ()]/";
You need to escape it with a backslash \. So it should be written as \-.
Write it in the end instead of '-' being between two characters.
Very simplified example:
[0-9-] would match 099-2233-3333 where 0-9 is a range, and the - in the end is a seperate dash sign to match.
Put it first in the range, like [^-0-9+ ()]. The hyphen needs to separate two characters to define a range; if it isn't (in this case because the ^ is also interpreted as a modifier, not a character in the set), then it's just a character in the set like any other.
escaping using \-
\
is escape character!

RegEx differences

Can someone please tell me the difference exactly between these 2 RegEx's?
'/[^a-zA-Z0-9\s]/'
and
'~[^A-Za-z0-9_]~'
Also, is there a syntax error for the space within the first Regex? Thinking it needs to be like this: /\s to be escaped properly.
Basically, I need a RegEx that only uses English A-Z, a-z, 0-9, and underscores only! Everything else will need to be replaced with an empty string ''. So, I know I need preg_replace to do this with, but Which RegEx is better to use, and why?
Thanks many guys!
The ^ inside your regex means NOT...and that is
[^a-zA-Z0-9]
means the string have not to have a-z, A-Z and 0-9 so if you want to replace all the chars which are not in those ranges (include the '_'), you have to use this statement:
$cleanString = preg_replace('/[^a-zA-Z0-9_]/', '', $theString);
The first character of the PCRE pattern string is a delimiter used to mark the end of the regular expression and the start of the modifier characters. The choice is arbitrary; you can use '/' or '~' or another character, but note that if you need the character in the expression part, then you will need to escape it.
In a character class, \s means any space character. Thus '/[^a-zA-Z0-9\\s]/' matches one character not in the set A-Z, a-z, 0-9, and space characters. '~[^A-Za-z0-9_]~' matches one character not in the set A-Z, a-z, 0-9, and underscore ('_').
One pattern string that meets your requirements is '~[^A-Za-z0-9_]+~s':
<?php
$str = <<<STR
test_
one
two Three 45
STR;
echo preg_replace('~[^A-Za-z0-9_]+~s', '', $str);
which outputs:
test_onetwoThree45
http://codepad.org/Ycl1WvR8

Categories