Help with php regex for limiting allowed characters - php

I'm working in php and want to set some rules for a submitted text field. I want to allow letters, numbers, spaces, and the symbols # ' , -
This is what I have:
/^(a-z,0-9+# )+$/i
That seems to work but when I add the ' or - symbols I get errors.

Almost there. What you're looking for is called character classes. These are denoted by the use of square brackets. For example
/^[-a-z0-9+#,' ]+$/i
To include the hyphen character, it needs to be the first or last character in the class.
Edit
As you want to include the single quote and you're using PHP where regular expressions must be represented as strings, be careful with how you quote the pattern. In this case, you can use either of
$pattern = "/^[-a-z0-9+#,' ]+\$/i"; // or
$pattern = '/^[-a-z0-9+#,\' ]+$/i';

You should use a character class - [a-zA-Z0-9 #',-]
Note that - should be used first or last or escaped otherwise it gets treated as denoting a range and you will get errors

I want to allow letters, numbers, spaces, and the symbols #, ', , and -.
Use this regex...
/^[-a-zA-Z\d ',#]+\z/
Note the \z. If you use $, you are allowing a trailing \n. CodePad.
Ensure to escape the ' if you are using ' as your string delimiter.

Please use /^[a-z,0-9+\#\-,\s]+$/i

Use this regex:
/^[-a-z0-9,# ']+$/i

Related

I need a regex that will limit input to letters, numbers and a couple special chars like newline

I'd like to limit the users input to a string that contains between 2 and 1024 letters, numbers, spaces, periods, underscores, dashes, carriage returns (new lines) and tabs. The carriage returns and tabs do not work in my regex. I do realize that there are other ways to check the length.
if (!preg_match('/^[a-zA-Z0-9 ._-\r\t]{2,1024}$/', $userstring))
{
echo '<p>Bad string</p>';
}
Thanks ahead of time.
The page has a form with the a control on it.
If I type: 1CR2 (that is 1 followed by a carriage return and then a 2), and submit the page, the error message will be displayed and the box will have 1rn2 in it.
As you're trying to match strings that may have line breaks in them, you need to enable multi-line in your regex using a Pattern Modifier. m will enable multi-line, e.g.:
if (!preg_match('/^[a-zA-Z0-9 ._-\r\t]{2,1024}/m', $userstring))
{
echo '<p>Bad string</p>';
}
The $ was also removed in case there is trailing white space. Suggestions that others have made to use \s instead of \r\t seem reasonable to me.
Reserved characters (including . and -) need to be escaped with a backslash (\). Try the following regex: ^[\w\s\.\-]{2,1024}$. \w matches word characters ([a-zA-Z0-9_]) and \s matches whitespace characters ([ \t\r\n\f]). That leaves you with the . and - that need to be escaped. Final PHP code:
if (!preg_match('/^[\w\s.\-]{2,1024}$/', $userstring))
{
echo '<p>Bad string</p>';
}
More info on shorthand classes and reserved characters.
Edit: . does not need to be escaped in a character class ([]), thanks Barmar.
Try:
if (!preg_match("/^[a-zA-Z0-9 ._\-\n\t]{2,1024}$/", $userstring))
{
echo '<p>Bad string</p>';
}
I believe you'll want your regular expression wrapped in double-quotes so that your newline (you should be using \n instead of \r) and tab characters are properly interpolated. Also, you should escape the - because it otherwise is used to define a range when used within brackets.

allow parentheses and other symbols in regex

I've made this regex:
^[a-zA-Z0-9_.-]*$
Supports:
letters [uppercase and lowercase]
numbers [from 0 to 9]
underscores [_]
dots [.]
hyphens [-]
Now, I want to add these:
spaces [ ]
comma [,]
exclamation mark [!]
parenthesis [()]
plus [+]
equal [=]
apostrophe [']
double quotation mark ["]
at [#]
dollar [$]
percent [%]
asterisk [*]
For example, this code accept only some of the symbols above:
^[a-zA-Z0-9 _.,-!()+=“”„#"$#%*]*$
Returns:
Warning: preg_match(): Compilation failed: range out of order in character class at offset 16
Make sure to put hyphen - either at start or at end in character class otherwise it needs to be escaped. Try this regex:
^[a-zA-Z0-9 _.,!()+=`,"#$#%*-]*$
Also note that because * it will even match an empty string. If you don't want to match empty strings then use +:
^[a-zA-Z0-9 _.,!()+=`,"#$#%*-]+$
Or better:
^[\w .,!()+=`,"#$#%*-]+$
TEST:
$text = "_.,!()+=,#$#%*-";
if(!preg_match('/\A[\w .,!()+=`,"#$#%*-]+\z/', $text)) {
echo "error.";
}
else {
echo "OK.";
}
Prints:
OK.
The hyphen is being treated as a range marker -- when it sees ,-! it thinks you're asking for a range all characters in the charset that fall between , and ! (ie the same way that A-Z works. This isn't what you want.
Either make sure the hyphen is the last character in the character class, as it was before, or escape it with a backslash.
I would also point out that the quote characters you're using “”„ are part of an extended charset, and are not the same as the basic ASCII quotes "'. You may want to include both sets in your pattern. If you do need to include the non-ASCII characters in the pattern, you should also add the u modifier after the end of your pattern so it correctly picks up unicode characters.
Try escaping your regex: [a-zA-Z0-9\-\(\)\*]
Check if this help you: How to escape regular expression special characters using javascript?
Inside of a character class [...] the hyphen - has a special meaning unless it is the first or last character, so you need to escape it:
^[a-zA-Z0-9 _.,\-!()+=“”„#"$#%*]*$
None of the other characters need to be escaped in the character class (except ]). You will also need to escape the quote indicating the string. e.g.
'/[\']/'
"/[\"]/"
try this
^[A-Z0-9][A-Z0-9*&!_^%$#!~#,=+,./\|}{)(~`?][;:\'""-]{0,8}$
use this link to test
trick is i reverse ordered the parenthesis and other braces that took care of some problems. And for square braces you must escape them

How do you allow '-' in regular expression?

I am trying to allow '-' in the regular expression for telephone numbers, but the - is usually used for ranges (e.g. A-Z). So how do I allow just the character? I tried escapting using /-, but that's not working.
$reg_num = "/[^0-9+ ()]/";
You need to escape it with a backslash \. So it should be written as \-.
Write it in the end instead of '-' being between two characters.
Very simplified example:
[0-9-] would match 099-2233-3333 where 0-9 is a range, and the - in the end is a seperate dash sign to match.
Put it first in the range, like [^-0-9+ ()]. The hyphen needs to separate two characters to define a range; if it isn't (in this case because the ^ is also interpreted as a modifier, not a character in the set), then it's just a character in the set like any other.
escaping using \-
\
is escape character!

Regular expression matching more than allowed characters

I am trying to validate that the given string contains contains only letters, numbers, spaces, and characters from a set of symbols (!-?():&,;+). Here is what I have so far:
/^[a-zA-Z0-9 !-?\(\):&,;\+]+$/
Now this works somewhat but it accepts other characters as well. For example, strings containing * or # validate. I thought that the ^ at the beginning of the expression and the $ at the end meant that it would match the whole string. What am I doing wrong?
Thanks.
/^[a-zA-Z0-9 !-?\(\):&,;\+]+$/
The - is not nice where you placed it! If you want to place - inside a character class be sure to either place it first or last e.g.
/^[a-zA-Z0-9 !?\(\):&,;\+-]+$/
Otherwise it will take the range of ! until ? whatever this range maybe...Depends on your regex machine.
Finally special characters are not special inside character classes. So no need to escape most of them :
/^[a-zA-Z0-9 !?():&,;+-]+$/
You have specified a "range" within your character class:
[!-?]
Means all ASCII symbols between ! and ?
http://www.regular-expressions.info/charclass.html
You need to escape the minus - with a \ backslash. (OTOH the backslash is redundant before the + and ( and ) within a character class.)

regex: remove all text within "double-quotes" (multiline included)

I'm having a hard time removing text within double-quotes, especially those spread over multiple lines:
$file=file_get_contents('test.html');
$replaced = preg_replace('/"(\n.)+?"/m','', $file);
I want to remove ALL text within double-quotes (included). Some of the text within them will be spread over multiple lines.
I read that newlines can be \r\n and \n as well.
Try this expression:
"[^"]+"
Also make sure you replace globally (usually with a g flag - my PHP is rusty so check the docs).
Another edit: daalbert's solution is best: a quote followed by one or more non-quotes ending with a quote.
I would make one slight modification if you're parsing HTML: make it 0 or more non-quote characters...so the regex will be:
"[^"]*"
EDIT:
On second thought, here's a better one:
"[\S\s]*?"
This says: "a quote followed by either a non-whitespace character or white-space character any number of times, non-greedily, ending with a quote"
The one below uses capture groups when it isn't necessary...and the use of a wildcard here isn't explicit about showing that wildcard matches everything but the new-line char...so it's more clear to say: "either a non-whitespace char or whitespace char" :) -- not that it makes any difference in the result.
there are many regexes that can solve your problem but here's one:
"(.*?(\s)*?)*?"
this reads as:
find a quote optionally followed by: (any number of characters that are not new-line characters non-greedily, followed by any number of whitespace characters non-greedily), repeated any number of times non-greedily
greedy means it will go to the end of the string and try matching it. if it can't find the match, it goes one from the end and tries to match, and so on. so non-greedy means it will find as little characters as possible to try matching the criteria.
great link on regex: http://www.regular-expressions.info
great link to test regexes: http://regexpal.com/
Remember that your regex may have to change slightly based on what language you're using to search using regex.
You can use single line mode (also know as dotall) and the dot will match even newlines (whatever they are):
/".+?"/s
You are using multiline mode which simply changes the meaning of ^ and $ from beginning/end of string to beginning/end of text. You don't need it here.
"[^"]+"
Something like below. s is dotall mode where . will match even newline:
/".+?"/s
$replaced = preg_replace('/"[^"]*"/s','', $file);
will do this for you. However note it won't allow for any quoted double quotes (e.g. A "test \" quoted string" B will result in A quoted string" B with a leading space, not in A B as you might expect.

Categories