PHP Only allow alphanumerical Latin lowercase characters and dash - php

I am using preg_match to validate a input text field that will be used for a subdomain name. I only want to allow alphanumerical Latin lowercase characters and dash no spaces or anything else.
Will the following be enough
if(preg_match('/^[a-zA-Z0-9 \-]+$/', $instance)) {
return true;
}

The regex You are currently having is allowing a-z, A-Z 0-9 and spance and - (the \ is just for escaping)
So your regex would be something like this (only allowing lowercase and -)
if(preg_match('/^[a-z0-9\-]+$/', $instance)) {
return true;
}

The expression you have - ^[a-zA-Z0-9 \-]+$ - currently matches both upper- and lowercase Latin letters, Arabic digits, a space and a literal hyphen.
You say you do not want to allow any spaces or uppercase letters.
In this case, all you need to do it to remove them from the character class:
/^[a-z0-9-]+$/
The regex breakdown:
^ - the beginning of a string
[a-z0-9-]+ - 1 or more characters that are either lowercase Latin letters (a-z), or digits (0-9), or a hyphen (- at the end of the character class is almost always considered a literal in all regex flavors (but some weird ones))
$ - end of string.
See demo

Related

PHP - check if a string is a multibyte alphanumeric character

I need to find out if a string contains exactly one alphanumeric character. The obvious solution would be to check the length and ASCII code (A-Z, a-z, 0-9) - but the problem is that I'm working with UTF-8 strings and accented letters like á, ř, č etc.
Is there a simple way to check if an UTF-8 character is alphanumeric (latin alphabet letter, possibly accented, or a number)?
This is easily done with a regular expression:
$count = preg_match_all('/\w/u', $string);
if ($count === 1) {
echo "One alphanumeric character found";
}
\w will match any "word" character, which are letters, numbers, and underscore. The u modifier treats the string as unicode so it will include accented characters.
If matching underscores is a problem you could use [:alnum:] as a character class match instead.

PHP regular expression pattern allows unwanted literal asterisks

I have a regular expression that allows only specific characters from the name fields in an HTML form, namely letters, white space, single quotes, hyphens and periods. Here is the pattern:
return mb_ereg_match("^[\w\s'-\.]+$", $name);
Problem is this pattern, for some reason, returns true when there are literal asterisks in $name. This shouldn't be possible unless I'm missing something. I've done multiple searches on literal asterisks and all I found was the "\*" pattern for intentionally matching them.
The same pattern in preg_match() also returns a match when passed a string like "*John".
What the heck am I missing?
You need a double-backslash in front of these codes. One to escape the backslash, one to escape the escape sequence.
You also need to escape the -, otherwise it accepts all characters "between" ' and ..
return mb_ereg_match("^[\\w\\s'\\-\\.]+$", $name);
Have a look at a working case (using preg_match): http://ideone.com/E8afAM
When enclosed in square-brackets, the hyphen acts as a special character to denote a range. In your case, it's matching all characters in the range ' to ..
Escaping the hyphen should return the desired result:
^[\w\s'\-\.]+$
I have a regular expression that allows only specific characters from the name fields in an HTML form, namely letters, white space, single quotes, hyphens and periods.
You miss, that \w is not a letter character. php.net says:
A "word" character is any letter or digit or the underscore character, that is, any character which can be part of a Perl "word".
And, the perl definition is:
A \w matches a single alphanumeric character (an alphabetic character, or a decimal digit) or a connecting punctuation character, such as an underscore ("_").
The connecting punctuation character should mean only _ as i read, but this is maybe a multibyte extension's bug.
If you use mb_ereg_match only for whole unicode matches, give a try to preg_match's /u modifier & the Unicode character properties feature, since php 5.1.0

preg_match some characters

I need an regex to my preg_match(), it should preg (allow) the following characters:
String can contain only letters, numbers, and the following punctuation marks:
full stop (.)
comma (,)
dash (-)
underscore (_)
I have no idea , how it can be done on regex, but I think there is a way!
^[\p{L}\p{N}.,_-]*$
will match a string that contains only (Unicode) letters, digits or the "special characters" you mentioned. [...] is a character class, meaning "one of the characters contained here". You'll need to use the /u Unicode modifier for this to work:
preg_match(`/^[\p{L}\p{N}.,_-]*$/u', $mystring);
If you only care about ASCII letters, it's easier:
^[\w.,-]*$
or, in PHP:
preg_match(`/^[\w.,-]*$/', $mystring);

RegEx differences

Can someone please tell me the difference exactly between these 2 RegEx's?
'/[^a-zA-Z0-9\s]/'
and
'~[^A-Za-z0-9_]~'
Also, is there a syntax error for the space within the first Regex? Thinking it needs to be like this: /\s to be escaped properly.
Basically, I need a RegEx that only uses English A-Z, a-z, 0-9, and underscores only! Everything else will need to be replaced with an empty string ''. So, I know I need preg_replace to do this with, but Which RegEx is better to use, and why?
Thanks many guys!
The ^ inside your regex means NOT...and that is
[^a-zA-Z0-9]
means the string have not to have a-z, A-Z and 0-9 so if you want to replace all the chars which are not in those ranges (include the '_'), you have to use this statement:
$cleanString = preg_replace('/[^a-zA-Z0-9_]/', '', $theString);
The first character of the PCRE pattern string is a delimiter used to mark the end of the regular expression and the start of the modifier characters. The choice is arbitrary; you can use '/' or '~' or another character, but note that if you need the character in the expression part, then you will need to escape it.
In a character class, \s means any space character. Thus '/[^a-zA-Z0-9\\s]/' matches one character not in the set A-Z, a-z, 0-9, and space characters. '~[^A-Za-z0-9_]~' matches one character not in the set A-Z, a-z, 0-9, and underscore ('_').
One pattern string that meets your requirements is '~[^A-Za-z0-9_]+~s':
<?php
$str = <<<STR
test_
one
two Three 45
STR;
echo preg_replace('~[^A-Za-z0-9_]+~s', '', $str);
which outputs:
test_onetwoThree45
http://codepad.org/Ycl1WvR8

Regex which validate for all caps

I want a regular expression in PHP which will check for all caps the string.
If the given string contains all capital letters irrespective of numbers and other characters then it should match them.
Since you want to match other characters too, look for lowercase letters instead of uppercase letters. If found, return false. (Or use tdammers' suggestion of a negative character class.)
return !preg_match('/[a-z]/', $str);
You can also skip regex and just compare strtoupper($str) with the original string, this leaves digits and symbols intact:
return strtoupper($str) == $str;
Both don't account for multi-byte strings though; for that, you could try adding a u modifier to the regex and using mb_strtoupper() respectively (I've not tested either — could someone more experienced with Unicode verify this?).
if (preg_match('/^[^\p{Ll}]*$/u', $subject)) {
# String doesn't contain any lowercase characters
} else {
# String contains at least one lowercase characters
}
\p{Ll} matches a Unicode lowercase letter; [^\p{Ll}] therefore matches any character that is not a lowercase letter.
Something like this maybe:
'/^[^a-z]*$/'
The trick is to use an exclusive character class: this one matches all characters that are not lower-case letters. Note that accented letters aren't checked.

Categories