Validate username as alphanumeric with underscores - php

On my registration page I need to validate the usernames as alphanumeric only, but also with optional underscores. I've come up with this:
function validate_alphanumeric_underscore($str)
{
return preg_match('/^\w+$/',$str);
}
Which seems to work okay, but I'm not a regex expert! Does anyone spot any problem?

The actual matched characters of \w depend on the locale that is being used:
A "word" character is any letter or digit or the underscore character, that is, any character which can be part of a Perl "word". The definition of letters and digits is controlled by PCRE's character tables, and may vary if locale-specific matching is taking place. For example, in the "fr" (French) locale, some character codes greater than 128 are used for accented letters, and these are matched by \w.
So you should better explicitly specify what characters you want to allow:
/^[A-Za-z0-9_]+$/
This allows just alphanumeric characters and the underscore.
And if you want to allow underscore only as concatenation character and want to force that the username must start with a alphabet character:
/^[A-Za-z][A-Za-z0-9]*(?:_[A-Za-z0-9]+)*$/

Here's a custom function to validate the string by using the PHP ctype_alnum in conjunction with an array of allowed chars:
<?php
$str = "";
function validate_username($str) {
// each array entry is an special char allowed
// besides the ones from ctype_alnum
$allowed = array(".", "-", "_");
if ( ctype_alnum( str_replace($allowed, '', $str ) ) ) {
return $str;
} else {
$str = "Invalid Username";
return $str;
}
}
?>

try
function validate_alphanumeric_underscore($str)
{
return preg_match('/^[a-zA-Z0-9_]+$/',$str);
}

Looks fine to me. Note that you make no requirement for the placement of the underscore, so "username_" and "___username" would both pass.

I would take gumbo's secondary regex, to only allow underscore as concatenation, but add a + after the _ so a user can be like "special__username", just a minor tweak.
/^[A-Za-z][A-Za-z0-9]*(?:_+[A-Za-z0-9]+)*$/

Your own solution is perfectly fine.
preg_match uses Perl-like regular expressions, in which the character class \w defined to match exactly what you need:
\w - Match a "word" character (alphanumeric plus "_")
(source)

Related

Using Preg Match to check if string contains an underscore

I am trying to check if a string contains an underscore - can anyone explain whats wrong with the following code
$str = '12_322';
if (preg_match('/^[1-9]+[_]$/', $str)) {
echo 'contains number underscore';
}
In your regex [_]$ means the underscore is at the end of the string. That's why it is not matching with yours.
If you want to check only underscore checking anywhere at the string, then:
if (preg_match('/_/', $str)) {
If you want to check string must be comprised with numbers and underscores, then
if (preg_match('/^[1-9_]+$/', $str)) { // its 1-9 you mentioned
But for your sample input 12_322, this one can be handy too:
if (preg_match('/^[1-9]+_[1-9]+$/', $str)) {
You need to take $ out since underscore is not the last character in your input. You can try this regex:
'/^[1-9]+_/'
PS: Underscore is not a special regex character hence doesn't need to be in character class.

PHP preg_match: any letter but no numbers (and symbols)

I am trying to set a validation rule for a field in my form that checks that the input only contains letters.
At first I tried to make a function that returned true if there were no numbers in the string, for that I used preg_match:
function my_format($str)
{
return preg_match('/^([^0-9])$', $str);
}
It doesn't matter how many times I look at the php manual, it seems like I won't get to understand how to create the pattern I want. What's wrong with what I made?
But I'd like to extend the question: I want the input text to contain any letter but no numbers nor symbols, like question marks, exclamation marks, and all those you can imagine. BUT the letters I want are not only a-z, I want letters with all kinds of accents, as those used in Spanish, Portuguese, Swedish, Polish, Serbian, Islandic...
I guess this is no easy task and hard or impossible to do with preg_match. It there any library that covers my exact needs?
If you're using utf-8 encoded input, go for unicode regex. Using the u modifier.
This one would match a string that only consists of letters and any kind of whitespace/invisible separators:
preg_match('~^[\p{L}\p{Z}]+$~u', $str);
function my_format($str)
{
return preg_match('/^\p{L}+$/', $str);
}
Simpler than you think about!
\p{L} matches any kind of letter from any language
First of all,Merry Christmas.
You are on the right track with the first one, just missing a + to match one or more non-number characters:
preg_match('/^([^0-9]+)$/', $str);
As you can see, 0-9 is a range, from number 0 to 9. This applies to some other cases, like a-z or A-Z, the '-' is special and it indicates that it is a range. for 0-9, you can use shorthand of \d like:
preg_match('/^([^\d]+)$/', $str);
For symbols, if your list is punctuations . , " ' ? ! ; : # $ % & ( ) * + - / < > = # [ ] \ ^ _ { } | ~, there is a shorthand.
preg_match('/^([^[:punct:]]+)$/', $str);
Combined you get:
preg_match('/^([^[:punct:]\d]+)$/', $str);
Use the [:alpha:] POSIX expression.
function my_format($str) {
return preg_match('/[[:alpha:]]+/u', $str);
}
The extra [] turns the POSIX into a range modified by the + to match 1 or more alphabetical characters. As you can see, the :alpha: POSIX matches accented characters as well
If you want to include whitespace, just add \s to the range:
preg_match('/[[:alpha:]\s]+/u', $str);
EDIT: Sorry, I misread your question when I looked over it a second time and thought you wanted punctuation. I've taken it back out.

What am I doing wrong with this Regex?

To be honest, I don't really get RegEx. So I'm completely oblivious as to where I'm going wrong here.
I'm looking for a RegEx that accepts alphanumeric characters only (and underscores, it's for usernames). I've searched around here and found numerous example RegExes that I've tried and not one of them has worked.
Among others, which I've mostly gotten from answers around here, I've tried
^[a-zA-Z0-9_]*$
/[^a-z_\-0-9]/i
/^\w+$/
To match these, I've tried (with each of the regexes)
if(preg_match("/^\w+$/", $username)) {
//don't accept
}
and
if(!preg_match("/^\w+$/", $username)) {
//don't accept
}
and
if(preg_match("/^\w+$/", $username) == 1) {
//don't accept
}
and
if(preg_match("/^\w+$/", $username) == 0) {
//don't accept
}
etc...
Each and every single time it's accepting special characters (I've tried &, $, ^, and %).
What exactly am I doing wrong here? Is it the format of the RegEx? Is it how I'm asking it to check?
Also, what exactly is the return type I get if it's found special characters? (i.e One I don't want to accept)
preg_match returns 1 if the input string matched the pattern you gave, and 0 if it didn't.
You want each character in your usernames to be alphanumeric (plus underscore). One PCRE way of expressing that is with a character class inside square brackets, like this one: [A-Za-z0-9_]. There are a couple of ways you could use this basic class to do what you want.
One way is a "negative" search: try to match a non-alphanumeric character, and if you do, then the test fails. For this, we just add a carat at the front of the character class. This means we're matching any character not in that set.
So, the following pattern matches "any non-alphanumeric, non-underscore character." Here, a match means an invalid username:
if (preg_match('/[^A-Za-z0-9_]/', $username)) {
// invalid username
}
Or, you could do the opposite kind of match, where you give a pattern for a valid username and check if you match that. This time, we don't change the character class itself at all, but we add the + quantifier after it, meaning we're matching one or more of the "good" characters.
Additionally, we wrap the ^ and $ beginning-and-end-of-string anchors around our pattern. (It's a little confusing, but a carat at the beginning of a pattern has a completely different meaning from a carat at the beginning of a character class, within the brackets).
The end result is a pattern that means: "1 or more alphanumeric characters (plus underscore) and nothing else." A match on this one means a valid username:
if (preg_match('/^[A-Za-z0-9_]+$/', $username)) {
// valid username
}
if (preg_match("^[a-zA-Z0-9_]+$", $username) === 1) {
// Good username
}
else {
// Bad username
}
The use of the strict equality operator (===) means we're comparing what preg_match() returns to 1, the number, not the boolean value. If it returns a 0, it means there are no matches, a boolean false, an error occored. Check out the page for preg_match for more information: http://php.net/manual/en/function.preg-match.php
Per the PHP manual *preg_match* will return 0 if it can't find a successful match with your regex and FALSE if en error occurs. So if you want to make sure you're testing for 0, and not something which can evaluate to false, you should use the === operator.
If you only want letters and underscores you can use a character class of [a-z_] which specifies that the range of characters for a to z and the _ symbol will match. And the + following the class specifies that you want one or multiple of the same. The ^ says the pattern must match from the beginning of the text, while the $ says that the pattern must match up until the end of the text.
if (preg_match("/^[a-z_]+$/i", $text_variable) === 1) {
//"A match was found.";
} else {
//"A match was not found.";
}
Regex is very easy to understand if you get the basics :)
I'll try to explain to you all three expressions you tried:
With ^[a-zA-Z0-9_]*$ string will be matched which:
^ // from the beginning...
[a-zA-Z0-9_] // contains only characters a-z or A-Z or 0-9 or _ sign
* // and has 0 or more of such characters
$ // to the end
Matched strings for example:
(empty string - since you told 0 or more characters)
abc09
fidjwieofoj4fio3j4fiojrfioj3ijfo
000000000000000000000
__________
and_many_many_more_as_long_as_they_contain_alpha_characters_and___sign
With /[^a-z_-0-9]/i string will be matched which:
[^a-z_\-0-9]
// ^ means "the opposite" so that subset describes characters
// which are not included in it
// (are not a-z or _ sign, or - dash sign, or 0-9 numbers)
i modifier
// stands for case insensitive, all letters are treated as lowercase
You did not add * or ? or + after the subset so basically you are looking for one character only, and because you did not put your regexp between ^ and $ signs, this expression will finally match any text which contains at least one character which is not A-Z or a-z, or _ sign, or - dash sign, or 0-9 numbers.
Matched strings for example:
!
a>a
A<9
ffffffffff.dflskfdfd
00000,
]]]]]]]]]]]]]]]]]]
and so-on
With /^\w+$/ string will be matched which:
^ // from the beginning
\w // contains only characters a-z or A-Z or 0-9 or _ sign
+ // and the string must be at least 1 character long
$ // to the end
Probably the most useful regular expression. Remember, \w is just an alias for [a-zA-Z0-9_]. This regexp will match only whole string which is not empty and contains only alphanumeric characters and _ sign.
Matched strings for example:
mike
alice
bob10
0000000000
1111
9
php
user_example
Hope that helps. To you, most useful expression imvho to match valid usernames would be /^\w{3,15}$/ as it would match any string which is 3 to 15 characters long and consist only of alphanumeric characters and the underscore sign (a-z A-Z 0-9 _).
Try this:
<?php
function isValidUsername($username)
{
return preg_match('/^\w{3,15}$/', $username) == 1;
}
echo isValidUsername('mike999') ? 'Yes' : 'No' , '<br>';
echo isValidUsername('alice!') ? 'Yes' : 'No';
Cheers.

Function to return only alpha-numeric characters from string?

I'm looking for a php function that will take an input string and return a sanitized version of it by stripping away all special characters leaving only alpha-numeric.
I need a second function that does the same but only returns alphabetic characters A-Z.
Any help much appreciated.
Warning: Note that English is not restricted to just A-Z.
Try this to remove everything except a-z, A-Z and 0-9:
$result = preg_replace("/[^a-zA-Z0-9]+/", "", $s);
If your definition of alphanumeric includes letters in foreign languages and obsolete scripts then you will need to use the Unicode character classes.
Try this to leave only A-Z:
$result = preg_replace("/[^A-Z]+/", "", $s);
The reason for the warning is that words like résumé contains the letter é that won't be matched by this. If you want to match a specific list of letters adjust the regular expression to include those letters. If you want to match all letters, use the appropriate character classes as mentioned in the comments.
try this to keep accentuated characters:
$result = preg_replace("/[^A-zÀ-ú0-9]+/", "", $s);
Rather than preg_replace, you could always use PHP's filter functions using the filter_var() function with FILTER_SANITIZE_STRING.

Regex which validate for all caps

I want a regular expression in PHP which will check for all caps the string.
If the given string contains all capital letters irrespective of numbers and other characters then it should match them.
Since you want to match other characters too, look for lowercase letters instead of uppercase letters. If found, return false. (Or use tdammers' suggestion of a negative character class.)
return !preg_match('/[a-z]/', $str);
You can also skip regex and just compare strtoupper($str) with the original string, this leaves digits and symbols intact:
return strtoupper($str) == $str;
Both don't account for multi-byte strings though; for that, you could try adding a u modifier to the regex and using mb_strtoupper() respectively (I've not tested either — could someone more experienced with Unicode verify this?).
if (preg_match('/^[^\p{Ll}]*$/u', $subject)) {
# String doesn't contain any lowercase characters
} else {
# String contains at least one lowercase characters
}
\p{Ll} matches a Unicode lowercase letter; [^\p{Ll}] therefore matches any character that is not a lowercase letter.
Something like this maybe:
'/^[^a-z]*$/'
The trick is to use an exclusive character class: this one matches all characters that are not lower-case letters. Note that accented letters aren't checked.

Categories