What am I doing wrong with this Regex? - php

To be honest, I don't really get RegEx. So I'm completely oblivious as to where I'm going wrong here.
I'm looking for a RegEx that accepts alphanumeric characters only (and underscores, it's for usernames). I've searched around here and found numerous example RegExes that I've tried and not one of them has worked.
Among others, which I've mostly gotten from answers around here, I've tried
^[a-zA-Z0-9_]*$
/[^a-z_\-0-9]/i
/^\w+$/
To match these, I've tried (with each of the regexes)
if(preg_match("/^\w+$/", $username)) {
//don't accept
}
and
if(!preg_match("/^\w+$/", $username)) {
//don't accept
}
and
if(preg_match("/^\w+$/", $username) == 1) {
//don't accept
}
and
if(preg_match("/^\w+$/", $username) == 0) {
//don't accept
}
etc...
Each and every single time it's accepting special characters (I've tried &, $, ^, and %).
What exactly am I doing wrong here? Is it the format of the RegEx? Is it how I'm asking it to check?
Also, what exactly is the return type I get if it's found special characters? (i.e One I don't want to accept)

preg_match returns 1 if the input string matched the pattern you gave, and 0 if it didn't.
You want each character in your usernames to be alphanumeric (plus underscore). One PCRE way of expressing that is with a character class inside square brackets, like this one: [A-Za-z0-9_]. There are a couple of ways you could use this basic class to do what you want.
One way is a "negative" search: try to match a non-alphanumeric character, and if you do, then the test fails. For this, we just add a carat at the front of the character class. This means we're matching any character not in that set.
So, the following pattern matches "any non-alphanumeric, non-underscore character." Here, a match means an invalid username:
if (preg_match('/[^A-Za-z0-9_]/', $username)) {
// invalid username
}
Or, you could do the opposite kind of match, where you give a pattern for a valid username and check if you match that. This time, we don't change the character class itself at all, but we add the + quantifier after it, meaning we're matching one or more of the "good" characters.
Additionally, we wrap the ^ and $ beginning-and-end-of-string anchors around our pattern. (It's a little confusing, but a carat at the beginning of a pattern has a completely different meaning from a carat at the beginning of a character class, within the brackets).
The end result is a pattern that means: "1 or more alphanumeric characters (plus underscore) and nothing else." A match on this one means a valid username:
if (preg_match('/^[A-Za-z0-9_]+$/', $username)) {
// valid username
}

if (preg_match("^[a-zA-Z0-9_]+$", $username) === 1) {
// Good username
}
else {
// Bad username
}
The use of the strict equality operator (===) means we're comparing what preg_match() returns to 1, the number, not the boolean value. If it returns a 0, it means there are no matches, a boolean false, an error occored. Check out the page for preg_match for more information: http://php.net/manual/en/function.preg-match.php

Per the PHP manual *preg_match* will return 0 if it can't find a successful match with your regex and FALSE if en error occurs. So if you want to make sure you're testing for 0, and not something which can evaluate to false, you should use the === operator.
If you only want letters and underscores you can use a character class of [a-z_] which specifies that the range of characters for a to z and the _ symbol will match. And the + following the class specifies that you want one or multiple of the same. The ^ says the pattern must match from the beginning of the text, while the $ says that the pattern must match up until the end of the text.
if (preg_match("/^[a-z_]+$/i", $text_variable) === 1) {
//"A match was found.";
} else {
//"A match was not found.";
}

Regex is very easy to understand if you get the basics :)
I'll try to explain to you all three expressions you tried:
With ^[a-zA-Z0-9_]*$ string will be matched which:
^ // from the beginning...
[a-zA-Z0-9_] // contains only characters a-z or A-Z or 0-9 or _ sign
* // and has 0 or more of such characters
$ // to the end
Matched strings for example:
(empty string - since you told 0 or more characters)
abc09
fidjwieofoj4fio3j4fiojrfioj3ijfo
000000000000000000000
__________
and_many_many_more_as_long_as_they_contain_alpha_characters_and___sign
With /[^a-z_-0-9]/i string will be matched which:
[^a-z_\-0-9]
// ^ means "the opposite" so that subset describes characters
// which are not included in it
// (are not a-z or _ sign, or - dash sign, or 0-9 numbers)
i modifier
// stands for case insensitive, all letters are treated as lowercase
You did not add * or ? or + after the subset so basically you are looking for one character only, and because you did not put your regexp between ^ and $ signs, this expression will finally match any text which contains at least one character which is not A-Z or a-z, or _ sign, or - dash sign, or 0-9 numbers.
Matched strings for example:
!
a>a
A<9
ffffffffff.dflskfdfd
00000,
]]]]]]]]]]]]]]]]]]
and so-on
With /^\w+$/ string will be matched which:
^ // from the beginning
\w // contains only characters a-z or A-Z or 0-9 or _ sign
+ // and the string must be at least 1 character long
$ // to the end
Probably the most useful regular expression. Remember, \w is just an alias for [a-zA-Z0-9_]. This regexp will match only whole string which is not empty and contains only alphanumeric characters and _ sign.
Matched strings for example:
mike
alice
bob10
0000000000
1111
9
php
user_example
Hope that helps. To you, most useful expression imvho to match valid usernames would be /^\w{3,15}$/ as it would match any string which is 3 to 15 characters long and consist only of alphanumeric characters and the underscore sign (a-z A-Z 0-9 _).
Try this:
<?php
function isValidUsername($username)
{
return preg_match('/^\w{3,15}$/', $username) == 1;
}
echo isValidUsername('mike999') ? 'Yes' : 'No' , '<br>';
echo isValidUsername('alice!') ? 'Yes' : 'No';
Cheers.

Related

How do I check if a string is composed only of letters and numbers? (PHP) [duplicate]

This question already has answers here:
How to check, if a php string contains only english letters and digits?
(10 answers)
Closed 12 months ago.
Title says it all: I am checking to see if a user's username contains anything that isn't a number or letter, such as €{¥]^}+<€, punctuation, spaces or even things like âæłęč. Is this possible in php?
You can use the ctype_alnum() function in PHP.
From the manual..
Check for alphanumeric character(s)
Returns TRUE if every character in text is either a letter or a digit, FALSE otherwise.
var_dump(ctype_alnum("æøsads")); // false
var_dump(ctype_alnum("123asd")); // true
Live demo at https://3v4l.org/5etr7
PHP does REGEX
What you want to do is fairly trivial, PHP has a number of regex functions
Testing a String For a Character
If all you want is to know IF a string contains non-alphanumeric characters, then just use preg_match():
preg_match( '/[^A-Za-z0-9]*/', $userName );
This will return 1 if the username contains anything other than alphanumeric (A-Z or a-z or 0to9), it returns 0 if it doesn't contain a non-alphanumeric.
Regex Pattern Elements
Regex PCRE patterns open and close with a delimiter such as a slash/, and that needs to be treated like a string (quoted):'/myPattern/' Some other key features are:
[ brackets contain match sets ]
[a-z] // means match any lowercase letter
This pattern means check the current character in the $String relative to the pattern in these brackets, in this case match any lowercase letter a to z.
^ Caret (Meta-Character)
[^a-z] // means no lowercase letters If the caret ^ (aka hat) is the first character inside brackets, it NEGATES the pattern inside brackets so [^A7] means match anything EXCEPT uppercase A and the numeral 7. (Note: when outside brackets, the caret ^ means the start of the string.)
\w\W\d\D\s\S. Meta-Characters (WildCards)
\w // match all alphanumeric An escaped (i.e. preceded by a backslash \ ) lowercase w means match any "word" character, i.e. alphanumeric and the underscore _, this is shorthand for [A-Za-z0-9_]. The uppercase \W is the NOT word character, equivalent to [^A-Za-z0-9_] or [^\w]
. // (dot) match ANY single character except return/newline
\w // match any word character [A-Za-z0-9_]
\W // NOT any word character [^A-Za-z0-9_]
\d // match any digit [0-9]
\D // NOT any digit [^0-9].
\s // match any whitespace (tab, space, newline)
\S // NOT any whitespace
.*+?| Meta-Characters (Quantifiers))
These modify the behavior outside of a set []
* // match previous character or [set] zero or more times,
// so .* means match everything (including nothing) until reaching a return/newline.
+ // match previous at least one or more times.
? // match previous only zero or one time (i.e. optional).
| // means logical OR eg.: com|net means match either literal "com" or "net"
Not shown: capture groups, backreferences, substitution (the real power of regex). See https://www.phpliveregex.com/#tab-preg-match for more including a live pattern-match playground that is based on the PHP functions, and delivers results as arrays.
Back To Your StringCleaning
So for your pattern, to match all non-letters and numbers (including underscores) you need either: '/[^A-Za-z0-9]*/' or '/[\W_]*/'
Strip Search
If instead you want to STRIP all the non-alpha characters from a string then use preg_replace( $Regex, $Replacement, $StringToClean )
<?php
$username = 'Svéñ déGööfinøff';
echo preg_replace('/[\W_]*/', '', $username);
?>
The output is: SvdGfinff If you'd prefer to replace certain accented letters with standard latin ones to keep the names reasonably readable, then I believe you'd need a lookup table (array). There is one ready to use at the PHP site

not sure about my regelur expression

I always get stucked with the preg_match function.
I want the input to match with a-z, A-Z, 0-9, ##&-_., and nothing else.
So if ! is in the input, it need to return false.
What I have till now.
$string = "String-20";
return (preg_match("/[a-z][A-Z][0-9][##&-_.,]/i", $string)) ? true : false;
This should return true.
But keep return false.
You'll want to have those as one set rather than breaking them up like that. Your pattern would match a string like "aA0#"
You're saying "One character a-z, then one A-z, then one 0-9, then one of these special characters" but what you actually want is "Any number of these specific characters"
The ^ and $ mean start and end of the string so I think this should do what you want.
preg_match('/^[a-zA-Z0-9##&\-_.,]*$/i', $string)
Your regex matches 4 character chunk(s) anywhere inside a string (as preg_match can find partial matches): 2 letters, then a digit and some chars including uppercase letters because &-_ declares the following range:
Use
/^[a-z0-9##&_.,-]+$/i
or even (since \w here will match [a-zA-Z0-9_]):
/^[\w##&.,-]+$/
If an empty string is allowed, replace + (one or more occurrences) with * (zero or more occurrences).
The ^ anchor will make sure the engine starts matching at the beginning of the string and $ anchor will make sure the pattern should match up to the string end. The hyphen at the end of the character class will be parsed as a literal -.

Regular Expression for alpha numeric, space, dash, and hash

I'm still a newbie for regular expressions. I want to create a regular expression with this rule:
if (preg_match('^[ A-Za-z0-9_-#]^', $text) == 1) {
return true;
}
else {
return false;
}
In short, I would like $text to accept texts, numbers, spaces, underscores, dashes, and hashes (#).
Is the above reg expression correct? it always return true.
First off, you shouldn't use ^ as the expression boundaries, because they're also used for expression anchors; use /, ~ or # instead.
Second, the dash in the character set should be at the last position; otherwise it matches the range from _ until #, and that's probably not what you want.
Third, the expression now only matches a single character; you will want to use a multiplier such as + or *.
Lastly, you should anchor the expression so that only those valid characters are present in the string:
/^[ \w#-]+$/
Btw, I've replaced A-Za-z0-9_ with \w as a shortcut.
That you can do:
\w stand for [a-zA-Z0-9_]
the character - have a special meaning in a character class since it is used to define ranges, thus you must place it at the begining or at the end of the class
the preg_match function return 0 if there is no match or false when an error occurs, thus you don't need to test if it is equal to 1 (you can use that preg_match returns to do things)
example:
if (preg_match('~[\w #-]++~', $subject))
...
else
...

A quick regular expression needed

I want a regular expression which ALLOWS only this:
letter a-z
case insensitive
allows underscores
allows any nrs
How should this be written?
Thanks
That would be
\w
if I'm not mistaken (As it turns out, it depends: In PHP the meaning of \w changes with the locale that's currently in effect). You can use a more explicit form to nail it down:
[A-Za-z0-9_]
I use it in context, add start-of-string and end-of-string anchors and a quantifier that defines how many characters you will allow:
^[A-Za-z0-9_]+$
PHP:
if (preg_match('/[^a-z0-9_]/i', $input)) {
// invalid input
} else {
// valid input
}
So [a-z0-9_] is a character set for your valid characters. Adding a ^ to the front ([^a-z0-9_]) negates it. The logic is, if any character matches something that ISN'T in the valid character set, the input is considered invalid.
The /i at the end makes the match case insensitive.
How should it be written? (breaking it into multiple lines)
/ # Start RegExp Pattern
^ # Match beginning of string only
[a-z0-9_]* # Match characters in the set [ a-z, 0-9 and _ ] * = Zero or more times
$ # Match end of string
/i # End Pattern - Case Insensitive Matching
Giving you
if (preg_match('/^[a-z0-9_]*$/i', $input)) {
// input is valid
}
You could also use a + instead of * if you want to force at least one character as well.
if(preg_match('/^[0-9a-z_]+$/i', $string)) {
//if it matches
}
else {
//if it doesn't match
}
[0-9a-z_] is a character class that defines the digits 0 through 9, the letters a through z and the underscore. The i at the end makes the match case-insensitive. ^ and $ are anchors that match the beginning and end of the string respectively. The + means 1 or more characters.

PHP Regular Expression [accept selected characters only]

I want to accept a list of character as input from the user and reject the rest. I can accept a formatted string or find if a character/string is missing.
But how I can accept only a set of character while reject all other characters. I would like to use preg_match to do this.
e.g. Allowable characters are: a..z, A..Z, -, ’ ‘
User must able to enter those character in any order. But they must not allowed to use other than those characters.
Use a negated character class: [^A-Za-z-\w]
This will only match if the user enters something OTHER than what is in that character class.
if (preg_match('/[^A-Za-z-\w]/', $input)) { /* invalid charcter entered */ }
[a-zA-Z-\w]
[] brackets are used to group characters and behave like a single character. so you can also do stuff like [...]+ and so on
also a-z, A-Z, 0-9 define ranges so you don't have to write the whole alphabet
You can use the following regular expression: ^[a-zA-Z -]+$.
The ^ matches the beginning of the string, which prevents it from matching the middle of the string 123abc. The $ similarly matches the end of the string, preventing it from matching the middle of abc123.
The brackets match every character inside of them; a-z means every character between a and z. To match the - character itself, put it at the end. ([19-] matches a 1, a 9, or a -; [1-9] matches every character between 1 and 9, and does not match -).
The + tells it to match one or more of the thing before it. You can replace the + with a *, which means 0 or more, if you also want to match an empty string.
For more information, see here.
You would be looking at a negated ^ character class [] that stipulates your allowed characters, then test for matches.
$pattern = '/[^A-Za-z\- ]/';
if (preg_match($pattern, $string_of_input)){
//return a fail
}
//Matt beat me too it...

Categories