I always get stucked with the preg_match function.
I want the input to match with a-z, A-Z, 0-9, ##&-_., and nothing else.
So if ! is in the input, it need to return false.
What I have till now.
$string = "String-20";
return (preg_match("/[a-z][A-Z][0-9][##&-_.,]/i", $string)) ? true : false;
This should return true.
But keep return false.
You'll want to have those as one set rather than breaking them up like that. Your pattern would match a string like "aA0#"
You're saying "One character a-z, then one A-z, then one 0-9, then one of these special characters" but what you actually want is "Any number of these specific characters"
The ^ and $ mean start and end of the string so I think this should do what you want.
preg_match('/^[a-zA-Z0-9##&\-_.,]*$/i', $string)
Your regex matches 4 character chunk(s) anywhere inside a string (as preg_match can find partial matches): 2 letters, then a digit and some chars including uppercase letters because &-_ declares the following range:
Use
/^[a-z0-9##&_.,-]+$/i
or even (since \w here will match [a-zA-Z0-9_]):
/^[\w##&.,-]+$/
If an empty string is allowed, replace + (one or more occurrences) with * (zero or more occurrences).
The ^ anchor will make sure the engine starts matching at the beginning of the string and $ anchor will make sure the pattern should match up to the string end. The hyphen at the end of the character class will be parsed as a literal -.
Related
This question already has answers here:
How to check, if a php string contains only english letters and digits?
(10 answers)
Closed 12 months ago.
Title says it all: I am checking to see if a user's username contains anything that isn't a number or letter, such as €{¥]^}+<€, punctuation, spaces or even things like âæłęč. Is this possible in php?
You can use the ctype_alnum() function in PHP.
From the manual..
Check for alphanumeric character(s)
Returns TRUE if every character in text is either a letter or a digit, FALSE otherwise.
var_dump(ctype_alnum("æøsads")); // false
var_dump(ctype_alnum("123asd")); // true
Live demo at https://3v4l.org/5etr7
PHP does REGEX
What you want to do is fairly trivial, PHP has a number of regex functions
Testing a String For a Character
If all you want is to know IF a string contains non-alphanumeric characters, then just use preg_match():
preg_match( '/[^A-Za-z0-9]*/', $userName );
This will return 1 if the username contains anything other than alphanumeric (A-Z or a-z or 0to9), it returns 0 if it doesn't contain a non-alphanumeric.
Regex Pattern Elements
Regex PCRE patterns open and close with a delimiter such as a slash/, and that needs to be treated like a string (quoted):'/myPattern/' Some other key features are:
[ brackets contain match sets ]
[a-z] // means match any lowercase letter
This pattern means check the current character in the $String relative to the pattern in these brackets, in this case match any lowercase letter a to z.
^ Caret (Meta-Character)
[^a-z] // means no lowercase letters If the caret ^ (aka hat) is the first character inside brackets, it NEGATES the pattern inside brackets so [^A7] means match anything EXCEPT uppercase A and the numeral 7. (Note: when outside brackets, the caret ^ means the start of the string.)
\w\W\d\D\s\S. Meta-Characters (WildCards)
\w // match all alphanumeric An escaped (i.e. preceded by a backslash \ ) lowercase w means match any "word" character, i.e. alphanumeric and the underscore _, this is shorthand for [A-Za-z0-9_]. The uppercase \W is the NOT word character, equivalent to [^A-Za-z0-9_] or [^\w]
. // (dot) match ANY single character except return/newline
\w // match any word character [A-Za-z0-9_]
\W // NOT any word character [^A-Za-z0-9_]
\d // match any digit [0-9]
\D // NOT any digit [^0-9].
\s // match any whitespace (tab, space, newline)
\S // NOT any whitespace
.*+?| Meta-Characters (Quantifiers))
These modify the behavior outside of a set []
* // match previous character or [set] zero or more times,
// so .* means match everything (including nothing) until reaching a return/newline.
+ // match previous at least one or more times.
? // match previous only zero or one time (i.e. optional).
| // means logical OR eg.: com|net means match either literal "com" or "net"
Not shown: capture groups, backreferences, substitution (the real power of regex). See https://www.phpliveregex.com/#tab-preg-match for more including a live pattern-match playground that is based on the PHP functions, and delivers results as arrays.
Back To Your StringCleaning
So for your pattern, to match all non-letters and numbers (including underscores) you need either: '/[^A-Za-z0-9]*/' or '/[\W_]*/'
Strip Search
If instead you want to STRIP all the non-alpha characters from a string then use preg_replace( $Regex, $Replacement, $StringToClean )
<?php
$username = 'Svéñ déGööfinøff';
echo preg_replace('/[\W_]*/', '', $username);
?>
The output is: SvdGfinff If you'd prefer to replace certain accented letters with standard latin ones to keep the names reasonably readable, then I believe you'd need a lookup table (array). There is one ready to use at the PHP site
I understand that the regex pattern must match a string which starts with the combination and the repetition of the following characters:
a-z
A-Z
a white-space character
And there is no limitation to how the string may end!
First Case
So a string such as uoiui897868 (any string that only starts with space, a-z or A-Z) matches the pattern... (Sure it does)
Second Case
But the problem is a string like 76868678jugghjiuh (any string that only starts with a character other than space, a-z or A-Z) matches too! This should not happen!
I have checked using the php function preg_match() too , which returns true (i.e. the pattern matches the string).
Also have used other online tools like regex101 or regexr.com. The string does match the pattern.
Can anybody could help me understand why the pattern matches the string described in the second case?
/^[a-zA-Z ]*/
Your regex will match strings that "begin with" any number (including zero) of letters or spaces.
^ means "start of string" and * means "zero or more".
Both uoiui897868 and 76868678jugghjiuh start with 0 or more letters/spaces, so they both match.
You probably want:
/^[a-zA-Z ]+/
The + means "one or more", so it won't match zero characters.
Your regex is completely useless: it will trivially match any string (empty, non-empty, with numbers, without,...), regardless of its structure.
This because
with ^, you enforce the begin of the string, now every string has a start.
You use a group [A-Za-z ], but you use a * operator, so 0 or more repititions. Thus even if the string does not contain (or begins with) a character from [A-Za-z ], the matcher will simply say: zero matches and parse the remaining of the string.
You need to use + instead of * to enforce "at least one character".
The '*' quantifier on the end means zero or more matches of the character, so all strings will match. Perhaps you want to drop the wildcard quantifier, or change it to a '+' quantifier, and add a '$' on the end to test the whole string.
What you really want is to match one or more of the preceding characters.
For that you use +
/^[a-zA-Z ]+/
I would like to use php's preg_match to capture substrings which comprise:
A-Z, a-z, all accented chars
space
hyphen
It must not capture strings with anything else in them, including numeric chars.
This example is close but also catches strings containing numeric chars:
preg_match("/([\p{L} -]+)/u", $string)
A similar question already had an answer (the one above) but it doesn't work...
If I understand your problem correctly (which I might not have), then you simply want to use the ^ and $ characters to specify that "the match HAS to start here and the match HAS to end here":
/^([\p{L} -]+)$/u
^ ^
Then preg_match would only return true if the string had nothing else in it.
DEMO
Edit:
If hyphens/spaces are only allowed in the middle:
/^([\p{L}](?:[\p{L} -]+[\p{L}])?)$/u
DEMO
To be honest, I don't really get RegEx. So I'm completely oblivious as to where I'm going wrong here.
I'm looking for a RegEx that accepts alphanumeric characters only (and underscores, it's for usernames). I've searched around here and found numerous example RegExes that I've tried and not one of them has worked.
Among others, which I've mostly gotten from answers around here, I've tried
^[a-zA-Z0-9_]*$
/[^a-z_\-0-9]/i
/^\w+$/
To match these, I've tried (with each of the regexes)
if(preg_match("/^\w+$/", $username)) {
//don't accept
}
and
if(!preg_match("/^\w+$/", $username)) {
//don't accept
}
and
if(preg_match("/^\w+$/", $username) == 1) {
//don't accept
}
and
if(preg_match("/^\w+$/", $username) == 0) {
//don't accept
}
etc...
Each and every single time it's accepting special characters (I've tried &, $, ^, and %).
What exactly am I doing wrong here? Is it the format of the RegEx? Is it how I'm asking it to check?
Also, what exactly is the return type I get if it's found special characters? (i.e One I don't want to accept)
preg_match returns 1 if the input string matched the pattern you gave, and 0 if it didn't.
You want each character in your usernames to be alphanumeric (plus underscore). One PCRE way of expressing that is with a character class inside square brackets, like this one: [A-Za-z0-9_]. There are a couple of ways you could use this basic class to do what you want.
One way is a "negative" search: try to match a non-alphanumeric character, and if you do, then the test fails. For this, we just add a carat at the front of the character class. This means we're matching any character not in that set.
So, the following pattern matches "any non-alphanumeric, non-underscore character." Here, a match means an invalid username:
if (preg_match('/[^A-Za-z0-9_]/', $username)) {
// invalid username
}
Or, you could do the opposite kind of match, where you give a pattern for a valid username and check if you match that. This time, we don't change the character class itself at all, but we add the + quantifier after it, meaning we're matching one or more of the "good" characters.
Additionally, we wrap the ^ and $ beginning-and-end-of-string anchors around our pattern. (It's a little confusing, but a carat at the beginning of a pattern has a completely different meaning from a carat at the beginning of a character class, within the brackets).
The end result is a pattern that means: "1 or more alphanumeric characters (plus underscore) and nothing else." A match on this one means a valid username:
if (preg_match('/^[A-Za-z0-9_]+$/', $username)) {
// valid username
}
if (preg_match("^[a-zA-Z0-9_]+$", $username) === 1) {
// Good username
}
else {
// Bad username
}
The use of the strict equality operator (===) means we're comparing what preg_match() returns to 1, the number, not the boolean value. If it returns a 0, it means there are no matches, a boolean false, an error occored. Check out the page for preg_match for more information: http://php.net/manual/en/function.preg-match.php
Per the PHP manual *preg_match* will return 0 if it can't find a successful match with your regex and FALSE if en error occurs. So if you want to make sure you're testing for 0, and not something which can evaluate to false, you should use the === operator.
If you only want letters and underscores you can use a character class of [a-z_] which specifies that the range of characters for a to z and the _ symbol will match. And the + following the class specifies that you want one or multiple of the same. The ^ says the pattern must match from the beginning of the text, while the $ says that the pattern must match up until the end of the text.
if (preg_match("/^[a-z_]+$/i", $text_variable) === 1) {
//"A match was found.";
} else {
//"A match was not found.";
}
Regex is very easy to understand if you get the basics :)
I'll try to explain to you all three expressions you tried:
With ^[a-zA-Z0-9_]*$ string will be matched which:
^ // from the beginning...
[a-zA-Z0-9_] // contains only characters a-z or A-Z or 0-9 or _ sign
* // and has 0 or more of such characters
$ // to the end
Matched strings for example:
(empty string - since you told 0 or more characters)
abc09
fidjwieofoj4fio3j4fiojrfioj3ijfo
000000000000000000000
__________
and_many_many_more_as_long_as_they_contain_alpha_characters_and___sign
With /[^a-z_-0-9]/i string will be matched which:
[^a-z_\-0-9]
// ^ means "the opposite" so that subset describes characters
// which are not included in it
// (are not a-z or _ sign, or - dash sign, or 0-9 numbers)
i modifier
// stands for case insensitive, all letters are treated as lowercase
You did not add * or ? or + after the subset so basically you are looking for one character only, and because you did not put your regexp between ^ and $ signs, this expression will finally match any text which contains at least one character which is not A-Z or a-z, or _ sign, or - dash sign, or 0-9 numbers.
Matched strings for example:
!
a>a
A<9
ffffffffff.dflskfdfd
00000,
]]]]]]]]]]]]]]]]]]
and so-on
With /^\w+$/ string will be matched which:
^ // from the beginning
\w // contains only characters a-z or A-Z or 0-9 or _ sign
+ // and the string must be at least 1 character long
$ // to the end
Probably the most useful regular expression. Remember, \w is just an alias for [a-zA-Z0-9_]. This regexp will match only whole string which is not empty and contains only alphanumeric characters and _ sign.
Matched strings for example:
mike
alice
bob10
0000000000
1111
9
php
user_example
Hope that helps. To you, most useful expression imvho to match valid usernames would be /^\w{3,15}$/ as it would match any string which is 3 to 15 characters long and consist only of alphanumeric characters and the underscore sign (a-z A-Z 0-9 _).
Try this:
<?php
function isValidUsername($username)
{
return preg_match('/^\w{3,15}$/', $username) == 1;
}
echo isValidUsername('mike999') ? 'Yes' : 'No' , '<br>';
echo isValidUsername('alice!') ? 'Yes' : 'No';
Cheers.
I want to accept a list of character as input from the user and reject the rest. I can accept a formatted string or find if a character/string is missing.
But how I can accept only a set of character while reject all other characters. I would like to use preg_match to do this.
e.g. Allowable characters are: a..z, A..Z, -, ’ ‘
User must able to enter those character in any order. But they must not allowed to use other than those characters.
Use a negated character class: [^A-Za-z-\w]
This will only match if the user enters something OTHER than what is in that character class.
if (preg_match('/[^A-Za-z-\w]/', $input)) { /* invalid charcter entered */ }
[a-zA-Z-\w]
[] brackets are used to group characters and behave like a single character. so you can also do stuff like [...]+ and so on
also a-z, A-Z, 0-9 define ranges so you don't have to write the whole alphabet
You can use the following regular expression: ^[a-zA-Z -]+$.
The ^ matches the beginning of the string, which prevents it from matching the middle of the string 123abc. The $ similarly matches the end of the string, preventing it from matching the middle of abc123.
The brackets match every character inside of them; a-z means every character between a and z. To match the - character itself, put it at the end. ([19-] matches a 1, a 9, or a -; [1-9] matches every character between 1 and 9, and does not match -).
The + tells it to match one or more of the thing before it. You can replace the + with a *, which means 0 or more, if you also want to match an empty string.
For more information, see here.
You would be looking at a negated ^ character class [] that stipulates your allowed characters, then test for matches.
$pattern = '/[^A-Za-z\- ]/';
if (preg_match($pattern, $string_of_input)){
//return a fail
}
//Matt beat me too it...