PHP Regular Expression [accept selected characters only] - php

I want to accept a list of character as input from the user and reject the rest. I can accept a formatted string or find if a character/string is missing.
But how I can accept only a set of character while reject all other characters. I would like to use preg_match to do this.
e.g. Allowable characters are: a..z, A..Z, -, ’ ‘
User must able to enter those character in any order. But they must not allowed to use other than those characters.

Use a negated character class: [^A-Za-z-\w]
This will only match if the user enters something OTHER than what is in that character class.
if (preg_match('/[^A-Za-z-\w]/', $input)) { /* invalid charcter entered */ }

[a-zA-Z-\w]
[] brackets are used to group characters and behave like a single character. so you can also do stuff like [...]+ and so on
also a-z, A-Z, 0-9 define ranges so you don't have to write the whole alphabet

You can use the following regular expression: ^[a-zA-Z -]+$.
The ^ matches the beginning of the string, which prevents it from matching the middle of the string 123abc. The $ similarly matches the end of the string, preventing it from matching the middle of abc123.
The brackets match every character inside of them; a-z means every character between a and z. To match the - character itself, put it at the end. ([19-] matches a 1, a 9, or a -; [1-9] matches every character between 1 and 9, and does not match -).
The + tells it to match one or more of the thing before it. You can replace the + with a *, which means 0 or more, if you also want to match an empty string.
For more information, see here.

You would be looking at a negated ^ character class [] that stipulates your allowed characters, then test for matches.
$pattern = '/[^A-Za-z\- ]/';
if (preg_match($pattern, $string_of_input)){
//return a fail
}
//Matt beat me too it...

Related

How do I check if a string is composed only of letters and numbers? (PHP) [duplicate]

This question already has answers here:
How to check, if a php string contains only english letters and digits?
(10 answers)
Closed 12 months ago.
Title says it all: I am checking to see if a user's username contains anything that isn't a number or letter, such as €{¥]^}+<€, punctuation, spaces or even things like âæłęč. Is this possible in php?
You can use the ctype_alnum() function in PHP.
From the manual..
Check for alphanumeric character(s)
Returns TRUE if every character in text is either a letter or a digit, FALSE otherwise.
var_dump(ctype_alnum("æøsads")); // false
var_dump(ctype_alnum("123asd")); // true
Live demo at https://3v4l.org/5etr7
PHP does REGEX
What you want to do is fairly trivial, PHP has a number of regex functions
Testing a String For a Character
If all you want is to know IF a string contains non-alphanumeric characters, then just use preg_match():
preg_match( '/[^A-Za-z0-9]*/', $userName );
This will return 1 if the username contains anything other than alphanumeric (A-Z or a-z or 0to9), it returns 0 if it doesn't contain a non-alphanumeric.
Regex Pattern Elements
Regex PCRE patterns open and close with a delimiter such as a slash/, and that needs to be treated like a string (quoted):'/myPattern/' Some other key features are:
[ brackets contain match sets ]
[a-z] // means match any lowercase letter
This pattern means check the current character in the $String relative to the pattern in these brackets, in this case match any lowercase letter a to z.
^ Caret (Meta-Character)
[^a-z] // means no lowercase letters If the caret ^ (aka hat) is the first character inside brackets, it NEGATES the pattern inside brackets so [^A7] means match anything EXCEPT uppercase A and the numeral 7. (Note: when outside brackets, the caret ^ means the start of the string.)
\w\W\d\D\s\S. Meta-Characters (WildCards)
\w // match all alphanumeric An escaped (i.e. preceded by a backslash \ ) lowercase w means match any "word" character, i.e. alphanumeric and the underscore _, this is shorthand for [A-Za-z0-9_]. The uppercase \W is the NOT word character, equivalent to [^A-Za-z0-9_] or [^\w]
. // (dot) match ANY single character except return/newline
\w // match any word character [A-Za-z0-9_]
\W // NOT any word character [^A-Za-z0-9_]
\d // match any digit [0-9]
\D // NOT any digit [^0-9].
\s // match any whitespace (tab, space, newline)
\S // NOT any whitespace
.*+?| Meta-Characters (Quantifiers))
These modify the behavior outside of a set []
* // match previous character or [set] zero or more times,
// so .* means match everything (including nothing) until reaching a return/newline.
+ // match previous at least one or more times.
? // match previous only zero or one time (i.e. optional).
| // means logical OR eg.: com|net means match either literal "com" or "net"
Not shown: capture groups, backreferences, substitution (the real power of regex). See https://www.phpliveregex.com/#tab-preg-match for more including a live pattern-match playground that is based on the PHP functions, and delivers results as arrays.
Back To Your StringCleaning
So for your pattern, to match all non-letters and numbers (including underscores) you need either: '/[^A-Za-z0-9]*/' or '/[\W_]*/'
Strip Search
If instead you want to STRIP all the non-alpha characters from a string then use preg_replace( $Regex, $Replacement, $StringToClean )
<?php
$username = 'Svéñ déGööfinøff';
echo preg_replace('/[\W_]*/', '', $username);
?>
The output is: SvdGfinff If you'd prefer to replace certain accented letters with standard latin ones to keep the names reasonably readable, then I believe you'd need a lookup table (array). There is one ready to use at the PHP site

not sure about my regelur expression

I always get stucked with the preg_match function.
I want the input to match with a-z, A-Z, 0-9, ##&-_., and nothing else.
So if ! is in the input, it need to return false.
What I have till now.
$string = "String-20";
return (preg_match("/[a-z][A-Z][0-9][##&-_.,]/i", $string)) ? true : false;
This should return true.
But keep return false.
You'll want to have those as one set rather than breaking them up like that. Your pattern would match a string like "aA0#"
You're saying "One character a-z, then one A-z, then one 0-9, then one of these special characters" but what you actually want is "Any number of these specific characters"
The ^ and $ mean start and end of the string so I think this should do what you want.
preg_match('/^[a-zA-Z0-9##&\-_.,]*$/i', $string)
Your regex matches 4 character chunk(s) anywhere inside a string (as preg_match can find partial matches): 2 letters, then a digit and some chars including uppercase letters because &-_ declares the following range:
Use
/^[a-z0-9##&_.,-]+$/i
or even (since \w here will match [a-zA-Z0-9_]):
/^[\w##&.,-]+$/
If an empty string is allowed, replace + (one or more occurrences) with * (zero or more occurrences).
The ^ anchor will make sure the engine starts matching at the beginning of the string and $ anchor will make sure the pattern should match up to the string end. The hyphen at the end of the character class will be parsed as a literal -.

Difference between regular expressions

I'm trying to work out what the differences are between these two:
preg_match('-^[^'.$inv.']+\.?$-' , $name
preg_match('-['.$inv.']-', $name
Thanks
To make it easier to exemplify, assume $inv = 'a'…
-^[^a]+\.?$- needs to match the whole string, because of the caret and the dollar signs. The string is expected to start with a character other than "a", followed by 0 or more characters that are still not "a"s. The last character in this string, however, can be a dot (hence the question mark after the dot)
-[a]- will match the first "a" in the string and it will stop looking as soon as it finds a match because you're using preg_match() and not preg_match_all().
Your first pattern does not make any sense, though, since already \. = [^a] (translated into English as: a dot is already not an "a")
[EDIT] The first pattern can actually mean something when there's a dot in the character class.
First of, be careful with $inv, depending on its content it could be possible to do some injections in the regular expression. To avoid that issue, use preg_quote().
That said, the first regex will be :
^ <-- the given string must begin with
[ <-- one of those characters
^ <-- inverse the accepted characters (instead of accepted characters, the following characters will be those that are not accepted)
$inv <-- characters
] <-- end of the list of characters (here not accepted characters)
+ <-- at least one character must be matched, more are accepted
\. <-- a '.'
? <-- the previous '.' isn't mandatory
$ <-- the given string must end here
If $inv = 'abc.' it will match:
def
def.
d
d.
It won't match:
., because the . isn't accepted by the [^abc.] group, even though there is \.? later, at least one character must be before a .
de.s, because the . isn't accepted in the [^abc.] group, it is only possible to have it at the end of the given string thanks to \.?
a
deb
testc
teskopkl;;[!##$b., because of the b
an empty string, at least one character must be matched with '[^'.$inv.']+'
It could be simplified into '^[^'.$inv.']+$' (don't forget the preg_quote though)
The second one will be:
[ <-- one of those characters
$inv <-- characters
] <-- end of the list of characters (here accepted characters)
If $inv = 'abc.' it will match
any string containing at least one of the letters a, b, c or .
It won't match any string which doesn't contain a, b, c or ..
In plain English, the first one is looking for an entire line which begins with one or more characters not included with the $inv string, and ending with an optional period.
The second one simply tries to match one character as specified by the value for $inv.
The first pattern matches a line containing none of the characters in $inv, optionally ending the line with a period.
The second pattern matches anything containing any of the characters in $inv.
- is the pattern delimiter, marking the beginning and end of the expression. It can technically be any character, but is most often /.
^ denotes the beginning of the string
[ ] encapsulates a set of characters to be matched
[^ ] encapsulates a set of characters that should not be matched, any other character is considered to be a match.
+ denotes that the previous character or set of characters should be matched one or more times.
. normally matches any character, which is why it is escaped as \. here to indicate a literal period character.
? denotes that the previous character should be matched zero or one time.
$ denotes the end of a string.
['.$inv.']
Lets go with the second one to begin with, since it's the simpler one.
This simply matches a string containing any single one of the characters contained within the string in the variable $inv.
It could contain anything else before or after that character from $inv.
^[^'.$inv.']+\.?$
Now the second one:
This matches a string that contains anything except the characters in $inv (the ^ inside the [] is a negative match).
The match that isn't part of $inv must be at the start of the string (the ^ outside the [] matches the start of the string).
The string can contain as many matching characters as it likes (one or more; that's the + sign after the [])
After that, it may optionally have a dot (the \.? is an optional dot character).
And nothing else after that (the $ matches the end of the string).
Note that in both cases, if $inv contains any regex reserved characters, it will fail (or do something unexpected). You should use preg_quote() to avoid this.
So... uh, they're completely different expressions. Not so much "what's the difference between them" as "what's the same about them". Answer: not much.
The first matches a string from start up to the first occurance of $inv followed by one or zero periods where the string must end.
The second matches a string only containing $inv.
Essentially they are almost the same, except the first allows for a possible . at the end.

What am I doing wrong with this Regex?

To be honest, I don't really get RegEx. So I'm completely oblivious as to where I'm going wrong here.
I'm looking for a RegEx that accepts alphanumeric characters only (and underscores, it's for usernames). I've searched around here and found numerous example RegExes that I've tried and not one of them has worked.
Among others, which I've mostly gotten from answers around here, I've tried
^[a-zA-Z0-9_]*$
/[^a-z_\-0-9]/i
/^\w+$/
To match these, I've tried (with each of the regexes)
if(preg_match("/^\w+$/", $username)) {
//don't accept
}
and
if(!preg_match("/^\w+$/", $username)) {
//don't accept
}
and
if(preg_match("/^\w+$/", $username) == 1) {
//don't accept
}
and
if(preg_match("/^\w+$/", $username) == 0) {
//don't accept
}
etc...
Each and every single time it's accepting special characters (I've tried &, $, ^, and %).
What exactly am I doing wrong here? Is it the format of the RegEx? Is it how I'm asking it to check?
Also, what exactly is the return type I get if it's found special characters? (i.e One I don't want to accept)
preg_match returns 1 if the input string matched the pattern you gave, and 0 if it didn't.
You want each character in your usernames to be alphanumeric (plus underscore). One PCRE way of expressing that is with a character class inside square brackets, like this one: [A-Za-z0-9_]. There are a couple of ways you could use this basic class to do what you want.
One way is a "negative" search: try to match a non-alphanumeric character, and if you do, then the test fails. For this, we just add a carat at the front of the character class. This means we're matching any character not in that set.
So, the following pattern matches "any non-alphanumeric, non-underscore character." Here, a match means an invalid username:
if (preg_match('/[^A-Za-z0-9_]/', $username)) {
// invalid username
}
Or, you could do the opposite kind of match, where you give a pattern for a valid username and check if you match that. This time, we don't change the character class itself at all, but we add the + quantifier after it, meaning we're matching one or more of the "good" characters.
Additionally, we wrap the ^ and $ beginning-and-end-of-string anchors around our pattern. (It's a little confusing, but a carat at the beginning of a pattern has a completely different meaning from a carat at the beginning of a character class, within the brackets).
The end result is a pattern that means: "1 or more alphanumeric characters (plus underscore) and nothing else." A match on this one means a valid username:
if (preg_match('/^[A-Za-z0-9_]+$/', $username)) {
// valid username
}
if (preg_match("^[a-zA-Z0-9_]+$", $username) === 1) {
// Good username
}
else {
// Bad username
}
The use of the strict equality operator (===) means we're comparing what preg_match() returns to 1, the number, not the boolean value. If it returns a 0, it means there are no matches, a boolean false, an error occored. Check out the page for preg_match for more information: http://php.net/manual/en/function.preg-match.php
Per the PHP manual *preg_match* will return 0 if it can't find a successful match with your regex and FALSE if en error occurs. So if you want to make sure you're testing for 0, and not something which can evaluate to false, you should use the === operator.
If you only want letters and underscores you can use a character class of [a-z_] which specifies that the range of characters for a to z and the _ symbol will match. And the + following the class specifies that you want one or multiple of the same. The ^ says the pattern must match from the beginning of the text, while the $ says that the pattern must match up until the end of the text.
if (preg_match("/^[a-z_]+$/i", $text_variable) === 1) {
//"A match was found.";
} else {
//"A match was not found.";
}
Regex is very easy to understand if you get the basics :)
I'll try to explain to you all three expressions you tried:
With ^[a-zA-Z0-9_]*$ string will be matched which:
^ // from the beginning...
[a-zA-Z0-9_] // contains only characters a-z or A-Z or 0-9 or _ sign
* // and has 0 or more of such characters
$ // to the end
Matched strings for example:
(empty string - since you told 0 or more characters)
abc09
fidjwieofoj4fio3j4fiojrfioj3ijfo
000000000000000000000
__________
and_many_many_more_as_long_as_they_contain_alpha_characters_and___sign
With /[^a-z_-0-9]/i string will be matched which:
[^a-z_\-0-9]
// ^ means "the opposite" so that subset describes characters
// which are not included in it
// (are not a-z or _ sign, or - dash sign, or 0-9 numbers)
i modifier
// stands for case insensitive, all letters are treated as lowercase
You did not add * or ? or + after the subset so basically you are looking for one character only, and because you did not put your regexp between ^ and $ signs, this expression will finally match any text which contains at least one character which is not A-Z or a-z, or _ sign, or - dash sign, or 0-9 numbers.
Matched strings for example:
!
a>a
A<9
ffffffffff.dflskfdfd
00000,
]]]]]]]]]]]]]]]]]]
and so-on
With /^\w+$/ string will be matched which:
^ // from the beginning
\w // contains only characters a-z or A-Z or 0-9 or _ sign
+ // and the string must be at least 1 character long
$ // to the end
Probably the most useful regular expression. Remember, \w is just an alias for [a-zA-Z0-9_]. This regexp will match only whole string which is not empty and contains only alphanumeric characters and _ sign.
Matched strings for example:
mike
alice
bob10
0000000000
1111
9
php
user_example
Hope that helps. To you, most useful expression imvho to match valid usernames would be /^\w{3,15}$/ as it would match any string which is 3 to 15 characters long and consist only of alphanumeric characters and the underscore sign (a-z A-Z 0-9 _).
Try this:
<?php
function isValidUsername($username)
{
return preg_match('/^\w{3,15}$/', $username) == 1;
}
echo isValidUsername('mike999') ? 'Yes' : 'No' , '<br>';
echo isValidUsername('alice!') ? 'Yes' : 'No';
Cheers.

check the value entered by the user with regular expression in php

in my program php, I want the user doesn't enter any caracters except the alphabets
like that : "dgdh", "sgfdgdfg" but he doesn't enter the numbers or anything else like "7657" or "gfd(-" or "fd54"
I tested this function but it doesn't cover all cases :
preg_match("#[^0-9]#",$_POST['chaine'])
how can I achieve that, thank you in advance
The simplest can be
preg_match('/^[a-z]+$/i', $_POST['chaine'])
the i modifier is for case-insensitive. The + is so that at least one alphabet is entered. You can change it to * if you want to allow empty string. The anchor ^ and $ enforce that the whole string is nothing but the alphabets. (they represent the beginning of the string and the end of the string, respectively).
If you want to allow whitespace, you can use:
Whitespace only at the beginning or end of string:
preg_match('/^\s*[a-z]+\s*$/i', $_POST['chaine'])
Any where:
preg_match('/^[a-z][\sa-z]*$/i', $_POST['chaine']) // at least one alphabet
Only the space character is allowed but not other whitespace:
preg_match('/^[a-z][ a-z]*$/i', $_POST['chaine'])
Two things. Firstly, you match non-digit characters. That is obviously not the same as letter characters. So you could simply use [a-zA-Z] or [a-z] and the case-insensitive modifier instead.
Secondly you only try to find one of those characters. You don't assert that the whole string is composed of these. So use this instead:
preg_match("#^[a-z]*$#i",$_POST['chaine'])
Only match letters (no whitespace):
preg_match("#^[a-zA-Z]+$#",$_POST['chaine'])
Explanation:
^ # matches the start of the line
[a-zA-Z] # matches any letter (upper or lowercase)
+ # means the previous pattern must match at least once
$ # matches the end of the line
With whitespace:
preg_match("#^[a-zA-Z ]+$#",$_POST['chaine'])

Categories