PHP - check if a string is a multibyte alphanumeric character - php

I need to find out if a string contains exactly one alphanumeric character. The obvious solution would be to check the length and ASCII code (A-Z, a-z, 0-9) - but the problem is that I'm working with UTF-8 strings and accented letters like á, ř, č etc.
Is there a simple way to check if an UTF-8 character is alphanumeric (latin alphabet letter, possibly accented, or a number)?

This is easily done with a regular expression:
$count = preg_match_all('/\w/u', $string);
if ($count === 1) {
echo "One alphanumeric character found";
}
\w will match any "word" character, which are letters, numbers, and underscore. The u modifier treats the string as unicode so it will include accented characters.
If matching underscores is a problem you could use [:alnum:] as a character class match instead.

Related

Russian character and alphanumeric converter

How can I remove non-alphanumeric characters from a string in PHP while keeping Russian characters like ч and г?
I tried to translate the string and then clean it with preg_replace, but this would remove the Russian characters.
You can do it with preg_replace. You just have to build a regular expression that matches what you desire.
If I understood your question correctly, this should work:
preg_replace('/[^\p{L}\p{N}\s]/u', '', $string);
Brief explanation:
^ matches any character that is not in this set.
\p{L} matches any letter (including the Cyrillic alphabet).
\p{N} matches any number.
\s matches any whitespaces.
/u adds Unicode support.
If you only want to match letters from the Cyrillic alphabet., you may want to use \p{Cyrillic} instead of \p{L}.

How can I detect letters in other languages (not English) in the string?

Here is my code:
function isValid($string) {
return strlen($string) >= 6 &&
strlen($string) <= 40 &&
preg_match("/\d/", $string) &&
preg_match("/[a-zA-Z]/", $string);
}
// Negative test cases
assert(!isValid("hello"));
// Positive test cases
assert(isValid("abcde2"));
As you see, my script validates a string based on 4 conditions. Now I'm trying to develop this one:
preg_match("/[a-zA-Z]/", $string)
This condition returns true just for English letters. How can I also add other letters like ا ب ث چ. Well how can I do that?
Note: Those characters aren't Arabic, they are Persian.
To match either an English or Persian letter, you may use
preg_match('/[\x{0600}-\x{06FF}A-Z]/iu', $string)
The \x{0600}-\x{06FF} range is supposed to match all Persian letters. The A-Z range will match all ASCII letters (both upper- ans lowercase since the /i case insensitive modifier is used). The /u modifier is necessary since you are working with Unicode characters.
Also, use mb_strlen rather than strlen when checking a Unicode string length, it will count the Unicode code points correctly.
As for
Your password should be containing at least a letter (that letter can be in any language
You need to use
preg_match('/\p{L}/u', $string)
or
preg_match('/\p{L}\p{M}*+/u', $string)
^^^^^^^^^^^^
that will match any letter (even the one with a diacritic after it). \p{L} matches any base Unicode letter, and \p{M}*+ will possessively match 0+ diacritics after it. If the match value is not used, /\p{L}/u will suffice for the check.

PHP Only allow alphanumerical Latin lowercase characters and dash

I am using preg_match to validate a input text field that will be used for a subdomain name. I only want to allow alphanumerical Latin lowercase characters and dash no spaces or anything else.
Will the following be enough
if(preg_match('/^[a-zA-Z0-9 \-]+$/', $instance)) {
return true;
}
The regex You are currently having is allowing a-z, A-Z 0-9 and spance and - (the \ is just for escaping)
So your regex would be something like this (only allowing lowercase and -)
if(preg_match('/^[a-z0-9\-]+$/', $instance)) {
return true;
}
The expression you have - ^[a-zA-Z0-9 \-]+$ - currently matches both upper- and lowercase Latin letters, Arabic digits, a space and a literal hyphen.
You say you do not want to allow any spaces or uppercase letters.
In this case, all you need to do it to remove them from the character class:
/^[a-z0-9-]+$/
The regex breakdown:
^ - the beginning of a string
[a-z0-9-]+ - 1 or more characters that are either lowercase Latin letters (a-z), or digits (0-9), or a hyphen (- at the end of the character class is almost always considered a literal in all regex flavors (but some weird ones))
$ - end of string.
See demo

preg_match some characters

I need an regex to my preg_match(), it should preg (allow) the following characters:
String can contain only letters, numbers, and the following punctuation marks:
full stop (.)
comma (,)
dash (-)
underscore (_)
I have no idea , how it can be done on regex, but I think there is a way!
^[\p{L}\p{N}.,_-]*$
will match a string that contains only (Unicode) letters, digits or the "special characters" you mentioned. [...] is a character class, meaning "one of the characters contained here". You'll need to use the /u Unicode modifier for this to work:
preg_match(`/^[\p{L}\p{N}.,_-]*$/u', $mystring);
If you only care about ASCII letters, it's easier:
^[\w.,-]*$
or, in PHP:
preg_match(`/^[\w.,-]*$/', $mystring);

Regex which validate for all caps

I want a regular expression in PHP which will check for all caps the string.
If the given string contains all capital letters irrespective of numbers and other characters then it should match them.
Since you want to match other characters too, look for lowercase letters instead of uppercase letters. If found, return false. (Or use tdammers' suggestion of a negative character class.)
return !preg_match('/[a-z]/', $str);
You can also skip regex and just compare strtoupper($str) with the original string, this leaves digits and symbols intact:
return strtoupper($str) == $str;
Both don't account for multi-byte strings though; for that, you could try adding a u modifier to the regex and using mb_strtoupper() respectively (I've not tested either — could someone more experienced with Unicode verify this?).
if (preg_match('/^[^\p{Ll}]*$/u', $subject)) {
# String doesn't contain any lowercase characters
} else {
# String contains at least one lowercase characters
}
\p{Ll} matches a Unicode lowercase letter; [^\p{Ll}] therefore matches any character that is not a lowercase letter.
Something like this maybe:
'/^[^a-z]*$/'
The trick is to use an exclusive character class: this one matches all characters that are not lower-case letters. Note that accented letters aren't checked.

Categories