I would like to have a regular expression that matches:
Arabic letters.
List item
English alphanumeric.
3 Spaces maximum.
4 Underscores maximum.
Any order.
I tried varies solution but couldn't solve it.
Here is what i have now:
preg_match('#^([^\W_]*\s){0,3}[^\W_]*$#', $username)
The above expression allows:
3 spaces maximum
English alpanumerics
No underscore allowed
You can check if your Regex flavour supports this \p{Arabic} or \p{InArabic}.
Also experiment with mb_ereg_match() function: http://si2.php.net/manual/en/function.mb-ereg-match.php
If that doesn't work, there is no other option than explicitly writing all arabic characters into the expression. Messy, but does the work.
Since you are using php, you can first list all arabic characters into a string variable and then add that variable to regex, for the code manageability's sake.
I don't know about arabic characters, but the following regexp should match the others
([a-zA-Z0-9]{1,})\s{0,3}_{0,4}
This will match
(Alphanumeric)(0-3 spaces)(0-4 underscores)
If there are more than 4 underscores, the last ones will be omitted
If there are more than 3 spaces then the part after the 3 spaces will be ignored.
EDIT:
For arabic letters: First declare a string containing all arabic letters
so you'll have
$arabic='all_arabic_letters';
Then your regexp string will be
$regex='[' . $arabic . ']{1,}([a-zA-Z0-9]{1,})\s{0,3}_{0,4}';
And match it as follows:
preg_match($regex, $username);
Related
I am trying to create a regex to filter only alphabets or numbers from English and Japanese languages. This is what I have tried,
preg_match('/(?![\n\r])[\x00-\x1F\x80-\xFF][^\x4e00-\x9fa0)]/u', $value)
But I am not getting the desired result. What might I be doing wrong?
You should use unicode character properties
Also you may have a look on this website which contains some other regex examples http://www.localizingjapan.com/blog/2012/01/20/regular-expressions-for-japanese-text/
Updated character list based on #Álvaro González notice about the three alphabets.
this regex should do what you expect :
preg_match('/[\p{L}\p{N}\p{Katakana}\p{Hiragana}\p{Han}]+/u', $value)
\p{L} will match any letter, \p{N} any number and \p{Katakana} will match any Katakana char etc...
You may need to add word delimiters into the accepted characters if you are not matching single words
The following regex checks the line is not Japanese language:
if(!preg_match('/^[\x{3041}-\x{3096}\x{30a1}-\x{30fc}\x{4e00}-\x{9faf}]+$/u', $line)){
// ...
}
You can find more in the document:
https://www.w3.org/International/questions/qa-forms-utf-8.html
I have got a forum site and I am currently working on the final piece, the registration form and I want to validate the username. It should only contain Arabic and English alphanumerics and a maximum of one space between words.
I've got the english alphanumeric part working but not the Arabic nor the double spaces.
I am using the preg_match() function to match the username input with the RegEX.
What I currently have:
!preg_match('/\p{Arabic}/', $username) && !preg_match('/^[A-Za-z0-9]$/')
//this is currently inside and if statement, so if they both don't match then it is false.
You should put the unicode properties inside your regular regex because this can all be done with 1 regex. You also need to quantify that character class otherwise you only allow 1 character. This regex should do it.
^[\p{Arabic}a-zA-Z\p{N}]+\h?[\p{N}\p{Arabic}a-zA-Z]*$
Use the u modifier in PHP so unicode works as expected.
PHP Usage:
preg_match('/^[\p{Arabic}a-zA-Z\p{N}]+\h?[\p{N}\p{Arabic}a-zA-Z]*$/u', $string);
Demo: https://regex101.com/r/fsRchS/2/
While there have been many questions regarding the non-english characters regex issue I have not been able to find a working answer. Moreover, there does not seem to be any simple PHP library which would help me to filter non-english input.
Could you please suggest me a regular expression which would allow
all english alphabet characters (abc...)
all non-english alphabet characters (šýüčá...)
spaces
case insensitive
in validation as well as sanitization. Essentially, I want either preg_match to return false when the input contains anything else than the 4 points above or preg_replace to get rid of everything except these 4 categories.
I was able to create
'/^((\p{L}\p{M}*)|(\p{Cc})|(\p{Z}))+$/ui' from http://www.regular-expressions.info/unicode.html. This regular expression works well when validating input but not when sanitizing it.
EDIT:
User enters 'český [jazyk]' as an input. Using '/^[\p{L}\p{Zs}]+$/u' in preg_match, the script determines that the string contains unallowed characters (in this case '[' and ']'). Next I would like to use preg_replace, to delete those unwanted characters. What regular expression should I pass into preg_replace to match all characters that are not specified by the regular expression stated above?
I think all you need is a character class like:
^[\p{L}\p{Zs}]+$
It means: The whole string (or line, with (?m) option) can only contain Unicode letters or spaces.
Have a look at the demo.
$re = "/^[\\p{L}\\p{Zs}]+$/um";
$str = "all english alphabet characters (abc...)\nall non-english alphabet characters (šýüčá...)\nspace s\nšýüčá šýüčá šýüčá ddd\nšýüčá eee 4\ncase insensitive";
preg_match_all($re, $str, $matches);
To remove all symbols that are not Unicode letters or spaces, use this code:
$re = "/[^\\p{L}\\p{Zs}]+/u";
$str = "český [jazyk]";
echo preg_replace($re, "", $str);
The output of the sample program:
český jazyk
I'm trying to create a pattern in PHP that matches 2 or more upper case characters in a string.
I've tried the following, but it only matches 2 or more upper case characters in a row, not the entire string:
preg_match('/[A-Z]{2,}/', $string);
For example, the string "aBcDe" or "Red Apple" should return true.
You just have to allow other characters between your uppercase letters:
^(?:.*?\p{Lu}){2}
Demo
I used \p{Lu} here to include Unicode characters as well. If you don't want that just use [A-Z] instead like you did in your pattern.
This simply means:
^ from the start of the pattern
(?: group:
.*? match anything, but as few chars as possible
\p{Lu} match an uppercase letter
){2} ... two times
If all you need to do is identify that a string contains at least 2 uppercase characters then you can use the following:
[A-Z].*?[A-Z]
Try it here.
If you need to identify the specific uppercase characters in the string then things get more complicated.
UPDATE: As Lucas mentioned, you need a different regex if you want unicode support.
\p{Lu}.*?\p{Lu}
^.*[A-Z].*[A-Z].*$
A simple pattern stating the same would do.See demo.
https://regex101.com/r/pT4tM5/23
[A-Z].*[A-Z]
is about as simple as it gets - match an uppercase followed by anything repeated any number of times followed by any other uppercase letter.
If you need to match the whole line/string that has at least 2 upper case letters, you can also use
^(?=(?:.*[A-Z]){2}).+$
Demo here.
I am writing a simple form pre-validator. I am looking for a PHP regular expression to match any irregular symbols. That is, non-Latic and non-numeric characters, and charachters which are not included in normal English punctuation (basically, match characters not in the second through forth column on this ascii table). Any regexp wizards out there who could help me out?
The second to fourth column can be translated into a simple regexp:
/[^ -~]/
matches any characters not between space and tilde.
Answer is over here.
The long & short of this. This PCRE: [^\x00-\x7F]