Preg Match for Letters in Arabic and Numbers in English - php

I want to allow my users to enter such record - Letters in Arabic and Numbers in english. (عسا871)
And I run a preg match on the top, I tried it this way -
if(!empty($_POST['number']))
if(!preg_match('/^[0-9]|[\p{Arabic}]+$/',$_POST["number"]))
die("Number Data Modification");
It still does not accept Arabic, what is a right way to do it?

The ^[0-9]|[\p{Arabic}]+$ regex accepts strings that have 1 ASCII digit at the start of string (^[0-9]) or (|) Arabic letters at the end ([\p{Arabic}]+).
Most probably you want to allow any string consisting of either ASCII digits or Arabic letters:
'/^[0-9\p{Arabic}]+$/u'
See the regex demo

Related

PHP regex to accept Japanese and english languages

I am trying to create a regex to filter only alphabets or numbers from English and Japanese languages. This is what I have tried,
preg_match('/(?![\n\r])[\x00-\x1F\x80-\xFF][^\x4e00-\x9fa0)]/u', $value)
But I am not getting the desired result. What might I be doing wrong?
You should use unicode character properties
Also you may have a look on this website which contains some other regex examples http://www.localizingjapan.com/blog/2012/01/20/regular-expressions-for-japanese-text/
Updated character list based on #Álvaro González notice about the three alphabets.
this regex should do what you expect :
preg_match('/[\p{L}\p{N}\p{Katakana}\p{Hiragana}\p{Han}]+/u', $value)
\p{L} will match any letter, \p{N} any number and \p{Katakana} will match any Katakana char etc...
You may need to add word delimiters into the accepted characters if you are not matching single words
The following regex checks the line is not Japanese language:
if(!preg_match('/^[\x{3041}-\x{3096}\x{30a1}-\x{30fc}\x{4e00}-\x{9faf}]+$/u', $line)){
// ...
}
You can find more in the document:
https://www.w3.org/International/questions/qa-forms-utf-8.html

Regex - Match only unicode alphabet not numbers

I'm using PHP, and trying to write a regular expression that matches any alphabet in any language but not numbers.
I've tried /\p{L}+/ But it matches unicode alphabets and numbers too. I'm checking against Arabic and English languages. English numbers doesn't pass which is normal, but Arabic numbers pass which is not normal.
Is there another regular expression that matches only alphabets in any language ?
The regex engine need to know that the target string is an unicode string (to avoid interpretation errors). To do that you can use the u modifier, that has two functions:
it expands classical shorthand character classes like \w \d to unicode characters (and not only ascii characters)
it forces the string to be seen as an unicode string
So you can use: /\pL+/u
Note that in your particular case, the first behavior is not needed, but you can only switch on the second behavior with: /(*UTF8)\pL+/ ((*UTF8) must be placed at the very begining of the pattern)

Regex for word characters in any language

Testing the PHP regex engine, I see that it considers only [0-9A-Za-z_] to be word characters. Letters of non-ASCII languages, such as Hebrew, are not matched as word characters with [\w]. Are there any PHP or Perl regex escape sequences which will match a letter in any language? I could add ranges for each alphabet that I expect to be used, but users will always surprise us with unexpected languages!
Note that this is not for security filtering but rather for tokenizing a text.
Try [\pL_] - see the reference at
http://php.net/manual/en/regexp.reference.unicode.php
Try \p{L}. It matches any kind of letter from any language. If you don't want to use char set [].

Match Arabic/English Alphanumeric using Regex

I would like to have a regular expression that matches:
Arabic letters.
List item
English alphanumeric.
3 Spaces maximum.
4 Underscores maximum.
Any order.
I tried varies solution but couldn't solve it.
Here is what i have now:
preg_match('#^([^\W_]*\s){0,3}[^\W_]*$#', $username)
The above expression allows:
3 spaces maximum
English alpanumerics
No underscore allowed
You can check if your Regex flavour supports this \p{Arabic} or \p{InArabic}.
Also experiment with mb_ereg_match() function: http://si2.php.net/manual/en/function.mb-ereg-match.php
If that doesn't work, there is no other option than explicitly writing all arabic characters into the expression. Messy, but does the work.
Since you are using php, you can first list all arabic characters into a string variable and then add that variable to regex, for the code manageability's sake.
I don't know about arabic characters, but the following regexp should match the others
([a-zA-Z0-9]{1,})\s{0,3}_{0,4}
This will match
(Alphanumeric)(0-3 spaces)(0-4 underscores)
If there are more than 4 underscores, the last ones will be omitted
If there are more than 3 spaces then the part after the 3 spaces will be ignored.
EDIT:
For arabic letters: First declare a string containing all arabic letters
so you'll have
$arabic='all_arabic_letters';
Then your regexp string will be
$regex='[' . $arabic . ']{1,}([a-zA-Z0-9]{1,})\s{0,3}_{0,4}';
And match it as follows:
preg_match($regex, $username);

PHP and regexp to accept only Greek characters in form

I need a regular expression that accepts only Greek chars and spaces for a name field in my form (PHP).
I've tried several findings on the net but no luck. Any help will be appreciated.
Full letters solution, with accented letters:
/^[A-Za-zΑ-Ωα-ωίϊΐόάέύϋΰήώ]+$/
I'm not too current on the Greek alphabet, but if you wanted to do this with the Roman alphabet, you would do this:
/^[a-zA-Z\s]*$/
So to do this with Greek, you replace a and z with the first and last letters of the Greek alphabet. If I remember right, those are α and ω. So the code would be:
/^[α-ωΑ-Ω\s]*$/
The other answers here didn't work for me. Greek Unicode characters are included in the following two blocks
Greek and Coptic U+0370 to U+03FF (normal Greek letters)
Greek Extended U+1F00 to U+1FFF (Greek letters with diacritics)
The following regex matches whole Greek words:
[\u0370-\u03ff\u1f00-\u1fff]+
I will let the reader translate that to whichever programming language format they may be using.
See also
Unicode charts
To elaborate on leo pal's answer, an even more complete regex, which would accept even capital accented Greek characters, would be the following:
/^[α-ωΑ-ΩίϊΐόάέύϋΰήώΊΪΌΆΈΎΫΉΏ\s]+$/
With this, you get:
α-ω - lowercase letters
Α-Ω - uppercase letters
ίϊΐόάέύϋΰήώ - lowercase letters with all (modern) diacritics
ΊΪΌΆΈΎΫΉΏ - uppercase letters with all (modern) diacritics
\s - any whitespace character
Note: The above does not take into account ancient Greek diacritics (ᾶ, ἀ, etc.).
What worked for me was /^[a-zA-Z\p{Greek}]+$/u
source: http://php.net/manual/fr/function.preg-match.php#105324
Greek & Coptic in utf-8 seem to be in the U+0370 - U+03FF range. Be aware: a space, a -, a . etc. are not....
Just noticed at the excellent site https://regexr.com/ that the range of Greek characters are from "Ά" (902) to "ώ" (974) with 3 characters that are not aphabet characters: "·" (903) and unprintable characters 0907, 0909
So a range [Ά-ώ] will cover 99.99% of the cases!
With (?![·\u0907\u0909])[Ά-ώ] covers 100%. (I don't check this at PHP though)
The modern Greek alphabet in UTF-8 is in the U+0386 - U+03CE range.
So the regex you need to accept Greek only characters is:
$regex_gr = '/^[\x{0386}-\x{03CE}]+$/u';
or (with spaces)
$regex_gr_with_spaces = '/^[\x{0386}-\x{03CE}\s]+$/u';

Categories