PHP and regexp to accept only Greek characters in form - php

I need a regular expression that accepts only Greek chars and spaces for a name field in my form (PHP).
I've tried several findings on the net but no luck. Any help will be appreciated.

Full letters solution, with accented letters:
/^[A-Za-zΑ-Ωα-ωίϊΐόάέύϋΰήώ]+$/

I'm not too current on the Greek alphabet, but if you wanted to do this with the Roman alphabet, you would do this:
/^[a-zA-Z\s]*$/
So to do this with Greek, you replace a and z with the first and last letters of the Greek alphabet. If I remember right, those are α and ω. So the code would be:
/^[α-ωΑ-Ω\s]*$/

The other answers here didn't work for me. Greek Unicode characters are included in the following two blocks
Greek and Coptic U+0370 to U+03FF (normal Greek letters)
Greek Extended U+1F00 to U+1FFF (Greek letters with diacritics)
The following regex matches whole Greek words:
[\u0370-\u03ff\u1f00-\u1fff]+
I will let the reader translate that to whichever programming language format they may be using.
See also
Unicode charts

To elaborate on leo pal's answer, an even more complete regex, which would accept even capital accented Greek characters, would be the following:
/^[α-ωΑ-ΩίϊΐόάέύϋΰήώΊΪΌΆΈΎΫΉΏ\s]+$/
With this, you get:
α-ω - lowercase letters
Α-Ω - uppercase letters
ίϊΐόάέύϋΰήώ - lowercase letters with all (modern) diacritics
ΊΪΌΆΈΎΫΉΏ - uppercase letters with all (modern) diacritics
\s - any whitespace character
Note: The above does not take into account ancient Greek diacritics (ᾶ, ἀ, etc.).

What worked for me was /^[a-zA-Z\p{Greek}]+$/u
source: http://php.net/manual/fr/function.preg-match.php#105324

Greek & Coptic in utf-8 seem to be in the U+0370 - U+03FF range. Be aware: a space, a -, a . etc. are not....

Just noticed at the excellent site https://regexr.com/ that the range of Greek characters are from "Ά" (902) to "ώ" (974) with 3 characters that are not aphabet characters: "·" (903) and unprintable characters 0907, 0909
So a range [Ά-ώ] will cover 99.99% of the cases!
With (?![·\u0907\u0909])[Ά-ώ] covers 100%. (I don't check this at PHP though)

The modern Greek alphabet in UTF-8 is in the U+0386 - U+03CE range.
So the regex you need to accept Greek only characters is:
$regex_gr = '/^[\x{0386}-\x{03CE}]+$/u';
or (with spaces)
$regex_gr_with_spaces = '/^[\x{0386}-\x{03CE}\s]+$/u';

Related

REGEX - how to do diacritic-insensitive in preg_match?

Is there a way to use preg_match (e.g. perhaps via a flag) to do diacritic-insensitive matches?
For example, say I'd like it to match:
cafe
café
I know I can do a regex like this: caf[eé]. This regex will work as long as I don't come across any other diacritic variations of e, like: ê è ë ē ĕ ě ẽ ė ẹ ę ẻ.
Of course, I could just list all of those diacritic variations in my regex, such as caf[eêéèëēĕěẽėẹęẻ]. And as long as I don't miss anything, I'll be good. I would just need to do this for all the letters in the alphabet, which is a tedious and prone-to-error solution.
It is not an option for me to find and replace the diacritic letters in the subject with their non-diacritic counterparts. I need to preserve the subject as-is.
The ideal solution for me is to have regex to be diacritic-insensitive. With the example above, I want my regex to simply be: cafe. Is this possible?
If you're open to matching a letter from any language (which includes characters with dicritic), then you could use \p{L} or \p{Letter} as shown here: https://regex101.com/r/UBGQI6/3
According to regular-expressions.info,
\p{L} or \p{Letter}: any kind of letter from any language.
\p{Ll} or \p{Lowercase_Letter}: a lowercase letter that has an uppercase variant.
\p{Lu} or \p{Uppercase_Letter}: an uppercase letter that has a lowercase variant.
\p{Lt} or \p{Titlecase_Letter}: a letter that appears at the start of a word when only the first letter of the word is capitalized.
\p{L&} or \p{Cased_Letter}: a letter that exists in lowercase and uppercase variants (combination of Ll, Lu and Lt).
\p{Lm} or \p{Modifier_Letter}: a special character that is used like a letter.
\p{Lo} or \p{Other_Letter}: a letter or ideograph that does not have lowercase and uppercase variants.
The only catch is that you can't search for particular letters with a diacritic such as È, and so you can't limit your search to English letters.

Preg Match for Letters in Arabic and Numbers in English

I want to allow my users to enter such record - Letters in Arabic and Numbers in english. (عسا871)
And I run a preg match on the top, I tried it this way -
if(!empty($_POST['number']))
if(!preg_match('/^[0-9]|[\p{Arabic}]+$/',$_POST["number"]))
die("Number Data Modification");
It still does not accept Arabic, what is a right way to do it?
The ^[0-9]|[\p{Arabic}]+$ regex accepts strings that have 1 ASCII digit at the start of string (^[0-9]) or (|) Arabic letters at the end ([\p{Arabic}]+).
Most probably you want to allow any string consisting of either ASCII digits or Arabic letters:
'/^[0-9\p{Arabic}]+$/u'
See the regex demo

Regular expression that allows letters (like "ñ") from any language

trying to let users use special characters in other languages such as Spanish or French. I originally had this:
"/[^A-Za-z0-9\.\_\- ]/i"
and then changed it to
"/[^\p{L}\p{N}\.\_\-\(\) ]/i"
but still doesn't work. letters such as "ñ" should be allowed. Thanks.
Revision:
I found that adding a (*UTF8) at the beginning helps solve the problem. So I'm using the following code:"/(*UTF8)[^\p{L}A-Za-z0-9._- ]/i"
Revision:
After looking at the answers I decided to use: "/[^\p{Xwd}. -]/u". Thanks(It works even with the Chinese alphabet.
for latin languages you can use the \p{Latin} character class:
/[^\p{Latin}0-9._ -]/u
But if you want all other letters and digits:
/[^\p{Xwd}. -]/u
The "u" modifier indicates that the string must be read as an unicode string.
You could also look into specifying a unicode range, ie. [\w\u00C0-\u024F.-]+ to include Latin extended letters. But it's hard to try and restrict characters to such a broad subset; what about Chinese, Vietnamese, etc.? I'm with Dagon on this one – best to allow anything.

Change RegEx to allow for both English & Japanese characters

This is my regular expression code:
"onlyLetterSp": {
"regex": /^[a-zA-Z\ \']+$/,
"alertText": "* Letters only"
}
How can I change this to allow English characters as well as Japanese?
I found this link:
http://www.localizingjapan.com/blog/2012/01/20/regular-expressions-for-japanese-text/
There are apparently a few different character sets for different types of Japanese.
Hiragana for example is:
[\x3041-\x3096]
You must be looking for the u regex modifier, which stands for Unicode. With it you can use POSIX symbols like \w to include whatever "word" characters you like

regular expression for French characters

I need a function or a regular expression to validate strings which contain alpha characters (including French ones), minus sign (-), dot (.) and space (excluding everything else)
Thanks
/^[a-zàâçéèêëîïôûùüÿñæœ .-]*$/i
Use of /i for case-insensitivity to make things simpler. If you don't want to allow empty strings, change * to +.
Simplified solution:
/^[a-zA-ZÀ-ÿ-. ]*$/
Explanation:
^ Start of the string
[ ... ]* Zero or more of the following:
a-z lowercase alphabets
A-Z Uppercase alphabets
À-ÿ Accepts lowercase and uppercase characters including letters with an umlaut
- dashes
. periods
spaces
$ End of the string
Try:
/^[\p{L}-. ]*$/u
This says:
^ Start of the string
[ ... ]* Zero or more of the following:
\p{L} Unicode letter characters
- dashes
. periods
spaces
$ End of the string
/u Enable Unicode mode in PHP
The character class I've been using is the following:
[\wÀ-Üà-øoù-ÿŒœ]. This covers a slightly larger character set than only French, but excludes a large portion of Eastern European and Scandinavian diacriticals and letters that are not relevant to French. I find this a decent compromise between brevity and exclusivity.
To match/validate complete sentences, I use this expression:
[\w\s.,!?:;&#%’'"()«»À-Üà-øoù-ÿŒœ], which includes punctuation and French style quotation marks.
Simply use the following code :
/[\u00C0-\u017F]/
This line of regex pass throug all of cirano de bergerac french text:
(you will need to remove markup language characters
http://www.gutenberg.org/files/1256/1256-8.txt
^([0-9A-Za-z\u00C0-\u017F\ ,.\;'\-()\s\:\!\?\"])+
All French and Spanish accents
/^[a-zA-ZàâäæáãåāèéêëęėēîïīįíìôōøõóòöœùûüūúÿçćčńñÀÂÄÆÁÃÅĀÈÉÊËĘĖĒÎÏĪĮÍÌÔŌØÕÓÒÖŒÙÛÜŪÚŸÇĆČŃÑ .-]*$/
This might suit:
/^[ a-zA-Z\xBF-\xFF\.-]+$/
It lets a few extra chars in, like ÷, but it handles quite a few of the accented characters.
/[A-Za-z-\.\s]/u should work.. /u switch is for UTF-8 encoding

Categories