This is my regular expression code:
"onlyLetterSp": {
"regex": /^[a-zA-Z\ \']+$/,
"alertText": "* Letters only"
}
How can I change this to allow English characters as well as Japanese?
I found this link:
http://www.localizingjapan.com/blog/2012/01/20/regular-expressions-for-japanese-text/
There are apparently a few different character sets for different types of Japanese.
Hiragana for example is:
[\x3041-\x3096]
You must be looking for the u regex modifier, which stands for Unicode. With it you can use POSIX symbols like \w to include whatever "word" characters you like
Related
I am trying to create a regex to filter only alphabets or numbers from English and Japanese languages. This is what I have tried,
preg_match('/(?![\n\r])[\x00-\x1F\x80-\xFF][^\x4e00-\x9fa0)]/u', $value)
But I am not getting the desired result. What might I be doing wrong?
You should use unicode character properties
Also you may have a look on this website which contains some other regex examples http://www.localizingjapan.com/blog/2012/01/20/regular-expressions-for-japanese-text/
Updated character list based on #Álvaro González notice about the three alphabets.
this regex should do what you expect :
preg_match('/[\p{L}\p{N}\p{Katakana}\p{Hiragana}\p{Han}]+/u', $value)
\p{L} will match any letter, \p{N} any number and \p{Katakana} will match any Katakana char etc...
You may need to add word delimiters into the accepted characters if you are not matching single words
The following regex checks the line is not Japanese language:
if(!preg_match('/^[\x{3041}-\x{3096}\x{30a1}-\x{30fc}\x{4e00}-\x{9faf}]+$/u', $line)){
// ...
}
You can find more in the document:
https://www.w3.org/International/questions/qa-forms-utf-8.html
trying to let users use special characters in other languages such as Spanish or French. I originally had this:
"/[^A-Za-z0-9\.\_\- ]/i"
and then changed it to
"/[^\p{L}\p{N}\.\_\-\(\) ]/i"
but still doesn't work. letters such as "ñ" should be allowed. Thanks.
Revision:
I found that adding a (*UTF8) at the beginning helps solve the problem. So I'm using the following code:"/(*UTF8)[^\p{L}A-Za-z0-9._- ]/i"
Revision:
After looking at the answers I decided to use: "/[^\p{Xwd}. -]/u". Thanks(It works even with the Chinese alphabet.
for latin languages you can use the \p{Latin} character class:
/[^\p{Latin}0-9._ -]/u
But if you want all other letters and digits:
/[^\p{Xwd}. -]/u
The "u" modifier indicates that the string must be read as an unicode string.
You could also look into specifying a unicode range, ie. [\w\u00C0-\u024F.-]+ to include Latin extended letters. But it's hard to try and restrict characters to such a broad subset; what about Chinese, Vietnamese, etc.? I'm with Dagon on this one – best to allow anything.
I need a regular expression that accepts only Greek chars and spaces for a name field in my form (PHP).
I've tried several findings on the net but no luck. Any help will be appreciated.
Full letters solution, with accented letters:
/^[A-Za-zΑ-Ωα-ωίϊΐόάέύϋΰήώ]+$/
I'm not too current on the Greek alphabet, but if you wanted to do this with the Roman alphabet, you would do this:
/^[a-zA-Z\s]*$/
So to do this with Greek, you replace a and z with the first and last letters of the Greek alphabet. If I remember right, those are α and ω. So the code would be:
/^[α-ωΑ-Ω\s]*$/
The other answers here didn't work for me. Greek Unicode characters are included in the following two blocks
Greek and Coptic U+0370 to U+03FF (normal Greek letters)
Greek Extended U+1F00 to U+1FFF (Greek letters with diacritics)
The following regex matches whole Greek words:
[\u0370-\u03ff\u1f00-\u1fff]+
I will let the reader translate that to whichever programming language format they may be using.
See also
Unicode charts
To elaborate on leo pal's answer, an even more complete regex, which would accept even capital accented Greek characters, would be the following:
/^[α-ωΑ-ΩίϊΐόάέύϋΰήώΊΪΌΆΈΎΫΉΏ\s]+$/
With this, you get:
α-ω - lowercase letters
Α-Ω - uppercase letters
ίϊΐόάέύϋΰήώ - lowercase letters with all (modern) diacritics
ΊΪΌΆΈΎΫΉΏ - uppercase letters with all (modern) diacritics
\s - any whitespace character
Note: The above does not take into account ancient Greek diacritics (ᾶ, ἀ, etc.).
What worked for me was /^[a-zA-Z\p{Greek}]+$/u
source: http://php.net/manual/fr/function.preg-match.php#105324
Greek & Coptic in utf-8 seem to be in the U+0370 - U+03FF range. Be aware: a space, a -, a . etc. are not....
Just noticed at the excellent site https://regexr.com/ that the range of Greek characters are from "Ά" (902) to "ώ" (974) with 3 characters that are not aphabet characters: "·" (903) and unprintable characters 0907, 0909
So a range [Ά-ώ] will cover 99.99% of the cases!
With (?![·\u0907\u0909])[Ά-ώ] covers 100%. (I don't check this at PHP though)
The modern Greek alphabet in UTF-8 is in the U+0386 - U+03CE range.
So the regex you need to accept Greek only characters is:
$regex_gr = '/^[\x{0386}-\x{03CE}]+$/u';
or (with spaces)
$regex_gr_with_spaces = '/^[\x{0386}-\x{03CE}\s]+$/u';
How validate username using regexp ?
For English letters, Numbers and spaces I am using :
/^[a-zA-Z]{1}([a-zA-Z0-9]|\s(?!\s)){4,14}[^\s]$/
How can i add arabic letters ?
Well that would depend if your characters are coming in as cp1256 or unicode. If its unicode you can use the range such as #([\x{0600}-\x{06FF}]+\s*) in your expression.
you would use unicode regexes and match all letters:
/\pL+/u
(one or more letters)
I need a function or a regular expression to validate strings which contain alpha characters (including French ones), minus sign (-), dot (.) and space (excluding everything else)
Thanks
/^[a-zàâçéèêëîïôûùüÿñæœ .-]*$/i
Use of /i for case-insensitivity to make things simpler. If you don't want to allow empty strings, change * to +.
Simplified solution:
/^[a-zA-ZÀ-ÿ-. ]*$/
Explanation:
^ Start of the string
[ ... ]* Zero or more of the following:
a-z lowercase alphabets
A-Z Uppercase alphabets
À-ÿ Accepts lowercase and uppercase characters including letters with an umlaut
- dashes
. periods
spaces
$ End of the string
Try:
/^[\p{L}-. ]*$/u
This says:
^ Start of the string
[ ... ]* Zero or more of the following:
\p{L} Unicode letter characters
- dashes
. periods
spaces
$ End of the string
/u Enable Unicode mode in PHP
The character class I've been using is the following:
[\wÀ-Üà-øoù-ÿŒœ]. This covers a slightly larger character set than only French, but excludes a large portion of Eastern European and Scandinavian diacriticals and letters that are not relevant to French. I find this a decent compromise between brevity and exclusivity.
To match/validate complete sentences, I use this expression:
[\w\s.,!?:;&#%’'"()«»À-Üà-øoù-ÿŒœ], which includes punctuation and French style quotation marks.
Simply use the following code :
/[\u00C0-\u017F]/
This line of regex pass throug all of cirano de bergerac french text:
(you will need to remove markup language characters
http://www.gutenberg.org/files/1256/1256-8.txt
^([0-9A-Za-z\u00C0-\u017F\ ,.\;'\-()\s\:\!\?\"])+
All French and Spanish accents
/^[a-zA-ZàâäæáãåāèéêëęėēîïīįíìôōøõóòöœùûüūúÿçćčńñÀÂÄÆÁÃÅĀÈÉÊËĘĖĒÎÏĪĮÍÌÔŌØÕÓÒÖŒÙÛÜŪÚŸÇĆČŃÑ .-]*$/
This might suit:
/^[ a-zA-Z\xBF-\xFF\.-]+$/
It lets a few extra chars in, like ÷, but it handles quite a few of the accented characters.
/[A-Za-z-\.\s]/u should work.. /u switch is for UTF-8 encoding