Regex for Chinese / Japanese letters - php

Okai so I already have this regular expression for names allowed on my website.
However, I also wish to add other possible letters that names use.
Does someone have a good regex or know how I can make this more complete? I have searched for quite a while now, and I can't find anything that suits my needs.
This is my current regex for checking names:
$regex = "/^([a-zA-ZàáâäãåąčćęèéêëėįìíîïłńòóôöõøùúûüųūÿýżźñçčšžÀÁÂÄÃÅĄĆČĖĘÈÉÊËÌÍÎÏĮŁŃÒÓÔÖÕØÙÚÛÜŲŪŸÝŻŹÑßÇŒÆČŠŽ∂ð ,.'-])+$/";
if(preg_match($regex, $fullname)){
// do something
}

As Lucas Trzesniewski has mentioned, the \p{L} will include the [a-zA-Z], so I have removed from the pattern.
Thus, combining the character lists that you have included in the example; the pattern will look like this, /^[\p{L}\s,.'-]+$/u
^[]+$ matches the string from start to end, thus + also imply the need of matching one or more
\p{L} matches unicode characters
\s,.'- matches space, comma, period, single quotation, and dash
u the PCRE_UTF8 modifier, this modifier turns on additional functionality of PCRE that is incompatible with Perl.
if(preg_match("/^[\p{L}\s,.'-]+$/u", "お元気ですか你好吗how are you你好嗎,.'-") === 1) {
echo "match";
}
else {
echo "no match";
}
// match
if(preg_match("/^[\p{L}\s,.'-]+$/u", "お元気ですか你好吗how are you你好_嗎-,.'") === 1) {
echo "match";
}
else {
echo "no match";
}
// no match as there are underscore in 你好_嗎

Related

PHP Regex Strip Away All Emojis

I am trying to strip away all non-allowed characters from a string using regex. Here is my current php code
$input = "👮";
$pattern = "[a-zA-Z0-9_ !##$%^&*();\\\/|<>\"'+\-.,:?=]";
$message = preg_replace($pattern,"",$input);
if (empty($message)) {
echo "The string is empty";
}
else {
echo $message;
}
The emoji gets printed out when I run this when I want it to print out "The string is empty.".
When I put my regex code into http://regexr.com/ it shows that the emoji is not matching, but when I run the code it gets printed out. Any suggestions?
This pattern should do the trick :
$filteredString = preg_replace('/([^-\p{L}\x00-\x7F]+)/u', '', $rawString);
Some sequences are quite rare, so let's explain them:
\p{L} matches any kind of letter from any language
\x00-\x7F a single character in the range between (index 0) and (index 127) (case sensitive)
the u modifier who turns on additional functionality of PCRE that is incompatible with Perl. Pattern and subject strings are treated as UTF-8.
Your pattern is incorrect. If you want to strip away all the characters that are not in the list provided, then you have to use a negating character class: [^...]. Also, currently, [ and ] are being used as delimiters, which means, the pattern isn't seen as a character class.
The pattern should be:
$pattern = "~[^a-zA-Z0-9_ !##$%^&*();\\\/|<>\"'+.,:?=-]~";
This should now strip away the emoji and print your message.

PHP regex to allow newline didn't work

PHP preg_match to accept new line
I want to pass every post/string through PHP preg_match function. I want to accept all the alpha-numerics and some special characters. Help me edit my syntax to allow newline. As the users fill textarea and press enter. Following syntax does not allow new line.
Please feedback whether following special characters are properly done or not
*/_:,.?#;-*
if (preg_match("/^[0-9a-zA-Z \/_:,.?#;-]+$/", $string)) {
echo 'good';
else {
echo 'bad';
}
You were almost there!
The DOTALL modifier mentioned by others is irrelevant to your regex.
To allow new lines, we just add \r\n to your character class. Your code becomes:
if (preg_match("/^[\r\n0-9a-zA-Z \/_:,.?#;-]+$/", $string)) {
echo 'good';
else {
echo 'bad';
}
Note that this test and the regex can be written in a tidier way:
echo (preg_match("~^[\r\n\w /:,.?#;-]+$~",$string))? "***Good!***" : "Bad!";
See the result of the online demo at the bottom.
\w matches letters, digits and underscores, so we can get rid of them in the character class
Changing the delimiter to a ~ allows you to use a / slash without escaping it (you need to escape delimiters)
it's always safe to add backslash to any non-alphanumeric characters so:
/^[0-9a-zA-Z \/\_\:\,\.\?\#\;\-]+$/
Also use character classes:
/^[[:alnum:] \/\_\:\,\.\?\#\;\-]+$/
oh about the new lines:
/^[[:alnum:] \r\n\/\_\:\,\.\?\#\;\-]+$/
to be able to do that string ^ (also, it'll be easier/safer to use single quotes)
'/^[[:alnum:] \\r\\n\/\_\:\,\.\?\#\;\-]+$/'
You can use an alternation to factor in the newlines:
/^(?:[0-9a-zA-Z \/_:,.?#;-]|\r?\n)+$/
Btw, you can shorten the expression a bit by replacing [A-Za-z0-9_] with [\w\d]:
/^(?:[\w\d \/:,.?#;-]|\r?\n)+$/
So:
if (preg_match('/^(?:[\w\d \/:,.?#;-]|\r?\n)+$/', $string)) {
echo "good";
} else {
echo "bad";
}

preg_match whitespace problem

I need to write a regex but for some reason i cannot get the \s to find whitespace,
Im trying....
$name = "a ";
if( preg_match('^[a-z]+\s$', $name)){
echo "match";
} else {
echo "no match";
}
ive also tried
preg_match('^[a-z\s]+$');
and
preg_match('^[a-z]/\s/$');
Im just getting no match.
The full regex needs to allow 1 or more lowercase letters then a space then an uppercase letter followed by one or more lowercase, but....
preg_match('^[a-z]+\s[A-Z][a-z]+$');
isnt working as i'd expect it to going of the tutorials ive seen around the web (sitepoint etc...).
Anyone shed any light on it?
You have to add delimiters(docs) to the expressions:
$name = "a ";
if( preg_match('/^[a-z]+\s$/', $name)){
// ^ ^
// └----------┴----- here: slash as delimiter
echo "match";
} else {
echo "no match";
}
Here I used a slash / but you are pretty much free in choosing a delimiter (ok, not that much: any non-alphanumeric, non-backslash, non-whitespace character).
Make sure you have error reporting set to include warnings, e.g. by adding error_reporting(E_ALL). Then you would have seen this warning:
Warning: preg_match(): No ending delimiter '^' found in ... on line ...
no match
PHP treats the first character in the string as delimiter, so in your case the ^.

A solid nickname regexp

I want a regular expression to validate a nickname: 6 to 36 characters, it should contain at least one letter. Other allowed characters: 0-9 and underscores.
This is what I have now:
if(!preg_match('/^.*(?=\d{0,})(?=[a-zA-Z]{1,})(?=[a-zA-Z0-9_]{6,36}).*$/i', $value)){
echo 'bad';
}
else{
echo 'good';
}
This seems to work, but when a validate this strings for example:
11111111111a > is not valid, but it should
aaaaaaa!aaaa > is valid, but it shouldn't
Any ideas to make this regexp better?
I would actually split your task into two regex:
to find out whether it's a valid word: /^\w{6,36}$/i
to find out whether it contains a letter /[a-z]/i
I think it's much simpler this way.
Try this:
'/^(?=.*[a-z])\w{6,36}$/i'
Here are some of the problems with your original regex:
/^.*(?=\d{0,})(?=[a-zA-Z]{1,})(?=[a-zA-Z0-9_]{6,36}).*$/i
(?=\d{0,}): What is this for??? This is always true and doesn't do anything!
(?=[a-zA-Z]{1,}): You don't need the {1,} part, you just need to find one letter, and i flag also allows you to omit A-Z
/^.*: You're matching these outside of the lookaround; it should be inside
(?=[a-zA-Z0-9_]{6,36}).*$: this means that as long as there are between 6-36 \w characters, everything else in the rest of the string matches! The string can be 100 characters long mostly containing illegal characters and it will still match!
You can do it easily using two calls to preg_match as:
if( preg_match('/^[a-z0-9_]{6,36}$/i',$input) && preg_match('/[a-z]/i',$input)) {
// good
} else {
// bad
}

<?PHP, REGEX and me. A tragedy in three acts

long time listener. First time caller...
Not strictly a PHP question as it involves regular expressions but this one has got me tearing my hair out.
I have 3 regular expressions that I want to create, and only one is working correctly.
Now I am not sure whether this is due to the fact that:
I don't understand preg_match and
ereg and their return codes as I
haven't worked in PHP for about 7
years.
My regular expressions are
just plain wrong.
I am mentally disabled.
Either way here are the expressions and my feeble attempts in making them work.
1) Match any number starting with 2,3,4 or 5 then followed by 5 digits. (This one I think works)
code:
if (!ereg('/[2-5]\d{5}/', $_POST['packageNumber' )
{
echo "The package number is not the correct format.";
}
2) Match any number starting with 2,3,4 or 5 then followed by 5 digits then a period then a 1 or a 2.
if (!ereg("/[2-5]\d{5}\.[1-2]/", $_POST['packageModifier' )
{
echo "The package modifier is not the correct format.";
}
3) Match any combination of alphanumerics, spaces,periods and hypens up to 50 characters.
if (!ereg("/[0-9a-zA-Z\s\-\.]{0,50}/", $_POST['customerNumber' )
{
echo "The customer number is not the correct format.";
}
If anyone can please tell me what I am doing wrong I'll give them my first born.
You are mixing up PCRE functions and POSIX regular expression functions. You are using a Perl-Compatible regular expression with a POSIX regular expression function.
So replace ereg by preg_match and it should work:
if (!preg_match('/^[2-5]\d{5}$/', $_POST['packageNumber'])) {
echo "The package number is not the correct format.";
}
if (!preg_match("/^[2-5]\d{5}\.[1-2]$/", $_POST['packageModifier'])) {
echo "The package modifier is not the correct format.";
}
if (!preg_match("/^[0-9a-zA-Z\s\-.]{0,50}$/", $_POST['customerNumber'])) {
echo "The customer number is not the correct format.";
}
Along with fixing the PHP syntax errors I added anchors for the start (^) and the end ($) of the string to be matched.
I'm assuming that you just missed off the closing ] on the $_POSTS and i've added in anchors for the start and end of the lines and used preg_match.
If you don't anchor it and the pattern is matched anywhere in the string then the entire thing will match. For example.
"dfasfasfasfasf25555555as5f15sdsdasdsfghfsgihfughd54" would be matched if the first one was not anchored.
Number One
if (!preg_match('/^[2-5]\d{5}$/', $_POST['packageNumber'])) {
echo "The package number is not the correct format.";
}
Number Two
if (!preg_match('/^[2-5]\d{5}\.[2-5]$/', $_POST['packageModifier'])) {
echo "The package modifier is not the correct format.";
}
Number Three
if (!preg_match('/^[0-9a-zA-Z\s\-.]{0,50}$/m', $_POST['customerNumber'])) {
echo "The package modifier is not the correct format.";
}
Don't you need to anchor the regular expressions?
Otheriwse '111111111111111211111111111' will match /[2-5]\d{5}/.
When using POSIX regular expressions (deprecated by PHP 5.3) you should write the tests like this:
if (ereg('^[2-5][0-9]{5}$', $_POST['packageNumber']) === false)
{
echo "The package number is not the correct format.";
}
if (ereg('^[2-5][0-9]{5}\\.[1-2]$', $_POST['packageModifier']) === false)
{
echo "The package modifier is not the correct format.";
}
if (ereg('^[[:alnum:][:space:].-]{0,50}$', $_POST['customerNumber']) === false)
{
echo "The customer number is not the correct format.";
}
Note that I anchored the regular expressions -- otherwise the customerNumber will always match (with a zero-length match).
See the POSIX regex man page for more information.
preg_match('/^[2-5]\d{5}$/', $str); // 1-st
preg_match('/^[2-5]\d{5}\.[1-2]$/', $str); // 2-nd
preg_match('/^[0-9a-z\s\-\.]{0,50}$/i', $str); // 3-rd
Your mistakes:
you didn't escape slashes and finally regexp "\d" means just 'd', but "\d" means '\d'.
you had to anchor regesp to beginnig and the end of entire string by ^ and $ symbols.
Thats all ;)
PS: better use single quotes for string literals - they are faster and safer...

Categories