Regex to match a string that may contain Chinese characters - php

I'm trying to write a regular expression which could match a string that possibly includes Chinese characters. Examples:
hahdj5454_fd.fgg"
example.com/list.php?keyword=关键字
example.com/list.php?keyword=php
I am using this expression:
$matchStr = '/^[a-z 0-9~%.:_\-\/[^x7f-xff]+$/i';
$str = "http://example.com/list.php?keyword=关键字";
if ( ! preg_match($matchStr, $str)){
exit('WRONG');
}else{
echo "RIGHT";
}
It matches plain English strings like that dasdsdsfds or http://example.com/list.php, but it doesn't match strings containing Chinese characters. How can I resolve this?

Assuming you want to extend the set of letters that this regex matches from ASCII to all Unicode letters, then you can use
$matchStr = '#^[\pL 0-9~%.:_/-]+$#u';
I've removed the [^x7f-xff part which didn't make any sense (in your regex, it would have matched an opening bracket, a caret, and some ASCII characters that were already covered by the a-z and 0-9 parts of that character class).

This works:
$str = "http://mysite/list.php?keyword=关键字";
if (preg_match('/[\p{Han}]/simu', $str)) {
echo "Contains Chinese Characters";
}else{
exit('WRONG'); // Doesn't contains Chinese Characters
}

Related

PHP Regex String Allow Chinese Word, alphanumeric & few special characters

hi can help in below code to do validation in username text field.
Allow chinese word, alphanumeric and special characters "_" and "-" only.
Added:
I'm trying to create a validation for a username text field, allow chinese word, alphanumeric & "-" & "_" . I'm trying to figure out the regex as below, but it does not work as i expected. Anyone can hep.
if (preg_match("/[~`!##$%^&*()+={}\[\]|\\:;\"'<>,.?\/]/", "小明#ah meng"))
{
echo "invalid";
}
else
{
echo "valid";
}
The Han unification comprehends multiple code points from CJK. Since PCRE allows Unicode categories with the \p token, you can match most Chinese characters with \p{Han}.
Code:
<?php
$str = "小明ahmeng";
$regex = '/^[-_A-Za-z0-9\p{Han}]+$/u'; // notice spaces are not included
if (preg_match( $regex, $str)) {
echo "valid";
} else {
echo "invalid";
}
?>
DEMO
Also, don't forget to set the /u modifier when you're working with UTF-8 encoded strings.

PHP capture word that contains special char from string using RegEx

I have special words in a string that i would like to capture based on the prefix.
Example Special words such as ^to_this should be caught.
I would need the word this because of the special prefix ^to_.
Here is my attempt but it is not working
preg_match('/\b(\w*^to_\w*)\b/', $str, $specialWordArr);
but this returns an empty array
Your code would be,
<?php
$mystring = 'Special words such as ^to_this should be caught';
$regex = '~[_^;]\w+[_^;](\w+)~';
if (preg_match($regex, $mystring, $m)) {
$yourmatch = $m[1];
echo $yourmatch;
}
?> //=> this
Explanation:
[_^;] Add the special characters into this character class to ensure that the begining of a word would be a special character.
\w+ After a special character, there must one or more word characters followed.
[_^;] Word characters must be followed by a special character.
(\w+) If these conditions are satisfied, capture the following one or more word characters into a group.
Without some additional examples this will work for what you've posted:
$str = 'Special words such as ^to_this should be caught';
preg_match('/\s\^to_(\w+)\s/', $str, $specialWordArr);
echo $specialWordArr[1]; //this

php regex ctype

I need a regex to see if the $input ONLY contained alphabetic characters or white spaces also need one to check if $numInput ONLY contained numeric characters or white spaces AND one combined so:
$alphabeticOnly = 'abcd adb';
$numericOnly = '1234 567';
$alphabeticNumeric = 'abcd 3232';
So in all of the above examples alphabetic, numeric, whitespace are allowed ONLY NO symbols.
How can I get those 3 diffrent regular expression?
This should help you
if (!preg_match('/^[\sa-zA-Z]+$/', $alphabeticOnly){
die('alpha match fail!');
}
if (!preg_match('/^[\s0-9]+$/', $numericOnly){
die('numeric match fail!');
}
if (!preg_match('/^[\sa-zA-Z0-9]+$/', $alphabeticNumeric){
die('alphanumeric match fail!');
}
This is pretty basic
/^[a-z\s]+$/i - letter and spaces
/^[\d\s]+$/ - number and spaces
/^[a-z\d\s]+$/i - letter, number and spaces
Just use them in preg_match()
In order to be unicode compatible, you should use:
/^[\pL\s]+$/ // Letters or spaces
/^[\pN\s]+$/ // Numbers or spaces
/^[\pL\pN\s]+$/ // Letters, numbers or spaces

PHP preg_replace special characters

I am wanting to replace all non letter and number characters i.e. /&%#$ etc with an underscore (_) and replace all ' (single quotes) with ""blank (so no underscore).
So "There wouldn't be any" (ignore the double quotes) would become "There_wouldnt_be_any".
I am useless at reg expressions hence the post.
Cheers
If you by writing "non letters and numbers" exclude more than [A-Za-z0-9] (ie. considering letters like åäö to be letters to) and want to be able to accurately handle UTF-8 strings \p{L} and \p{N} will be of aid.
\p{N} will match any "Number"
\p{L} will match any "Letter Character", which includes
Lower case letter
Modifier letter
Other letter
Title case letter
Upper case letter
Documentation PHP: Unicode Character Properties
$data = "Thäre!wouldn't%bé#äny";
$new_data = str_replace ("'", "", $data);
$new_data = preg_replace ('/[^\p{L}\p{N}]/u', '_', $new_data);
var_dump (
$new_data
);
output
string(23) "Thäre_wouldnt_bé_äny"
$newstr = preg_replace('/[^a-zA-Z0-9\']/', '_', "There wouldn't be any");
$newstr = str_replace("'", '', $newstr);
I put them on two separate lines to make the code a little more clear.
Note: If you're looking for Unicode support, see Filip's answer below. It will match all characters that register as letters in addition to A-z.
do this in two steps:
replace not letter characters with this regex:
[\/\&%#\$]
replace quotes with this regex:
[\"\']
and use preg_replace:
$stringWithoutNonLetterCharacters = preg_replace("/[\/\&%#\$]/", "_", $yourString);
$stringWithQuotesReplacedWithSpaces = preg_replace("/[\"\']/", " ", $stringWithoutNonLetterCharacters);

PHP preg_match - only allow alphanumeric strings and - _ characters

I need the regex to check if a string only contains numbers, letters, hyphens or underscore
$string1 = "This is a string*";
$string2 = "this_is-a-string";
if(preg_match('******', $string1){
echo "String 1 not acceptable acceptable";
// String2 acceptable
}
Code:
if(preg_match('/[^a-z_\-0-9]/i', $string))
{
echo "not valid string";
}
Explanation:
[] => character class definition
^ => negate the class
a-z => chars from 'a' to 'z'
_ => underscore
- => hyphen '-' (You need to escape it)
0-9 => numbers (from zero to nine)
The 'i' modifier at the end of the regex is for 'case-insensitive' if you don't put that you will need to add the upper case characters in the code before by doing A-Z
if(!preg_match('/^[\w-]+$/', $string1)) {
echo "String 1 not acceptable acceptable";
// String2 acceptable
}
Here is one equivalent of the accepted answer for the UTF-8 world.
if (!preg_match('/^[\p{L}\p{N}_-]+$/u', $string)){
//Disallowed Character In $string
}
Explanation:
[] => character class definition
p{L} => matches any kind of letter character from any language
p{N} => matches any kind of numeric character
_- => matches underscore and hyphen
+ => Quantifier — Matches between one to unlimited times (greedy)
/u => Unicode modifier. Pattern strings are treated as UTF-16. Also
causes escape sequences to match unicode characters
Note, that if the hyphen is the last character in the class definition it does not need to be escaped. If the dash appears elsewhere in the class definition it needs to be escaped, as it will be seen as a range character rather then a hyphen.
\w\- is probably the best but here just another alternative
Use [:alnum:]
if(!preg_match("/[^[:alnum:]\-_]/",$str)) echo "valid";
demo1 | demo2
Why to use regex? PHP has some built in functionality to do that
<?php
$valid_symbols = array('-', '_');
$string1 = "This is a string*";
$string2 = "this_is-a-string";
if(preg_match('/\s/',$string1) || !ctype_alnum(str_replace($valid_symbols, '', $string1))) {
echo "String 1 not acceptable acceptable";
}
?>
preg_match('/\s/',$username) will check for blank space
!ctype_alnum(str_replace($valid_symbols, '', $string1)) will check for valid_symbols

Categories