When i run this code and similar some Chinese the ni (你) character (maybe others) gets chopped of and broken.
$sample = "你不喜欢 香蕉 吗";
$parts = preg_split("/[\s,]+/", $sample);
var_dump($parts);
//outputs
array(4) {
[0]=>
string(2) "�"
[1]=>
string(9) "不喜欢"
[2]=>
string(6) "香蕉"
[3]=>
string(3) "吗"
}
//in 我觉得 你很 麻烦
//out
array(4) {
[0]=>
string(9) "我觉得"
[1]=>
string(2) "�"
[2]=>
string(3) "很"
[3]=>
string(6) "麻烦"
}
Is my regex wrong?
If your string is in UTF-8, you must use the u modifier:
$sample = "你不喜欢 香蕉 吗";
$parts = preg_split("/[\\s,]+/u", $sample);
var_dump($parts);
If it's in another encoding, see unicornaddict's answer.
Since the input string is multi-byte, I guess you'll have to use mb_split in place of preg_split.
Related
im trying for few hours to find the right regular expression in php to match any language letters but to prevent it to allow space
i have try this
[^\p{L}]
this is ok but it look like it allow the space
then i have try this
[^\w_-]
and it still look that it allow space
anyone can help with this please ?
You need to specify the Unicode modifier u to get Unicode character properties in PCRE.
For example...
$pattern = "/([\p{L}]+)/u";
$string = "你好,世界!Привет мир! !مرحبا بالعالم";
if (preg_match_all($pattern, $string, $match)) {
var_dump($match);
}
Gives us...
array(2) {
[0]=>
array(6) {
[0]=>
string(6) "你好"
[1]=>
string(6) "世界"
[2]=>
string(12) "Привет"
[3]=>
string(6) "мир"
[4]=>
string(10) "مرحبا"
[5]=>
string(14) "بالعالم"
}
[1]=>
array(6) {
[0]=>
string(6) "你好"
[1]=>
string(6) "世界"
[2]=>
string(12) "Привет"
[3]=>
string(6) "мир"
[4]=>
string(10) "مرحبا"
[5]=>
string(14) "بالعالم"
}
}
I need to extract a string that is enclosed by both parentheses and single quotes. Currently, I am using two regex pattern to do the job. With the first regex I retrieve a string from parentheses while the string still contains single quotes, and with the second regex I can strip that single quotes from it. Now, I would like to do this job in a single step. For the past one hour and so I have been experimenting with some patterns without any viable results; may be its due to my limited regex knowledge. So, any feedback you offer to me will be very helpful. I also welcome any solutions apart from regular expressions.
Here is an example string that needs to be parsed.
$string = "[('minute stroller workout', 9.0), ('week', 1.0), ('leaving', 1.0), ('times', 1.0), ('guilt', 1.0), ('baby', 1.0), ('beginning', 1.0)]";
# Strip parentheses
preg_match_all('#\((.*?)\)#', $string, $match);
# I am using the first match here
echo $match[1][0]; // output = 'minute stroller workout', 9.0
# Strip single quotes and extract the string
preg_match('~(["\'])([^"\']+)\1~', $match[1][0], $matches);
echo $matches[2]; // output = minute stroller workout (i.e. what we are looking for)
If I understand you correctly
preg_match_all('/\(\'([\s\w]*)\', ([\d.]*)\)/', $string, $match);
Output for your string
array(3) {
[0]=>
array(7) {
[0]=>
string(32) "('minute stroller workout', 9.0)"
[1]=>
string(13) "('week', 1.0)"
[2]=>
string(16) "('leaving', 1.0)"
[3]=>
string(14) "('times', 1.0)"
[4]=>
string(14) "('guilt', 1.0)"
[5]=>
string(13) "('baby', 1.0)"
[6]=>
string(18) "('beginning', 1.0)"
}
[1]=>
array(7) {
[0]=>
string(23) "minute stroller workout"
[1]=>
string(4) "week"
[2]=>
string(7) "leaving"
[3]=>
string(5) "times"
[4]=>
string(5) "guilt"
[5]=>
string(4) "baby"
[6]=>
string(9) "beginning"
}
[2]=>
array(7) {
[0]=>
string(3) "9.0"
[1]=>
string(3) "1.0"
[2]=>
string(3) "1.0"
[3]=>
string(3) "1.0"
[4]=>
string(3) "1.0"
[5]=>
string(3) "1.0"
[6]=>
string(3) "1.0"
}
}
You can use this single regex:
preg_match("#\('([^']+)#", $string, $matches);
echo $matches[1];
//=> minute stroller workout
<!--:en-->Apvalus šviestuvas<!--:-->
<!--:ru-->Круглый Светильник<!--:-->
<!--:lt-->Round lighting<!--:-->
I need get the content between <!--:lt--> and <!--:-->
I have tried:
$string = "<!--:en-->Apvalus šviestuvas<!--:--><!--:ru-->Круглый Светильник<!--:--><!--:lt-->Round lighting<!--:-->";
preg_match('<!--:lt-->+[a-zA-Z0-9]+<!--:-->$', $string, $match);
var_dump($match);
Something is wrong with the syntax and logic. How can I make this work?
preg_match("/<!--:lt-->([a-zA-Z0-9 ]+?)<!--:-->/", $string, $match);
added delimiters
added a match group
added ? to make it ungreedy
added [space] (there is a space in Round lighting)
Your result should be in $match[1].
A cooler and more generic variation is:
preg_match_all("/<!--:([a-z]+)-->([^<]+)<!--:-->/", $string, $match);
Which will match all of them. Gives:
array(3) { [0]=> array(3) { [0]=> string(37) "Apvalus šviestuvas" [1]=> string(53) "Круглый Светильник" [2]=> string(32) "Round lighting" } [1]=> array(3) { [0]=> string(2) "en" [1]=> string(2) "ru" [2]=> string(2) "lt" } [2]=> array(3) { [0]=> string(19) "Apvalus šviestuvas" [1]=> string(35) "Круглый Светильник" [2]=> string(14) "Round lighting" } }
Use this Pattern (?<=<!--:lt-->)(.*)(?=<!--:-->)
<?php
$string = "<!--:en-->Apvalus šviestuvas<!--:--><!--:ru-->Круглый Светильник<!--:--><!--:lt-->Round lighting<!--:-->";
preg_match('~(?<=<!--:lt-->)(.*)(?=<!--:-->)~', $string, $match);
var_dump($match);
I would like to ask how to convert a string to array using
a string pattern like mp3tag does
%ALBUM% - %SOMETHING% - %SOMETHING%,
the ' - ' are custom chars that are not static.
If i didnt made myself clear
i want fro custom sting to make it an array
but the pattern is custom not static
Is this possible in php and if so how.
$str = "%ALBUM% & %SOMETHING% (ノ゜-゜)ノ ︵ ┬──┬ %SOMETHING%,";
preg_match_all("/%([a-z]+)%/i", $str, $matches);
var_dump($matches);
Outputs
array(2) {
[0]=>
array(3) {
[0]=>
string(7) "%ALBUM%"
[1]=>
string(11) "%SOMETHING%"
[2]=>
string(11) "%SOMETHING%"
}
[1]=>
array(3) {
[0]=>
string(5) "ALBUM"
[1]=>
string(9) "SOMETHING"
[2]=>
string(9) "SOMETHING"
}
}
I was looking to split a string based on a regular expression but I also have interest in keeping the text we split on:
php > var_dump(preg_split("/(\^)/","category=Telecommunications & CATV^ORcategory!=ORtest^caused_byISEMPTY^EQ"), null, PREG_SPLIT_DELIM_CAPTURE);
array(4) {
[0]=> string(34) "category=Telecommunications & CATV"
[1]=> string(18) "ORcategory!=ORtest"
[2]=> string(16) "caused_byISEMPTY"
[3]=> string(2) "EQ"
}
NULL
int(2)
What I do not understand is why am I not getting an array such as:
array(4) {
[0]=> "category=Telecommunications & CATV"
[1]=> "^"
[2]=> "ORcategory!=ORtest"
[3]=> "^"
[4]=> "caused_byISEMPTY"
[5]=> "^"
[6]=> "EQ"
}
Additionally, how could I change my regular expression to match "^OR" and also "^". I was having trouble with a lookbehind assertion such as:
$regexp = "/(?<=\^)OR|\^/";
This will work as expected:
var_dump(preg_split('/(\^)/','category=Telecommunications & CATV^ORcategory!=ORtest^caused_byISEMPTY^EQ', -1, PREG_SPLIT_DELIM_CAPTURE));
the closing bracket of preg_split() is at the wrong place.
additional question:
/(\^OR|\^)/