I was looking to split a string based on a regular expression but I also have interest in keeping the text we split on:
php > var_dump(preg_split("/(\^)/","category=Telecommunications & CATV^ORcategory!=ORtest^caused_byISEMPTY^EQ"), null, PREG_SPLIT_DELIM_CAPTURE);
array(4) {
[0]=> string(34) "category=Telecommunications & CATV"
[1]=> string(18) "ORcategory!=ORtest"
[2]=> string(16) "caused_byISEMPTY"
[3]=> string(2) "EQ"
}
NULL
int(2)
What I do not understand is why am I not getting an array such as:
array(4) {
[0]=> "category=Telecommunications & CATV"
[1]=> "^"
[2]=> "ORcategory!=ORtest"
[3]=> "^"
[4]=> "caused_byISEMPTY"
[5]=> "^"
[6]=> "EQ"
}
Additionally, how could I change my regular expression to match "^OR" and also "^". I was having trouble with a lookbehind assertion such as:
$regexp = "/(?<=\^)OR|\^/";
This will work as expected:
var_dump(preg_split('/(\^)/','category=Telecommunications & CATV^ORcategory!=ORtest^caused_byISEMPTY^EQ', -1, PREG_SPLIT_DELIM_CAPTURE));
the closing bracket of preg_split() is at the wrong place.
additional question:
/(\^OR|\^)/
Related
im trying for few hours to find the right regular expression in php to match any language letters but to prevent it to allow space
i have try this
[^\p{L}]
this is ok but it look like it allow the space
then i have try this
[^\w_-]
and it still look that it allow space
anyone can help with this please ?
You need to specify the Unicode modifier u to get Unicode character properties in PCRE.
For example...
$pattern = "/([\p{L}]+)/u";
$string = "你好,世界!Привет мир! !مرحبا بالعالم";
if (preg_match_all($pattern, $string, $match)) {
var_dump($match);
}
Gives us...
array(2) {
[0]=>
array(6) {
[0]=>
string(6) "你好"
[1]=>
string(6) "世界"
[2]=>
string(12) "Привет"
[3]=>
string(6) "мир"
[4]=>
string(10) "مرحبا"
[5]=>
string(14) "بالعالم"
}
[1]=>
array(6) {
[0]=>
string(6) "你好"
[1]=>
string(6) "世界"
[2]=>
string(12) "Привет"
[3]=>
string(6) "мир"
[4]=>
string(10) "مرحبا"
[5]=>
string(14) "بالعالم"
}
}
I need to extract a string that is enclosed by both parentheses and single quotes. Currently, I am using two regex pattern to do the job. With the first regex I retrieve a string from parentheses while the string still contains single quotes, and with the second regex I can strip that single quotes from it. Now, I would like to do this job in a single step. For the past one hour and so I have been experimenting with some patterns without any viable results; may be its due to my limited regex knowledge. So, any feedback you offer to me will be very helpful. I also welcome any solutions apart from regular expressions.
Here is an example string that needs to be parsed.
$string = "[('minute stroller workout', 9.0), ('week', 1.0), ('leaving', 1.0), ('times', 1.0), ('guilt', 1.0), ('baby', 1.0), ('beginning', 1.0)]";
# Strip parentheses
preg_match_all('#\((.*?)\)#', $string, $match);
# I am using the first match here
echo $match[1][0]; // output = 'minute stroller workout', 9.0
# Strip single quotes and extract the string
preg_match('~(["\'])([^"\']+)\1~', $match[1][0], $matches);
echo $matches[2]; // output = minute stroller workout (i.e. what we are looking for)
If I understand you correctly
preg_match_all('/\(\'([\s\w]*)\', ([\d.]*)\)/', $string, $match);
Output for your string
array(3) {
[0]=>
array(7) {
[0]=>
string(32) "('minute stroller workout', 9.0)"
[1]=>
string(13) "('week', 1.0)"
[2]=>
string(16) "('leaving', 1.0)"
[3]=>
string(14) "('times', 1.0)"
[4]=>
string(14) "('guilt', 1.0)"
[5]=>
string(13) "('baby', 1.0)"
[6]=>
string(18) "('beginning', 1.0)"
}
[1]=>
array(7) {
[0]=>
string(23) "minute stroller workout"
[1]=>
string(4) "week"
[2]=>
string(7) "leaving"
[3]=>
string(5) "times"
[4]=>
string(5) "guilt"
[5]=>
string(4) "baby"
[6]=>
string(9) "beginning"
}
[2]=>
array(7) {
[0]=>
string(3) "9.0"
[1]=>
string(3) "1.0"
[2]=>
string(3) "1.0"
[3]=>
string(3) "1.0"
[4]=>
string(3) "1.0"
[5]=>
string(3) "1.0"
[6]=>
string(3) "1.0"
}
}
You can use this single regex:
preg_match("#\('([^']+)#", $string, $matches);
echo $matches[1];
//=> minute stroller workout
<!--:en-->Apvalus šviestuvas<!--:-->
<!--:ru-->Круглый Светильник<!--:-->
<!--:lt-->Round lighting<!--:-->
I need get the content between <!--:lt--> and <!--:-->
I have tried:
$string = "<!--:en-->Apvalus šviestuvas<!--:--><!--:ru-->Круглый Светильник<!--:--><!--:lt-->Round lighting<!--:-->";
preg_match('<!--:lt-->+[a-zA-Z0-9]+<!--:-->$', $string, $match);
var_dump($match);
Something is wrong with the syntax and logic. How can I make this work?
preg_match("/<!--:lt-->([a-zA-Z0-9 ]+?)<!--:-->/", $string, $match);
added delimiters
added a match group
added ? to make it ungreedy
added [space] (there is a space in Round lighting)
Your result should be in $match[1].
A cooler and more generic variation is:
preg_match_all("/<!--:([a-z]+)-->([^<]+)<!--:-->/", $string, $match);
Which will match all of them. Gives:
array(3) { [0]=> array(3) { [0]=> string(37) "Apvalus šviestuvas" [1]=> string(53) "Круглый Светильник" [2]=> string(32) "Round lighting" } [1]=> array(3) { [0]=> string(2) "en" [1]=> string(2) "ru" [2]=> string(2) "lt" } [2]=> array(3) { [0]=> string(19) "Apvalus šviestuvas" [1]=> string(35) "Круглый Светильник" [2]=> string(14) "Round lighting" } }
Use this Pattern (?<=<!--:lt-->)(.*)(?=<!--:-->)
<?php
$string = "<!--:en-->Apvalus šviestuvas<!--:--><!--:ru-->Круглый Светильник<!--:--><!--:lt-->Round lighting<!--:-->";
preg_match('~(?<=<!--:lt-->)(.*)(?=<!--:-->)~', $string, $match);
var_dump($match);
I was just helped in another thread with a regex that has been verified to work. I can see it actually working on Rubular but when I plug the regex into preg_match, I get absolutely nothing.
Here is the regex with my preg_match function:
preg_match('/^!!([0-9]{5}) +.*? +[MF] ([0-9]{3})([0-9]{3})([A-Z]{3})([A-Z]{3}) + ([A-Z])[0-9]{3}([0-9]{4})([0-9]{2})([0-9]{2})/', $res, $matches);
All I am getting is an empty array returned.
The problem is that you have added two extra spaces into the regular expression that should not be there and that cause the match to fail.
/^!!([0-9]{5}) +.*? +[MF] ([0-9]{3})([0-9]{3})([A-Z]{3})([A-Z]{3}) + ([A-Z])...
^ ^
here and here
Whitespace is significant (by default) in regular expressions. A space in a regular expression matches a space in the target string. Removing these two spaces fixes the problem.
See it working on ideone (this time it is a PHP example).
array(10) {
[0]=>
string(39) "!!92519 C 01 M600200BLNBRN D55420090205"
[1]=>
string(5) "92519"
[2]=>
string(3) "600"
[3]=>
string(3) "200"
[4]=>
string(3) "BLN"
[5]=>
string(3) "BRN"
[6]=>
string(1) "D"
[7]=>
string(4) "2009"
[8]=>
string(2) "02"
[9]=>
string(2) "05"
}
When i run this code and similar some Chinese the ni (你) character (maybe others) gets chopped of and broken.
$sample = "你不喜欢 香蕉 吗";
$parts = preg_split("/[\s,]+/", $sample);
var_dump($parts);
//outputs
array(4) {
[0]=>
string(2) "�"
[1]=>
string(9) "不喜欢"
[2]=>
string(6) "香蕉"
[3]=>
string(3) "吗"
}
//in 我觉得 你很 麻烦
//out
array(4) {
[0]=>
string(9) "我觉得"
[1]=>
string(2) "�"
[2]=>
string(3) "很"
[3]=>
string(6) "麻烦"
}
Is my regex wrong?
If your string is in UTF-8, you must use the u modifier:
$sample = "你不喜欢 香蕉 吗";
$parts = preg_split("/[\\s,]+/u", $sample);
var_dump($parts);
If it's in another encoding, see unicornaddict's answer.
Since the input string is multi-byte, I guess you'll have to use mb_split in place of preg_split.