I need to extract a string that is enclosed by both parentheses and single quotes. Currently, I am using two regex pattern to do the job. With the first regex I retrieve a string from parentheses while the string still contains single quotes, and with the second regex I can strip that single quotes from it. Now, I would like to do this job in a single step. For the past one hour and so I have been experimenting with some patterns without any viable results; may be its due to my limited regex knowledge. So, any feedback you offer to me will be very helpful. I also welcome any solutions apart from regular expressions.
Here is an example string that needs to be parsed.
$string = "[('minute stroller workout', 9.0), ('week', 1.0), ('leaving', 1.0), ('times', 1.0), ('guilt', 1.0), ('baby', 1.0), ('beginning', 1.0)]";
# Strip parentheses
preg_match_all('#\((.*?)\)#', $string, $match);
# I am using the first match here
echo $match[1][0]; // output = 'minute stroller workout', 9.0
# Strip single quotes and extract the string
preg_match('~(["\'])([^"\']+)\1~', $match[1][0], $matches);
echo $matches[2]; // output = minute stroller workout (i.e. what we are looking for)
If I understand you correctly
preg_match_all('/\(\'([\s\w]*)\', ([\d.]*)\)/', $string, $match);
Output for your string
array(3) {
[0]=>
array(7) {
[0]=>
string(32) "('minute stroller workout', 9.0)"
[1]=>
string(13) "('week', 1.0)"
[2]=>
string(16) "('leaving', 1.0)"
[3]=>
string(14) "('times', 1.0)"
[4]=>
string(14) "('guilt', 1.0)"
[5]=>
string(13) "('baby', 1.0)"
[6]=>
string(18) "('beginning', 1.0)"
}
[1]=>
array(7) {
[0]=>
string(23) "minute stroller workout"
[1]=>
string(4) "week"
[2]=>
string(7) "leaving"
[3]=>
string(5) "times"
[4]=>
string(5) "guilt"
[5]=>
string(4) "baby"
[6]=>
string(9) "beginning"
}
[2]=>
array(7) {
[0]=>
string(3) "9.0"
[1]=>
string(3) "1.0"
[2]=>
string(3) "1.0"
[3]=>
string(3) "1.0"
[4]=>
string(3) "1.0"
[5]=>
string(3) "1.0"
[6]=>
string(3) "1.0"
}
}
You can use this single regex:
preg_match("#\('([^']+)#", $string, $matches);
echo $matches[1];
//=> minute stroller workout
Related
im trying for few hours to find the right regular expression in php to match any language letters but to prevent it to allow space
i have try this
[^\p{L}]
this is ok but it look like it allow the space
then i have try this
[^\w_-]
and it still look that it allow space
anyone can help with this please ?
You need to specify the Unicode modifier u to get Unicode character properties in PCRE.
For example...
$pattern = "/([\p{L}]+)/u";
$string = "你好,世界!Привет мир! !مرحبا بالعالم";
if (preg_match_all($pattern, $string, $match)) {
var_dump($match);
}
Gives us...
array(2) {
[0]=>
array(6) {
[0]=>
string(6) "你好"
[1]=>
string(6) "世界"
[2]=>
string(12) "Привет"
[3]=>
string(6) "мир"
[4]=>
string(10) "مرحبا"
[5]=>
string(14) "بالعالم"
}
[1]=>
array(6) {
[0]=>
string(6) "你好"
[1]=>
string(6) "世界"
[2]=>
string(12) "Привет"
[3]=>
string(6) "мир"
[4]=>
string(10) "مرحبا"
[5]=>
string(14) "بالعالم"
}
}
I want to grab all IDs (integers) from several URLs within a text. These URLs could look like these:
http://url.tld/index.php/p1
http://url.tld/p2#abc
http://url.tld/index.php/Page/3-xxx
http://url.tld/Page/4
For this, I've built two regexes (the URLs are enclosed by an URL bbcode):
\[url\](http\://url\.tld/index\.php/p(\d+).*?\)[/url\]
\[url\](http\://url\.tld(?:/index\.php)?/Page/(\d+).*?\)[/url\]
However, if i do a preg_match_all with every single regex, I get an array that looks like this (and which is correct):
array(3) {
[0]=>
array(2) {
[0]=>
string(62) "[url]http://url.tld/index.php/Page/6-fdgfh/[/url]"
[1]=>
string(50) "[url]http://url.tld/Page/7[/url]"
}
[1]=>
array(2) {
[0]=>
string(51) "http://url.tld/index.php/Page/6-fdgfh/"
[1]=>
string(39) "http://url.tld/Page/7"
}
[2]=>
array(2) {
[0]=>
string(1) "6"
[1]=>
string(1) "7"
}
}
But if I combine both regexes with a pipe:
\[url\](http\://url\.tld/index\.php/p(\d+).*?|http\://url\.tld(?:/index\.php)?/Page/(\d+).*?)\[/url\]
it builds an array like this (which is wrong):
array(4) {
[0]=>
array(3) {
[0]=>
string(71) "[url]http://url.tld/index.php/p9-abc#hashtag[/url]"
[1]=>
string(62) "[url]http://url.tld/index.php/Page/6-fdgfh/[/url]"
[2]=>
string(50) "[url]http://url.tld/Page/7[/url]"
}
[1]=>
array(3) {
[0]=>
string(60) "http://url.tld/index.php/t9-abc#hashtag"
[1]=>
string(51) "http://url.tld/index.php/Page/6-fdgfh/"
[2]=>
string(39) "http://url.tld/Page/7"
}
[2]=>
array(3) {
[0]=>
string(1) "9"
[1]=>
string(0) ""
[2]=>
string(0) ""
}
[3]=>
array(3) {
[0]=>
string(0) ""
[1]=>
string(1) "6"
[2]=>
string(1) "7"
}
}
====
So, my question is: How can I fix this? What I need is the array structure from the first example, while using both regular expressions as one regular expression, because I need a consistent structure to do a preg_replace_callback later.
I think you're looking for the Branch Reset group:
\[url]((?|http://url\.tld/index\.php/p(\d+).*?|http://url\.tld(?:/index\.php)?/Page/(\d+).*?))\[/url]
Or, for the line-noise-challenged among us:
\[url]
(
(?|
http://url\.tld/index\.php/p(\d+)[^[]*
|
http://url\.tld(?:/index\.php)?/Page/(\d+)[^[]*
)
)
\[/url]
This captures the numbers in group #2, no matter which part of the regex matched it. The whole URL is still captured in group #1.
I have a form input field that accepts multiple "tags" from a user, a bit like the one on this site! So, for example a user could enter something like:
php mysql regex
...which would be nice & simple to separate up the multiple tags, as I could explode() on the spaces. I would end up with:
array('php', 'mysql', 'regex')
However things get a little more complicated as the user can separate tags with commas or
spaces & use double quotes for multi-word tags.
So a user could also input:
php "mysql" regex, "zend framework", another "a, tag with punc $^&!)(123 *note the comma"
All of which would be valid. This should produce:
array('php', 'mysql', 'regex', 'zend framework', 'another', 'a, tag with punc $^&!)(123 *note the comma')
I don't know how to write a regular expression that would firstly match everything in double quotes, then explode the string on commas or spaces & finally match everything else. I guess I would use preg_match_all() for this?
Could anyone point me in the right direction!? Many thanks.
Try this regex out. I tested it against your string, and it correctly pulled out the individual tags:
("([^"]+)"|\s*([^,"\s]+),?\s*)
This code:
$string = 'php "mysql" regex, "zend framework", another "a, tag with punc $^&!)(123 *note the comma"';
$re = '("([^"]+)"|\s*([^,"\s]+),?\s*)';
$matches = array();
preg_match_all($re, $string, $matches);
var_dump($matches);
Yielded the following result for me:
array(3) {
[0]=>
array(6) {
[0]=>
string(4) "php "
[1]=>
string(7) ""mysql""
[2]=>
string(8) " regex, "
[3]=>
string(16) ""zend framework""
[4]=>
string(9) " another "
[5]=>
string(44) ""a, tag with punc $^&!)(123 *note the comma""
}
[1]=>
array(6) {
[0]=>
string(0) ""
[1]=>
string(5) "mysql"
[2]=>
string(0) ""
[3]=>
string(14) "zend framework"
[4]=>
string(0) ""
[5]=>
string(42) "a, tag with punc $^&!)(123 *note the comma"
}
[2]=>
array(6) {
[0]=>
string(3) "php"
[1]=>
string(0) ""
[2]=>
string(5) "regex"
[3]=>
string(0) ""
[4]=>
string(7) "another"
[5]=>
string(0) ""
}
}
Hope that helps.
I was looking to split a string based on a regular expression but I also have interest in keeping the text we split on:
php > var_dump(preg_split("/(\^)/","category=Telecommunications & CATV^ORcategory!=ORtest^caused_byISEMPTY^EQ"), null, PREG_SPLIT_DELIM_CAPTURE);
array(4) {
[0]=> string(34) "category=Telecommunications & CATV"
[1]=> string(18) "ORcategory!=ORtest"
[2]=> string(16) "caused_byISEMPTY"
[3]=> string(2) "EQ"
}
NULL
int(2)
What I do not understand is why am I not getting an array such as:
array(4) {
[0]=> "category=Telecommunications & CATV"
[1]=> "^"
[2]=> "ORcategory!=ORtest"
[3]=> "^"
[4]=> "caused_byISEMPTY"
[5]=> "^"
[6]=> "EQ"
}
Additionally, how could I change my regular expression to match "^OR" and also "^". I was having trouble with a lookbehind assertion such as:
$regexp = "/(?<=\^)OR|\^/";
This will work as expected:
var_dump(preg_split('/(\^)/','category=Telecommunications & CATV^ORcategory!=ORtest^caused_byISEMPTY^EQ', -1, PREG_SPLIT_DELIM_CAPTURE));
the closing bracket of preg_split() is at the wrong place.
additional question:
/(\^OR|\^)/
When i run this code and similar some Chinese the ni (你) character (maybe others) gets chopped of and broken.
$sample = "你不喜欢 香蕉 吗";
$parts = preg_split("/[\s,]+/", $sample);
var_dump($parts);
//outputs
array(4) {
[0]=>
string(2) "�"
[1]=>
string(9) "不喜欢"
[2]=>
string(6) "香蕉"
[3]=>
string(3) "吗"
}
//in 我觉得 你很 麻烦
//out
array(4) {
[0]=>
string(9) "我觉得"
[1]=>
string(2) "�"
[2]=>
string(3) "很"
[3]=>
string(6) "麻烦"
}
Is my regex wrong?
If your string is in UTF-8, you must use the u modifier:
$sample = "你不喜欢 香蕉 吗";
$parts = preg_split("/[\\s,]+/u", $sample);
var_dump($parts);
If it's in another encoding, see unicornaddict's answer.
Since the input string is multi-byte, I guess you'll have to use mb_split in place of preg_split.