I want to grab all IDs (integers) from several URLs within a text. These URLs could look like these:
http://url.tld/index.php/p1
http://url.tld/p2#abc
http://url.tld/index.php/Page/3-xxx
http://url.tld/Page/4
For this, I've built two regexes (the URLs are enclosed by an URL bbcode):
\[url\](http\://url\.tld/index\.php/p(\d+).*?\)[/url\]
\[url\](http\://url\.tld(?:/index\.php)?/Page/(\d+).*?\)[/url\]
However, if i do a preg_match_all with every single regex, I get an array that looks like this (and which is correct):
array(3) {
[0]=>
array(2) {
[0]=>
string(62) "[url]http://url.tld/index.php/Page/6-fdgfh/[/url]"
[1]=>
string(50) "[url]http://url.tld/Page/7[/url]"
}
[1]=>
array(2) {
[0]=>
string(51) "http://url.tld/index.php/Page/6-fdgfh/"
[1]=>
string(39) "http://url.tld/Page/7"
}
[2]=>
array(2) {
[0]=>
string(1) "6"
[1]=>
string(1) "7"
}
}
But if I combine both regexes with a pipe:
\[url\](http\://url\.tld/index\.php/p(\d+).*?|http\://url\.tld(?:/index\.php)?/Page/(\d+).*?)\[/url\]
it builds an array like this (which is wrong):
array(4) {
[0]=>
array(3) {
[0]=>
string(71) "[url]http://url.tld/index.php/p9-abc#hashtag[/url]"
[1]=>
string(62) "[url]http://url.tld/index.php/Page/6-fdgfh/[/url]"
[2]=>
string(50) "[url]http://url.tld/Page/7[/url]"
}
[1]=>
array(3) {
[0]=>
string(60) "http://url.tld/index.php/t9-abc#hashtag"
[1]=>
string(51) "http://url.tld/index.php/Page/6-fdgfh/"
[2]=>
string(39) "http://url.tld/Page/7"
}
[2]=>
array(3) {
[0]=>
string(1) "9"
[1]=>
string(0) ""
[2]=>
string(0) ""
}
[3]=>
array(3) {
[0]=>
string(0) ""
[1]=>
string(1) "6"
[2]=>
string(1) "7"
}
}
====
So, my question is: How can I fix this? What I need is the array structure from the first example, while using both regular expressions as one regular expression, because I need a consistent structure to do a preg_replace_callback later.
I think you're looking for the Branch Reset group:
\[url]((?|http://url\.tld/index\.php/p(\d+).*?|http://url\.tld(?:/index\.php)?/Page/(\d+).*?))\[/url]
Or, for the line-noise-challenged among us:
\[url]
(
(?|
http://url\.tld/index\.php/p(\d+)[^[]*
|
http://url\.tld(?:/index\.php)?/Page/(\d+)[^[]*
)
)
\[/url]
This captures the numbers in group #2, no matter which part of the regex matched it. The whole URL is still captured in group #1.
Related
i have this code
$a='-t40-';
preg_match('/^-t(.*?)-$/', $a,$match);
var_dump($match);
Result:
array(2) { [0]=> array(1) { [0]=> string(5) "-t40-" }
[1]=> array(1) { [0]=> string(2) "40" } }
if i add some text after last "-" code will not be valid.
if $a='-t40-some text'; i need a result similar with:
array(3) { [0]=> array(1) { [0]=> string(5) "-t40-" }
[1]=> array(1) { [0]=> string(2) "40" }
[2]=> array(1) { [0]=> string(9) "some text" }}
How to edit pattern to find "some text"?
Thanks in advance.
$a='-t40-some text';
preg_match('/^-t(.*?)-(.*?)$/', $a,$match);
var_dump($match);
Output:
array(3) {
[0]=>
string(14) "-t40-some text"
[1]=>
string(2) "40"
[2]=>
string(9) "some text"
}
Explanation:
^ : beginning of line
-t : literally "-t"
(.*?) : group 1, 0 or more any charater but newline, not greedy
- : literally "-"
(.*?) : group 2, 0 or more any charater but newline, not greedy
$ : end of line
im trying for few hours to find the right regular expression in php to match any language letters but to prevent it to allow space
i have try this
[^\p{L}]
this is ok but it look like it allow the space
then i have try this
[^\w_-]
and it still look that it allow space
anyone can help with this please ?
You need to specify the Unicode modifier u to get Unicode character properties in PCRE.
For example...
$pattern = "/([\p{L}]+)/u";
$string = "你好,世界!Привет мир! !مرحبا بالعالم";
if (preg_match_all($pattern, $string, $match)) {
var_dump($match);
}
Gives us...
array(2) {
[0]=>
array(6) {
[0]=>
string(6) "你好"
[1]=>
string(6) "世界"
[2]=>
string(12) "Привет"
[3]=>
string(6) "мир"
[4]=>
string(10) "مرحبا"
[5]=>
string(14) "بالعالم"
}
[1]=>
array(6) {
[0]=>
string(6) "你好"
[1]=>
string(6) "世界"
[2]=>
string(12) "Привет"
[3]=>
string(6) "мир"
[4]=>
string(10) "مرحبا"
[5]=>
string(14) "بالعالم"
}
}
I need to extract a string that is enclosed by both parentheses and single quotes. Currently, I am using two regex pattern to do the job. With the first regex I retrieve a string from parentheses while the string still contains single quotes, and with the second regex I can strip that single quotes from it. Now, I would like to do this job in a single step. For the past one hour and so I have been experimenting with some patterns without any viable results; may be its due to my limited regex knowledge. So, any feedback you offer to me will be very helpful. I also welcome any solutions apart from regular expressions.
Here is an example string that needs to be parsed.
$string = "[('minute stroller workout', 9.0), ('week', 1.0), ('leaving', 1.0), ('times', 1.0), ('guilt', 1.0), ('baby', 1.0), ('beginning', 1.0)]";
# Strip parentheses
preg_match_all('#\((.*?)\)#', $string, $match);
# I am using the first match here
echo $match[1][0]; // output = 'minute stroller workout', 9.0
# Strip single quotes and extract the string
preg_match('~(["\'])([^"\']+)\1~', $match[1][0], $matches);
echo $matches[2]; // output = minute stroller workout (i.e. what we are looking for)
If I understand you correctly
preg_match_all('/\(\'([\s\w]*)\', ([\d.]*)\)/', $string, $match);
Output for your string
array(3) {
[0]=>
array(7) {
[0]=>
string(32) "('minute stroller workout', 9.0)"
[1]=>
string(13) "('week', 1.0)"
[2]=>
string(16) "('leaving', 1.0)"
[3]=>
string(14) "('times', 1.0)"
[4]=>
string(14) "('guilt', 1.0)"
[5]=>
string(13) "('baby', 1.0)"
[6]=>
string(18) "('beginning', 1.0)"
}
[1]=>
array(7) {
[0]=>
string(23) "minute stroller workout"
[1]=>
string(4) "week"
[2]=>
string(7) "leaving"
[3]=>
string(5) "times"
[4]=>
string(5) "guilt"
[5]=>
string(4) "baby"
[6]=>
string(9) "beginning"
}
[2]=>
array(7) {
[0]=>
string(3) "9.0"
[1]=>
string(3) "1.0"
[2]=>
string(3) "1.0"
[3]=>
string(3) "1.0"
[4]=>
string(3) "1.0"
[5]=>
string(3) "1.0"
[6]=>
string(3) "1.0"
}
}
You can use this single regex:
preg_match("#\('([^']+)#", $string, $matches);
echo $matches[1];
//=> minute stroller workout
I have this string:
a[0]=a[27%a.length];
and this pattern
([a-z])+\[(\S)+\]\=([a-z])+\[+(.*?)+\%+([a-z])+\.length
Preg match array is this one:
array(6) {
[0]=>
string(18) "a[0]=a[27%a.length"
[1]=>
string(1) "a"
[2]=>
string(1) "0"
[3]=>
string(1) "a"
[4]=>
string(0) ""
[5]=>
string(1) "a"
}
Why is the element 4 empty instead of holding the 27?
I think there's one "+" too many. This regex seems to work as you would expect:
([a-z])+\[(\S)+\]\=([a-z])+\[+(.*?)\%+([a-z])+\.length
The following is even cleaner, and it matches the input string:
([a-z])+\[(\S)+\]\=([a-z])+\[(.*?)\%([a-z])+\.length
The difference is that I removed the unnecessary "+" after the squared brackets ("+" means "one or more", while there should be only one bracket in each position).
For an explanation, please refer to regex101, which has a nice formatting for the various parts of the expression: http://regex101.com/r/iB6nH1
Because you used (.*?)+, the matched part will be replaced by empty string at end.
Remove that + then you will get 27 in the match part.
php > preg_match('/([a-z])+\[(\S)+\]\=([a-z])+\[+(.*?)\%+([a-z])+\.length/', $str, $matches);
php > var_dump($matches);
array(6) {
[0]=>
string(18) "a[0]=a[27%a.length"
[1]=>
string(1) "a"
[2]=>
string(1) "0"
[3]=>
string(1) "a"
[4]=>
string(2) "27"
[5]=>
string(1) "a"
}
I would like to ask how to convert a string to array using
a string pattern like mp3tag does
%ALBUM% - %SOMETHING% - %SOMETHING%,
the ' - ' are custom chars that are not static.
If i didnt made myself clear
i want fro custom sting to make it an array
but the pattern is custom not static
Is this possible in php and if so how.
$str = "%ALBUM% & %SOMETHING% (ノ゜-゜)ノ ︵ ┬──┬ %SOMETHING%,";
preg_match_all("/%([a-z]+)%/i", $str, $matches);
var_dump($matches);
Outputs
array(2) {
[0]=>
array(3) {
[0]=>
string(7) "%ALBUM%"
[1]=>
string(11) "%SOMETHING%"
[2]=>
string(11) "%SOMETHING%"
}
[1]=>
array(3) {
[0]=>
string(5) "ALBUM"
[1]=>
string(9) "SOMETHING"
[2]=>
string(9) "SOMETHING"
}
}