Regex quantified capture - php

php > preg_match("#/m(/[^/]+)+/t/?#", "/m/part/other-part/t", $m);
php > var_dump($m);
array(2) {
[0]=>
string(20) "/m/part/other-part/t"
[1]=>
string(11) "/other-part"
}
php > preg_match_all("#/m(/[^/]+)+/t/?#", "/m/part/other-part/t", $m);
php > var_dump($m);
array(2) {
[0]=>
array(1) {
[0]=>
string(20) "/m/part/other-part/t"
}
[1]=>
array(1) {
[0]=>
string(11) "/other-part"
}
}
With said example I would like the capture to match both /part and /other-part, unfortunately with regex /m(/[^/]+)+/t/? doesn't capture both, as I expect.
This capture should not be bound to only match this sample, it should capture an undefined number of repetitions of the capture group; e.g. /m/part/other-part/and-another/more/t
UPDATE:
Given that this is expected behavior my question stands as of how I would be able to achieve this matching of mine?

Try this one out:
preg_match_all("#(?:/m)?/([^/]+)(?:/t)?#", "/m/part/other-part/another-part/t", $m);
var_dump($m);
It gives:
array(2) {
[0]=>
array(3) {
[0]=>
string(7) "/m/part"
[1]=>
string(11) "/other-part"
[2]=>
string(15) "/another-part/t"
}
[1]=>
array(3) {
[0]=>
string(4) "part"
[1]=>
string(10) "other-part"
[2]=>
string(12) "another-part"
}
}
//EDIT
IMO the best way to do what you want is to use preg_match() from #stema and explode result by / to get list of parts you want.

Thats the way capturing groups are working. repeated capturing groups have only the last match stored after the regex finished. Thats in your test "/other-part".
Try this instead
/m((?:/[^/]+)+)/t/?
See it here on Regexr, while hovering over the match, you can see the content of the capturing group.
Just make your group non-capturing by adding a ?: at the start and put another one around the whole repetition.
In php
preg_match_all("#/m((?:/[^/]+)+)/t/?#", "/m/part/other-part/t", $m);
var_dump($m);
Output:
array(2) {
[0]=> array(1) {
[0]=>
string(20) "/m/part/other-part/t"
}
[1]=> array(1) {
[0]=>
string(16) "/part/other-part"
}
}

As already written in a comment, you can't do this at once because preg_match does not allow you to return the same subgroup matches as well (like you can do with Javascript or .Net, see Get repeated matches with preg_match_all()). So you can divide the operation onto multiple steps:
Match the subject, extract the part you're interested in.
Match the interested part only.
Code:
$subject = '/m/part/other-part/t';
$subpattern = '/[^/]+';
$pattern = sprintf('~/m(?<path>(?:%s)+)/t/?~', $subpattern);
$r = preg_match($pattern, $subject, $matches);
if (!$r) return;
$r = preg_match_all("~$subpattern~", $matches['path'], $matches);
var_dump($matches);
Output:
array(1) {
[0]=>
array(2) {
[0]=>
string(5) "/part"
[1]=>
string(11) "/other-part"
}
}

Related

php preg_match_all Not getting all results

I am using php preg_match_all to extract some parts of a message like this:
$customerMessage = '"message":"success:2,2;3,3;"' ;
preg_match_all('/("message":")([a-z0-9A-Z]+):([0-9]+,[0-9]+;)+/', $customerMessage, $matches);
var_dump($matches);
die;
this code output is:
array(4) {
[0]=>
array(1) {
[0]=>
string(27) ""message":"success:2,2;3,3;"
}
[1]=>
array(1) {
[0]=>
string(11) ""message":""
}
[2]=>
array(1) {
[0]=>
string(7) "success"
}
[3]=>
array(1) {
[0]=>
string(4) "3,3;"
}
}
Why cant I get part 2,2; ?
Thanks in advance!
You can only get the last match of a group. Two get all values like x,x; you can use your current regex, changed a bit:
preg_match_all('/("message":")([a-z0-9A-Z]+):(.*)"/', $customerMessage, $matches);
/* $matches[3] --> 2,2;3,3;
Now you can get group 3 with $matches[3] and match all x,x; with [0-9]+,[0-9]+;
preg_match_all('/[0-9]+,[0-9]+/', $matches[3], $matches2);
/* $matches[0] --> 2,2;
/* $matches[1] --> 3,3;

php regex for detecting #number

i have the following regex that i am trying to detect #x, x being a number. I was able to get it working when there is nothing around match 2, however if there is then it breaks. can someone help me with how to make this work both ways?
/(\G|\s+|^)#(\d+)((?=\s+)|(?=::)|$)/i
that will work with the line
This is a test #1234 end test
but that will not work with
This is a test #1234end test
This is a test#1234 end test
This is a test.#1234 end test
This is a test #1234. End test
anyone know what needs to be changed to achieve this?
edit, i am trying to allow anything but alphanumeric in the 3rd group, right now there is :: and whitespace. is there a way to combine these into 1 and not detect letters or numbers
Running a preg match using /#\d+/i should get you what you are looking for. So running the following:
$items = [
"This is a test #1234end test",
"This is a test#1234 end test",
"This is a test.#1234 end test",
"This is a test #1234. End test"
];
foreach($items as $test){
preg_match("/#\d+/i", $test, $matches);
var_dump($matches);
}
You will get this result:
array(1) {
[0]=>
string(5) "#1234"
}
array(1) {
[0]=>
string(5) "#1234"
}
array(1) {
[0]=>
string(5) "#1234"
}
array(1) {
[0]=>
string(5) "#1234"
}
If you don't want the # in the results, then you can then do a subpattern of /#(\d+)/i
Which will then result in the following:
array(2) {
[0]=>
string(5) "#1234"
[1]=>
string(4) "1234"
}
array(2) {
[0]=>
string(5) "#1234"
[1]=>
string(4) "1234"
}
array(2) {
[0]=>
string(5) "#1234"
[1]=>
string(4) "1234"
}
array(2) {
[0]=>
string(5) "#1234"
[1]=>
string(4) "1234"
}
(\G|\s+|^)#(\d+)((?=[^[:alnum:]])|$)
i wanted to keep the three groups that i had, but i only changed the 3rd group. i removed the :: and \S whitespace characters from the 3rd group and just added a simple NOT alphanumeric check, as this will contain those 2 conditions as well.
(\G|\s+|^)
#(\d+)
((?=[^[:alnum:]])|$)
[^[:alnum:]]

PHP preg_match get content between

<!--:en-->Apvalus šviestuvas<!--:-->
<!--:ru-->Круглый Светильник<!--:-->
<!--:lt-->Round lighting<!--:-->
I need get the content between <!--:lt--> and <!--:-->
I have tried:
$string = "<!--:en-->Apvalus šviestuvas<!--:--><!--:ru-->Круглый Светильник<!--:--><!--:lt-->Round lighting<!--:-->";
preg_match('<!--:lt-->+[a-zA-Z0-9]+<!--:-->$', $string, $match);
var_dump($match);
Something is wrong with the syntax and logic. How can I make this work?
preg_match("/<!--:lt-->([a-zA-Z0-9 ]+?)<!--:-->/", $string, $match);
added delimiters
added a match group
added ? to make it ungreedy
added [space] (there is a space in Round lighting)
Your result should be in $match[1].
A cooler and more generic variation is:
preg_match_all("/<!--:([a-z]+)-->([^<]+)<!--:-->/", $string, $match);
Which will match all of them. Gives:
array(3) { [0]=> array(3) { [0]=> string(37) "Apvalus šviestuvas" [1]=> string(53) "Круглый Светильник" [2]=> string(32) "Round lighting" } [1]=> array(3) { [0]=> string(2) "en" [1]=> string(2) "ru" [2]=> string(2) "lt" } [2]=> array(3) { [0]=> string(19) "Apvalus šviestuvas" [1]=> string(35) "Круглый Светильник" [2]=> string(14) "Round lighting" } }
Use this Pattern (?<=<!--:lt-->)(.*)(?=<!--:-->)
<?php
$string = "<!--:en-->Apvalus šviestuvas<!--:--><!--:ru-->Круглый Светильник<!--:--><!--:lt-->Round lighting<!--:-->";
preg_match('~(?<=<!--:lt-->)(.*)(?=<!--:-->)~', $string, $match);
var_dump($match);

Why does preg_match_all() create the same answer multiple times?

The following code extracts #hashtags from a tweet and puts them in the variable $matches.
$tweet = "this has a #hashtag a #badhash-tag and a #goodhash_tag";
preg_match_all("/(#\w+)/", $tweet, $matches);
var_dump( $matches );
Can someone please explain to me why the following results have 2 identical arrays instead of just 1?
array(2) {
[0]=>
array(3) {
[0]=>
string(8) "#hashtag"
[1]=>
string(8) "#badhash"
[2]=>
string(13) "#goodhash_tag"
}
[1]=>
array(3) {
[0]=>
string(8) "#hashtag"
[1]=>
string(8) "#badhash"
[2]=>
string(13) "#goodhash_tag"
}
}
Because you use () to catch the sub group.
Try:
preg_match_all("/#\w+/", $tweet, $matches);
Why are you using () unless you want it to do exactly that. lol Sorry, that came out not so friendly :(
http://php.net/manual/en/function.preg-match-all.php Example 3
its simple :
remove () from your expression
Hope it helps.

preg_match not returning expected results

I'm attempting to use regexp to parse a search string that from time to time may contain special syntax. The syntax im looking for is [special keyword : value] and i want each match put into an array. Keep in mind that the search string will contain other text that is not intended to be parsed.
$searchString = "[StartDate:2010-11-01][EndDate:2010-11-31]";
$specialKeywords = array();
preg_match("/\[{1}.+\:{1}.+\]{1}/", $searchString, $specialKeywords);
var_dump($specialKeywords);
Output:
array(1) { [0]=> string(43) "[StartDate:2010-11-01] [EndDate:2010-11-31]" }
Desired Output:
array(2) { [0]=> string() "[StartDate:2010-11-01]"
[1]=> string() "[EndDate:2010-11-01]"}
Please let me know if i am not being clear enough.
Your .+ matches across the boundaries between the two [...] parts because it matches any character, and as many of them as possible. You could be more restrictive about which characters may be matched. Also {1} is redundant and can be dropped.
/\[[^:]*:[^\]]*\]/
should work more reliably.
Explanation:
\[ # match a [
[^:]* # match any number of characters except :
: # match a :
[^\]]* # match any number of characters except ]
\] # match a ]
This:
$searchString = "[StartDate:2010-11-01][EndDate:2010-11-31]";
preg_match_all('/\[.*?\]/', $searchString, $match);
print_r($match);
gives the expected result, I'm not sure if it matches all the constraints.
Try the following:
$searchString = "[StartDate:2010-11-01][EndDate:2010-11-31]";
$specialKeywords = array();
preg_match_all("/\[\w+:\d{4}-\d\d-\d\d\]/i", $searchString, $specialKeywords);
var_dump($specialKeywords[0]);
Outputs:
array(2) {
[0]=>
string(22) "[StartDate:2010-11-01]"
[1]=>
string(20) "[EndDate:2010-11-31]"
}
Use this regex: "/\[(.*?)\:(.*?)\]{1}/" and also use preg_match_all, it will return
array(3) {
[0]=>
array(2) {
[0]=>
string(22) "[StartDate:2010-11-01]"
[1]=>
string(20) "[EndDate:2010-11-31]"
}
[1]=>
array(2) {
[0]=>
string(9) "StartDate"
[1]=>
string(7) "EndDate"
}
[2]=>
array(2) {
[0]=>
string(10) "2010-11-01"
[1]=>
string(10) "2010-11-31"
}
}
/\[.+?\:.+?\]/
I suggest this method, less complex but it handles the same as tim's

Categories