I have a string with a random amount of words, and I need to find the random texts that is in between the words, one and two, and after the word two. I managed to get the random texts in between the words, one and two, but after the word two, I'm only getting one character of the random texts. Please take a look at my code, and let me know what I did wrong.
$string = 'one randomText two randomText';
preg_match_all('/one\\s+(.+?)\\s+two\\s+(.+?)/i', $string, $matches);
print_r($matches);
Expected output:
Array ( [0] => Array ( [0] => one randomText two randomText ) [1] => Array ( [0] => randomText ) [2] => Array ( [0] => randomText ) );
Actual output:
Array ( [0] => Array ( [0] => one randomText two r ) [1] => Array ( [0] => randomText ) [2] => Array ( [0] => r ) );
You can use:
preg_match_all('/one\s+(.+?)\s+two\s+(.+)/i', $string, $matches);
No need to use non-greedy (lazy) quantifier here in the end which will otherwise match as little as possible (more than one) hence matches only r.
Related
I have a string with data that looks like this:
$string = '
foo=bar
badge_name_foo=foo
bar_badge_name=bar
bar=baz
';
I want to match all *_badge_name and badge_name_* strings.
The regex im using is this:
preg_match_all('~(?:(\w+)_)?badge_name(?:_(\w+))?~', $string, $matches, PREG_SET_ORDER);
The result is:
Array
(
[0] => Array
(
[0] => badge_name_foo
[1] =>
[2] => foo
)
[1] => Array
(
[0] => bar_badge_name
[1] => bar
)
)
The *_badge_name is working fine, but on badge_name_* there is every time a empty value? Now how can i remove that with preg_match_all
Expected result should be:
Array
(
[0] => Array
(
[0] => badge_name_foo
[1] => foo
)
[1] => Array
(
[0] => bar_badge_name
[1] => bar
)
)
It seems you need to use BRANCH RESET feature:
Alternatives inside a branch reset group share the same capturing groups. The syntax is (?|regex) where (?| opens the group and regex is any regular expression. If you don't use any alternation or capturing groups inside the branch reset group, then its special function doesn't come into play. It then acts as a non-capturing group.
Use
(?|(\w+)_badge_name|badge_name_(\w+))
^^^
See the regex demo.
PHP demo:
$re = '/(?|(\w+)_badge_name|badge_name_(\w+))/';
$str = 'foo=bar
badge_name_foo=foo
bar_badge_name=bar
bar=baz';
preg_match_all($re, $str, $matches);
print_r($matches);
Result:
Array
(
[0] => Array
(
[0] => badge_name_foo
[1] => bar_badge_name
)
[1] => Array
(
[0] => foo
[1] => bar
)
)
Got a number list with separator(s) like these (note: quotes not included):
"1"
"1$^20"
"23$^100$^250"
I watch to write a regex to match both syntax of numbers and separators and also return all numbers in list, the best try I can get in PHP is this code segment:
preg_match_all("/(\d+)(?:\\$\\^){0,1}/", $s2, $n);
print_r($n);
but it returns:
Array
(
[0] => Array
(
[0] => 1
[1] => 20
)
[1] => Array
(
[0] => 1
[1] => 20
)
)
What I need is:
Array
(
[0] => 1
[1] => 20
)
or at least:
Array
(
[0] => Array
(
[0] => 1
[1] => 20
)
)
You can just get the first entry in your match array like this:
$s2 = "1$^20";
preg_match_all("/(\d+)(?:\$\^){0,1}/", $s2, $n);
print_r($n[0]);
// Array ( [0] => 1 [1] => 20 )
Or drop the group and just extract the numbers like this:
$s2 = "1$^20";
preg_match_all("/\d+/", $s2, $n);
print_r($n);
// Array ( [0] => Array ( [0] => 1 [1] => 20 ) )
Another alternative might be to use preg_split:
$s2 = "1$^20";
$n = preg_split('/\$\^/', $s2);
print_r($n);
// Array ( [0] => 1 [1] => 20 )
I thought about this quesiton again. I know I need not only split them but also check the value syntax. And what if it's a text seprated list? ... Hmm... then a smart way comes into my mind as follows in PHP codes:
// Split and also check value validity of number separated list
$pattern1 = "/(\d+?)\\$\\^/";
$1 = "1^$23";
$s1 .= "$^"; // Always append one separator set
preg_match_all($pattern1, $s1, $matches);
Change \d to . will work for text separated list or number-text-mixed separated list, too.
I had fetched some result from a regular expression using.
$res = "there are many restaurants in the city. Restaurants like xyz,abc. one restaurant like.....";
$pattern = '/restaurants?/i';
preg_match_all($pattern, substr($res,10), $matches, PREG_OFFSET_CAPTURE);
print_r($matches[0]);
For this regular expression, my output is
Array
(
[0] => Array
(
[0] => restaurants
[1] => 5
)
[1] => Array
(
[0] => Restaurants
[1] => 30
)
[2] => Array
(
[0] => restaurant
[1] => 60
)
)
in the [0] index, i find the matched strings. But, I dont know the values came in the [1] index like 5 , 30 , 60. Please help me in finding that.
PREG_OFFSET_CAPTURE
This captures the offset. These numbers tell you where the string was found.
Please read the manual first.
http://php.net/preg_match_all
I have preg_match_all function:
preg_match_all('#<h2>(.*?)</h2>#is', $source, $output, PREG_SET_ORDER);
It's working as intended, BUT the problem is, it preg_matches all items twice and into a huge multi dimensional array like this for example where it, as intended, preg_matched all 11 items needed, but twice and into a multidimensional array:
Array
(
[0] => Array
(
[0] => <h2>10. <em>Cruel</em> by St. Vincent</h2>
[1] => 10. <em>Cruel</em> by St. Vincent
)
[1] => Array
(
[0] => <h2>9. <em>Robot Rock</em> by Daft Punk</h2>
[1] => 9. <em>Robot Rock</em> by Daft Punk
)
[2] => Array
(
[0] => <h2>8. <em>Seven Nation Army</em> by the White Stripes</h2>
[1] => 8. <em>Seven Nation Army</em> by the White Stripes
)
[3] => Array
(
[0] => <h2>7. <em>Do You Want To</em> by Franz Ferdinand</h2>
[1] => 7. <em>Do You Want To</em> by Franz Ferdinand
)
[4] => Array
(
[0] => <h2>6. <em>Teenage Dream</em> by Katie Perry</h2>
[1] => 6. <em>Teenage Dream</em> by Katie Perry
)
[5] => Array
(
[0] => <h2>5. <em>Crazy</em> by Gnarls Barkley</h2>
[1] => 5. <em>Crazy</em> by Gnarls Barkley
)
[6] => Array
(
[0] => <h2>4. <em>Kids</em> by MGMT</h2>
[1] => 4. <em>Kids</em> by MGMT
)
[7] => Array
(
[0] => <h2>3. <em>Bad Romance</em> by Lady Gaga</h2>
[1] => 3. <em>Bad Romance</em> by Lady Gaga
)
[8] => Array
(
[0] => <h2>2. <em>Pumped Up Kicks</em> by Foster the People</h2>
[1] => 2. <em>Pumped Up Kicks</em> by Foster the People
)
[9] => Array
(
[0] => <h2>1. <em>Paradise</em> by Coldplay</h2>
[1] => 1. <em>Paradise</em> by Coldplay
)
[10] => Array
(
[0] => <h2>Song That Get Stuck In Your Head YouTube Playlist</h2>
[1] => Song That Get Stuck In Your Head YouTube Playlist
)
)
How to convert this array into simple one and without those duplicated items? Thank you very much.
You will always get a multidimensional array back, however, you can get close to what you want like this:
if (preg_match_all('#<h2>(.*?)</h2>#is', $source, $output, PREG_PATTERN_ORDER))
$matches = $output[0]; // reduce the multi-dimensional array to the array of full matches only
And if you don't want the submatch at all, then use a non-capturing grouping:
if (preg_match_all('#<h2>(?:.*?)</h2>#is', $source, $output, PREG_PATTERN_ORDER))
$matches = $output[0]; // reduce the multi-dimensional array to the array of full matches only
Note that this call to preg_match_all is using PREG_PATTERN_ORDER instead of PREG_SET_ORDER:
PREG_PATTERN_ORDER Orders results so that $matches[0] is an array of
full pattern matches, $matches[1] is an array of strings matched by
the first parenthesized subpattern, and so on.
PREG_SET_ORDER Orders results so that $matches[0] is an array of first
set of matches, $matches[1] is an array of second set of matches, and
so on.
See: http://php.net/manual/en/function.preg-match-all.php
Use
#<h2>(?:.*?)</h2>#is
as your regex. If you use a non capturing group (which is what ?: signifies), a backreference won't show up in the array.
Using preg_match with subpattern always returns double-key array with identical data, one with subpattern name and the other tagged with number. Because I'm matching hundred thousands of lines with few kbytes per row, I'm afraid the number array is occupying extra memory. Is there any proper way to disable the number tag array from returning?
Example:
<?php
header('Content-Type: text/plain');
$data = <<<START
I go to school.
He goes to funeral.
START;
preg_match_all('#^(?<who>.*?) go(es)* to (?<place>.*?)$#m', $data, $matches);
print_r($matches);
?>
Output:
Array
(
[0] => Array
(
[0] => I go to school.
[1] => He goes to funeral.
)
[who] => Array
(
[0] => I
[1] => He
)
[1] => Array
(
[0] => I
[1] => He
)
[2] => Array
(
[0] =>
[1] => es
)
[place] => Array
(
[0] => school.
[1] => funeral.
)
[3] => Array
(
[0] => school.
[1] => funeral.
)
)
From php.net- Subpatterns
It is possible to name a subpattern using the syntax (?P<name>pattern). This subpattern will then be indexed in the matches array by its normal numeric position and also by name.
I see no option to give only the index by name.
So, I think, if you don't want this data two times, the only possibility is: don't use named groups.
Is this really an issue? IMO optimize this only if you run into problems, because of this additional memory usage! The improved readability should be worth the memory!
Update
It look like go(es)* should only match an optional "es". Here you can save memory by using a non capturing group.
preg_match_all('#^(?<who>.*?) go(?:es)? to (?<place>.*?)$#m', $data, $matches);
by starting the group with ?: the matched content is not stored. I also replaced the * that means 0 or more and would also match "goeseses" with the ? which means 0 or 1.