Retrieving text outside square brackets in PHP - php

I need some way of capturing the text outside square brackets. So for example, the following string:
My [ground]name[test]Jhon[random]petor [shorts].
I m using the below preg match expression but the result could not be expected
preg_match_all("/\[[^\]]*\]/", $text, $matches);
it giving me the result which is within the square bracket.
Result :
Array (
[0] => [ground]
[1] => [test]
[2] => [random]
[3] => [shorts]
)
Expect Output:
Array (
[0] => [My]
[1] => [name]
[2] => [Jhon]
[3] => [petor]
)
Any help that would be great

You can extend the pattern adding \K to clean what is matched so far and then using an alternation to match 1 or more word characters.
\[[^][]+]\K|\w+
See a regex demo
$re = '/\[[^][]+]\K|\w+/';
$str = 'My [ground]name[test]Jhon[random]petor [shorts].';
preg_match_all($re, $str, $matches);
print_r(array_values(array_filter($matches[0])));
Output
Array
(
[0] => My
[1] => name
[2] => Jhon
[3] => petor
)

Related

Split a string by forward slash but ignore the <\ in the string

I have string similar to this
word1/word2/word3/<b>word3</b>
I want to explode this string by forward slash. So that I can get the following result.
Array = (
[0] => 'word1',
[1] => 'word2',
[2] => 'word3',
[3] => '<b>word3</b>'
);
But I'm unable to get the above result. Instead I'm getting the following result
Array = (
[0] => 'word1',
[1] => 'word2',
[2] => 'word3',
[3] => '<b>word3<',
[4] => 'b>'
);
What regular expression should I use for this to use the preg_split function to achieve the expected results?
With preg_split function and specific regex pattern:
$s = 'word1/word2/word3/<b>word3</b>';
$result = preg_split('~(?<!<)/~', $s);
print_r($result);
~ - treated as regex expression separator
(?<!<)/ - negative lookbehind assertion, assures that forward slash / is not preceded by <
The output:
Array
(
[0] => word1
[1] => word2
[2] => word3
[3] => <b>word3</b>
)

php regex mach before and after specific word

I have a string with data that looks like this:
$string = '
foo=bar
badge_name_foo=foo
bar_badge_name=bar
bar=baz
';
I want to match all *_badge_name and badge_name_* strings.
The regex im using is this:
preg_match_all('~(?:(\w+)_)?badge_name(?:_(\w+))?~', $string, $matches, PREG_SET_ORDER);
The result is:
Array
(
[0] => Array
(
[0] => badge_name_foo
[1] =>
[2] => foo
)
[1] => Array
(
[0] => bar_badge_name
[1] => bar
)
)
The *_badge_name is working fine, but on badge_name_* there is every time a empty value? Now how can i remove that with preg_match_all
Expected result should be:
Array
(
[0] => Array
(
[0] => badge_name_foo
[1] => foo
)
[1] => Array
(
[0] => bar_badge_name
[1] => bar
)
)
It seems you need to use BRANCH RESET feature:
Alternatives inside a branch reset group share the same capturing groups. The syntax is (?|regex) where (?| opens the group and regex is any regular expression. If you don't use any alternation or capturing groups inside the branch reset group, then its special function doesn't come into play. It then acts as a non-capturing group.
Use
(?|(\w+)_badge_name|badge_name_(\w+))
^^^
See the regex demo.
PHP demo:
$re = '/(?|(\w+)_badge_name|badge_name_(\w+))/';
$str = 'foo=bar
badge_name_foo=foo
bar_badge_name=bar
bar=baz';
preg_match_all($re, $str, $matches);
print_r($matches);
Result:
Array
(
[0] => Array
(
[0] => badge_name_foo
[1] => bar_badge_name
)
[1] => Array
(
[0] => foo
[1] => bar
)
)

Break a string with optional space and number and a dot

trying to break a string from (optional space) number and a dot.
$string = "1.1Kumar/Sandeep MR*T0148.4801 12.23Pal/Sandeep MR*T643.948";
$regex1 = "/(\s*[0-9]+\.)/";
$regex2 = "/(?<=\s)[0-9]+\./";
I need to break from 1. and 12. .
The first regex gives:
Array
(
[0] =>
[1] => 1Kumar/Sandeep MR*T
[2] => 4801
[3] => 23Pal/Sandeep MR*T
[4] => 948
)
The second regex gives:
Array
(
[0] => 1.1Kumar/Sandeep MR*T0148.4801
[1] => 23Pal/Sandeep MR*T643.948
)
I am trying to get:
Array
(
[0] => 1Kumar/Sandeep MR*T0148.4801
[1] => 23Pal/Sandeep MR*T643.948
)
For you example string this will work:
\b\d+\.
Debuggex Demo
It makes sure there's a word break before the numeric part. (start of line or a space does it)

PHP - preg_match_all Same Group, Different Pattern

<?php
$str = '123.456.789.987,654,321,';
// preg_match_all('((?<grp1>\d{3})\.|(?<grp2>\d{3})\,)', $str, $matches);
preg_match_all('((?<grp1>\d{3})\.|(?<grp1>\d{3})\,)', $str, $matches);
print_r($matches);
?>
Based on code above, I want to get all the string as an group array called grp1, but it always become error PHP Warning: preg_match_all(): Compilation failed: two named subpatterns have the same name at offset .... If I change one of the group name to grp2 it works well.
Is it possible to using 1 group name instead of different group name in preg_match_all?
Update
There is a reason why I cannot using something like this /(?<grp1>\d{3})[.,]/, here is an example to clear the problem:
<?php
$str = '
src="img1.png"
src="img2.jpg"
url(img3.png)
url(img4.jpg)
';
preg_match_all('/src=\"(?<img1>(\w+)(.png|.jpg))\"|url\((?<img2>(\w+)(.png|.jpg))\)/', $str, $matches);
print_r($matches);
?>
I want to take all the img1.png, img2.jpg, img3.png and img4.jpg into array group named img something like this:
[img] => Array
(
[0] => img1.png
[1] => img2.jpg
[2] => img3.png
[3] => img4.jpg
)
First of all a regex in PHP needs to be wrapped in boundaries like / or # etc.
Then your regex doesn't need to be this complex. Same can be simplified using this regex:
/(?<grp1>\d{3})[.,]/
Full Code:
$str = '123.456.789.987,654,321,';
preg_match_all('/(?<grp1>\d{3})[.,]/', $str, $matches);
print_r($matches['grp1']);
OUTPUT:
Array
(
[0] => 123
[1] => 456
[2] => 789
[3] => 987
[4] => 654
[5] => 321
)
UPDATE: As per your updated question:
$str = '
src="img1.png"
src="img2.jpg"
url(img3.png)
url(img4.jpg)
';
preg_match_all('/(?<=src="|url\()(?<img>[^")]+)/i', $str, $matches);
print_r($matches['img']);
OUTPUT:
Array
(
[0] => img1.png
[1] => img2.jpg
[2] => img3.png
[3] => img4.jpg
)

PHP regex preg_split - split by largest group only

I have the following regex
((\$|(\\\[)).*?(\$|(\\\])))
which should capture everything between $$ and \[\] and I tested it on http://gskinner.com/RegExr/ and it's working.
PHP variant is (doubled backslashes)
((\$|(\\\\\[)).*?(\$|(\\\\\])))
and I would like to split my text based on that regex. How can I tell that it uses just the first (and largest group) and not these small ones?
preg_split('/((\$|(\\\\\[)).*?(\$|(\\\\\])))/', $text, -1, PREG_SPLIT_DELIM_CAPTURE);
So for text This is my $test$ for something. I should get an array
[0] => This is my
[1] => $test$
[2] => for something.
But I get
[0] => This is my
[1] => $test$
[2] => $
[3] =>
[4] => $
[5] => for something.
You would need something like this:
$text = 'This is my $test$ for \[something\] new!';
print_r(preg_split('/(\$.*?\$|\\\\\[.*?\\\\\])/', $text, -1, PREG_SPLIT_DELIM_CAPTURE));
Output:
Array
(
[0] => This is my
[1] => $test$
[2] => for
[3] => \[something\]
[4] => new!
)
IMHO, your regex is (probably) wrong. It would fail for texts like Hello $there\]. If you need to capture texts between two $s and a pair of \[ and \], then you need the regexp like:
<-------------> Match text between \[ and \]
/(\$.*?\$|\\\\\[.*?\\\\\])/
<-----> Match text between dollars

Categories