Split __HELLO____HAPPY_BIRTHDAY__ to __HELLO__ and __HAPPY_BIRTHDAY__ - php

I have some php code like this:
$input = "
__HELLO__
__HAPPY_BIRTHDAY__
__HELLO____HAPPY_BIRTHDAY__";
preg_match_all('/__(\w+)__/', $input, $matches);
print_r($matches[0]);
Currently the result of $matches[0] is this:
Array
(
[0] => __HELLO__
[1] => __HAPPY_BIRTHDAY__
[2] => __HELLO____HAPPY_BIRTHDAY__
)
As you can see my regex is interpreting __HELLO____HAPPY_BIRTHDAY__ as one match, which I don't want.
I want the matches to return this:
Array
(
[0] => __HELLO__
[1] => __HAPPY_BIRTHDAY__
[2] => __HELLO__
[3] => __HAPPY_BIRTHDAY__
)
Where __HELLO____HAPPY_BIRTHDAY__ is split into __HELLO__ and __HAPPY_BIRTHDAY__. How can I do this?
(Each line will only ever have one underscore in between the outer underscores e.g. __HAPPY__BIRTHDAY__ is illegal)

You need to use the U modifier. This makes quantifiers "lazy".
$input = "
__HELLO__
__HAPPY_BIRTHDAY__
__HELLO____HAPPY_BIRTHDAY__";
preg_match_all('/__(\w+)__/U', $input, $matches);
print_r($matches[0]);
Output:
Array
(
[0] => __HELLO__
[1] => __HAPPY_BIRTHDAY__
[2] => __HELLO__
[3] => __HAPPY_BIRTHDAY__
)

Related

Array named capture using PHP regex

If named capture matches multiple times, is it possible to retrieve all matches?
Example
<?php
$string = 'TextToMatch [some][random][tags] SomeMoreMatches';
$pattern = "!(TextToMatch )(?P<tags>\[.+?\])+( SomeMoreMatches)!";
preg_match($pattern, $string, $matches);
print_r($matches);
Which results in
Array
(
[0] => TextToMatch [some][random][tags] SomeMoreMatches
[1] => TextToMatch
[tags] => [tags]
[2] => [tags]
[3] => SomeMoreMatches
)
Is is possible to get something like
Array
(
[0] => TextToMatch [some][random][tags] SomeMoreMatches
[1] => TextToMatch
[tags] => Array
(
[0] => [some]
[1] => [random]
[2] => [tags]
)
[2] => Array
(
[0] => [some]
[1] => [random]
[2] => [tags]
)
[3] => SomeMoreMatches
)
using only preg_match?
I am aware that I can explode tags, but I wonder if I can do this with preg_match (or similiar function) only.
Other example
$input = "Some text [many][more][other][tags][here] and maybe some text here?";
Desirable output
Array
(
[0] => Some text [many][more][other][tags][here] and maybe some text here?
[1] => Some text
[tags] => Array
(
[0] => [many]
[1] => [more]
[2] => [other]
[3] => [tags]
[4] => [here]
)
[2] => Array
(
[0] => [many]
[1] => [more]
[2] => [other]
[3] => [tags]
[4] => [here]
)
[3] => and maybe some text here?
)
You need use preg_match_all and modify the reg exp:
preg_match_all('/(?P<tags>\[.+?\])/', $string, $matches);
Just remove the + after ) to set one pattern and preg_match_all make a global search
If you need the specific answer that you posted, try with:
$string = '[some][random][tags]';
$pattern = "/(?P<tags>\[.+?\])/";
preg_match_all($pattern, $string, $matches);
$matches = [
implode($matches['tags']), end($matches['tags'])
] + $matches;
print_r($matches);
You get:
Array
(
[0] => [some][random][tags]
[1] => [tags]
[tags] => Array
(
[0] => [some]
[1] => [random]
[2] => [tags]
)
)
Since you stated in your comments that you are not actually interested in the leading substring before the set of tags, and because you stated that you don't necessarily need the named capture group (I never use them), you really only need to remove the first bit, split the string on the space after the set of tags, then split each tag in the set of tags.
Code: (Demo)
$split = explode(' ', strstr($input, '['), 2); // strstr() trims off the leading substring
var_export($split); // ^ tells explode to stop after making 2 elements
Produces:
array (
0 => '[many][more][other][tags][here]',
1 => 'and maybe some text here?',
)
Then the most direct/clean way to split those square bracketed tags, is to use the zero-width position between each closing bracket (]) and each opening bracket ([). Since only regex can isolate these specific positions as delimiters, I'll suggest preg_split().
$split[0] = preg_split('~]\K~', $split[0], -1, PREG_SPLIT_NO_EMPTY);
var_export($split); ^^- release/forget previously matched character(s)
This is the final output:
array (
0 =>
array (
0 => '[many]',
1 => '[more]',
2 => '[other]',
3 => '[tags]',
4 => '[here]',
),
1 => 'and maybe some text here?',
)
No, as Wiktor stated(1, 2), it is not possible to do using only preg_match
Solution that just works
<?php
$string = 'TextToMatch [some][random][tags] SomeMoreMatches';
$pattern = "!(TextToMatch )(?P<tags>\[.+?\]+)( SomeMoreMatches)!";
preg_match($pattern, $string, $matches);
$matches[2] = $matches["tags"] = array_map(function($s){return "[$s]";}, explode("][", substr($matches["tags"],1,-1)));
print_r($matches);

Wrong working regular expression for parsing short terms

I wrote some a regular expression for PHP to parsing abbreviation from string.
My code:
$re = "/(([$]?+[А-Яа-я.]+[.]){1,})/";
$str = "г. Братск, ж.р. Южный Падун, ул. Мамырская, 62А, за остановкой";
preg_match_all($re, $str, $matches);
And this script return:
Array
(
[0] => Array
(
[0] => г.
[1] => ж.
[2] => л.
)
[1] => Array
(
[0] => г.
[1] => ж.
[2] => л.
)
[2] => Array
(
[0] => г.
[1] => ж.
[2] => л.
)
)
But it will work like this:
[1]=>'ж.р.', [2]=>'ул.'
It means, that my regex parse part of abbreviation, though I need to get full abbreviation.
For example on regex101.com it pretty works: https://regex101.com/r/wQ7lR7/1
How I can get full abbreviation ('г.','ж.р.','ул.')?
You need to use the unicode modifier, u, http://php.net/manual/en/reference.pcre.pattern.modifiers.php.
Example:
$re = "/(([$]?+[А-Яа-я.]+[.]){1,})/u";
$str = "г. Братск, ж.р. Южный Падун, ул. Мамырская, 62А, за остановкой";
preg_match_all($re, $str, $matches);
print_r($matches);
Output:
Array
(
[0] => Array
(
[0] => г.
[1] => ж.р.
[2] => ул.
)
[1] => Array
(
[0] => г.
[1] => ж.р.
[2] => ул.
)
[2] => Array
(
[0] => г.
[1] => ж.р.
[2] => ул.
)
)

Unexpected preg_match result from pattern with "?:"

I try this pattern
(?:(\d+)\/|)reports\/(\d+)-([\w-]+).html
with this string (preg_match with modifiers "Axu")
reports/683868-derger-gergewrger.html
and i expected this matched result (https://regex101.com/r/kX6yZ5/1):
[1] => 683868
[2] => derger-gergewrger
But i get this:
[1] =>
[2] => 683868
[3] => derger-gergewrger
Why? Where does the empty value (1), because the pattern should not capture "?:"
I have two cases:
"reports/683868-derger-gergewrger.html"
"757/reports/683868-derger-gergewrger.html"
at first case, i need two captures, but at second case i need three captures.
You can use:
preg_match('~(?:\d+/)?reports/(\d+)-([\w-]+)\.html~',
'reports/683868-derger-gergewrger.html', $m);
print_r($m);
Array
(
[0] => reports/683868-derger-gergewrger.html
[1] => 683868
[2] => derger-gergewrger
)
EDIT: You probably want this behavior:
$s = '757/reports/683868-derger-gergewrger.html';
preg_match('~(?|(\d+)/reports/(\d+)-([\w-]+)\.html|reports/(\d+)-([\w-]+)\.html)~',
$s, $m); print_r($m);Array
(
[0] => 757/reports/683868-derger-gergewrger.html
[1] => 757
[2] => 683868
[3] => derger-gergewrger
)
and:
$s = 'reports/683868-derger-gergewrger.html';
preg_match('~(?|(\d+)/reports/(\d+)-([\w-]+)\.html|reports/(\d+)-([\w-]+)\.html)~',
$s, $m); print_r($m);
Array
(
[0] => reports/683868-derger-gergewrger.html
[1] => 683868
[2] => derger-gergewrger
)
(?|..) is a Non-capturing group. Subpatterns declared within each alternative of this construct will start over from the same index.

Find all patterns in a string php

This is my string.
$str = '"additional_details":" {"mode_of_transport":"air"}"}],"additional_details":"{"mode_of_transport":"air"}"}],"additional_details":"{"mode_of_transport":"air"}"}],';
I want to find all patterns that start with "{" and end with "}".
I am trying this:
preg_match_all( '/"(\{.*\})"/', $json, $matches );
print_r($matches);
It gives me an output of:
Array
(
[0] => Array
(
[0] => "{"mode_of_transport":"air"}"}],"additional_details":"{"mode_of_transport":"air"}"}],"additional_details":"{"mode_of_transport":"air"}"
)
[1] => Array
(
[0] => {"mode_of_transport":"air"}"}],"additional_details":"{"mode_of_transport":"air"}"}],"additional_details":"{"mode_of_transport":"air"}
)
)
See the array key 1. It gives all matches in one key and other details too.
I want an array of all matches. Like
Array
(
[0] => Array
(
[0] => "{"mode_of_transport":"air"}"}],"additional_details":"{"mode_of_transport":"air"}"}],"additional_details":"{"mode_of_transport":"air"}"
)
[1] => Array
(
[0] => {"mode_of_transport":"air"},
[1] => {"mode_of_transport":"air"},
[2] => {"mode_of_transport":"air"}
)
)
What should I change in my pattern.
Thanks
You can use:
preg_match_all( '/({[^}]*})/', $str, $matches );
print_r($matches[1]);
Array
(
[0] => {"mode_of_transport":"air"}
[1] => {"mode_of_transport":"air"}
[2] => {"mode_of_transport":"air"}
)

preg_match_all and umlets

I am using preg_match_all to filter out strings
The string which I have supplied in preg_match_all is
$text = "Friedric'h Wöhler"
after that I use
preg_match_all('/(\"[^"]+\"|[\\p{L}\\p{N}\\*\\-\\.\\?]+)/', $text, $arr, PREG_PATTERN_ORDER);
and the result i get when I print $arr is
Array
(
[0] => Array
(
[0] => friedric
[1] => h
[2] => w
[3] => ouml
[4] => hler
)
[1] => Array
(
[0] => friedric
[1] => h
[2] => w
[3] => ouml
[4] => hler
)
)
Somehow the ö character is replaced by ouml which I am not really sure how to figure this out
I am expecting following result
Array
(
[0] => Array
(
[0] => Friedric'h
[1] => Wöhler
)
)
Per nhahtdh's comment:
$text = "Friedric'h Wöhler";
preg_match_all('/"[^"]+"|[\p{L}\p{N}*.?\\\'-]+/u', $text, $arr, PREG_PATTERN_ORDER);
echo "<pre>";
print_r($arr);
echo "</pre>";
Gives
Array
(
[0] => Array
(
[0] => Friedric'h
[1] => Wöhler
)
)
If you think preg_match_all() is messy, you could take a look at pattern():
$p = '"[^"]+"|[\p{L}\p{N}*.?\\\'-]+'; // automatic delimiters
$text = "Friedric'h Wöhler";
$result = pattern($p)->match($text)->all();

Categories