$str = "[10:42-23:10]part1[11:30-13:20]part2"
I wish to split it into something like:
[1] 10:42-23:10
[2] part1
[3] 11:30-13:20
[4] part2
The best I managed to come up with is:
$parts = preg_split("/(\\[*\\])\w+/", $str );
But this returns
[0] => [10:42-23:10
[1] => [11:30-13:20
[2] =>
Also you can use regex in preg_match_all() instead of preg_split()
$str = "[10:42-23:10]part1[11:30-13:20]part2";
preg_match_all("/[^\[\]]+/", $str, $parts);
print_r($parts[0]);
See result in demo
Split on alternative between [ and ], and use the flag PREG_SPLIT_NO_EMPTY to not catch empty parts.
$str = "[10:42-23:10]part1[11:30-13:20]part2";
$parts = preg_split("/\[|\]/", $str, -1, PREG_SPLIT_NO_EMPTY );
print_r($parts);
Output:
Array
(
[0] => 10:42-23:10
[1] => part1
[2] => 11:30-13:20
[3] => part2
)
NB.
Thank to #WiktorStribiżew , his regex /[][]/ is much more efficient, I've some benchmark, it is about 40% faster.
$str = "[10:42-23:10]part1[11:30-13:20]part2";
$parts = preg_split("/[][]/", $str, -1, PREG_SPLIT_NO_EMPTY );
print_r($parts);
Here is the perl script I have used to do the benchmark:
#!/usr/bin/perl
use Benchmark qw(:all);
my $str = "[10:42-23:10]part1[11:30-13:20]part2";
my $count = -5;
cmpthese($count, {
'[][]' => sub {
my #parts = split(/[][]/, $str);
},
'\[|\]' => sub {
my #parts = split(/\[|\]/, $str);
},
});
Result: (2 runs)
>perl -w benchmark.pl
Rate \[|\] [][]
\[|\] 536640/s -- -40%
[][] 891396/s 66% --
>Exit code: 0
>perl -w benchmark.pl
Rate \[|\] [][]
\[|\] 530867/s -- -40%
[][] 885242/s 67% --
>Exit code: 0
Use a simple regex to match any [...] substring (\[[^][]*]) and wrap the whole pattern with a capturing group - then you can use it with preg_split and PREG_SPLIT_DELIM_CAPTURE flag to get both the captures and the substrings in between matches:
$re = '/(\[[^][]*])/';
$str = '[10:42-23:10]part1[11:30-13:20]part2';
$matches = preg_split($re, $str, -1, PREG_SPLIT_NO_EMPTY | PREG_SPLIT_DELIM_CAPTURE);
print_r($matches);
See the PHP demo
With this approach, you may have a better control of what you match inside square brackets, as you may adjust the pattern to only match time ranges, e.g.
(\[\d{2}:\d{2}-\d{2}:\d{2}])
A [10:42-23:10]part1[11:30-13:20]part2[4][5] will get split into [10:42-23:10], part1, [11:30-13:20] and part2[4][5] (note the [4][5] are not split out).
See this regex demo
Without regex, you can use strtok:
$result = [];
$tok = strtok($str, '[]');
do {
if (!empty($tok))
$result[] = $tok;
} while (false !== $tok = strtok('[]'));
Related
$pattern = "/^(?<animal>DOG|CAT)?(?<color>BLUE|RED)?$/i";
$str = "DOG";
preg_match($pattern, $str, $matches);
$matches = array_filter($matches, 'is_string', ARRAY_FILTER_USE_KEY);
// $str = "dog" returns [animal] => DOG
//$str = "dogBLUE" returns [animal] => dog and [color] => BLUE
print_r($matches);
I have an example http://sandbox.onlinephpfunctions.com/code/2428bc68adcf9929557d86dc1ae72552c3681b58 too
Both named capture groups are optional and so keys will only be returned if a match is found.
My Question
How can I unconditionally return keys for any of my possible named capture groups? Empty String '' would be great if it group ain't found.
Input of "DOG" resulting in [animal] => 'DOG', [color] => '' is what I'm looking for.
I was hoping for a flag on preg_match to do this, but couldn't find anything.
Update: I just want to avoid doing isset($matches[OPTIONAL_GROUP])
Thanks!
Allow duplicate named captures (with J or with (?J) for PHP < 7.3) and initialize groups you need at the start of the pattern:
~ (?<animal>) (?<color>)
^ (?<animal>DOG|CAT)? (?<color>RED|BLUE)? $
~xiJ
demo
Advantage: you can choose the order of groups in the result array.
Or without to change your pattern:
$arr = ['animal' => '', 'color' => ''];
$result = array_intersect_key($matches, $arr) + $arr;
demo
Notice: array_intersect_key($matches + $arr, $arr) produces exactly the same result.
You can use andre at koethur dot de's solution:
$pattern = "/^(?<animal>DOG|CAT)?(?<color>BLUE|RED)?$/i";
$str = "DOG";
if (preg_match($pattern, $str, $matches)) {
$matches = array_merge(array('animal' => '', 'color' => ''), $matches);
$matches = array_filter($matches, 'is_string', ARRAY_FILTER_USE_KEY);
print_r($matches);
}
See the PHP demo.
Output:
Array
(
[animal] => DOG
[color] =>
)
The idea is that you need to "assign a name to all subpatterns you are interested in, and merge $matches afterwards with an constant array containing some reasonable default values".
I need to extract a predefined set of hashtags from a blob of text, then extract what number follows right after it if any. Eg. I'd need to extract 30 from "Test string with #other30 hashtag". I assumed preg_match_all would be the right choice.
Some test code:
$hashtag = '#other';
$string = 'Test string with #other30 hashtag';
$matches = [];
preg_match_all('/' . $hashtag . '\d*/', $string, $matches);
print_r($matches);
Output:
Array
(
[0] => Array
(
[0] => #other30
)
)
Perfect... Works as expected. Now to extract the number:
$string = $matches[0][0]; // #other30
$matches = [];
preg_match_all('/\d*/', $string, $matches);
print_r($matches);
Output:
Array
(
[0] => Array
(
[0] =>
[1] =>
[2] =>
[3] =>
[4] =>
[5] =>
[6] => 30
[7] =>
)
)
What? Looks like it's trying to match every character?
I'm aware of some preg_match_all related answers (one, two), but they all use a parenthesized subpattern. According to documentation - it is optional.
What am I missing? How do I simply get all matches into an array that match such a basic regex like /\d*/ There doesn't seem to be a more appropriate function in php for that.
I never thought I'd be scratching my head with such a basic thing in PHP. Much appreciated.
You need to replace:
preg_match_all('/\d*/', $string, $matches);
with:
preg_match_all('/\d+/', $string, $matches);
Replace * with +
Because
* Match zero or more times.
+ Match one or more times.
You can use a capturing group:
preg_match_all('/' . $hashtag . '(\d*)/', $string, $matches);
echo $matches[1][0] . "\n";
//=> 30
Here (\d*) will capture the number after $hashtag.
Also see, that you can reset after a certain point to get part of a match by using \K. And of course need to use \d+ instead of \d* to match one or more digits. Else there would be matches in gaps in between the characters where zero or more digits matches.
So your code can be reduced to
$hashtag = '#other';
$string = 'Test string with #other30 #other31 hashtag';
preg_match_all('/' . $hashtag . '\K\d+/', $string, $matches);
print_r($matches[0]);
See the demo at eval.in and consider using preg_quote for $hashtag.
PHP Fiddle
<?php
$hashtag = '#other';
$string = 'Test string with #other30 hashtag';
$matches = [];
preg_match_all('/' . $hashtag . '\d*/', $string, $matches);
$string = preg_match_all('#\d+#', $matches[0][0], $m);
echo $m[0][0];
?>
I'm trying to split a string by one or more spaces, but the below isn't working..instead it is returning the whole string as a singular array.
$str = 'I am a test';
$parts = preg_split('/\s+/', $str, PREG_SPLIT_NO_EMPTY);
print_r($parts);
Here is what it returns:
Array
(
[0] => I am a test
)
flags is the 4th parameter to preg_split, not the 3rd.
Remove PREG_SPLIT_NO_EMPTY flag: $parts = preg_split('/\s+/', $str);
I have a string such as:
"0123456789"
And I need to split each character into an array.
I, for the hell of it, tried:
explode('', '123545789');
But it gave me the obvious: Warning: No delimiter defined in explode) ..
How would I come across this? I can't see any method off hand, especially just a function.
$array = str_split("0123456789bcdfghjkmnpqrstvwxyz");
str_split takes an optional 2nd param, the chunk length (default 1), so you can do things like:
$array = str_split("aabbccdd", 2);
// $array[0] = aa
// $array[1] = bb
// $array[2] = cc etc ...
You can also get at parts of your string by treating it as an array:
$string = "hello";
echo $string[1];
// outputs "e"
You can access characters in a string just like an array:
$s = 'abcd';
echo $s[0];
prints 'a'
Try this:
$str = '123456789';
$char_array = preg_split('//', $str, -1, PREG_SPLIT_NO_EMPTY);
str_split can do the trick. Note that strings in PHP can be accessed just like a character array. In most cases, you won't need to split your string into a "new" array.
Here is an example that works with multibyte (UTF-8) strings.
$str = 'äbcd';
// PHP 5.4.8 allows null as the third argument of mb_strpos() function
do {
$arr[] = mb_substr( $str, 0, 1, 'utf-8' );
} while ( $str = mb_substr( $str, 1, mb_strlen( $str ), 'utf-8' ) );
It can be also done with preg_split() (preg_split( '//u', $str, null, PREG_SPLIT_NO_EMPTY )), but unlike the above example, that runs almost as fast regardless of the size of the string, preg_split() is fast with small strings, but a lot slower with large ones.
Try this:
$str = '546788';
$char_array = preg_split('//', $str, -1, PREG_SPLIT_NO_EMPTY);
Try this:
$str = "Hello Friend";
$arr1 = str_split($str);
$arr2 = str_split($str, 3);
print_r($arr1);
print_r($arr2);
The above example will output:
Array
(
[0] => H
[1] => e
[2] => l
[3] => l
[4] => o
[5] =>
[6] => F
[7] => r
[8] => i
[9] => e
[10] => n
[11] => d
)
Array
(
[0] => Hel
[1] => lo
[2] => Fri
[3] => end
)
If you want to split the string, it's best to use:
$array = str_split($string);
When you have a delimiter, which separates the string, you can try,
explode('', $string);
Where you can pass the delimiter in the first variable inside the explode such as:
explode(',', $string);
$array = str_split("$string");
will actually work pretty fine, but if you want to preserve the special characters in that string, and you want to do some manipulation with them, then I would use
do {
$array[] = mb_substr($string, 0, 1, 'utf-8');
} while ($string = mb_substr($string, 1, mb_strlen($string), 'utf-8'));
because for some of mine personal uses, it has been shown to be more reliable when there is an issue with special characters.
UPDATE: I'm making progress, but this is hard!
The test text will be valid[REGEX_EMAIL|REGEX_PASSWORD|REGEX_TEST].
(The real life text is required|valid[REGEX_EMAIL]|confirmed[emailconfirmation]|correct[not in|emailconfirmation|email confirmation].)
([^|]+) saves REGEX_EMAIL, REGEX_PASSWORD and REGEX_TEST in an array.
^[^[]+\[ matches valid[
\] matches ]
^[^[]+\[ + ([^|]+) + \] doesn't save REGEX_EMAIL, REGEX_PASSWORD and REGEX_TEST in an array.
How to solve?
Why is it important to try to everything with a single regular expression? It becomes much easier if you extract the two parts first and then split the strings on | using explode:
$s = 'valid[REGEX_EMAIL|REGEX_PASSWORD|REGEX_TEST]';
$matches = array();
$s = preg_match('/^([^[]++)\[([^]]++)\]$/', $s, $matches);
$left = explode('|', $matches[1]);
$right = explode('|', $matches[2]);
print_r($left);
print_r($right);
Output:
Array
(
[0] => valid
)
Array
(
[0] => REGEX_EMAIL
[1] => REGEX_PASSWORD
[2] => REGEX_TEST
)