PHP preg_split, split by same characters - php

I'm trying to split a string with preg_split. Here's an example of the string:
111235622411
I want the output to be like this:
$arr[0] = "111";
$arr[1] = "2";
$arr[2] = "3";
$arr[3] = "5";
$arr[4] = "6";
$arr[5] = "22";
$arr[6] = "4";
$arr[7] = "11";
So if there's the same characters one after the other, I want them in the same "chunk". I just can't come up with the regular expression I should use. I'm sorry if some of the terms are wrong, because it has been some time since I coded PHP before.

I would use preg_match_all():
$string = '111235622411';
preg_match_all('/(.)\1*/', $string, $matches);
var_dump($matches[0]);
\1 references the previously captured group (.) (any single character). This feature is called back referencing. The regex repeats the previously matched character - greedy * meaning it matches as much equal characters as possible, what was desired in the question.
Output:
array(8) {
[0]=>
string(3) "111"
[1]=>
string(1) "2"
[2]=>
string(1) "3"
[3]=>
string(1) "5"
[4]=>
string(1) "6"
[5]=>
string(2) "22"
[6]=>
string(1) "4"
[7]=>
string(2) "11"
}

You can use this regex:
(.)(?=\1)\1+|\d
And instead of splitting it, take the matches.
$matches = null;
$returnValue = preg_match_all('/(.)(?=\\1)\\1+|\\d/', '111235622411', $matches);
And the $matches[0] will contain what you want. As #hek2mgl has suggested, you can also use the simpler /(\d)\1*/
DEMO

Following, a simple solution that consists in executing a preg_match_all:
The regex in this case is:
(\d)\1*
Signification of the regex:
(\d): 1st Capturing group. \d match a digit [0-9].
\1 matches the same text as most recently matched by the 1st capturing group.
*: Quantifier between zero and unlimited times.
The php code would be:
$re = "/(\\d)\\1*/";
$str = "111235622411";
preg_match_all($re, $str, $matches);
print_r($matches[0]);
You can access for example the first matching group which is "111" like this: $matches[0][0], the second which is "2" like this $matches[0][1], and so on. Check here Demo to see a working example.
Hope it's useful!

Related

What is the patern to search for any string which respect this format "CEC0000-0000"?

The zeros can be incremented but it must be of four digits, so it could be CEC0152-2005
Of course with a "-" between them.
I used www.txt2re.com to generate this patern but it didn't help me.
Maybe,
^[A-Z]{3}[0-9]{4}-[0-9]{4}$
or,
^CEC[0-9]{4}-[0-9]{4}$
might work fine.
Test
$re = '/^[A-Z]{3}[0-9]{4}-[0-9]{4}$/m';
$str = 'CEC0152-2005
CEC0152-2019
CEC0152-1999
CEC0152-19991';
preg_match_all($re, $str, $matches, PREG_SET_ORDER, 0);
var_dump($matches);
Output
array(3) {
[0]=>
array(1) {
[0]=>
string(12) "CEC0152-2005"
}
[1]=>
array(1) {
[0]=>
string(12) "CEC0152-2019"
}
[2]=>
array(1) {
[0]=>
string(12) "CEC0152-1999"
}
}
If you wish to simplify/modify/explore the expression, it's been explained on the top right panel of regex101.com. If you'd like, you can also watch in this link, how it would match against some sample inputs.
RegEx Circuit
jex.im visualizes regular expressions:
If after the dash we'd have a four-digit year,
^[A-Z]{3}[0-9]{4}-[12][0-9]{3}$
^CEC[0-9]{4}-[12][0-9]{3}$
might also work fine, I guess.
Demo 2

Strange result of using asterisk * quantifier

I am trying to practice asterisk * quantifier on a simple string, but while i have only two letters, the result contains a third match.
<?php
$x = 'ab';
preg_match_all("/a*/",$x,$m);
echo '<pre>';
var_dump($m);
echo '</pre>';
?>
the result came out:
array(1) {
[0]=>
array(3) {
[0]=> string(1) "a"
[1]=> string(0) ""
[2]=> string(0) ""
}
}
As i understand it first matched a then nothing matched when b, so the result should be
array(1) {
[0]=>
array(2) {
[0]=> string(1) "a"
[1]=> string(0) ""
}
}
So what is the third match?
From using a regex demo tool here, we can see that the first match is a, while the second and third matches are the zero width delimiters in between a and b, and also in between b and the end of the string.
Keep in mind that the behavior of preg_match_all is to repeatedly take the pattern a* and try to apply it sequentially to the entire input string.
I suspect that what you really want to use here is a+. If you examine this second demo, you will see that with a+ we only get a single match, for the single a letter in ab. So, I vote for using a+ here to resolve your problem.
Your regular expression '/a/*' Matches zero(empty) or more consecutive a characters.
Example : if you try to match '/a*/' to an empty string it will return one match because * refer to nothing or more . see here
the preg_match_all continues to look until finishning processing the entire string. Once match is found, it remainds of the string to try and apply another match.

Php preg_match issue not working

I am trying to find a php preg_match that can match:
"2-20 to 2-25"
from this text:
user levels 2-20 to 2-25 not ready
I tried
preg_match("/([0-9]+) to ([0-9]+)/", $vars[1] , $matchesto);
but the result is:
"20 to 2"
Any help appreciated.
Your pattern is almost correct; just include the dashes and adjust the capture group:
([-0-9]+ to [-0-9]+)
Example:
https://regex101.com/r/eD6lQ2/1
Thats because [0-9]+ matches one or more numbers but won't match a hyphen (-).
Try this:
$pattern = '~([0-9]+-[0-9]+) to ([0-9]+-[0-9]+)~Ui';
preg_match($pattern, $vars[1] , $matchesto);
You can use "\d" to match the digits:
<?php
$str = 'user levels 2-20 to 2-25 not ready';
$matches = array();
preg_match('/(\d+-\d+) to (\d+-\d+)/', $str, $matches);
var_dump($matches);
Output:
array(3) {
[0]=>
string(12) "2-20 to 2-25"
[1]=>
string(4) "2-20"
[2]=>
string(4) "2-25"
}

PHP and RegEx: how to split a string including comma,space,colon to some substring

I'm trying to split a string that can either be comma, space or semi-colon delimitted. It could also contain a space or spaces after each delimitter. For example
chr1:22222-333333 or
chr1 22222 333333 or
chr1 22222 333333 or
chr1:22,222-33,333
Any one of these would produce an array with three values ["chr1","22222","33333"], I have tried some method, but it not all complete. especially the fourth case.
Thank you very much for help me.
$yourString = "chr1:22222-33333"; // for instance
$output = preg_split("/:| |;/", $yourString);
This acts as an equivalent of explode() but when you want multiple delimiters.
Explanation of the characters in the preg_split statement:
/ acts to enclose the regular expression, as to say ok, that's happening here
| acts as a OR statement, as if to tell this OR this OR that
So that in the end, /:| |;/ means select anything that is ":" or " " or ";"
If you want to practice or simply understand better the principles of RegEx, you can have a look to this nice collection of RegEx tutorials
you can use str_replace with explode
$str = array('chr1:22222-333333', 'chr1 22222 333333', 'chr1 22222 333333', 'chr1:22,222-33,333');
foreach($str as $val){
var_dump(explode(" ", str_replace(array(',',':','-'), array('',' ', ' '), $val)));
}
which pretty much removes all , then replaces : AND - with a space then explodes with spaces as a delimiter.
Demo
which produces
array(3) {
[0]=>
string(4) "chr1"
[1]=>
string(5) "22222"
[2]=>
string(6) "333333"
}
array(3) {
[0]=>
string(4) "chr1"
[1]=>
string(5) "22222"
[2]=>
string(6) "333333"
}
array(3) {
[0]=>
string(4) "chr1"
[1]=>
string(5) "22222"
[2]=>
string(6) "333333"
}
array(3) {
[0]=>
string(4) "chr1"
[1]=>
string(5) "22222"
[2]=>
string(5) "33333"
}
If you value conciseness and want to keep things neat, preg_split is the best way to go, in my opinion.
In the following examples, I assume you want your input separated by commas, spaces or colons:
$splitted = preg_split("/[,: ]/", $string);
If you want to treat tabs as whitespaces, you can replace the single space character with \s, which will match tabs as well:
$splitted = preg_split("/[,:\s]/", $string);
Note: The \s will match newlines too, if your input may eventually be a multline string.
Yet, if you don't trust your input (You don't, right?) and think that perhaps subsequent spaces and/or tabs should be ignored and treated as single spaces, you can go with this version:
$splitted = preg_split("/,|:|\s/", $string);
All the forms above work great provided the input you presented. If you want to play with these a little, this is a nice place to do so.

PHP Split String after specific occurances

I have the following string I'm trying to split into different variables based on specfic occurneces
Brodel8DARK HORSE COMICS
I'd like my end result to be
$user = Brodel
$index = 8
$publisher = DARK HORSE COMICS
I've tried playing around with some reg expressions but I'm a novice
This conditions will always be true
The user name will change (different number of Characters etc..)
The index will always be an integer but can grow to 3+ digits
The Publisher will always be in all caps
Thanks for any help
As long as the publisher doesn't start with a number, then this regex should work
/^([A-Za-z]+)(\d+)([A-Z\s]+)$/
It's 0+ number of characters followed by 0+ digits and finally 0+ capital letters.
<?php
$string = 'Brodel8DARK HORSE COMICS';
if(preg_match('/^([A-Za-z]+)(\d+)([A-Z\s]+)$/', $string, $matches) === 1){
var_dump($matches);
}
This outputs:
array(4) {
[0]=>
string(24) "Brodel8DARK HORSE COMICS"
[1]=>
string(6) "Brodel"
[2]=>
string(1) "8"
[3]=>
string(17) "DARK HORSE COMICS"
}
try this:
<?php
$string = 'Brodel8DARK HORSE COMICS';
preg_match("/^([^\d]+)(\d+)([A-Z\s]+)$/", $string, $match);
//print_r($match);
echo $publisher = $match[3];//DARK HORSE COMICS
?>

Categories