Convert string to array at different character occurence - php

Consider I have this string 'aaaabbbaaaaaabbbb' I want to convert this to array so that I get the following result
$array = [
'aaaa',
'bbb',
'aaaaaa',
'bbbb'
]
How to go about this in PHP?

PHP code demo
Regex: (.)\1{1,}
(.): Match and capture single character.
\1: This will contain first match
\1{1,}: Using matched character one or more times.
<?php
ini_set("display_errors", 1);
$string="aaaabbbaaaaaabbbb";
preg_match_all('/(.)\1{1,}/', $string,$matches);
print_r($matches);
Output:
Array
(
[0] => Array
(
[0] => aaaa
[1] => bbb
[2] => aaaaaa
[3] => bbbb
)
[1] => Array
(
[0] => a
[1] => b
[2] => a
[3] => b
)
)
Or:
PHP code demo
<?php
$string="aaaabbbaaaaaabbbb";
$array=str_split($string);
$start=0;
$end= strlen($string);
$indexValue=$array[0];
$result=array();
$resultantArray=array();
while($start!=$end)
{
if($indexValue==$array[$start])
{
$result[]=$array[$start];
}
else
{
$resultantArray[]=implode("", $result);
$result=array();
$result[]=$indexValue=$array[$start];
}
$start++;
}
$resultantArray[]=implode("", $result);
print_r($resultantArray);
Output:
Array
(
[0] => aaaa
[1] => bbb
[2] => aaaaaa
[3] => bbbb
)

I have written a one-liner using only preg_split() that generates the expected result with no wasted memory (no array bloat):
Code (Demo):
$string = 'aaaabbbaaaaaabbbb';
var_export(preg_split('/(.)\1*\K/', $string, 0, PREG_SPLIT_NO_EMPTY));
Output:
array (
0 => 'aaaa',
1 => 'bbb',
2 => 'aaaaaa',
3 => 'bbbb',
)
Pattern:
(.) #match any single character
\1* #match the same character zero or more times
\K #keep what is matched so far out of the overall regex match
The real magic happens with the \K, for more reading go here.
The 0 parameter in preg_split() means "unlimited matches". This is the default behavior, but it needs to hold its place in the function so that the next parameter is used appropriately as a flag
The final parameter is PREG_SPLIT_NO_EMPTY which removes any empty matches.
Sahil's preg_match_all() method preg_match_all('/(.)\1{1,}/', $string,$matches); is a good attempt but it is not perfect for two reasons:
The first issue is that his use of preg_match_all() returns two subarrays which is double the necessary result.
The second issue is revealed when $string="abbbaaaaaabbbb";. His method will ignore the first lone character. Here is its output:
Array (
[0] => Array
(
[0] => bbb
[1] => aaaaaa
[2] => bbbb
)
[1] => Array
(
[0] => b
[1] => a
[2] => b
)
)
Sahil's second attempt produces the correct output, but requires much more code. A more concise non-regex solution could look like this:
$array = str_split($string);
$last = "";
foreach ($array as $v) {
if (!$last || strpos($last, $v) !== false) {
$last .= $v;
} else {
$result[] = $last;
$last = $v;
}
}
$result[] = $last;
var_export($result);

Related

PHP regex - count number of exclamation marks before and after a word

I need help to refine a regex, in PHP, intended to count the number of exclamation marks that appear before and after a word. Words, in this situation, can include any character except a space (even exclamation marks), as follows (I am showing the expected "before, after" counts):
!!!!Hi!! => 4, 2
!!!!Hi => 4, 0
!Hi!!! => 1, 3
!easdf.kjaf!! => 1, 2
!hjdfa!sdfk!jaf!! => 1, 2
!,!!!!!fdgsdfg!!sdgj => 1, 0
!!!,!ksfgfdg!jkft!!! => 3, 3
How to code the regex so that, for the before, it stops looking for consecutive exclamation marks when some non-exclamation mark is reached, and start counting for the after when there are only exclamation marks remaining?
The tricky part, is when punctuation characters appear within the word. These should be ignored, these are considered as part of the word.
Here is where I am at:
preg_match_all('/(!*)\b(\S+)\b(!*)/', $w, $m);
$w is the word (as shown above), $m is matching array
As an example, "!!Hi!" would result in $m equal to
Array
(
[0] => Array
(
[0] => !!Hi!
)
[1] => Array
(
[0] => !!
)
[2] => Array
(
[0] => Hi
)
[3] => Array
(
[0] => !
)
)
That is correct and what I am looking for. However, things get thrown off when a punctuation character starts or ends the word , the regex anchor "\b" does not recognize that as part of the word (as it is defined in this exercise). Here is an example of a failure to parse the word "!!!!!!!!xd.sfgdx!!!,!!"
Array
(
[0] => Array
(
[0] => !!!!!!!!xd.sfgdx!!!
)
[1] => Array
(
[0] => !!!!!!!!
)
[2] => Array
(
[0] => xd.sfgdx
)
[3] => Array
(
[0] => !!!
)
)
Help, please.
You just need anchors (^ for beginning and $ for end) and basically anything in the middle. With anchors, a middle ! won't match if it is not on either ends. This might be a first attempt;
/^(!*).*(!*)$/
The problem with the anything in the middle here (.*) is that it is greedy, and will take precedence over the final group (!*). The anything in the middle would match all to the end and the group just nothing. Simple to fix though, just make the middle un-greedy:
/^(!*).*?(!*)$/
Now it will match any ! on the beginning, as much as possible, then anything in the middle step by step until the next condition matches (! at the end).
Here is a quick non-regex solution, just because:
$test = ['!!!!Hi!!',
'!!!!Hi',
'!Hi!!!',
'!easdf.kjaf!!',
'!hjdfa!sdfk!jaf!!',
'!,!!!!!fdgsdfg!!sdgj',
'!!!,!ksfgfdg!jkft!!!'];
foreach($test as $str) {
$count = $rcount = 0;
for ($i = 0; $i < strlen($str); $i++) {
if ($str[$i] == '!') {
$count += 1;
continue;
}
break;
}
for ($i = strlen($str) - 1; $i > 0; $i--) {
if ($str[$i] == '!') {
$rcount += 1;
continue;
}
break;
}
echo $str . ': ' . $count . ', ' . $rcount . '<br />';
}
Output:
!!!!Hi!!: 4, 2
!!!!Hi: 4, 0
!Hi!!!: 1, 3
!easdf.kjaf!!: 1, 2
!hjdfa!sdfk!jaf!!: 1, 2
!,!!!!!fdgsdfg!!sdgj: 1, 0
!!!,!ksfgfdg!jkft!!!: 3, 3
Use this regexp:
preg_match_all('/^(!*)[^!]{1}.*[^!]{1}(!*)/', $w, $m);
For you examples outputs are:
Array
(
[0] => Array
(
[0] => !!!!Hi!!
)
[1] => Array
(
[0] => !!!!
)
[2] => Array
(
[0] => !!
)
)
Array
(
[0] => Array
(
[0] => !!!,!ksfgfdg!jkft!!,!
)
[1] => Array
(
[0] => !!!
)
[2] => Array
(
[0] => !
)
)

Given a string create a multidimensional array with named keys usign a regex

I'm using PHP. Given, for example, the following string:
$str = "a2c4-8|a6c2,c3-5,c6[2],c8[4]-10,c14-21[5]|a30"
and exploding it by | I get the strings:
a2c4-8
a6c2,c3-5,c6[2],c8[4]-10,c14-21[5]
a30
Now I would like to separate the digits that follow the a from all the other characters, remove the letters a and c (keep dashes, commas and square brackets) and place the results in a multidimensional array as follows:
Array
(
[0] => Array
(
[a] => 2
[c] => 4-8
)
[1] => Array
(
[a] => 6
[c] => 2,3-5,6[2],8[4]-10,14-21[5]
)
[2] => Array
(
[a] => 30
[c] =>
)
)
a is always followed by digit and after this digit there may be or may not be a c followed by other comma separated strings.
Notice that in the resulting array the letters a and c have been removed. All other characters have been kept. I tried to modify this answer by Casimir et Hippolyte but without success.
A plus would be avoid to add to the resulting array empty array keys (as the last [c] above).
Consider the following solution using preg_match_all function with named submasks((?P<a>)...) and PREG_SET_ORDER flag, array_map, array_filter, array_column(available since PHP 5.5) and trim functions:
$str = "a2c4-8|a6c2,c3-5,c6[2],c8[4]-10,c14-21[5]|a30";
$parts = explode("|", $str);
$result = array_map(function ($v) {
preg_match_all("/(?P<a>a\d+)?(?P<c>c[0-9-\[\]]+)?/", $v, $matches, PREG_SET_ORDER);
$arr = [];
$a_numbers = array_filter(array_column($matches, "a"));
$c_numbers = array_filter(array_column($matches, "c"));
if (!empty($a_numbers)) {
$arr['a'] = array_map(function($v){ return trim($v, 'a'); }, $a_numbers)[0];
}
if (!empty($c_numbers)) {
$arr['c'] = implode(",", array_map(function($v){ return trim($v, 'c'); }, $c_numbers));
}
return $arr;
}, $parts);
print_r($result);
The output:
Array
(
[0] => Array
(
[a] => 2
[c] => 4-8
)
[1] => Array
(
[a] => 6
[c] => 2,3-5,6[2],8[4]-10,14-21[5]
)
[2] => Array
(
[a] => 30
)
)
P.S. "empty array keys" are also omitted

Regexp PHP Split number and string

I would like to split a string contains some numbers and letters. Like this:
ABCd Abhe123
123ABCd Abhe
ABCd Abhe 123
123 ABCd Abhe
I tried this:
<?php preg_split('#(?<=\d)(?=[a-z])#i', "ABCd Abhe 123"); ?>
But it doesn't work. Only one cell in array with "ABCd Abhe 123"
I would like for example, in cell 0: numbers and in cell1: string:
[0] => "123",
[1] => "ABCd Abhe"
Thank you for your help! ;)
Use preg_match_all instead
preg_match_all("/(\d+)*\s?([A-Za-z]+)*/", "ABCd Abhe 123" $match);
For every match:
$match[i][0] contains the matched segment
$match[i][1] contains numbers
$match[i][2] contains letters
(See here for regex test)
Then put them in an array
for($i = 0; $i < count($match); $i++)
{
if($match[i][1] != "")
$numbers[] = $match[1];
if($match[i][2] != "")
$letters[] = $match[2];
}
EDIT1
I've updated the regex. It now looks for either numbers or letters, with or without a whitespace.
EDIT2
The regex is correct, but the arrayhandling wasn't. Use preg_match_all, then $match is an array containing arrays, like:
Array
(
[0] => Array
(
[0] => Abc
[1] => aaa
[2] => 25
)
[1] => Array
(
[0] =>
[1] =>
[2] => 25
)
[2] => Array
(
[0] => Abc
[1] => aaa
[2] =>
)
)
Maybe something like this?
$numbers = preg_replace('/[^\d]/', '', $input);
$letters = preg_replace('/\d/', '', $input);

Split a single string into an array using specific Regex rules

I'm processing a single string which contains many pairs of data. Each pair is separated by a ; sign. Each pair contains a number and a string, separated by an = sign.
I thought it would be easy to process, but i've found that the string half of the pair can contain the = and ; sign, making simple splitting unreliable.
Here is an example of a problematic string:
123=one; two;45=three=four;6=five;
For this to be processed correctly I need to split it up into an array that looks like this:
'123', 'one; two'
'45', 'three=four'
'6', 'five'
I'm at a bit of dead end so any help is appreciated.
UPDATE:
Thanks to everyone for the help, this is where I am so far:
$input = '123=east; 456=west';
// split matches into array
preg_match_all('~(\d+)=(.*?);(?=\s*(?:\d|$))~', $input, $matches);
$newArray = array();
// extract the relevant data
for ($i = 0; $i < count($matches[2]); $i++) {
$type = $matches[2][$i];
$price = $matches[1][$i];
// add each key-value pair to the new array
$newArray[$i] = array(
'type' => "$type",
'price' => "$price"
);
}
Which outputs
Array
(
[0] => Array
(
[type] => east
[price] => 123
)
)
The second item is missing as it doesn't have a semicolon on the end, i'm not sure how to fix that.
I've now realised that the numeric part of the pair sometimes contains a decimal point, and that the last string pair does not have a semicolon after it. Any hints would be appreciated as i'm not having much luck.
Here is the updated string taking into account the things I missed in my initial question (sorry):
12.30=one; two;45=three=four;600.00=five
You need a look-ahead assertion for this; the look-ahead matches if a ; is followed by a digit or the end of your string:
$s = '12.30=one; two;45=three=four;600.00=five';
preg_match_all('/(\d+(?:.\d+)?)=(.+?)(?=(;\d|$))/', $s, $matches);
print_r(array_combine($matches[1], $matches[2]));
Output:
Array
(
[12.30] => one; two
[45] => three=four
[600.00] => five
)
I think this is the regex you want:
\s*(\d+)\s*=(.*?);(?=\s*(?:\d|$))
The trick is to consider only the semicolon that's followed by a digit as the end of a match. That's what the lookahead at the end is for.
You can see a detailed visualization on www.debuggex.com.
You can use following preg_match_all code to capture that:
$str = '123=one; two;45=three=four;6=five;';
if (preg_match_all('~(\d+)=(.+?);(?=\d|$)~', $str, $arr))
print_r($arr);
Live Demo: http://ideone.com/MG3BaO
$str = '123=one; two;45=three=four;6=five;';
preg_match_all('/(\d+)=([a-zA-z ;=]+)/', $str,$matches);
echo '<pre>';
print_r($matches);
echo '</pre>';
o/p:
Array
(
[0] => Array
(
[0] => 123=one; two;
[1] => 45=three=four;
[2] => 6=five;
)
[1] => Array
(
[0] => 123
[1] => 45
[2] => 6
)
[2] => Array
(
[0] => one; two;
[1] => three=four;
[2] => five;
)
)
then y can combine
echo '<pre>';
print_r(array_combine($matches[1],$matches[2]));
echo '</pre>';
o/p:
Array
(
[123] => one; two;
[45] => three=four;
[6] => five;
)
Try this but this code is written in c#, you can change it into php
string[] res = Regex.Split("123=one; two;45=three=four;6=five;", #";(?=\d)");
--SJ

Regex for spliting on all unescaped semi-colons

I'm using php's preg_split to split up a string based on semi-colons, but I need it to only split on non-escaped semi-colons.
<?
$str = "abc;def\\;abc;def";
$arr = preg_split("/;/", $str);
print_r($arr);
?>
Produces:
Array
(
[0] => abc
[1] => def\
[2] => abc
[3] => def
)
When I want it to produce:
Array
(
[0] => abc
[1] => def\;abc
[2] => def
)
I've tried "/(^\\)?;/" or "/[^\\]?;/" but they both produce errors. Any ideas?
This works.
<?
$str = "abc;def\;abc;def";
$arr = preg_split('/(?<!\\\);/', $str);
print_r($arr);
?>
It outputs:
Array
(
[0] => abc
[1] => def\;abc
[2] => def
)
You need to make use of a negative lookbehind (read about lookarounds). Think of "match all ';' unless preceed by a '\'".
I am not really proficient with PHP regexes, but try this one:
/(?<!\\);/
Since Bart asks: Of course you can also use regex to split on unescaped ; and take escaped escape characters into account. It just gets a bit messy:
<?
$str = "abc;def\;abc\\\\;def";
preg_match_all('/((?:[^\\\\;]|\\\.)*)(?:;|$)/', $str, $arr);
print_r($arr);
?>
Array
(
[0] => Array
(
[0] => abc;
[1] => def\;abc\\;
[2] => def
)
[1] => Array
(
[0] => abc
[1] => def\;abc\\
[2] => def
)
)
What this does is to take a regular expression for “(any character except \ and ;) or (\ followed by any character)” and allow any number of those, followed by a ; or the end of the string.
I'm not sure how php handles $ and end-of-line characters within a string, you may need to set some regex options to get exactly what you want for those.

Categories