I have an array of string values which sometimes form repeating value patterns ('a', 'b', 'c', 'd')
$array = array(
'a', 'b', 'c', 'd',
'a', 'b', 'c', 'd',
'c', 'd',
);
I would like to find duplicate patterns based on the array order and group them by that same order (to maintain it).
$patterns = array(
array('number' => 2, 'values' => array('a', 'b', 'c', 'd')),
array('number' => 1, 'values' => array('c'))
array('number' => 1, 'values' => array('d'))
);
Notice that [a,b], [b,c], & [c,d] are not patterns by themselves because they are inside the larger [a,b,c,d] pattern and the last [c,d] set only appears once so it's not a pattern either - just the individual values 'c' and 'd'
Another example:
$array = array(
'x', 'x', 'y', 'x', 'b', 'x', 'b', 'a'
//[.......] [.] [[......] [......]] [.]
);
which produces
$patterns = array(
array('number' => 2, 'values' => array('x')),
array('number' => 1, 'values' => array('y')),
array('number' => 2, 'values' => array('x', 'b')),
array('number' => 1, 'values' => array('a'))
);
How can I do this?
Character arrays are just strings. Regex is the king of string pattern matching. Add recursion and the solution is pretty elegant, even with the conversion back and forth from character arrays:
function findPattern($str){
$results = array();
if(is_array($str)){
$str = implode($str);
}
if(strlen($str) == 0){ //reached the end
return $results;
}
if(preg_match_all('/^(.+)\1+(.*?)$/',$str,$matches)){ //pattern found
$results[] = array('number' => (strlen($str) - strlen($matches[2][0])) / strlen($matches[1][0]), 'values' => str_split($matches[1][0]));
return array_merge($results,findPattern($matches[2][0]));
}
//no pattern found
$results[] = array('number' => 1, 'values' => array(substr($str, 0, 1)));
return array_merge($results,findPattern(substr($str, 1)));
}
You can test here : https://eval.in/507818 and https://eval.in/507815
If c and d can be grouped, this is my code:
<?php
$array = array(
'a', 'b', 'c', 'd',
'a', 'b', 'c', 'd',
'c', 'd',
);
$res = array();
foreach ($array AS $value) {
if (!isset($res[$value])) {
$res[$value] = 0;
}
$res[$value]++;
}
foreach ($res AS $key => $value) {
$fArray[$value][] = $key;
for ($i = $value - 1; $i > 0; $i--) {
$fArray[$i][] = $key;
}
}
$res = array();
foreach($fArray AS $key => $value) {
if (!isset($res[serialize($value)])) {
$res[serialize($value)] = 0;
}
$res[serialize($value)]++;
}
$fArray = array();
foreach($res AS $key => $value) {
$fArray[] = array('number' => $value, 'values' => unserialize($key));
}
echo '<pre>';
var_dump($fArray);
echo '</pre>';
Final result is:
array (size=2)
0 =>
array (size=2)
'number' => int 2
'values' =>
array (size=4)
0 => string 'a' (length=1)
1 => string 'b' (length=1)
2 => string 'c' (length=1)
3 => string 'd' (length=1)
1 =>
array (size=2)
'number' => int 1
'values' =>
array (size=2)
0 => string 'c' (length=1)
1 => string 'd' (length=1)
The following code will return the expected result, finding the longest portions with repeated values:
function pepito($array) {
$sz=count($array);
$patterns=Array();
for ($pos=0;$pos<$sz;$pos+=$len) {
$nb=1;
for ($len=floor($sz/2);$len>0;$len--) {
while (array_slice($array, $pos, $len)==array_slice($array, $pos+$len, $len)) {
$pos+=$len;
$nb++;
}
if ($nb>1) break;
}
if (!$len) $len=1;
$patterns[]=Array('number'=>$nb, 'values'=>array_slice($array, $pos, $len));
}
return $patterns;
}
This will match with your examples:
{['a', 'b', 'c', 'd'], ['a', 'b', 'c', 'd']}, ['c', 'd']
or {['x'], ['x']}, ['y'], {['x', 'b'], ['x', 'b']}, ['a']
The difficult part is more about examples like:
{['one', 'one', 'two'], ['one', 'one', 'two']}
Or the most difficult choice to make:
one, two, one, two, one, two, one, two
Because we can group this in both the form:
[one, two], [one, two], [one, two], [one, two]
[one, two, one, two], [one, two, one, two]
where there is no "evident" choice. My above algorithm will always consider the longest match, as this is the most easy implementation to consider any combination.
EDIT: You should also consider cases where the longest match is after a shorter one:
Example:
'one', 'two', 'one', 'two', 'three', 'four', 'one', 'two', 'three', 'four'
If you start from left to right, you might want to group as :
{['one', 'two'], ['one', 'two'],} 'three', 'four', 'one', 'two', 'three', 'four'
when you could group like:
'one', 'two', {['one', 'two', 'three', 'four'], ['one', 'two', 'three', 'four']}
This situation has to be resolved with recursive calls to get the better solution but this will result in longer execution time:
function pepito($array) {
if (($sz=count($array))<1) return Array();
$pos=0;
$nb=1;
for ($len=floor($sz/2);$len>0;$len--) {
while (array_slice($array, $pos, $len)==array_slice($array, $pos+$len, $len)) {
$pos+=$len;
$nb++;
}
if ($nb>1) break;
}
if (!$len) $len=1;
$rec1=pepito(array_slice($array, $pos+$len));
$rec2=pepito(array_slice($array, 1));
if (count($rec1)<count($rec2)+1) {
return array_merge(Array(Array('number'=>$nb, 'values'=>array_slice($array, $pos, $len))), $rec1);
}
return array_merge(Array(Array('number'=>1, 'values'=>array_slice($array, 0, 1))), $rec2);
}
Definitions:
Pattern base: The sequence of elements that repeat within a pattern. (ie. For [a,b,a,b,c], [a,b] is the pattern base and [a,b,a,b] is the pattern.
We want to start searching for the longest pattern base, followed by the next longest, and so forth. It's important to understand that if we find a pattern, we don't need to check within it for the start of another pattern with a base of the same length.
Here's the proof.
Assume that A is a pattern base, and that we've encountered the pattern AA. Assume that B is a another pattern base, of the same length, that forms a pattern starting within A. Let Y be the overlapping elements. If A=XY, then AA=XYXY. Since B is the same length, it must be the case that B=YX because in order to complete B we must use the remaining elements in A. Moreover, since B forms a pattern we must have BB, which is YXYX. Since A starts before B, we have XYXYX=AAX=XBB. If B repeats again, we would have XBBB=XYXYXYX=AAAX. Therefore, B cannot repeat an additional time without A repeating an additional time. Thus, we do not need to check for longer patterns within those generated by A.
The longest pattern possible consists of half the elements in the whole list because the simplest pattern can occur exactly twice. Thus, we can start checking for patterns of half of the length and work our way down to patterns of size 2.
Assuming that we search the array from left to right, if a pattern is found, we only need to search on either side of it for additional patterns. To the left, there are no patterns with bases of the same length, or they would have been discovered beforehand. Thus, we search the left side for patterns using the next smallest base size. The elements to the right of the pattern haven't been searched so we continue searching for patterns using a base of the same size.
The function to do this is as follows:
function get_patterns($arr, $len = null) {
// The smallest pattern base length for which a pattern can be found
$minlen = 2;
// No pattern base length was specified
if ($len === null) {
// Use the longest pattern base length possible
$maxlen = floor(count($arr) / 2);
return get_patterns($arr, $maxlen);
// Base length is too small to find any patterns
} else if ($len < $minlen) {
// Compile elements into lists consisting of one element
$results = array();
$num = 1;
$elem = $arr[0];
for ($i=1; $i < count($arr); $i++) {
if ($elem === $arr[$i]) {
$num++;
} else {
array_push($results, array(
'number' => $num,
'values' => array( $elem )
));
$num = 1;
$elem = $arr[$i];
}
}
array_push($results, array(
'number' => $num,
'values' => array( $elem )
));
return $results;
}
// Cycle through elements until there aren't enough elements to fit
// another repition.
for ($i=0; $i < count($arr) - $len * 2 + 1; $i++) {
// Number of times pattern base occurred
$num_read = 1; // One means there is no pattern yet
// Current pattern base we are attempting to match against
$base = array_slice($arr, $i, $len);
// Check for matches using segments of the same length for the elements
// following the current pattern base
for ($j = $i + $len; $j < count($arr) - $len + 1; $j += $len) {
// Elements being compared to pattern base
$potential_match = array_slice($arr, $j, $len);
// Match found
if (has_same_elements($base, $potential_match)) {
$num_read++;
// NO match found
} else {
// Do not check again using currently selected elements
break;
}
}
// Patterns were encountered
if ($num_read > 1) {
// The total number of elements that make up the pattern
$pattern_len = $num_read * $len;
// The elements before the pattern
$before = array_slice($arr, 0, $i);
// The elements after the pattern
$after = array_slice(
$arr, $i + $pattern_len, count($arr) - $pattern_len - $i
);
$results = array_merge(
// Patterns of a SMALLER length may exist beforehand
count($before) > 0 ? get_patterns($before, $len-1) : array(),
// Patterns that were found
array(
array(
'number' => $num_read,
'values' => $base
)
),
// Patterns of the SAME length may exist afterward
count($after) > 0 ? get_patterns($after, $len) : array()
);
return $results;
}
}
// No matches were encountered
// Search for SMALLER patterns
return get_patterns($arr, $len-1);
}
The function has_same_elements, which was used to check if arrays with primitive keys are identical, is as follows:
// Returns true if two arrays have the same elements.
//
// Precondition: Elements must be primitive data types (ie. int, string, etc)
function has_same_elements($a1, $a2) {
// There are a different number of elements
if (count($a1) != count($a2)) {
return false;
}
for ($i=0; $i < count($a1); $i++) {
if ($a1[$i] !== $a2[$i]) {
return false;
}
}
return true;
}
In order to speed up the code, there are a few things that you could do. Instead of slicing the array, you can supply the function with indexes to the start and end position that are to be examined, along with the array. Also, using strings may be slow so you can create an array that maps strings to numbers and vice versa. Then you can convert the array of strings into an array of numbers and use it instead. After you get the result, you can convert the arrays of numbers back into strings.
I tested the function using the following code:
$tests = array(
'a,b,c,d',
'a',
'a,a,a,a',
'a,a,a,a,a',
'a,a,a,a,a,a',
'b,a,a,a,a,c',
'b,b,a,a,a,a,c,c',
'b,b,a,a,d,a,a,c,c',
'a,b,c,d,a,b,c,d,c,d',
'x,x,y,x,b,x,b,a'
);
echo '<pre>';
foreach ($tests as $test) {
echo '<div>';
$arr = explode(',',$test);
echo "$test<br /><br />";
pretty_print(get_patterns($arr));
echo '</div><br />';
}
echo '</pre>';
The function that I used to print the output, pretty_print is as follows:
function pretty_print($results) {
foreach ($results as $result) {
$a = "array('" . implode("','", $result['values']) . "')";
echo "array('number' => ${result['number']}, 'values' => $a)<br />";
}
}
The output from the test code is as follows:
a,b,c,d
array('number' => 1, 'values' => array('a'))
array('number' => 1, 'values' => array('b'))
array('number' => 1, 'values' => array('c'))
array('number' => 1, 'values' => array('d'))
a
array('number' => 1, 'values' => array('a'))
a,a,a,a
array('number' => 2, 'values' => array('a','a'))
a,a,a,a,a
array('number' => 2, 'values' => array('a','a'))
array('number' => 1, 'values' => array('a'))
a,a,a,a,a,a
array('number' => 2, 'values' => array('a','a','a'))
b,a,a,a,a,c
array('number' => 1, 'values' => array('b'))
array('number' => 2, 'values' => array('a','a'))
array('number' => 1, 'values' => array('c'))
b,b,a,a,a,a,c,c
array('number' => 2, 'values' => array('b'))
array('number' => 2, 'values' => array('a','a'))
array('number' => 2, 'values' => array('c'))
b,b,a,a,d,a,a,c,c
array('number' => 2, 'values' => array('b'))
array('number' => 2, 'values' => array('a'))
array('number' => 1, 'values' => array('d'))
array('number' => 2, 'values' => array('a'))
array('number' => 2, 'values' => array('c'))
a,b,c,d,a,b,c,d,c,d
array('number' => 2, 'values' => array('a','b','c','d'))
array('number' => 1, 'values' => array('c'))
array('number' => 1, 'values' => array('d'))
x,x,y,x,b,x,b,a
array('number' => 2, 'values' => array('x'))
array('number' => 1, 'values' => array('y'))
array('number' => 2, 'values' => array('x','b'))
array('number' => 1, 'values' => array('a'))
OK, here is my take, the code below splits the whole original array into longest adjacent non-overlapping chunks.
So in the situation like this
'a', 'b', 'a', 'b', 'a', 'b', 'a', 'b', 'c', 'd'
[ ] [ ] [ ] [ ] <-- use 2 long groups
[ ] [ ] [ ] [ ] [ ] [ ] <-- and not 4 short
it will prefer 2 long groups to 4 shorter groups.
Update: also tested with examples from another answer, works for these cases too:
one, two, one, two, one, two, one, two
[one two one two], [one two one two]
'one' 'two' 'one' 'two' 'three' 'four' 'one' 'two' 'three' 'four'
['one'] ['two'] ['one' 'two' 'three' 'four'] ['one' 'two' 'three' 'four']
Here is the code and tests:
<?php
/*
* Splits an $array into chunks of $chunk_size.
* Returns number of repeats, start index and chunk which has
* max number of ajacent repeats.
*/
function getRepeatCount($array, $chunk_size) {
$parts = array_chunk($array, $chunk_size);
$maxRepeats = 1;
$maxIdx = 0;
$repeats = 1;
$len = count($parts);
for ($i = 0; $i < $len-1; $i++) {
if ($parts[$i] === $parts[$i+1]) {
$repeats += 1;
if ($repeats > $maxRepeats) {
$maxRepeats = $repeats;
$maxIdx = $i - ($repeats-2);
}
} else {
$repeats = 1;
}
}
return array($maxRepeats, $maxIdx*$chunk_size, $parts[$maxIdx]);
}
/*
* Finds longest pattern in the $array.
* Returns number of repeats, start index and pattern itself.
*/
function findLongestPattern($array) {
$len = count($array);
for ($window = floor($len/2); $window >= 1; $window--) {
$num_chunks = ceil($len/$window);
for ($i = 0; $i < $num_chunks; $i++) {
list($repeats, $idx, $pattern) = getRepeatCount(
array_slice($array, $i), $window
);
if ($repeats > 1) {
return array($repeats, $idx+$i, $pattern);
}
}
}
return array(1, 0, [$array[0]]);
}
/*
* Splits $array into longest adjacent non-overlapping parts.
*/
function splitToPatterns($array) {
if (count($array) < 1) {
return $array;
}
list($repeats, $start, $pattern) = findLongestPattern($array);
$end = $start + count($pattern) * $repeats;
return array_merge(
splitToPatterns(array_slice($array, 0, $start)),
array(
array('number'=>$repeats, 'values' => $pattern)
),
splitToPatterns(array_slice($array, $end))
);
}
Tests:
function isEquals($expected, $actual) {
$exp_str = json_encode($expected);
$act_str = json_encode($actual);
$equals = $exp_str === $act_str;
if (!$equals) {
echo 'Equals check failed'.PHP_EOL;
echo 'expected: '.$exp_str.PHP_EOL;
echo 'actual : '.$act_str.PHP_EOL;
}
return $equals;
}
assert(isEquals(
array(1, 0, ['a']), getRepeatCount(['a','b','c'], 1)
));
assert(isEquals(
array(1, 0, ['a']), getRepeatCount(['a','b','a','b','c'], 1)
));
assert(isEquals(
array(2, 0, ['a','b']), getRepeatCount(['a','b','a','b','c'], 2)
));
assert(isEquals(
array(1, 0, ['a','b','a']), getRepeatCount(['a','b','a','b','c'], 3)
));
assert(isEquals(
array(3, 0, ['a','b']), getRepeatCount(['a','b','a','b','a','b','a'], 2)
));
assert(isEquals(
array(2, 2, ['a','c']), getRepeatCount(['x','c','a','c','a','c'], 2)
));
assert(isEquals(
array(1, 0, ['x','c','a']), getRepeatCount(['x','c','a','c','a','c'], 3)
));
assert(isEquals(
array(2, 0, ['a','b','c','d']),
getRepeatCount(['a','b','c','d','a','b','c','d','c','d'],4)
));
assert(isEquals(
array(2, 2, ['a','c']), findLongestPattern(['x','c','a','c','a','c'])
));
assert(isEquals(
array(1, 0, ['a']), findLongestPattern(['a','b','c'])
));
assert(isEquals(
array(2, 2, ['c','a']),
findLongestPattern(['a','b','c','a','c','a'])
));
assert(isEquals(
array(2, 0, ['a','b','c','d']),
findLongestPattern(['a','b','c','d','a','b','c','d','c','d'])
));
// Find longest adjacent non-overlapping patterns
assert(isEquals(
array(
array('number'=>1, 'values'=>array('a')),
array('number'=>1, 'values'=>array('b')),
array('number'=>1, 'values'=>array('c')),
),
splitToPatterns(['a','b','c'])
));
assert(isEquals(
array(
array('number'=>1, 'values'=>array('a')),
array('number'=>1, 'values'=>array('b')),
array('number'=>2, 'values'=>array('c','a')),
),
splitToPatterns(['a','b','c','a','c','a'])
));
assert(isEquals(
array(
array('number'=>2, 'values'=>array('a','b','c','d')),
array('number'=>1, 'values'=>array('c')),
array('number'=>1, 'values'=>array('d')),
),
splitToPatterns(['a','b','c','d','a','b','c','d','c','d'])
));
/* 'a', 'b', 'a', 'b', 'a', 'b', 'a', 'b', 'c', 'd', */
/* [ ] [ ] [ ] [ ] */
/* NOT [ ] [ ] [ ] [ ] [ ] [ ] */
assert(isEquals(
array(
array('number'=>2, 'values'=>array('a','b','a','b')),
array('number'=>1, 'values'=>array('c')),
array('number'=>1, 'values'=>array('d')),
),
splitToPatterns(['a','b','a','b','a','b','a','b','c','d'])
));
/* 'x', 'x', 'y', 'x', 'b', 'x', 'b', 'a' */
/* // [ ] [ ] [ ] [ ] [ ] [ ] */
assert(isEquals(
array(
array('number'=>2, 'values'=>array('x')),
array('number'=>1, 'values'=>array('y')),
array('number'=>2, 'values'=>array('x','b')),
array('number'=>1, 'values'=>array('a')),
),
splitToPatterns(['x','x','y','x','b','x','b','a'])
));
// one, two, one, two, one, two, one, two
// [ ] [ ]
assert(isEquals(
array(
array('number'=>2, 'values'=>array('one', 'two', 'one', 'two')),
),
splitToPatterns(['one', 'two', 'one', 'two', 'one', 'two', 'one', 'two'])
));
// 'one', 'two', 'one', 'two', 'three', 'four', 'one', 'two', 'three', 'four'
// [ ] [ ] [ ] [ ]
assert(isEquals(
array(
array('number'=>1, 'values'=>array('one')),
array('number'=>1, 'values'=>array('two')),
array('number'=>2, 'values'=>array('one','two','three','four')),
),
splitToPatterns(['one', 'two', 'one', 'two', 'three', 'four', 'one', 'two', 'three','four'])
));
/* 'a', 'a', 'b', 'a', 'b', 'a', 'b', 'a', 'b', 'c', */
/* [ ] [ ] [ ] [ ] */
assert(isEquals(
array(
array('number'=>1, 'values'=>array('a')),
array('number'=>2, 'values'=>array('a','b','a','b')),
array('number'=>1, 'values'=>array('c')),
),
splitToPatterns(['a','a','b','a','b','a','b','a','b','c'])
));
/* 'a', 'b', 'a', 'b', 'c', 'd', 'a', 'b', 'a', 'b', 'a', 'b' */
// [ ] [ ] [ ] [ ] [ ] [ ] [ ]
assert(isEquals(
array(
array('number'=>2, 'values'=>array('a', 'b')),
array('number'=>1, 'values'=>array('c')),
array('number'=>1, 'values'=>array('d')),
array('number'=>3, 'values'=>array('a','b')),
),
splitToPatterns(['a', 'b', 'a', 'b', 'c', 'd', 'a', 'b', 'a', 'b', 'a', 'b'])
));
/* 'a', 'c', 'd', 'a', 'b', 'a', 'b', 'a', 'b', 'a', 'b', 'c', */
/* [ ] [ ] [ ] [ ] [ ] [ ] */
assert(isEquals(
array(
array('number'=>1, 'values'=>array('a')),
array('number'=>2, 'values'=>array('a','b','a','b')),
array('number'=>1, 'values'=>array('c')),
),
splitToPatterns(['a','a','b','a','b','a','b','a','b','c'])
));
I started with this now but at the end my brain burn and I don't know where to start to compare the arrays... Enjoy!
$array = array(
'x', 'x', 'y', 'x', 'b', 'x', 'b', 'a'
//[.......] [.] [[......] [......]] [.]
);
$arrayCount = count($array);
$res = array();
for($i = 0; $i < $arrayCount; $i++) {
for($j = 1; $j < $arrayCount; $j++) {
$res[$i][] = array_slice($array, $i, $j);
}
}
//echo '<pre>';
//var_dump($res);
//echo '</pre>';
//
//die;
$resCount = count($res);
$oneResCount = count($res[0]);
First make a function which will find the possible group matches in the array for a given group array, starting from a specific index in the array and will return the number of matches found.
function findGroupMatch($group, $array, $startFrom) {
$match = 0;
while($group == array_slice($array, $startFrom, count($group))) {
$match++;
$startFrom += count($group);
}
return $match;
}
Now, we need to iterate through each item to find possible groups and then send it to findGroupMatch() function to check if any match for the group exists in the next items. The trick to find a possible group is finding an item which matches any of the previous items. If so, we find a possible group taking all the previous items starting from the matched item. Otherwise we just increase the list of unmatched items and at the end we enter all unmatched items as single item groups. (In the given example, we have a, b, c, d, a.... When we find 2nd a in the array, it matches the previous a, So, we consider a, b, c, d a possible group and send it to function findGroupMatch(), to check how many more groups we can find in the next items.)
$array = array(
'a', 'b', 'c', 'd',
'a', 'b', 'c', 'd',
'c', 'd',
);
$unMatchedItems = array();
$totalCount = count($array);
$groupsArray = array();
for($i=0; $i < $totalCount; $i++) {
$item = $array[$i];
if(in_array($item, $unMatchedItems)) {
$matched_keys = array_keys($unMatchedItems, $item);
foreach($matched_keys as $key) {
$possibleGroup = array_slice($unMatchedItems, $key);
$matches = findGroupMatch($possibleGroup, $array, $i);
if ($matches) {
//Insert the items before group as single item group
if ($key > 0) {
for ($j = 0; $j < $key; $j++) {
$groupsArray[] = array('number' => 1, 'values' => array($unMatchedItems[$j]));
}
}
//Insert the group array
$groupsArray[] = array('number' => $matches + 1, 'values' => $possibleGroup); //number includes initial group also so $matches + 1
$i += (count($possibleGroup) * $matches) - 1; //skip the matched items from next iteration
//Empty the unmatched array to start with a new group search
$unMatchedItems = array();
break;
}
}
//If there was no matches, add the item to the unMatched group
if(!$matches) $unMatchedItems[] = $item;
} else {
$unMatchedItems[] = $item;
}
}
//Insert the remaining items as single item group
for($k=0; $k<count($unMatchedItems); $k++) {
$groupsArray[] = array('number' => 1, 'values' => array($unMatchedItems[$k]));
}
print_r($groupsArray);
The Result will be like this: (Check this PHP Fiddle for testing and also https://eval.in/507333 for your another input test.)
Array
(
[0] => Array
(
[number] => 2
[values] => Array
(
[0] => a
[1] => b
[2] => c
[3] => d
)
)
[1] => Array
(
[number] => 1
[values] => Array
(
[0] => c
)
)
[2] => Array
(
[number] => 1
[values] => Array
(
[0] => d
)
)
)
The first example is super easy with recursion. The second example... not so easy.
The example below works for only the first example by assuming no pattern should ever contain two of the same element. This will also handle all individual element patterns at the end of the original array and keep the pattern order (of the first pattern occurrence).
function find_pattern($input, &$result) {
$values = []; // currently processed elements
$pattern = ''; // the current element pattern
$dupe_found = false; // did we find a duplicate element?
// search the values for the first that matches a previous value
while ($next = array_shift($input)) {
// check if the element was already found
if (in_array($next, $values)) {
// re-add the value back into the input, since the next call needs it
array_unshift($input, $next);
// add the resulting pattern
add_pattern($pattern, $values, $result);
// find the next pattern with a recursive call
find_pattern($input, $result);
// a duplicate element was found!
$dupe_found = true;
// the rest of the values are handled by recursion, break the while loop
break;
} else {
// not already found, so store the element and keep going
$values[] = $next;
// use the element to produce a key for the result set
$pattern .= $next;
}
}
// if no duplicate was found, then each value should be an individual pattern
if (!$dupe_found) {
foreach ($values as $value) {
add_pattern($value, [$value], $result);
}
}
}
function add_pattern($pattern, $values, &$result) {
// increment the pattern count
$result[$pattern]['number'] = isset($result[$pattern]['number']) ?
result[$pattern]['number']+1 : 1;
// add the current pattern to the result, if not already done
if (!isset($result[$pattern]['values'])) {
$result[$pattern]['values'] = $values;
}
}
And an example usage:
$input = [
'a', 'b', 'c', 'd',
'a', 'b', 'c', 'd',
'c', 'd'
];
$result = [];
find_pattern($input, $result);
echo "<pre>";
print_r($result);
echo "</pre>";
The example output:
Array
(
[abcd] => Array
(
[number] => 2
[values] => Array
(
[0] => a
[1] => b
[2] => c
[3] => d
)
)
[c] => Array
(
[number] => 1
[values] => Array
(
[0] => c
)
)
[d] => Array
(
[number] => 1
[values] => Array
(
[0] => d
)
)
)
You could do something like this:
<?php
$array = array(
'a', 'b', 'c', 'd',
'a', 'b', 'c', 'd',
'c', 'd'
);
// Call this function to get your patterns
function patternMatching(array $array) {
$patterns = array();
$belongsToPattern = array_fill(0, count($array), false);
// Find biggest patterns first
for ($size = (int) (count($array) / 2); $size > 0; $size--) {
// for each pattern: start at every possible point in the array
for($start=0; $start <= count($array) - $size; $start++) {
$p = findPattern($array, $start, $size);
if($p != null) {
/* Before we can save the pattern we need to check, if we've found a
* pattern that does not collide with patterns we've already found */
$hasConflict = false;
foreach($p["positions"] as $idx => $pos) {
$PatternConflicts = array_slice($belongsToPattern, $pos, $p["size"]);
$hasConflict = $hasConflict || in_array(true, $PatternConflicts);
}
if(!$hasConflict) {
/* Since we have found a pattern, we don't want to find more
* patterns for these positions */
foreach($p["positions"] as $idx => $pos) {
$replace = array_fill($pos, $p["size"], true);
$belongsToPattern = array_replace($belongsToPattern, $replace);
}
$patterns[] = $p;
// or only return number and values:
// $patterns[] = [ "number" => $p["number"], "values" => $p["values"]];
}
}
}
}
return $patterns;
}
function findPattern(array $haystack, $patternStart, $patternSize ) {
$size = count($haystack);
$patternCandidate = array_slice($haystack, $patternStart, $patternSize);
$patternCount = 1;
$patternPositions = [$patternStart];
for($i = $patternStart + $patternSize; $i <= $size - $patternSize; $i++) {
$patternCheck = array_slice($haystack, $i, $patternSize);
$diff = array_diff($patternCandidate, $patternCheck);
if(empty($diff)) {
$patternCount++;
$patternPositions[] = $i;
}
}
if($patternCount > 1 || $patternSize <= 1) {
return [
"number" => $patternCount,
"values" => $patternCandidate,
// Additional information needed for filtering, sorting, etc.
"positions" => $patternPositions,
"size" => $patternSize
];
} else {
return null;
}
}
$patterns = patternMatching($array);
print "<pre>";
print_r($patterns);
print "</pre>";
?>
The code might be far from being optimal in speed but it should do what you want to do for any sequence of strings in an array. patternMatching() returns the patterns ordered descending in size of the pattern and ascending by first occurence (You can use ['positions'][0] as a sorting criteria to achieve a different order).
This should do it:
<?php
$array = array(
'x', 'y', 'x', 'y', 'a',
'ab', 'c', 'd',
'a', 'b', 'c', 'd',
'c', 'd', 'x', 'y', 'b',
'x', 'y', 'b', 'c', 'd'
);
// convert the array to a string
$string = '';
foreach ($array as $a) {
$l = strlen($a)-1;
$string .= ($l) ? str_replace('::',':',$a[0] . ':' . substr($a,1,$l-1) . ':' . $a[$l]) . '-' : $a . '-';
}
// find patterns
preg_match_all('/(?=((.+)(?:.*?\2)+))/s', $string, $matches, PREG_SET_ORDER);
foreach ($matches as $m) {
$temp = str_replace('--','-',$m[2].'-');
$patterns[] = ($temp[0]==='-') ? substr($temp,1) : $temp;
}
// remove empty values and duplicates
$patterns = array_keys(array_flip(array_filter($patterns)));
// sort patterns
foreach ($patterns as $p) {
$sorted[$p] = strlen($p);
}
arsort($sorted);
// find double or more occurences
$stringClone = $string;
foreach ($sorted as $s=>$n) {
$nA = substr_count($stringClone,':'.$s);
$nZ = substr_count($stringClone,$s.':');
$number = substr_count($stringClone,$s);
$sub = explode('-',substr($stringClone,strpos($stringClone,$s),$n-1));
$values = $sub;
if($nA>0 || $nZ>0){
$numberAdjusted = $number - $nA - $nZ;
if($numberAdjusted > 1) {
$temp = '';
while($n--){
$temp .= '#';
}
$position = strpos(str_replace(':'.$s,':'.$temp,str_replace($s.':',$temp.':',$string)),$s);
$stringClone = str_replace(':'.$s,':'.$temp,$stringClone);
$stringClone = str_replace($s.':',$temp.':',$stringClone);
$result['p'.sprintf('%09d', $position)] = array('number'=>$numberAdjusted,'values'=>$values);
$stringClone = str_replace($s,'',$stringClone);
$stringClone = str_replace($temp,$s,$stringClone);
}
} else if($number>1){
$position = strpos($string,$s);
$result['p'.sprintf('%09d', $position)] = array('number'=>$number,'values'=>$values);
$stringClone = str_replace($s,'',$stringClone);
}
}
// add the remaining items
$remaining = array_flip(explode('-',substr($stringClone,0,-1)));
foreach ($remaining as $r=>$n) {
$position = strpos($string,$r);
$result['p'.sprintf('%09d', $position)] = array('number'=>1,'values'=>str_replace(':','',$r));
}
// sort results
ksort($result);
$result = array_values($result);
print_r($result);
?>
Working example here.