finding common prefix of array of strings

finding common prefix of array of strings - php

I have an array like this:
$sports = array(
'Softball - Counties',
'Softball - Eastern',
'Softball - North Harbour',
'Softball - South',
'Softball - Western'
);
I would like to find the longest common prefix of the string. In this instance, it would be 'Softball - '
I am thinking that I would follow this process
$i = 1;
// loop to the length of the first string
while ($i < strlen($sports[0]) {
// grab the left most part up to i in length
$match = substr($sports[0], 0, $i);
// loop through all the values in array, and compare if they match
foreach ($sports as $sport) {
if ($match != substr($sport, 0, $i) {
// didn't match, return the part that did match
return substr($sport, 0, $i-1);
}
} // foreach
// increase string length
$i++;
} // while
// if you got to here, then all of them must be identical
Questions
Is there a built in function or much simpler way of doing this ?
For my 5 line array that is probably fine, but if I were to do several thousand line arrays, there would be a lot of overhead, so I would have to be move calculated with my starting values of $i, eg $i = halfway of string, if it fails, then $i/2 until it works, then increment $i by 1 until we succeed. So that we are doing the least number of comparisons to get a result.
Is there a formula/algorithm out already out there for this kind of problem?

If you can sort your array, then there is a simple and very fast solution.
Simply compare the first item to the last one.
If the strings are sorted, any prefix common to all strings will be common to the sorted first and last strings.
sort($sport);
$s1 = $sport[0]; // First string
$s2 = $sport[count($sport)-1]; // Last string
$len = min(strlen($s1), strlen($s2));
// While we still have string to compare,
// if the indexed character is the same in both strings,
// increment the index.
for ($i=0; $i<$len && $s1[$i]==$s2[$i]; $i++);
$prefix = substr($s1, 0, $i);

I would use this:
$prefix = array_shift($array); // take the first item as initial prefix
$length = strlen($prefix);
// compare the current prefix with the prefix of the same length of the other items
foreach ($array as $item) {
// check if there is a match; if not, decrease the prefix by one character at a time
while ($length && substr($item, 0, $length) !== $prefix) {
$length--;
$prefix = substr($prefix, 0, -1);
}
if (!$length) {
break;
}
}
Update  
Here’s another solution, iteratively comparing each n-th character of the strings until a mismatch is found:
$pl = 0; // common prefix length
$n = count($array);
$l = strlen($array[0]);
while ($pl < $l) {
$c = $array[0][$pl];
for ($i=1; $i<$n; $i++) {
if ($array[$i][$pl] !== $c) break 2;
}
$pl++;
}
$prefix = substr($array[0], 0, $pl);
This is even more efficient as there are only at most numberOfStrings‍·‍commonPrefixLength atomic comparisons.

I implemented #diogoriba algorithm into code, with this result:
Finding the common prefix of the first two strings, and then comparing this with all following strings starting from the 3rd, and trim the common string if nothing common is found, wins in situations where there is more in common in the prefixes than different.
But bumperbox's original algorithm (except the bugfixes) wins where the strings have less in common in their prefix than different. Details in the code comments!
Another idea I implemented:
First check for the shortest string in the array, and use this for comparison rather than simply the first string. In the code, this is implemented with the custom written function arrayStrLenMin().
Can bring down iterations dramatically, but the function arrayStrLenMin() may itself cause ( more or less) iterations.
Simply starting with the length of first string in array seems quite clumsy, but may turn out effective, if arrayStrLenMin() needs many iterations.
Get the maximum common prefix of strings in an array with as little iterations as possible (PHP)
Code + Extensive Testing + Remarks:
function arrayStrLenMin ($arr, $strictMode = false, $forLoop = false) {
$errArrZeroLength = -1; // Return value for error: Array is empty
$errOtherType = -2; // Return value for error: Found other type (than string in array)
$errStrNone = -3; // Return value for error: No strings found (in array)
$arrLength = count($arr);
if ($arrLength <= 0 ) { return $errArrZeroLength; }
$cur = 0;
foreach ($arr as $key => $val) {
if (is_string($val)) {
$min = strlen($val);
$strFirstFound = $key;
// echo("Key\tLength / Notification / Error\n");
// echo("$key\tFound first string member at key with length: $min!\n");
break;
}
else if ($strictMode) { return $errOtherType; } // At least 1 type other than string was found.
}
if (! isset($min)) { return $errStrNone; } // No string was found in array.
// SpeedRatio of foreach/for is approximately 2/1 as dicussed at:
// http://juliusbeckmann.de/blog/php-foreach-vs-while-vs-for-the-loop-battle.html
// If $strFirstFound is found within the first 1/SpeedRatio (=0.5) of the array, "foreach" is faster!
if (! $forLoop) {
foreach ($arr as $key => $val) {
if (is_string($val)) {
$cur = strlen($val);
// echo("$key\t$cur\n");
if ($cur == 0) { return $cur; } // 0 is the shortest possible string, so we can abort here.
if ($cur < $min) { $min = $cur; }
}
// else { echo("$key\tNo string!\n"); }
}
}
// If $strFirstFound is found after the first 1/SpeedRatio (=0.5) of the array, "for" is faster!
else {
for ($i = $strFirstFound + 1; $i < $arrLength; $i++) {
if (is_string($arr[$i])) {
$cur = strlen($arr[$i]);
// echo("$i\t$cur\n");
if ($cur == 0) { return $cur; } // 0 is the shortest possible string, so we can abort here.
if ($cur < $min) { $min = $cur; }
}
// else { echo("$i\tNo string!\n"); }
}
}
return $min;
}
function strCommonPrefixByStr($arr, $strFindShortestFirst = false) {
$arrLength = count($arr);
if ($arrLength < 2) { return false; }
// Determine loop length
/// Find shortest string in array: Can bring down iterations dramatically, but the function arrayStrLenMin() itself can cause ( more or less) iterations.
if ($strFindShortestFirst) { $end = arrayStrLenMin($arr, true); }
/// Simply start with length of first string in array: Seems quite clumsy, but may turn out effective, if arrayStrLenMin() needs many iterations.
else { $end = strlen($arr[0]); }
for ($i = 1; $i <= $end + 1; $i++) {
// Grab the part from 0 up to $i
$commonStrMax = substr($arr[0], 0, $i);
echo("Match: $i\t$commonStrMax\n");
// Loop through all the values in array, and compare if they match
foreach ($arr as $key => $str) {
echo(" Str: $key\t$str\n");
// Didn't match, return the part that did match
if ($commonStrMax != substr($str, 0, $i)) {
return substr($commonStrMax, 0, $i-1);
}
}
}
// Special case: No mismatch (hence no return) happened until loop end!
return $commonStrMax; // Thus entire first common string is the common prefix!
}
function strCommonPrefixByChar($arr, $strFindShortestFirst = false) {
$arrLength = count($arr);
if ($arrLength < 2) { return false; }
// Determine loop length
/// Find shortest string in array: Can bring down iterations dramatically, but the function arrayStrLenMin() itself can cause ( more or less) iterations.
if ($strFindShortestFirst) { $end = arrayStrLenMin($arr, true); }
/// Simply start with length of first string in array: Seems quite clumsy, but may turn out effective, if arrayStrLenMin() needs many iterations.
else { $end = strlen($arr[0]); }
for ($i = 0 ; $i <= $end + 1; $i++) {
// Grab char $i
$char = substr($arr[0], $i, 1);
echo("Match: $i\t"); echo(str_pad($char, $i+1, " ", STR_PAD_LEFT)); echo("\n");
// Loop through all the values in array, and compare if they match
foreach ($arr as $key => $str) {
echo(" Str: $key\t$str\n");
// Didn't match, return the part that did match
if ($char != $str[$i]) { // Same functionality as ($char != substr($str, $i, 1)). Same efficiency?
return substr($arr[0], 0, $i);
}
}
}
// Special case: No mismatch (hence no return) happened until loop end!
return substr($arr[0], 0, $end); // Thus entire first common string is the common prefix!
}
function strCommonPrefixByNeighbour($arr) {
$arrLength = count($arr);
if ($arrLength < 2) { return false; }
/// Get the common string prefix of the first 2 strings
$strCommonMax = strCommonPrefixByChar(array($arr[0], $arr[1]));
if ($strCommonMax === false) { return false; }
if ($strCommonMax == "") { return ""; }
$strCommonMaxLength = strlen($strCommonMax);
/// Now start looping from the 3rd string
echo("-----\n");
for ($i = 2; ($i < $arrLength) && ($strCommonMaxLength >= 1); $i++ ) {
echo(" STR: $i\t{$arr[$i]}\n");
/// Compare the maximum common string with the next neighbour
/*
//// Compare by char: Method unsuitable!
// Iterate from string end to string beginning
for ($ii = $strCommonMaxLength - 1; $ii >= 0; $ii--) {
echo("Match: $ii\t"); echo(str_pad($arr[$i][$ii], $ii+1, " ", STR_PAD_LEFT)); echo("\n");
// If you find the first mismatch from the end, break.
if ($arr[$i][$ii] != $strCommonMax[$ii]) {
$strCommonMaxLength = $ii - 1; break;
// BUT!!! We may falsely assume that the string from the first mismatch until the begining match! This new string neighbour string is completely "unexplored land", there might be differing chars closer to the beginning. This method is not suitable. Better use string comparison than char comparison.
}
}
*/
//// Compare by string
for ($ii = $strCommonMaxLength; $ii > 0; $ii--) {
echo("MATCH: $ii\t$strCommonMax\n");
if (substr($arr[$i],0,$ii) == $strCommonMax) {
break;
}
else {
$strCommonMax = substr($strCommonMax,0,$ii - 1);
$strCommonMaxLength--;
}
}
}
return substr($arr[0], 0, $strCommonMaxLength);
}
// Tests for finding the common prefix
/// Scenarios
$filesLeastInCommon = array (
"/Vol/1/aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa/a/1",
"/Vol/2/aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa/a/2",
"/Vol/1/aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa/b/1",
"/Vol/1/aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa/b/2",
"/Vol/2/aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa/b/c/1",
"/Vol/2/aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa/a/1",
);
$filesLessInCommon = array (
"/Vol/1/aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa/a/1",
"/Vol/1/aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa/a/2",
"/Vol/1/aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa/b/1",
"/Vol/1/aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa/b/2",
"/Vol/2/aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa/b/c/1",
"/Vol/2/aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa/a/1",
);
$filesMoreInCommon = array (
"/Voluuuuuuuuuuuuuumes/1/a/a/1",
"/Voluuuuuuuuuuuuuumes/1/a/a/2",
"/Voluuuuuuuuuuuuuumes/1/a/b/1",
"/Voluuuuuuuuuuuuuumes/1/a/b/2",
"/Voluuuuuuuuuuuuuumes/2/a/b/c/1",
"/Voluuuuuuuuuuuuuumes/2/a/a/1",
);
$sameDir = array (
"/Volumes/1/a/a/",
"/Volumes/1/a/a/aaaaa/2",
);
$sameFile = array (
"/Volumes/1/a/a/1",
"/Volumes/1/a/a/1",
);
$noCommonPrefix = array (
"/Volumes/1/a/a/",
"/Volumes/1/a/a/aaaaa/2",
"Net/1/a/a/aaaaa/2",
);
$longestLast = array (
"/Volumes/1/a/a/1",
"/Volumes/1/a/a/aaaaa/2",
);
$longestFirst = array (
"/Volumes/1/a/a/aaaaa/1",
"/Volumes/1/a/a/2",
);
$one = array ("/Volumes/1/a/a/aaaaa/1");
$empty = array ( );
// Test Results for finding the common prefix
/*
I tested my functions in many possible scenarios.
The results, the common prefixes, were always correct in all scenarios!
Just try a function call with your individual array!
Considering iteration efficiency, I also performed tests:
I put echo functions into the functions where iterations occur, and measured the number of CLI line output via:
php <script with strCommonPrefixByStr or strCommonPrefixByChar> | egrep "^ Str:" | wc -l GIVES TOTAL ITERATION SUM.
php <Script with strCommonPrefixByNeighbour> | egrep "^ Str:" | wc -l PLUS | egrep "^MATCH:" | wc -l GIVES TOTAL ITERATION SUM.
My hypothesis was proven:
strCommonPrefixByChar wins in situations where the strings have less in common in their beginning (=prefix).
strCommonPrefixByNeighbour wins where there is more in common in the prefixes.
*/
// Test Results Table
// Used Functions | Iteration amount | Remarks
// $result = (strCommonPrefixByStr($filesLessInCommon)); // 35
// $result = (strCommonPrefixByChar($filesLessInCommon)); // 35 // Same amount of iterations, but much fewer characters compared because ByChar instead of ByString!
// $result = (strCommonPrefixByNeighbour($filesLessInCommon)); // 88 + 42 = 130 // Loses in this category!
// $result = (strCommonPrefixByStr($filesMoreInCommon)); // 137
// $result = (strCommonPrefixByChar($filesMoreInCommon)); // 137 // Same amount of iterations, but much fewer characters compared because ByChar instead of ByString!
// $result = (strCommonPrefixByNeighbour($filesLeastInCommon)); // 12 + 4 = 16 // Far the winner in this category!
echo("Common prefix of all members:\n");
var_dump($result);
// Tests for finding the shortest string in array
/// Arrays
// $empty = array ();
// $noStrings = array (0,1,2,3.0001,4,false,true,77);
// $stringsOnly = array ("one","two","three","four");
// $mixed = array (0,1,2,3.0001,"four",false,true,"seven", 8888);
/// Scenarios
// I list them from fewest to most iterations, which is not necessarily equivalent to slowest to fastest!
// For speed consider the remarks in the code considering the Speed ratio of foreach/for!
//// Fewest iterations (immediate abort on "Found other type", use "for" loop)
// foreach( array($empty, $noStrings, $stringsOnly, $mixed) as $arr) {
// echo("NEW ANALYSIS:\n");
// echo("Result: " . arrayStrLenMin($arr, true, true) . "\n\n");
// }
/* Results:
NEW ANALYSIS:
Result: Array is empty!
NEW ANALYSIS:
Result: Found other type!
NEW ANALYSIS:
Key Length / Notification / Error
0 Found first string member at key with length: 3!
1 3
2 5
3 4
Result: 3
NEW ANALYSIS:
Result: Found other type!
*/
//// Fewer iterations (immediate abort on "Found other type", use "foreach" loop)
// foreach( array($empty, $noStrings, $stringsOnly, $mixed) as $arr) {
// echo("NEW ANALYSIS:\n");
// echo("Result: " . arrayStrLenMin($arr, true, false) . "\n\n");
// }
/* Results:
NEW ANALYSIS:
Result: Array is empty!
NEW ANALYSIS:
Result: Found other type!
NEW ANALYSIS:
Key Length / Notification / Error
0 Found first string member at key with length: 3!
0 3
1 3
2 5
3 4
Result: 3
NEW ANALYSIS:
Result: Found other type!
*/
//// More iterations (No immediate abort on "Found other type", use "for" loop)
// foreach( array($empty, $noStrings, $stringsOnly, $mixed) as $arr) {
// echo("NEW ANALYSIS:\n");
// echo("Result: " . arrayStrLenMin($arr, false, true) . "\n\n");
// }
/* Results:
NEW ANALYSIS:
Result: Array is empty!
NEW ANALYSIS:
Result: No strings found!
NEW ANALYSIS:
Key Length / Notification / Error
0 Found first string member at key with length: 3!
1 3
2 5
3 4
Result: 3
NEW ANALYSIS:
Key Length / Notification / Error
4 Found first string member at key with length: 4!
5 No string!
6 No string!
7 5
8 No string!
Result: 4
*/
//// Most iterations (No immediate abort on "Found other type", use "foreach" loop)
// foreach( array($empty, $noStrings, $stringsOnly, $mixed) as $arr) {
// echo("NEW ANALYSIS:\n");
// echo("Result: " . arrayStrLenMin($arr, false, false) . "\n\n");
// }
/* Results:
NEW ANALYSIS:
Result: Array is empty!
NEW ANALYSIS:
Result: No strings found!
NEW ANALYSIS:
Key Length / Notification / Error
0 Found first string member at key with length: 3!
0 3
1 3
2 5
3 4
Result: 3
NEW ANALYSIS:
Key Length / Notification / Error
4 Found first string member at key with length: 4!
0 No string!
1 No string!
2 No string!
3 No string!
4 4
5 No string!
6 No string!
7 5
8 No string!
Result: 4
*/

Probably there is some terribly well-regarded algorithm for this, but just off the top of my head, if you know your commonality is going to be on the left-hand side like in your example, you could do way better than your posted methodology by first finding the commonality of the first two strings, and then iterating down the rest of the list, trimming the common string as necessary to achieve commonality or terminating with failure if you trim all the way to nothing.

I think you're on the right way. But instead of incrementing i when all of the string passes, you could do this:
1) Compare the first 2 strings in the array and find out how many common characters they have. Save the common characters in a separate string called maxCommon, for example.
2) Compare the third string w/ maxCommon. If the number of common characters is smaller, trim maxCommon to the characters that match.
3) Repeat and rinse for the rest of the array. At the end of the process, maxCommon will have the string that is common to all of the array elements.
This will add some overhead because you'll need to compare each string w/ maxCommon, but will drastically reduce the number of iterations you'll need to get your results.

I assume that by "common part" you mean "longest common prefix". That is a much simpler to compute than any common substring.
This cannot be done without reading (n+1) * m characters in the worst case and n * m + 1 in the best case, where n is the length of the longest common prefix and m is the number of strings.
Comparing one letter at a time achieves that efficiency (Big Theta (n * m)).
Your proposed algorithm runs in Big Theta(n^2 * m), which is much, much slower for large inputs.
The third proposed algorithm of finding the longest prefix of the first two strings, then comparing that with the third, fourth, etc. also has a running time in Big Theta(n * m), but with a higher constant factor. It will probably only be slightly slower in practice.
Overall, I would recommend just rolling your own function, since the first algorithm is too slow and the two others will be about equally complicated to write anyway.
Check out WikiPedia for a description of Big Theta notation.

Here's an elegant, recursive implementation in JavaScript:
function prefix(strings) {
switch (strings.length) {
case 0:
return "";
case 1:
return strings[0];
case 2:
// compute the prefix between the two strings
var a = strings[0],
b = strings[1],
n = Math.min(a.length, b.length),
i = 0;
while (i < n && a.charAt(i) === b.charAt(i))
++i;
return a.substring(0, i);
default:
// return the common prefix of the first string,
// and the common prefix of the rest of the strings
return prefix([ strings[0], prefix(strings.slice(1)) ]);
}
}

not that I know of
yes: instead of comparing the substring from 0 to length i, you can simply check the ith character (you already know that characters 0 to i-1 match).

Short and sweet version, perhaps not the most efficient:
/// Return length of longest common prefix in an array of strings.
function _commonPrefix($array) {
if(count($array) < 2) {
if(count($array) == 0)
return false; // empty array: undefined prefix
else
return strlen($array[0]); // 1 element: trivial case
}
$len = max(array_map('strlen',$array)); // initial upper limit: max length of all strings.
$prevval = reset($array);
while(($newval = next($array)) !== FALSE) {
for($j = 0 ; $j < $len ; $j += 1)
if($newval[$j] != $prevval[$j])
$len = $j;
$prevval = $newval;
}
return $len;
}
// TEST CASE:
$arr = array('/var/yam/yamyam/','/var/yam/bloorg','/var/yar/sdoo');
print_r($arr);
$plen = _commonprefix($arr);
$pstr = substr($arr[0],0,$plen);
echo "Res: $plen\n";
echo "==> ".$pstr."\n";
echo "dir: ".dirname($pstr.'aaaa')."\n";
Output of the test case:
Array
(
[0] => /var/yam/yamyam/
[1] => /var/yam/bloorg
[2] => /var/yar/sdoo
)
Res: 7
==> /var/ya
dir: /var

#bumperbox
Your basic code needed some correction to work in ALL scenarios!
Your loop only compares until one character before the last character!
The mismatch can possibly occur 1 loop cycle after the latest common character.
Hence you have to at least check until 1 character after your first string's last character.
Hence your comparison operator must be "<= 1" or "< 2".
Currently your algorithm fails
if the first string is completely included in all other strings,
or completely included in all other strings except the last character.
In my next answer/post, I will attach iteration optimized code!
Original Bumperbox code PLUS correction (PHP):
function shortest($sports) {
$i = 1;
// loop to the length of the first string
while ($i < strlen($sports[0])) {
// grab the left most part up to i in length
// REMARK: Culturally biased towards LTR writing systems. Better say: Grab frombeginning...
$match = substr($sports[0], 0, $i);
// loop through all the values in array, and compare if they match
foreach ($sports as $sport) {
if ($match != substr($sport, 0, $i)) {
// didn't match, return the part that did match
return substr($sport, 0, $i-1);
}
}
$i++; // increase string length
}
}
function shortestCorrect($sports) {
$i = 1;
while ($i <= strlen($sports[0]) + 1) {
// Grab the string from its beginning with length $i
$match = substr($sports[0], 0, $i);
foreach ($sports as $sport) {
if ($match != substr($sport, 0, $i)) {
return substr($sport, 0, $i-1);
}
}
$i++;
}
// Special case: No mismatch happened until loop end! Thus entire str1 is common prefix!
return $sports[0];
}
$sports1 = array(
'Softball',
'Softball - Eastern',
'Softball - North Harbour');
$sports2 = array(
'Softball - Wester',
'Softball - Western',
);
$sports3 = array(
'Softball - Western',
'Softball - Western',
);
$sports4 = array(
'Softball - Westerner',
'Softball - Western',
);
echo("Output of the original function:\n"); // Failure scenarios
var_dump(shortest($sports1)); // NULL rather than the correct 'Softball'
var_dump(shortest($sports2)); // NULL rather than the correct 'Softball - Wester'
var_dump(shortest($sports3)); // NULL rather than the correct 'Softball - Western'
var_dump(shortest($sports4)); // Only works if the second string is at least one character longer!
echo("\nOutput of the corrected function:\n"); // All scenarios work
var_dump(shortestCorrect($sports1));
var_dump(shortestCorrect($sports2));
var_dump(shortestCorrect($sports3));
var_dump(shortestCorrect($sports4));

How about something like this? It can be further optimised by not having to check the lengths of the strings if we can use the null terminating character (but I am assuming python strings have length cached somewhere?)
def find_common_prefix_len(strings):
"""
Given a list of strings, finds the length common prefix in all of them.
So
apple
applet
application
would return 3
"""
prefix = 0
curr_index = -1
num_strings = len(strings)
string_lengths = [len(s) for s in strings]
while True:
curr_index += 1
ch_in_si = None
for si in xrange(0, num_strings):
if curr_index >= string_lengths[si]:
return prefix
else:
if si == 0:
ch_in_si = strings[0][curr_index]
elif strings[si][curr_index] != ch_in_si:
return prefix
prefix += 1

I would use a recursive algorithm like this:
1 - get the first string in the array
2 - call the recursive prefix method with the first string as a param
3 - if prefix is empty return no prefix
4 - loop through all the strings in the array
4.1 - if any of the strings does not start with the prefix
4.1.1 - call recursive prefix method with prefix - 1 as a param
4.2 return prefix

// Common prefix
$common = '';
$sports = array(
'Softball T - Counties',
'Softball T - Eastern',
'Softball T - North Harbour',
'Softball T - South',
'Softball T - Western'
);
// find mini string
$minLen = strlen($sports[0]);
foreach ($sports as $s){
if($minLen > strlen($s))
$minLen = strlen($s);
}
// flag to break out of inner loop
$flag = false;
// The possible common string length does not exceed the minimum string length.
// The following solution is O(n^2), this can be improve.
for ($i = 0 ; $i < $minLen; $i++){
$tmp = $sports[0][$i];
foreach ($sports as $s){
if($s[$i] != $tmp)
$flag = true;
}
if($flag)
break;
else
$common .= $sports[0][$i];
}
print $common;

The solutions here work only for finding commonalities at the beginning of strings. Here is a function that looks for the longest common substring anywhere in an array of strings.
http://www.christopherbloom.com/2011/02/24/find-the-longest-common-substring-using-php/

The top answer seemed a bit long, so here's a concise solution with a runtime of O(n2).
function findLongestPrefix($arr) {
return array_reduce($arr, function($prefix, $item) {
$length = min(strlen($prefix), strlen($item));
while (substr($prefix, 0, $length) !== substr($item, 0, $length)) {
$length--;
}
return substr($prefix, 0, $length);
}, $arr[0]);
}
print findLongestPrefix($sports); // Softball -

For what it's worth, here's another alternative I came up with.
I used this for finding the common prefix for a list of products codes (ie. where there are multiple product SKUs that have a common series of characters at the start):
/**
* Try to find a common prefix for a list of strings
*
* #param array $strings
* #return string
*/
function findCommonPrefix(array $strings)
{
$prefix = '';
$chars = array_map("str_split", $strings);
$matches = call_user_func_array("array_intersect_assoc", $chars);
if ($matches) {
$i = 0;
foreach ($matches as $key => $value) {
if ($key != $i) {
unset($matches[$key]);
}
$i++;
}
$prefix = join('', $matches);
}
return $prefix;
}

This is an addition to the #Gumbo answer. If you want to ensure that the chosen, common prefix does not break words, use this. I am just having it look for a blank space at the end of the chosen string. If that exists we know that there was more to all of the phrases, so we truncate it.
function product_name_intersection($array){
$pl = 0; // common prefix length
$n = count($array);
$l = strlen($array[0]);
$first = current($array);
while ($pl < $l) {
$c = $array[0][$pl];
for ($i=1; $i<$n; $i++) {
if (!isset($array[$i][$pl]) || $array[$i][$pl] !== $c) break 2;
}
$pl++;
}
$prefix = substr($array[0], 0, $pl);
if ($pl < strlen($first) && substr($prefix, -1, 1) != ' ') {
$prefix = preg_replace('/\W\w+\s*(\W*)$/', '$1', $prefix);
}
$prefix = preg_replace('/^\W*(.+?)\W*$/', '$1', $prefix);
return $prefix;
}

Sharing a Typescript solution for this question. I split it into 2 methods, just to keep it clean while at it.
function longestCommonPrefix(strs: string[]): string {
let output = '';
if(strs.length > 0) {
output = strs[0];
if(strs.length > 1) {
for(let i=1; i <strs.length; i++) {
output = checkCommonPrefix(output, strs[i]);
}
}
}
return output;
};
function checkCommonPrefix(str1: string, str2: string): string {
let output = '';
let len = Math.min(str1.length, str2.length);
let i = 0;
while(i < len) {
if(str1[i] === str2[i]) {
output += str1[i];
} else {
i = len;
}
i++;
}
return output;
}

Related

Count number of leading characters of a specific character at the beginning of a string?

Given a string such as:
$a = '00023407283';
$b = 'f045602345';
Is there a built in function that can count the number of occurrences of a specific character starting at the beginning and continuing until it finds a different character that is not specified?
Given the above, and specifying zero (0) as the character, the expected result would be:
$a = '00023407283'; // 3 (the other zeros don't count)
$b = 'f0045602345'; // 0 (It does not start with zero)

This should do the trick:
function count_leading($haystack,$value) {
$i = 0;
$mislead = false;
while($i < strlen($haystack) && !$mislead) {
if($haystack[$i] == $value) {
$i += 1;
} else {
$mislead = true;
}
}
return $i;
}
//examples
echo count_leading('aaldfkjlk','a'); //returns 2
echo count_leading('dskjheelk','c'); //returns 0

I don't think there's any built-in functions that could do that (it's too specific) but you could write a method to do that
function repeatChar($string, $char) {
$pos = 0;
while($string{$pos} == $char) $pos++;
return $pos;
}

Yes, you want strspn, which counts the number of characters from the second argument at the beginning of the first argument:
echo strspn($a, '0'); // === 3
echo strspn($b, '0'); // === 0
See it live at 3v4l.org. Besides being a built-in (read "fast"), this also accepts any number of single characters to look at the beginning. However, note that the function is byte-oriented, so it will not work as expected for multi-byte characters.

How to tell if a comma delimited list of numbers obeys the natural order of numbers

I have a comma delimited list of numbers which i am converting into an array and what i want to know about the list of numbers is if the numbers listed obey a natural ordering of numbers,you know,have a difference of exactly 1 between the next and the previous.
If its true the list obeys the natural ordering,i want to pick the first number of the list and if not the list obeys not the natural order,i pick the second.
This is my code.
<?php
error_reporting(0);
/**
Analyze numbers
Condition 1
if from number to the next has a difference of 1,then pick the first number in the list
Condition 2
if from one number the next,a difference of greater than 1 was found,then pick next from first
Condition 3
if list contains only one number,pick the number
*/
$number_picked = null;
$a = '5,7,8,9,10';
$b = '2,3,4,5,6,7,8,9,10';
$c = '10';
$data = explode(',', $b);
$count = count($data);
foreach($data as $index => $number)
{
/**
If array has exactly one value
*/
if($count == 1){
echo 'number is:'.$number;
exit();
}
$previous = $data[($count+$index-1) % $count];
$current = $number;
$next = $data[($index+1) % $count];
$diff = ($next - $previous);
if($diff == 1){
$number_picked = array_values($data)[0];
echo $number_picked.'correct';
}
elseif($diff > 1){
$number_picked = array_values($data)[1];
echo $number_picked.'wrong';
}
}
?>
The problem i am having is to figure out how to test the difference for all array elements.

No loops are needed, a little bit of maths will help you here. Once you have your numbers in an array:
$a = explode(',', '5,7,8,9,10');
pass them to this function:-
function isSequential(array $sequence, $diff = 1)
{
return $sequence[count($sequence) - 1] === $sequence[0] + ($diff * (count($sequence) - 1));
}
The function will return true if the numbers in the array follow a natural sequence. You should even be able to adjust it for different spacings between numbers, eg 2, 4, 6, 8, etc using the $diff parameter, although I haven't tested that thoroughly.
See it working.
Keep in mind that this will only work if your list of numbers is ordered from smallest to largest.

Try using a function to solve this... Like so:
<?php
error_reporting(0);
/**
Analyze numbers
Condition 1
if from number to the next has a difference of 1,then pick the first number in the list
Condition 2
if from one number the next,a difference of greater than 1 was found,then pick next from first
Condition 3
if list contains only one number,pick the number
*/
$number_picked = null;
$a = '5,7,8,9,10';
$b = '2,3,4,5,6,7,8,9,10';
$c = '10';
function test($string) {
$data = explode(',', $string);
if(count($data) === 1){
return 'number is:'.$number;
}
foreach($data as $index => $number)
{
$previous = $data[($count+$index-1) % $count];
$current = $number;
$next = $data[($index+1) % $count];
$diff = ($next - $previous);
if($diff == 1){
$number_picked = array_values($data)[0];
return $number_picked.'correct';
}
elseif($diff > 1){
$number_picked = array_values($data)[1];
return $number_picked.'wrong';
}
}
}
echo test($a);
echo test($b);
echo test($c);
?>

You already know how to explode the list, so I'll skip that.
You already handle a single item, so I'll skip that as well.
What is left, is checking the rest of the array. Basically; there's two possible outcome values: either the first element or the second. So we'll save those two first:
$outcome1 = $list[0];
$outcome2 = $list[1];
Next, we'll loop over the items. We'll remember the last found item, and make sure that the difference between the new and the old is 1. If it is, we continue. If it isn't, we abort and immediately return $outcome2.
If we reach the end of the list without aborting, it's naturally ordered, so we return $outcome1.
$lastNumber = null;
foreach( $items as $number ) {
if($lastNumber === null || $number - $lastNumber == 1 ) {
// continue scanning
$lastNumber = $number;
}
else {
// not ordened
return $outcome2;
}
}
return $outcome1; // scanned everything; was ordened.
(Note: code not tested)

To avoid the headache of accessing the previous or next element, and deciding whether it still is inside the array or not, use the fact that on a natural ordering the item i and the first item have a difference of i.
Also the corner case you call condition 3 is easier to handle outside the loop than inside of it. But easier still, the way we characterize a natural ordered list holds for a 1-item list :
$natural = true;
for($i=1; $i<$count && $natural; $i++)
$natural &= ($data[$i] == $data[0] + $i)
$number = $natural ? $data[0] : $data[1];
For $count == 1 the loop is never entered and thus $natural stays true : you select the first element.

Search for pattern in a string

Pattern search within a string.
for eg.
$string = "111111110000";
FindOut($string);
Function should return 0
function FindOut($str){
$items = str_split($str, 3);
print_r($items);
}

If I understand you correctly, your problem comes down to finding out whether a substring of 3 characters occurs in a string twice without overlapping. This will get you the first occurence's position if it does:
function findPattern($string, $minlen=3) {
$max = strlen($string)-$minlen;
for($i=0;$i<=$max;$i++) {
$pattern = substr($string,$i,$minlen);
if(substr_count($string,$pattern)>1)
return $i;
}
return false;
}
Or am I missing something here?

What you have here can conceptually be solved with a sliding window. For your example, you have a sliding window of size 3.
For each character in the string, you take the substring of the current character and the next two characters as the current pattern. You then slide the window up one position, and check if the remainder of the string has what the current pattern contains. If it does, you return the current index. If not, you repeat.
Example:
1010101101
|-|
So, pattern = 101. Now, we advance the sliding window by one character:
1010101101
|-|
And see if the rest of the string has 101, checking every combination of 3 characters.
Conceptually, this should be all you need to solve this problem.
Edit: I really don't like when people just ask for code, but since this seemed to be an interesting problem, here is my implementation of the above algorithm, which allows for the window size to vary (instead of being fixed at 3, the function is only briefly tested and omits obvious error checking):
function findPattern( $str, $window_size = 3) {
// Start the index at 0 (beginning of the string)
$i = 0;
// while( (the current pattern in the window) is not empty / false)
while( ($current_pattern = substr( $str, $i, $window_size)) != false) {
$possible_matches = array();
// Get the combination of all possible matches from the remainder of the string
for( $j = 0; $j < $window_size; $j++) {
$possible_matches = array_merge( $possible_matches, str_split( substr( $str, $i + 1 + $j), $window_size));
}
// If the current pattern is in the possible matches, we found a duplicate, return the index of the first occurrence
if( in_array( $current_pattern, $possible_matches)) {
return $i;
}
// Otherwise, increment $i and grab a new window
$i++;
}
// No duplicates were found, return -1
return -1;
}
It should be noted that this certainly isn't the most efficient algorithm or implementation, but it should help clarify the problem and give a straightforward example on how to solve it.

Looks like you more want to use a sub-string function to walk along and check every three characters and not just break it into 3
function fp($s, $len = 3){
$max = strlen($s) - $len; //borrowed from lafor as it was a terrible oversight by me
$parts = array();
for($i=0; $i < $max; $i++){
$three = substr($s, $i, $len);
if(array_key_exists("$three",$parts)){
return $parts["$three"];
//if we've already seen it before then this is the first duplicate, we can return it
}
else{
$parts["$three"] = i; //save the index of the starting position.
}
}
return false; //if we get this far then we didn't find any duplicate strings
}

Based on the str_split documentation, calling str_split on "1010101101" will result in:
Array(
[0] => 101
[1] => 010
[2] => 110
[3] => 1
}
None of these will match each other.
You need to look at each 3-long slice of the string (starting at index 0, then index 1, and so on).
I suggest looking at substr, which you can use like this:
substr($input_string, $index, $length)
And it will get you the section of $input_string starting at $index of length $length.

quick and dirty implementation of such pattern search:
function findPattern($string){
$matches = 0;
$substrStart = 0;
while($matches < 2 && $substrStart+ 3 < strlen($string) && $pattern = substr($string, $substrStart++, 3)){
$matches = substr_count($string,$pattern);
}
if($matches < 2){
return null;
}
return $substrStart-1;

Counting possibilities in a char. combination using a consecutive repetition criterion

In PHP, given
the final string length
the range of characters it can use
min consecutive repetition count possible
how can you calculate the number of matches that fits these criteria?To draw a better picture…
$range = array('a','b','c');
$length = 2; // looking for 2 digit results
$minRep = 2; // with >=2 consecutive characters
// aa,bb,cc = 3 possibilities
another one:
$range = array('a','b','c');
$length = 3; // looking for 3 digit results
$minRep = 2; // with >=2 consecutive characters
// aaa,aab,aac,baa,caa
// bbb,bba,bbc,abb,cbb
// ccc,cca,ccb,acc,bcc
// 5 + 5 + 5 = 15 possibilities
// note that combos like aa,bb,cc are not included
// because their length is smaller than $length
last one:
$range = array('a','b','c');
$length = 3; // looking for 3 digit results
$minRep = 3; // with >=3 consecutive characters
// aaa,bbb,ccc = 3 possibilities
So basically, in the 2nd example the 3rd criterion made it catch e.g. [aa]b in aab because a was repeating consecutively more than once, whereas [a]b[a] wouldn't be a match because those a's are separate.
Needless to say, none of the variables is static.

Got it. All credit to leonbloy #mathexchange.com.
/* The main function computes the number of words that do NOT contain
* a character repetition of length $minRep (or more). */
function countStrings($rangeLength, $length, $minRep, &$results = array())
{
if (!isset($results[$length]))
{
$b = 0;
if ($length < $minRep)
$b = pow($rangeLength, $length);
else
{
for ($i = 1; $i < $minRep; $i++)
$b += countStrings($rangeLength, $length - $i, $minRep, $results);
$b *= $rangeLength - 1;
}
$results[$length] = $b;
}
return $results[$length];
}
/* This one answers directly the question. */
function printNumStringsRep($rangeLength, $length, $minRep)
{
$n = (pow($rangeLength, $length)
- countStrings($rangeLength, $length, $minRep));
echo "Size of alphabet : $rangeLength<br/>"
. "Size of string : $length<br/>"
. "Minimal repetition : $minRep<br/>"
. "<strong>Number of words : $n</strong>";
}
/* Prints :
*
Size of alphabet : 3
Size of string : 3
Minimal repetition : 2
Number of words : 15
*
*/
printNumStringsRep(3, 3, 2);

I think it is best to handle this with math.
$range = array('a','b','c');
$length = 3; // looking for 3 digit results
$minRep = 2; // with >=2 consecutive characters
$rangeLength = count($range);
$count = (pow($rangeLength,$length-$minRep+1) * ($length-$minRep+1)) - ($rangeLength * ($length-$minRep)); // is the result
Now, $count is getting true result for three situation. But it may not be general formula and need to improve.
Try to explain it:
pow($rangeLength,$length-$minRep+1)
in this, we count repetitive characters like as one. For instance, in second example that you gave, we think in aab, aa is a one character. Because, two characters need to change together. We think now there is two character like xy. So there is same possibilities for both character a, b, and c namely 3 ($rangeLength) possible value for two characters($length-$minRep+1). So 3^2=9 is possible situations for second example.
We calculate 9 is for just xy not yx. For this, we multiply length of xy ($length-$minRep+1). And then we have 18.
It can be seemed that we calculated the result, but there is a repeat in our calculation. We didn't reckon with this situation: xy => aaa and yx => aaa. For this, we calculate and substract repeated results
- ($rangeLength * ($length-$minRep))
So after this, we get result.
As i said begining of the description, this formula may need to improve.

With Math, work becomes really complex. But, there is always a way, even not beautiful as much as Math. We can create all possible strings with php and control them with regexp like below:
$range = array('a','b','c');
$length = 3;
$minRep = 2;
$rangeLength = count($range);
$createdStrings = array();
$matchedStrings = array();
function calcIndex(){
global $range;
global $length;
global $rangeLength;
static $ret;
$addTrigger = false;
// initial values
if(is_null($ret)){
$ret = array_fill(0, $length, 0);
return $ret;
}
for($i=$length-1;$i>=0;$i--){
if($ret[$i] == ($rangeLength-1)) {
if($i==0) return false;
$ret[$i] = 0;
}
else {
$ret[$i]++;
break;
}
}
return $ret;
}
function createPattern()
{
global $minRep;
$patt = '/(.)\\1{'.($minRep-1).'}/';
return $patt;
}
$pattern = createPattern();
while(1)
{
$index = calcIndex();
if($index === false) break;
$string = '';
for($i=0;$i<$length;$i++)
{
$string .= $range[$index[$i]];
}
if(!in_array($string, $createdStrings)){
$createdStrings[] = $string;
if(preg_match($pattern, $string)){
$matchedStrings[] = $string;
}
}
}
echo count($createdStrings).' is created:';
var_dump($createdStrings);
echo count($matchedStrings).'strings is matched:';
var_dump($matchedStrings);

Wrongly asked or am I stupid?

There's a blog post comment on codinghorror.com by Paul Jungwirth which includes a little programming task:
You have the numbers 123456789, in that order. Between each number, you must insert either nothing, a plus sign, or a multiplication sign, so that the resulting expression equals 2001. Write a program that prints all solutions. (There are two.)
Bored, I thought, I'd have a go, but I'll be damned if I can get a result for 2001. I think the code below is sound and I reckon that there are zero solutions that result in 2001. According to my code, there are two solutions for 2002. Am I right or am I wrong?
/**
* Take the numbers 123456789 and form expressions by inserting one of ''
* (empty string), '+' or '*' between each number.
* Find (2) solutions such that the expression evaluates to the number 2001
*/
$input = array(1,2,3,4,5,6,7,8,9);
// an array of strings representing 8 digit, base 3 numbers
$ops = array();
$numOps = sizeof($input)-1; // always 8
$mask = str_repeat('0', $numOps); // mask of 8 zeros for padding
// generate the ops array
$limit = pow(3, $numOps) -1;
for ($i = 0; $i <= $limit; $i++) {
$s = (string) $i;
$s = base_convert($s, 10, 3);
$ops[] = substr($mask, 0, $numOps - strlen($s)) . $s;
}
// for each element in the ops array, generate an expression by inserting
// '', '*' or '+' between the numbers in $input. e.g. element 11111111 will
// result in 1+2+3+4+5+6+7+8+9
$limit = sizeof($ops);
$stringResult = null;
$numericResult = null;
for ($i = 0; $i < $limit; $i++) {
$l = $numOps;
$stringResult = '';
$numericResult = 0;
for ($j = 0; $j <= $l; $j++) {
$stringResult .= (string) $input[$j];
switch (substr($ops[$i], $j, 1)) {
case '0':
break;
case '1':
$stringResult .= '+';
break;
case '2':
$stringResult .= '*';
break;
default :
}
}
// evaluate the expression
// split the expression into smaller ones to be added together
$temp = explode('+', $stringResult);
$additionElems = array();
foreach ($temp as $subExpressions)
{
// split each of those into ones to be multiplied together
$multplicationElems = explode('*', $subExpressions);
$working = 1;
foreach ($multplicationElems as $operand) {
$working *= $operand;
}
$additionElems[] = $working;
}
$numericResult = 0;
foreach($additionElems as $operand)
{
$numericResult += $operand;
}
if ($numericResult == 2001) {
echo "{$stringResult}\n";
}
}

Further down the same page you linked to.... =)
"Paul Jungwirth wrote:
You have the numbers 123456789, in
that order. Between each number, you
must insert either nothing, a plus
sign, or a multiplication sign, so
that the resulting expression equals
2001. Write a program that prints all solutions. (There are two.)
I think you meant 2002, not 2001. :)
(Just correcting for anyone else like
me who obsessively tries to solve
little "practice" problems like this
one, and then hit Google when their
result doesn't match the stated
answer. ;) Damn, some of those Perl
examples are ugly.)"

The number is 2002.
Recursive solution takes eleven lines of JavaScript (excluding string expression evaluation, which is a standard JavaScript function, however it would probably take another ten or so lines of code to roll your own for this specific scenario):
function combine (digit,exp) {
if (digit > 9) {
if (eval(exp) == 2002) alert(exp+'=2002');
return;
}
combine(digit+1,exp+'+'+digit);
combine(digit+1,exp+'*'+digit);
combine(digit+1,exp+digit);
return;
}
combine(2,'1');

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

finding common prefix of array of strings - php

not that I know of yes: instead of comparing the substring from 0 to length i, you can simply check the ith character (you already know that characters 0 to i-1 match).

The solutions here work only for finding commonalities at the beginning of strings. Here is a function that looks for the longest common substring anywhere in an array of strings. http://www.christopherbloom.com/2011/02/24/find-the-longest-common-substring-using-php/

Related

Count number of leading characters of a specific character at the beginning of a string?

How to tell if a comma delimited list of numbers obeys the natural order of numbers

Search for pattern in a string

Counting possibilities in a char. combination using a consecutive repetition criterion

Wrongly asked or am I stupid?

Categories

Resources