As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
Just tanked a job interview where I was asked to implement a function with this signature:
function justify($str_in, $desired_length)
It needs to mimic what HTML's text-align: justify would do, here's some examples (desired_length = 48)
hello world there ok then = hello......world......there.......ok.......then
hello = .....................hello.....................
ok then = ok.........................................then
this string is almost certainly longer than 48 I think = this.string.is.almost.certainly.longer.than.48.
two words = two.......................................words
three ok words = three.................ok..................words
1 2 3 4 5 6 7 8 9 = 1....2....3.....4.....5.....6.....7.....8.....9
(I replaced the spaces with periods to illustrate)
The length of spaces between words may never differ by more than one.
I have written a PHP solution, but I am more interested in what algorithms people can come up with to solve the problem. It was my first whiteboard question at a job interview ever, and I'm afraid a combination of factors made me take way longer than I should have.
Here's what I came up with. I added the optional $char parameter so you can see what it's outputting - Of course you can pull it inside the function so the prototype matches the requirement.
function justify($str_in, $desired_length, $char = '_') {
// Some common vars and simple error checking / sanitation
$return = '';
$str_in = trim( $str_in);
$desired_length = intval( $desired_length);
// If we've got invalid input, we're done
if( $desired_length <= 0)
return $str_in;
// If the input string is greater than the length, we need to truncate it WITHOUT splitting words
if( strlen( $str_in) > $desired_length) {
$str = wordwrap($str_in, $desired_length);
$str = explode("\n", $str);
$str_in = $str[0];
}
$words = explode( ' ', $str_in);
$num_words = count( $words);
// If there's only one word, it's a simple edge case
if( $num_words == 1) {
$length = ($desired_length - strlen( $words[0])) / 2;
$return .= str_repeat( $char, floor( $length)) . $words[0] . str_repeat( $char, ceil( $length));
} else {
$word_length = strlen( implode( '', $words));
// Calculate the number of spaces to distribute over the words
$num_words--; // We're going to eliminate the last word
$spaces = floor( ($desired_length - $word_length) / $num_words);
$remainder = $desired_length - $word_length - ($num_words * $spaces);
$last = array_pop( $words);
foreach( $words as $word) {
// If we didn't get an even number of spaces to distribute, just tack it on to the front
$spaces_to_add = $spaces;
if( $remainder > 0) {
$spaces_to_add++;
$remainder--;
}
$return .= $word . str_repeat( $char, $spaces_to_add);
}
$return .= $last;
}
return $return;
}
And the test cases:
$inputs = array(
'hello world there ok then',
'hello',
'ok then',
'this string is almost certainly longer than 48 I think',
'two words',
'three ok words',
'1 2 3 4 5 6 7 8 9'
);
foreach( $inputs as $x) {
$ret = justify( $x, 48);
echo 'Inp: ' . $x . " - strlen(" . strlen( $x) . ")\n";
echo 'Out: ' . $ret . " - strlen(" . strlen( $ret) . ")\n\n";
}
And the output:
Inp: hello world there ok then - strlen(25)
Out: hello_______world_______there_______ok______then - strlen(48)
Inp: hello - strlen(5)
Out: _____________________hello______________________ - strlen(48)
Inp: ok then - strlen(7)
Out: ok__________________________________________then - strlen(48)
Inp: this string is almost certainly longer than 48 I think - strlen(54)
Out: this_string_is_almost_certainly_longer_than_48_I - strlen(48)
Inp: two words - strlen(9)
Out: two________________________________________words - strlen(48)
Inp: three ok words - strlen(14)
Out: three__________________ok__________________words - strlen(48)
Inp: 1 2 3 4 5 6 7 8 9 - strlen(17)
Out: 1_____2_____3_____4_____5_____6_____7_____8____9 - strlen(48)
And a demo!
Edit: Cleaned up the code, and it still works :).
Made it a personal challenge to not use any loops/recursion or regex with callbacks. I used a single explode() and a single implode() to achieve this. Great success!
The Code
function justify($str, $maxlen) {
$str = trim($str);
$strlen = strlen($str);
if ($strlen >= $maxlen) {
$str = wordwrap($str, $maxlen);
$str = explode("\n", $str);
$str = $str[0];
$strlen = strlen($str);
}
$space_count = substr_count($str, ' ');
if ($space_count === 0) {
return str_pad($str, $maxlen, ' ', STR_PAD_BOTH);
}
$extra_spaces_needed = $maxlen - $strlen;
$total_spaces = $extra_spaces_needed + $space_count;
$space_string_avg_length = $total_spaces / $space_count;
$short_string_multiplier = floor($space_string_avg_length);
$long_string_multiplier = ceil($space_string_avg_length);
$short_fill_string = str_repeat(' ', $short_string_multiplier);
$long_fill_string = str_repeat(' ', $long_string_multiplier);
$limit = ($space_string_avg_length - $short_string_multiplier) * $space_count;
$words_split_by_long = explode(' ', $str, $limit+1);
$words_split_by_short = $words_split_by_long[$limit];
$words_split_by_short = str_replace(' ', $short_fill_string, $words_split_by_short);
$words_split_by_long[$limit] = $words_split_by_short;
$result = implode($long_fill_string, $words_split_by_long);
return $result;
}
Short (348 chars)
function j($s,$m){$s=trim($s);$l=strlen($s);if($l>=$m){$s=explode("\n",wordwrap($s,$m));$s=$s[0];$l=strlen($s);}$c=substr_count($s,' ');if($c===0)return str_pad($s,$m,' ',STR_PAD_BOTH);$a=($m-$l+$c)/$c;$h=floor($a);$i=($a-$h)*$c;$w=explode(' ',$s,$i+1);$w[$i]=str_replace(' ',str_repeat(' ',$h),$w[$i]);return implode(str_repeat(' ',ceil($a)),$w);}
Algorithm / Code explanation
Handle the two exceptions (string longer than max length or only one word).
Find the average space needed between each word ($space_string_avg_length).
Create a long and short fill string for use between the words, based on ceil() and floor() of the $space_string_avg_length, respectively.
Find out how many long fill strings we need. ($limit+1).
Split the text based on how many long fill strings we need.
Replace spaces in the last part of the array, made by the split, with the short fill strings.
Join the split text back together with the long fill strings.
Testing
$tests = array(
'hello world there ok then',
'hello',
'ok then',
'this string is almost certainly longer than 48 I think',
'two words',
'three ok words',
'1 2 3 4 5 6 7 8 9'
);
foreach ($tests as $test) {
$len_before = strlen($test);
$processed = str_replace(' ', '_', justify($test, 48));
$len_after = strlen($processed);
echo "IN($len_before): $test\n";
echo "OUT($len_after): $processed\n";
}
Results
IN(25): hello world there ok then
OUT(48): hello_______world_______there_______ok______then
IN(5): hello
OUT(48): _____________________hello______________________
IN(7): ok then
OUT(48): ok__________________________________________then
IN(54): this string is almost certainly longer than 48 I think
OUT(48): this_string_is_almost_certainly_longer_than_48_I
IN(9): two words
OUT(48): two________________________________________words
IN(14): three ok words
OUT(48): three__________________ok__________________words
IN(17): 1 2 3 4 5 6 7 8 9
OUT(48): 1_____2_____3_____4_____5_____6_____7_____8____9
See it run!
Here's my solution with no pesky loops
function justify( $str_in, $desired_length=48 ) {
if ( strlen( $str_in ) > $desired_length ) {
$str_in = current( explode( "\n", wordwrap( $str_in, $desired_length ) ) );
}
$string_length = strlen( $str_in );
$spaces_count = substr_count( $str_in, ' ' );
$needed_spaces_count = $desired_length - $string_length + $spaces_count;
if ( $spaces_count === 0 ) {
return str_pad( $str_in, $desired_length, ' ', STR_PAD_BOTH );
}
$spaces_per_space = ceil( $needed_spaces_count / $spaces_count );
$spaced_string = preg_replace( '~\s+~', str_repeat( ' ', $spaces_per_space ), $str_in );
return preg_replace_callback(
sprintf( '~\s{%s}~', $spaces_per_space ),
function ( $m ) use( $spaces_per_space ) {
return str_repeat( ' ', $spaces_per_space-1 );
},
$spaced_string,
strlen( $spaced_string ) - $desired_length
);
}
Comments and output...
https://gist.github.com/2939068
Find out how many spaces there are
Find out how many spaces are needed
Replace existing spaces with the amount of spaces (evenly distributed) needed to meet or just exceed desired line length
Use preg_replace_callback to replace the amount of \s{spaces_inserted} with \s{spaces_inserted-1} necessary to meet the desired line length
I wanted to see which algorithm was the most efficient, so I ran some benchmarks. I did 100k iterations of all 7 test cases. (Ran it in a single core Ubuntu VM)
The results of #ppsreejith and #Kristian Antonsen's code are omitted, because their code crashed when I tried to run it. #PhpMyCoder's code ran as long as I didn't do the formatting to 48 length after object construction. Therefore the test result is incomplete. (Fixed)
Benchmark results
$ php justify.bench.php
Galen(justify1): 5.1464750766754
nickb(justify2): 3.8629620075226
Paolo Bergantino(justify3): 4.3705048561096
user381521(justify5): 8.5988481044769
vlzvl(justify7): 6.6795041561127
Alexander(justify8): 6.7060301303864
ohaal(justify9): 2.9896130561829
PhpMyCoder: 6.1514630317688 (Fixed!)
justify.bench.php
<?php
$tests = array(
'hello world there ok then',
'hello',
'ok then',
'this string is almost certainly longer than 48 I think',
'two words',
'three ok words',
'1 2 3 4 5 6 7 8 9'
);
$testers = array(
'Galen' => 'justify1',
'nickb' => 'justify2',
'Paolo Bergantino' => 'justify3',
// 'Kristian Antonsen' => 'justify4',
'user381521' => 'justify5',
// 'ppsreejith' => 'justify6',
'vlzvl' => 'justify7',
'Alexander' => 'justify8',
'ohaal' => 'justify9'
);
// ppsreejith and Kristian Antonsen's code crashed and burned when I tried to run it
// PhpMyCoder is a special case, but his code also crashed when doing $jus->format(48);
foreach ($testers as $tester => $func) {
$b=microtime(true);
for($i=0;$i<100000;$i++)
foreach ($tests as $test)
$func($test,48);
$a=microtime(true);
echo $tester.'('.$func.'): '.($a-$b)."\n";
}
echo "\n";
// Fixed!
$jus = new Justifier($tests);
$b=microtime(true);
for($i=0;$i<100000;$i++) {
$jus->format(54);
}
$a=microtime(true);
echo 'PhpMyCoder: '.($a-$b)." (Fixed!)\n";
// ALGORITHMS BELOW
// Galen
function justify1( $str_in, $desired_length=48 ) {
if ( strlen( $str_in ) > $desired_length ) {
$str_in = current( explode( "\n", wordwrap( $str_in, $desired_length ) ) );
}
$string_length = strlen( $str_in );
$spaces_count = substr_count( $str_in, ' ' );
$needed_spaces_count = $desired_length - $string_length + $spaces_count;
if ( $spaces_count === 0 ) {
return str_pad( $str_in, $desired_length, ' ', STR_PAD_BOTH );
}
$spaces_per_space = ceil( $needed_spaces_count / $spaces_count );
$spaced_string = preg_replace( '~\s+~', str_repeat( ' ', $spaces_per_space ), $str_in );
return preg_replace_callback(
sprintf( '~\s{%s}~', $spaces_per_space ),
function ( $m ) use( $spaces_per_space ) {
return str_repeat( ' ', $spaces_per_space-1 );
},
$spaced_string,
strlen( $spaced_string ) - $desired_length
);
}
// nickb
function justify2($str_in, $desired_length, $char = '_') {
// Some common vars and simple error checking / sanitation
$return = '';
$str_in = trim( $str_in);
$desired_length = intval( $desired_length);
// If we've got invalid input, we're done
if( $desired_length <= 0)
return $str_in;
// If the input string is greater than the length, we need to truncate it WITHOUT splitting words
if( strlen( $str_in) > $desired_length) {
$str = wordwrap($str_in, $desired_length);
$str = explode("\n", $str);
$str_in = $str[0];
}
$words = explode( ' ', $str_in);
$num_words = count( $words);
// If there's only one word, it's a simple edge case
if( $num_words == 1) {
$length = ($desired_length - strlen( $words[0])) / 2;
$return .= str_repeat( $char, floor( $length)) . $words[0] . str_repeat( $char, ceil( $length));
} else {
$word_length = strlen( implode( '', $words));
// Calculate the number of spaces to distribute over the words
$num_words--; // We're going to eliminate the last word
$spaces = floor( ($desired_length - $word_length) / $num_words);
$remainder = $desired_length - $word_length - ($num_words * $spaces);
$last = array_pop( $words);
foreach( $words as $word) {
// If we didn't get an even number of spaces to distribute, just tack it on to the front
$spaces_to_add = $spaces;
if( $remainder > 0) {
$spaces_to_add++;
$remainder--;
}
$return .= $word . str_repeat( $char, $spaces_to_add);
}
$return .= $last;
}
return $return;
}
// Paolo Bergantino
function justify3($str, $to_len) {
$str = trim($str);
$strlen = strlen($str);
if($str == '') return '';
if($strlen >= $to_len) {
return substr($str, 0, $to_len);
}
$words = explode(' ', $str);
$word_count = count($words);
$space_count = $word_count - 1;
if($word_count == 1) {
return str_pad($str, $to_len, ' ', STR_PAD_BOTH);
}
$space = $to_len - $strlen + $space_count;
$per_space = $space/$space_count;
if(is_int($per_space)) {
return implode($words, str_pad('', $per_space, ' '));
}
$new_str = '';
$spacing = floor($per_space);
$new_str .= $words[0] . str_pad('', $spacing);
foreach($words as $x => $word) {
if($x == $word_count - 1 || $x == 0) continue;
if($x < $word_count - 1) {
$diff = $to_len - strlen($new_str) - (strlen(implode('', array_slice($words, $x))));
$new_str .= $word . str_pad('', floor($diff/($space_count - $x)), ' ');
}
}
$new_str .= $words[$x];
return $new_str;
}
// Kristian Antonsen
function justify4($str_in, $desired_length)
{
foreach ($str_in as &$line) {
$words = explode(' ', $line);
$word_count = count($words) - 1;
$spaces_to_fill = $desired_length - strlen($line) + $word_count;
if (count($words) == 1) {
$line = str_repeat('_', ceil($spaces_to_fill/2)) . $line
. str_repeat('_', floor($spaces_to_fill/2));
continue;
}
$next_space = floor($spaces_to_fill/$word_count);
$leftover_space = $spaces_to_fill % $word_count;
$line = array_shift($words);
foreach($words as $word) {
$extra_space = ($leftover_space) ? ceil($leftover_space / $word_count) : 0;
$leftover_space -= $extra_space;
$line .= str_repeat('_', $next_space + $extra_space) . $word;
}
}
return $str_in;
}
// user381521
function justify5 ($str, $len)
{
// split by whitespace, remove empty strings
$words = array_diff (preg_split ('/\s+/', $str), array (""));
// just space if no words
if (count ($words) == 0)
return str_repeat (" ", $len);
// add empty strings if only one element
if (count ($words) == 1)
$words = array ("", $words[0], "");
// get number of words and spaces
$wordcount = count ($words);
$numspaces = $wordcount - 1;
// get number of non-space characters
$numchars = array_sum (array_map ("strlen", $words));
// get number of characters remaining for space
$remaining = $len - $numchars;
// return if too little spaces remaining
if ($remaining <= $numspaces)
return substr (implode (" ", $words), 0, $len);
// get number of spaces per space
$spaces_per_space = $remaining / $numspaces;
$spaces_leftover = $remaining % $numspaces;
// make array for spaces, spread out leftover spaces
$spaces = array_fill (0, $numspaces, $spaces_per_space);
while ($spaces_leftover--)
$spaces[$numspaces - $spaces_leftover - 1]++;
$spaces[] = 0; // make count ($words) == count ($spaces)
// join it all together
$result = array ();
foreach ($words as $k => $v)
array_push ($result, $v, str_repeat (" ", $spaces[$k]));
return implode ($result);
}
// ppsreejith
function justify6($str, $to_len) {
$str = trim($str);
$strlen = strlen($str);
if($str == '') return '';
if($strlen >= $to_len) {
return substr($str, 0, $to_len);
}
$words = explode(' ', $str);
$word_count = count($words);
$space_count = $word_count - 1;
if($word_count == 1) {
return str_pad($str, $to_len, ' ', STR_PAD_BOTH);
}
$space = $to_len - $strlen + $space_count;
$per_space = floor($space/$space_count);
$spaces = str_pad('', $per_space, ' ');
$curr_word = implode($words, $spaces);
while(strlen($curr_word) < $to_len){
$curr_word = substr($curr_word,0,preg_match("[! ][".$spaces."][! ]",$curr_word)." ".preg_match("[! ][".$spaces."][! ]",$curr_word));
}
return $curr_word;
}
// vlzvl
function justify7($str_in, $desired_length)
{
$str_in = preg_replace("!\s+!"," ",$str_in); // get rid of multiple spaces
$words = explode(" ",$str_in); // break words
$num_words = sizeof($words); // num words
if ($num_words==1) {
return str_pad($str_in,$desired_length,"_",STR_PAD_BOTH);
}
else {
$num_chars = 0; $lenwords = array();
for($x=0;$x<$num_words;$x++) { $num_chars += $lenwords[$x] = strlen($words[$x]); }
$each_div = round(($desired_length - $num_chars) / ($num_words-1));
for($x=0,$sum=0;$x<$num_words;$x++) { $sum += ($lenwords[$x] + ($x<$num_words-1 ? $each_div : 0)); }
$space_to_addcut = ($desired_length - $sum);
for($x=0;$x<$num_words-1;$x++) {
$words[$x] .= str_repeat("_",$each_div+($each_div>1? ($space_to_addcut<0?-1:($space_to_addcut>0?1:0)) :0));
if ($each_div>1) { $space_to_addcut += ($space_to_addcut<0 ? 1 : ($space_to_addcut>0?-1:0) ); }
}
return substr(implode($words),0,$desired_length);
}
}
// Alexander
function justify8($str, $length) {
$words = explode(' ', $str);
if(count($words)==1) $words = array("", $str, "");
$spaces = $length - array_sum(array_map("strlen", $words));
$add = (int)($spaces / (count($words) - 1));
$left = $spaces % (count($words) - 1);
$spaced = implode(str_repeat("_", $add + 1), array_slice($words, 0, $left + 1));
$spaced .= str_repeat("_", max(1, $add));
$spaced .= implode(str_repeat("_", max(1, $add)), array_slice($words, $left + 1));
return substr($spaced, 0, $length);
}
// ohaal
function justify9($s,$m){$s=trim($s);$l=strlen($s);if($l>=$m){$s=explode("\n",wordwrap($s,$m));$s=$s[0];$l=strlen($s);}$c=substr_count($s,' ');if($c===0)return str_pad($s,$m,' ',STR_PAD_BOTH);$a=($m-$l+$c)/$c;$h=floor($a);$i=($a-$h)*$c;$w=explode(' ',$s,$i+1);$w[$i]=str_replace(' ',str_repeat(' ',$h),$w[$i]);return implode(str_repeat(' ',ceil($a)),$w);}
// PhpMyCoder
class Justifier {
private $text;
public function __construct($text) {
if(!is_string($text) && !is_array($text)) {
throw new InvalidArgumentException('Expected a string or an array of strings, instead received type: ' . gettype($text));
}
if(is_array($text)) {
// String arrays must be converted to JustifierLine arrays
$this->text = array_map(function($line) {
return JustifierLine::fromText($line);
}, $text);
} else {
// Single line of text input
$this->text = $text;
}
}
public function format($width = NULL) {
// Strings have to be broken into an array and then jusitifed
if(is_string($this->text)) {
if($width == null) {
throw new InvalidArgumentException('A width must be provided for separation when an un-split string is provided');
}
if($width <= 0) {
throw new InvalidArgumentException('Expected a positive, non-zero width, instead received width of ' . $width);
}
// Break up a JustifierLine of all text until each piece is smaller or equal to $width
$lines = array(JustifierLine::fromText($this->text));
$count = 0;
$newLine = $lines[0]->breakAtColumn($width);
while($newLine !== null) {
$lines[] = $newLine;
$newLine = $lines[++$count]->breakAtColumn($width);
}
} else {
$lines = $this->text;
// Allow for fluid width (uses longest line with single space)
if($width == NULL) {
$width = -1;
foreach($lines as $line) {
// Width of line = Sum of the lengths of the words and the spaces (number of words - 1)
$newWidth = $line->calculateWordsLength() + $line->countWords() - 1;
if($newWidth > $width) { // Looking for the longest line
$width = $newWidth;
}
}
}
}
// Justify each element of array
//$output = array_map(function($line) use ($width) {
// return $this->justify($line, $width);
//}, $lines);
$output = array();
foreach($lines as $line) {
$output[] = $this->justify($line, $width);
}
// If a single-line is passed in, a single line is returned
if(count($output)) {
return $output[0];
}
return $output;
}
private function justify(JustifierLine $line, $width) {
// Retrieve already calculated line information
$words = $line->extractWords();
$spaces = $line->countWords() - 1;
$wordLens = $line->findWordLengths();
$wordsLen = $line->calculateWordsLength();
$minWidth = $wordsLen + $spaces;
$output = '';
if($minWidth > $width) {
throw new LengthException('A minimum width of ' . $minWidth . ' was required, but a width of ' . $width . ' was given instead');
}
// No spaces means only one word (center align)
if($spaces == 0) {
return str_pad($words[0], $width, ' ', STR_PAD_BOTH);
}
for(;$spaces > 0; $spaces--) {
// Add next word to output and subtract its length from counters
$output .= array_shift($words);
$length = array_shift($wordLens);
$wordsLen -= $length;
$width -= $length;
if($spaces == 1) { // Last Iteration
return $output . str_repeat(' ', $width - $wordsLen) . $words[0];
}
// Magic padding is really just simple math
$padding = floor(($width - $wordsLen) / $spaces);
$output .= str_repeat(' ', $padding);
$width -= $padding;
}
}
}
class JustifierLine {
private $words;
private $numWords;
private $wordLengths;
private $wordsLength;
public static function fromText($text) {
// Split words into an array
preg_match_all('/[^ ]+/', $text, $matches, PREG_PATTERN_ORDER);
$words = $matches[0];
// Count words
$numWords = count($words);
// Find the length of each word
$wordLengths = array_map('strlen', $words);
//And Finally, calculate the total length of all words
$wordsLength = array_reduce($wordLengths, function($result, $length) {
return $result + $length;
}, 0);
return new JustifierLine($words, $numWords, $wordLengths, $wordsLength);
}
private function __construct($words, $numWords, $wordLengths, $wordsLength) {
$this->words = $words;
$this->numWords = $numWords;
$this->wordLengths = $wordLengths;
$this->wordsLength = $wordsLength;
}
public function extractWords() { return $this->words; }
public function countWords() { return $this->numWords; }
public function findWordLengths() { return $this->wordLengths; }
public function calculateWordsLength() { return $this->wordsLength; }
public function breakAtColumn($column) {
// Avoid extraneous processing if we can determine no breaking can be done
if($column >= ($this->wordsLength + $this->numWords - 1)) {
return null;
}
$width = 0;
$wordsLength = 0;
for($i = 0; $i < $this->numWords; $i++) {
// Add width of next word
$width += $this->wordLengths[$i];
// If the line is overflowing past required $width
if($width > $column) {
// Remove overflow at end & create a new object with the overflow
$words = array_splice($this->words, $i);
$numWords = $this->numWords - $i;
$this->numWords = $i;
$wordLengths = array_splice($this->wordLengths, $i);
$tempWordsLength = $wordsLength;
$wordsLength = $this->wordsLength - $wordsLength;
$this->wordsLength = $tempWordsLength;
return new JustifierLine($words, $numWords, $wordLengths, $wordsLength);
}
$width++; // Assuming smallest spacing to fit
// We also have to keep track of the total $wordsLength
$wordsLength += $this->wordLengths[$i];
}
return null;
}
}
This is my solution. No pesky regular expressions :)
function justify($str, $length) {
$words = explode(' ', $str);
if(count($words)==1) $words = array("", $str, "");
$spaces = $length - array_sum(array_map("strlen", $words));
$add = (int)($spaces / (count($words) - 1));
$left = $spaces % (count($words) - 1);
$spaced = implode(str_repeat("_", $add + 1), array_slice($words, 0, $left + 1));
$spaced .= str_repeat("_", max(1, $add));
$spaced .= implode(str_repeat("_", max(1, $add)), array_slice($words, $left + 1));
return substr($spaced, 0, $length);
}
This is powered by PHP array functions.
Here is the working example.
Just so no one thinks I'm trying to have them do my homework for me, this is my (working, I think) solution.
I'm not sure I could have possibly been expected to write this much code on a whiteboard on demand, however, so I'm mostly curious to see how others would tackle it without looking at my code. (I made it to around the foreach in the interview before they called 'time' on me, so to speak)
function justify($str, $to_len) {
$str = trim($str);
$strlen = strlen($str);
if($str == '') return '';
if($strlen >= $to_len) {
return substr($str, 0, $to_len);
}
$words = explode(' ', $str);
$word_count = count($words);
$space_count = $word_count - 1;
if($word_count == 1) {
return str_pad($str, $to_len, ' ', STR_PAD_BOTH);
}
$space = $to_len - $strlen + $space_count;
$per_space = $space/$space_count;
if(is_int($per_space)) {
return implode($words, str_pad('', $per_space, ' '));
}
$new_str = '';
$spacing = floor($per_space);
$new_str .= $words[0] . str_pad('', $spacing);
foreach($words as $x => $word) {
if($x == $word_count - 1 || $x == 0) continue;
if($x < $word_count - 1) {
$diff = $to_len - strlen($new_str) - (strlen(implode('', array_slice($words, $x))));
$new_str .= $word . str_pad('', floor($diff/($space_count - $x)), ' ');
}
}
$new_str .= $words[$x];
return $new_str;
}
$tests = array(' hello world there ok then ', 'hello', 'ok then', 'this string is almost certainly longer than 48 I think', 'two words', 'three ok words', '1 2 3 4 5 6 7 8 9');
foreach($tests as $word) {
print $word . ' = ' . str_replace(' ', '_', justify($word, 48)) . '<br>';
}
I miss my list comprehensions in Python ...
<?php
function justify ($str, $len)
{
// split by whitespace, remove empty strings
$words = array_diff (preg_split ('/\s+/', $str), array (""));
// just space if no words
if (count ($words) == 0)
return str_repeat (" ", $len);
// add empty strings if only one element
if (count ($words) == 1)
$words = array ("", $words[0], "");
// get number of words and spaces
$wordcount = count ($words);
$numspaces = $wordcount - 1;
// get number of non-space characters
$numchars = array_sum (array_map ("strlen", $words));
// get number of characters remaining for space
$remaining = $len - $numchars;
// return if too little spaces remaining
if ($remaining <= $numspaces)
return substr (implode (" ", $words), 0, $len);
// get number of spaces per space
$spaces_per_space = $remaining / $numspaces;
$spaces_leftover = $remaining % $numspaces;
// make array for spaces, spread out leftover spaces
$spaces = array_fill (0, $numspaces, $spaces_per_space);
while ($spaces_leftover--)
$spaces[$numspaces - $spaces_leftover - 1]++;
$spaces[] = 0; // make count ($words) == count ($spaces)
// join it all together
$result = array ();
foreach ($words as $k => $v)
array_push ($result, $v, str_repeat (" ", $spaces[$k]));
return implode ($result);
}
?>
Here's my attempt.
function justify($str_in, $desired_length)
{
foreach ($str_in as &$line) {
$words = explode(' ', $line);
$word_count = count($words) - 1;
$spaces_to_fill = $desired_length - strlen($line) + $word_count;
if (count($words) == 1) {
$line = str_repeat('_', ceil($spaces_to_fill/2)) . $line
. str_repeat('_', floor($spaces_to_fill/2));
continue;
}
$next_space = floor($spaces_to_fill/$word_count);
$leftover_space = $spaces_to_fill % $word_count;
$line = array_shift($words);
foreach($words as $word) {
$extra_space = ($leftover_space) ? ceil($leftover_space / $word_count) : 0;
$leftover_space -= $extra_space;
$line .= str_repeat('_', $next_space + $extra_space) . $word;
}
}
return $str_in;
}
I've tried to keep it relatively concise, which has impacted the readability. But here's how it works:
For each entry, we split up the words into an array $words. Because we might want spaces before and after the word, we also add an empty string to the beginning and ending of the array.
We calculate the left-over amount of spaces $leftover_space (that is, the spaces we need to insert somewhere), and divide it by number of words $word_count, so we know the average of how many spaces to put in between each word.
Whenever we add a word, we also add a few spaces $extra_space, depending on how many are left. After that, we remove the amount added from the $leftover_space.
Sample output
$data = justify($data, 48);
print_r($data);
Array
(
[0] => 123456789012345678901234567890123456789012345678
[1] => hello_______world_______there_______ok______then
[2] => ______________________hello_____________________
[3] => ok__________________________________________then
[4] => this__string__is_almost_certainly_longer_than_48
[5] => two________________________________________words
[6] => three__________________ok__________________words
[7] => 1_____2_____3_____4_____5_____6_____7_____8____9
)
I think this is fully working: (the "_" is just keeping the space visible)
function justify($str_in, $desired_length)
{
$str_in = preg_replace("!\s+!"," ",$str_in); // get rid of multiple spaces
$words = explode(" ",$str_in); // break words
$num_words = sizeof($words); // num words
if ($num_words==1) {
return str_pad($str_in,$desired_length,"_",STR_PAD_BOTH);
}
else {
$num_chars = 0; $lenwords = array();
for($x=0;$x<$num_words;$x++) { $num_chars += $lenwords[$x] = strlen($words[$x]); }
$each_div = round(($desired_length - $num_chars) / ($num_words-1));
for($x=0,$sum=0;$x<$num_words;$x++) { $sum += ($lenwords[$x] + ($x<$num_words-1 ? $each_div : 0)); }
$space_to_addcut = ($desired_length - $sum);
for($x=0;$x<$num_words-1;$x++) {
$words[$x] .= str_repeat("_",$each_div+($each_div>1? ($space_to_addcut<0?-1:($space_to_addcut>0?1:0)) :0));
if ($each_div>1) { $space_to_addcut += ($space_to_addcut<0 ? 1 : ($space_to_addcut>0?-1:0) ); }
}
return substr(implode($words),0,$desired_length);
}
}
EDITED:
Function now get rid of multiple spaces between words as well.
How it works (in short):
removes continuous spaces between words
count words so if one (the 'hello' example) just padding both and echo it.
..otherwise count the characters of the used words
calculate the global and partial space to add (the '_' in example).
calculate the extra space to add (string len < desired) OR remove (string len > desired) and apply it to padding.
final, reduce the final string to desired length.
TESTING:
$tests = array(
'hello world there ok then',
'hello',
'ok then',
'this string is almost certainly longer than 48 I think',
'three ok words',
'1 2 3 4 5 6 7 8 9',
'Lorem Ipsum is simply dummy text'
);
$arr = array();
foreach($tests as $key=>$val) {
$arr[$key] = justify($val,50);
$arr[$key] .= " - (chars: ".strlen($arr[$key]).")";
}
echo "<pre>".print_r($arr,TRUE)."</pre>";
AND THE RESULT:
Array
(
[0] => hello________world_______there_______ok_______then - (chars: 50)
[1] => ______________________hello_______________________ - (chars: 50)
[2] => ok____________________________________________then - (chars: 50)
[3] => this_string_is_almost_certainly_longer_than_48_I_t - (chars: 50)
[4] => three___________________ok___________________words - (chars: 50)
[5] => 1______2_____3_____4_____5_____6_____7_____8_____9 - (chars: 50)
[6] => Lorem____Ipsum____is_____simply_____dummy_____text - (chars: 50)
)
THAT WAS TOUGH :)
EDITED 2:
Function is now about 20% faster, because that benchmark touched me :)
The (Semi-Long) Solution
It's taken me a while to perfect (probably much, much longer than an interviewer would have allowed for), but I've come up with an elegant, 162 line OOP solution to this problem. I included functionality to allow for the justifying of a single string, array of strings (already separated into lines) or a long string that needs to be broken up into lines of a maximum width first. Demos follow the code block.
Important Note: This class will only work in PHP 5.4. I realized this when running a version on my own server PHP (5.3.6) to get profiling stats with XDebug. PHP 5.3 complains about my use of $this in the anonymous function. A quick check of the docs on anonymous functions reveals that $this could not be used in the context of an anonymous function until 5.4. If anyone can find a clean workaround to this, please drop it in the comments. Added support for PHP 5.3!
<?php
class Justifier {
private $text;
public function __construct($text) {
if(!is_string($text) && !is_array($text)) {
throw new InvalidArgumentException('Expected a string or an array of strings, instead received type: ' . gettype($text));
}
if(is_array($text)) {
// String arrays must be converted to JustifierLine arrays
$this->text = array_map(function($line) {
return JustifierLine::fromText($line);
}, $text);
} else {
// Single line of text input
$this->text = $text;
}
}
public function format($width = null) {
// Strings have to be broken into an array and then jusitifed
if(is_string($this->text)) {
if($width == null) {
throw new InvalidArgumentException('A width must be provided for separation when an un-split string is provided');
}
if($width <= 0) {
throw new InvalidArgumentException('Expected a positive, non-zero width, instead received width of ' . $width);
}
// Break up a JustifierLine of all text until each piece is smaller or equal to $width
$lines = array(JustifierLine::fromText($this->text));
$count = 0;
$newLine = $lines[0]->breakAtColumn($width);
while($newLine !== null) {
$lines[] = $newLine;
$newLine = $lines[++$count]->breakAtColumn($width);
}
} else {
$lines = $this->text;
// Allow for fluid width (uses longest line with single space)
if($width == NULL) {
$width = -1;
foreach($lines as $line) {
// Width of line = Sum of the lengths of the words and the spaces (number of words - 1)
$newWidth = $line->calculateWordsLength() + $line->countWords() - 1;
if($newWidth > $width) { // Looking for the longest line
$width = $newWidth;
}
}
}
}
// Justify each element of array (PHP 5.4 ONLY)
//$output = array_map(function($line) use ($width) {
// return $this->justify($line, $width);
//}, $lines);
// Support for PHP 5.3
$output = array();
foreach($lines as $line) {
$output = $this->justify($line, $width);
}
// If a single-line is passed in, a single line is returned
if(count($output)) {
return $output[0];
}
return $output;
}
private function justify(JustifierLine $line, $width) {
// Retrieve already calculated line information
$words = $line->extractWords();
$spaces = $line->countWords() - 1;
$wordLens = $line->findWordLengths();
$wordsLen = $line->calculateWordsLength();
$minWidth = $wordsLen + $spaces;
$output = '';
if($minWidth > $width) {
throw new LengthException('A minimum width of ' . $minWidth . ' was required, but a width of ' . $width . ' was given instead');
}
// No spaces means only one word (center align)
if($spaces == 0) {
return str_pad($words[0], $width, ' ', STR_PAD_BOTH);
}
for(;$spaces > 0; $spaces--) {
// Add next word to output and subtract its length from counters
$output .= array_shift($words);
$length = array_shift($wordLens);
$wordsLen -= $length;
$width -= $length;
if($spaces == 1) { // Last Iteration
return $output . str_repeat(' ', $width - $wordsLen) . $words[0];
}
// Magic padding is really just simple math
$padding = floor(($width - $wordsLen) / $spaces);
$output .= str_repeat(' ', $padding);
$width -= $padding;
}
}
}
class JustifierLine {
private $words;
private $numWords;
private $wordLengths;
private $wordsLength;
public static function fromText($text) {
// Split words into an array
preg_match_all('/[^ ]+/', $text, $matches, PREG_PATTERN_ORDER);
$words = $matches[0];
// Count words
$numWords = count($words);
// Find the length of each word
$wordLengths = array_map('strlen', $words);
//And Finally, calculate the total length of all words
$wordsLength = array_reduce($wordLengths, function($result, $length) {
return $result + $length;
}, 0);
return new JustifierLine($words, $numWords, $wordLengths, $wordsLength);
}
private function __construct($words, $numWords, $wordLengths, $wordsLength) {
$this->words = $words;
$this->numWords = $numWords;
$this->wordLengths = $wordLengths;
$this->wordsLength = $wordsLength;
}
public function extractWords() { return $this->words; }
public function countWords() { return $this->numWords; }
public function findWordLengths() { return $this->wordLengths; }
public function calculateWordsLength() { return $this->wordsLength; }
public function breakAtColumn($column) {
// Avoid extraneous processing if we can determine no breaking can be done
if($column >= ($this->wordsLength + $this->numWords - 1)) {
return null;
}
$width = 0;
$wordsLength = 0;
for($i = 0; $i < $this->numWords; $i++) {
// Add width of next word
$width += $this->wordLengths[$i];
// If the line is overflowing past required $width
if($width > $column) {
// Remove overflow at end & create a new object with the overflow
$words = array_splice($this->words, $i);
$numWords = $this->numWords - $i;
$this->numWords = $i;
$wordLengths = array_splice($this->wordLengths, $i);
$tempWordsLength = $wordsLength;
$wordsLength = $this->wordsLength - $wordsLength;
$this->wordsLength = $tempWordsLength;
return new JustifierLine($words, $numWords, $wordLengths, $wordsLength);
}
$width++; // Assuming smallest spacing to fit
// We also have to keep track of the total $wordsLength
$wordsLength += $this->wordLengths[$i];
}
return null;
}
}
Demos
Original Question (Justifying Lines of Text to width = 48)
You can pass in an array of many strings or just one string to Justifier. Calling Justifier::format($desired_length) will always return an array of justified lines *if an array of strings or string that required segmentation was passed to the constructor. Otherwise, a string will be returned. (Codepad Demo)
$jus = new Justifier(array(
'hello world there ok then',
'hello',
'ok then',
'two words',
'three ok words',
'1 2 3 4 5 6 7 8 9'
));
print_r( $jus->format(48) );
Output
Array
(
[0] => hello world there ok then
[1] => hello
[2] => ok then
[3] => two words
[4] => three ok words
[5] => 1 2 3 4 5 6 7 8 9
)
You may notice I omitted one of the OP's test lines. This is because it was 54 characters and would exceed the $desired_length passed to Justifier::format(). The function will throw an IllegalArgumentException for widths that aren't positive, non-zero numbers that exceed or equal to the minimum width. The minimum width is calculated by finding the longest line (of all the lines passed to the constructor) with single spacing.
Fluid Width Justifying With An Array of Strings
If you omit the width, Justifier will use the width of the longest line (of those passed to the constructor) when single spaced. This is the same calculation as finding the minimum width in the previous demo. (Codepad Demo)
$jus = new Justifier(array(
'hello world there ok then',
'hello',
'ok then',
'this string is almost certainly longer than 48 I think',
'two words',
'three ok words',
'1 2 3 4 5 6 7 8 9'
));
print_r( $jus->format() );
Output
Array
(
[0] => hello world there ok then
[1] => hello
[2] => ok then
[3] => this string is almost certainly longer than 48 I think
[4] => two words
[5] => three ok words
[6] => 1 2 3 4 5 6 7 8 9
)
Justifying a Single String of Text (width = 48)
I've also included a feature in the class which allows you to pass a single, non-broken string to the constructor. This string can be of any length. When you call Justifier::format($desired_length) the string is broken into lines such that each line is filled with as much text as possible and justified before starting a new line. The class will complain with an InvalidArgumentException because you must provide a width into which it can break the string. If anyone can think of a sensible default or way to programmatically determine a default for a string, I'm completely open to suggestions. (Codepad Demo)
$jus = new Justifier(
'hello world there ok then hello ok then this string is almost certainly longer than 48 I think two words three ok words 1 2 3 4 5 6 7 8 9'
);
print_r( $jus->format(48) );
Output
Array
(
[0] => hello world there ok then hello ok then this
[1] => string is almost certainly longer than 48 I
[2] => think two words three ok words 1 2 3 4 5 6 7 8 9
)
Here's my solution. For what it's worth, it took me about 20 minutes to make both the justify function and acceptance tests for it; 5 of those minutes debugging the justify function. Also, I used notpad++ instead of a more robust IDE to try to simulate to some extent the interview environment.
I think this might be too large of a problem for a whiteboard interview question, unless the interviewer lets you write in pseudocode and is more interested in your thought process that what you are putting on the board.
<?php
function justify($str_in, $desired_length) {
$words = preg_split("/ +/",$str_in);
// handle special cases
if(count($words)==0) { return str_repeat(" ",$desired_length); }
// turn single word case into a normal case
if(count($words)==1) { $words = array("",$words[0],""); }
$numwords = count($words);
$wordlength = strlen(join("",$words));
// handles cases where words are longer than the desired_length
if($wordlength>($desired_length-$numwords)) {
return substr(join(" ",$words),0,$desired_length);
}
$minspace = floor(($desired_length-$wordlength)/($numwords-1));
$extraspace = $desired_length - $wordlength - ($minspace * ($numwords-1));
$result = $words[0];
for($i=1;$i<$numwords;$i++) {
if($extraspace>0) {
$result.=" ";
$extraspace--;
}
$result.=str_repeat(" ",$minspace);
$result.=$words[$i];
}
return $result;
}
function acceptance_justify($orig_str, $just_str, $expected_length) {
// should be the correct length
if(strlen($just_str)!=$expected_length) { return false; }
// should contain most of the words in the original string, in the right order
if(preg_replace("/ +/","",substr($orig_str,0,$expected_length)) != preg_replace("/ +/","",substr($just_str,0,$expected_length))) { return false; }
//spacing should be uniform (+/- 1 space)
if(!preg_match("/( +)/",$just_str,$spaces)) { return false; }
$space_length=strlen($spaces[0]);
$smin=$space_length;
$smax=$space_length;
for($i=1;$i<count(#spaces);$i++) {
$smin=min($smin,strlen($spaces));
$smax=max($smax,strlen($spaces));
}
if(($smax-$smin)>1) { return false; }
return true;
}
function run_test($str,$len) {
print "<pre>";
print "$str ==> \n";
$result = justify($str,$len);
print preg_replace("/ /",".",$result) . "\n";
print acceptance_justify($str,$result,$len)?"passed":"FAILED";
print "\n\n</pre>";
}
run_test("hello world there ok then",48);
run_test("hello",48);
run_test("this string is almost certainly longer than 48 I think",48);
run_test("two words",48);
run_test("three ok words",48);
run_test("1 2 3 4 5 6 7 8 9",48);
Here's a little bit different implementation just towards the end.
<?php
function justify($str, $to_len) {
$str = trim($str);
$strlen = strlen($str);
if($str == '') return '';
if($strlen >= $to_len) {
return substr($str, 0, $to_len);
}
$words = explode(' ', $str);
$word_count = count($words);
$space_count = $word_count - 1;
if($word_count == 1) {
return str_pad($str, $to_len, ' ', STR_PAD_BOTH);
}
$space = $to_len - $strlen + $space_count;
$per_space = floor($space/$space_count);
$spaces = str_pad('', $per_space, ' ');
$curr_word = implode($words, $spaces);
while(strlen($curr_word) < $to_len){
$curr_word = substr($curr_word,0,preg_match("[! ][".$spaces."][! ]",$curr_word))." ".preg_match("[! ][".$spaces."][! ]",$curr_word));
}
return $curr_word;
?>
I'm not sure about the regexp, I just meant $spaces and not next space.
I have a small search engine doing its thing, and want to highlight the results. I thought I had it all worked out till a set of keywords I used today blew it out of the water.
The issue is that preg_replace() is looping through the replacements, and later replacements are replacing the text I inserted into previous ones. Confused? Here is my pseudo function:
public function highlightKeywords ($data, $keywords = array()) {
$find = array();
$replace = array();
$begin = "<span class=\"keywordHighlight\">";
$end = "</span>";
foreach ($keywords as $kw) {
$find[] = '/' . str_replace("/", "\/", $kw) . '/iu';
$replace[] = $begin . "\$0" . $end;
}
return preg_replace($find, $replace, $data);
}
OK, so it works when searching for "fred" and "dagg" but sadly, when searching for "class" and "lass" and "as" it strikes a real issue when highlighting "Joseph's Class Group"
Joseph's <span class="keywordHighlight">Cl</span><span <span c<span <span class="keywordHighlight">cl</span>ass="keywordHighlight">lass</span>="keywordHighlight">c<span <span class="keywordHighlight">cl</span>ass="keywordHighlight">lass</span></span>="keywordHighlight">ass</span> Group
How would I get the latter replacements to only work on the non-HTML components, but to also allow the tagging of the whole match? e.g. if I was searching for "cla" and "lass" I would want "class" to be highlighted in full as both the search terms are in it, even though they overlap, and the highlighting that was applied to the first match has "class" in it, but that shouldn't be highlighted.
Sigh.
I would rather use a PHP solution than a jQuery (or any client-side) one.
Note: I have tried to sort the keywords by length, doing the long ones first, but that means the cross-over searches do not highlight, meaning with "cla" and "lass" only part of the word "class" would highlight, and it still murdered the replacement tags :(
EDIT: I have messed about, starting with pencil & paper, and wild ramblings, and come up with some very unglamorous code to solve this issue. It's not great, so suggestions to trim/speed this up would still be greatly appreciated :)
public function highlightKeywords ($data, $keywords = array()) {
$find = array();
$replace = array();
$begin = "<span class=\"keywordHighlight\">";
$end = "</span>";
$hits = array();
foreach ($keywords as $kw) {
$offset = 0;
while (($pos = stripos($data, $kw, $offset)) !== false) {
$hits[] = array($pos, $pos + strlen($kw));
$offset = $pos + 1;
}
}
if ($hits) {
usort($hits, function($a, $b) {
if ($a[0] == $b[0]) {
return 0;
}
return ($a[0] < $b[0]) ? -1 : 1;
});
$thisthat = array(0 => $begin, 1 => $end);
for ($i = 0; $i < count($hits); $i++) {
foreach ($thisthat as $key => $val) {
$pos = $hits[$i][$key];
$data = substr($data, 0, $pos) . $val . substr($data, $pos);
for ($j = 0; $j < count($hits); $j++) {
if ($hits[$j][0] >= $pos) {
$hits[$j][0] += strlen($val);
}
if ($hits[$j][1] >= $pos) {
$hits[$j][1] += strlen($val);
}
}
}
}
}
return $data;
}
I've used the following to address this problem:
<?php
$protected_matches = array();
function protect(&$matches) {
global $protected_matches;
return "\0" . array_push($protected_matches, $matches[0]) . "\0";
}
function restore(&$matches) {
global $protected_matches;
return '<span class="keywordHighlight">' .
$protected_matches[$matches[1] - 1] . '</span>';
}
preg_replace_callback('/\x0(\d+)\x0/', 'restore',
preg_replace_callback($patterns, 'protect', $target_string));
The first preg_replace_callback pulls out all matches and replaces them with nul-byte-wrapped placeholders; the second pass replaces them with the span tags.
Edit: Forgot to mention that $patterns was sorted by string length, longest to shortest.
Edit; another solution
<?php
function highlightKeywords($data, $keywords = array(),
$prefix = '<span class="hilite">', $suffix = '</span>') {
$datacopy = strtolower($data);
$keywords = array_map('strtolower', $keywords);
$start = array();
$end = array();
foreach ($keywords as $keyword) {
$offset = 0;
$length = strlen($keyword);
while (($pos = strpos($datacopy, $keyword, $offset)) !== false) {
$start[] = $pos;
$end[] = $offset = $pos + $length;
}
}
if (!count($start)) return $data;
sort($start);
sort($end);
// Merge and sort start/end using negative values to identify endpoints
$zipper = array();
$i = 0;
$n = count($end);
while ($i < $n)
$zipper[] = count($start) && $start[0] <= $end[$i]
? array_shift($start)
: -$end[$i++];
// EXAMPLE:
// [ 9, 10, -14, -14, 81, 82, 86, -86, -86, -90, 99, -103 ]
// take 9, discard 10, take -14, take -14, create pair,
// take 81, discard 82, discard 86, take -86, take -86, take -90, create pair
// take 99, take -103, create pair
// result: [9,14], [81,90], [99,103]
// Generate non-overlapping start/end pairs
$a = array_shift($zipper);
$z = $x = null;
while ($x = array_shift($zipper)) {
if ($x < 0)
$z = $x;
else if ($z) {
$spans[] = array($a, -$z);
$a = $x;
$z = null;
}
}
$spans[] = array($a, -$z);
// Insert the prefix/suffix in the start/end locations
$n = count($spans);
while ($n--)
$data = substr($data, 0, $spans[$n][0])
. $prefix
. substr($data, $spans[$n][0], $spans[$n][1] - $spans[$n][0])
. $suffix
. substr($data, $spans[$n][1]);
return $data;
}
I had to revisit this subject myself today and wrote a better version of the above. I'll include it here. It's the same idea only easier to read and should perform better since it uses arrays instead of concatenation.
<?php
function highlight_range_sort($a, $b) {
$A = abs($a);
$B = abs($b);
if ($A == $B)
return $a < $b ? 1 : 0;
else
return $A < $B ? -1 : 1;
}
function highlightKeywords($data, $keywords = array(),
$prefix = '<span class="highlight">', $suffix = '</span>') {
$datacopy = strtolower($data);
$keywords = array_map('strtolower', $keywords);
// this will contain offset ranges to be highlighted
// positive offset indicates start
// negative offset indicates end
$ranges = array();
// find start/end offsets for each keyword
foreach ($keywords as $keyword) {
$offset = 0;
$length = strlen($keyword);
while (($pos = strpos($datacopy, $keyword, $offset)) !== false) {
$ranges[] = $pos;
$ranges[] = -($offset = $pos + $length);
}
}
if (!count($ranges))
return $data;
// sort offsets by abs(), positive
usort($ranges, 'highlight_range_sort');
// combine overlapping ranges by keeping lesser
// positive and negative numbers
$i = 0;
while ($i < count($ranges) - 1) {
if ($ranges[$i] < 0) {
if ($ranges[$i + 1] < 0)
array_splice($ranges, $i, 1);
else
$i++;
} else if ($ranges[$i + 1] < 0)
$i++;
else
array_splice($ranges, $i + 1, 1);
}
// create substrings
$ranges[] = strlen($data);
$substrings = array(substr($data, 0, $ranges[0]));
for ($i = 0, $n = count($ranges) - 1; $i < $n; $i += 2) {
// prefix + highlighted_text + suffix + regular_text
$substrings[] = $prefix;
$substrings[] = substr($data, $ranges[$i], -$ranges[$i + 1] - $ranges[$i]);
$substrings[] = $suffix;
$substrings[] = substr($data, -$ranges[$i + 1], $ranges[$i + 2] + $ranges[$i + 1]);
}
// join and return substrings
return implode('', $substrings);
}
// Example usage:
echo highlightKeywords("This is a test.\n", array("is"), '(', ')');
echo highlightKeywords("Classes are as hard as they say.\n", array("as", "class"), '(', ')');
// Output:
// Th(is) (is) a test.
// (Class)es are (as) hard (as) they say.
OP - something that's not clear in the question is whether $data can contain HTML from the get-go. Can you clarify this?
If $data can contain HTML itself, you are getting into the realms attempting to parse a non-regular language with a regular language parser, and that's not going to work out well.
In such a case, I would suggest loading the $data HTML into a PHP DOMDocument, getting hold of all of the textNodes and running one of the other perfectly good answers on the contents of each text block in turn.