Justify string algorithm [closed]

Justify string algorithm [closed] - php

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
Just tanked a job interview where I was asked to implement a function with this signature:
function justify($str_in, $desired_length)
It needs to mimic what HTML's text-align: justify would do, here's some examples (desired_length = 48)
hello world there ok then = hello......world......there.......ok.......then
hello = .....................hello.....................
ok then = ok.........................................then
this string is almost certainly longer than 48 I think = this.string.is.almost.certainly.longer.than.48.
two words = two.......................................words
three ok words = three.................ok..................words
1 2 3 4 5 6 7 8 9 = 1....2....3.....4.....5.....6.....7.....8.....9
(I replaced the spaces with periods to illustrate)
The length of spaces between words may never differ by more than one.
I have written a PHP solution, but I am more interested in what algorithms people can come up with to solve the problem. It was my first whiteboard question at a job interview ever, and I'm afraid a combination of factors made me take way longer than I should have.

Here's what I came up with. I added the optional $char parameter so you can see what it's outputting - Of course you can pull it inside the function so the prototype matches the requirement.
function justify($str_in, $desired_length, $char = '_') {
// Some common vars and simple error checking / sanitation
$return = '';
$str_in = trim( $str_in);
$desired_length = intval( $desired_length);
// If we've got invalid input, we're done
if( $desired_length <= 0)
return $str_in;
// If the input string is greater than the length, we need to truncate it WITHOUT splitting words
if( strlen( $str_in) > $desired_length) {
$str = wordwrap($str_in, $desired_length);
$str = explode("\n", $str);
$str_in = $str[0];
}
$words = explode( ' ', $str_in);
$num_words = count( $words);
// If there's only one word, it's a simple edge case
if( $num_words == 1) {
$length = ($desired_length - strlen( $words[0])) / 2;
$return .= str_repeat( $char, floor( $length)) . $words[0] . str_repeat( $char, ceil( $length));
} else {
$word_length = strlen( implode( '', $words));
// Calculate the number of spaces to distribute over the words
$num_words--; // We're going to eliminate the last word
$spaces = floor( ($desired_length - $word_length) / $num_words);
$remainder = $desired_length - $word_length - ($num_words * $spaces);
$last = array_pop( $words);
foreach( $words as $word) {
// If we didn't get an even number of spaces to distribute, just tack it on to the front
$spaces_to_add = $spaces;
if( $remainder > 0) {
$spaces_to_add++;
$remainder--;
}
$return .= $word . str_repeat( $char, $spaces_to_add);
}
$return .= $last;
}
return $return;
}
And the test cases:
$inputs = array(
'hello world there ok then',
'hello',
'ok then',
'this string is almost certainly longer than 48 I think',
'two words',
'three ok words',
'1 2 3 4 5 6 7 8 9'
);
foreach( $inputs as $x) {
$ret = justify( $x, 48);
echo 'Inp: ' . $x . " - strlen(" . strlen( $x) . ")\n";
echo 'Out: ' . $ret . " - strlen(" . strlen( $ret) . ")\n\n";
}
And the output:
Inp: hello world there ok then - strlen(25)
Out: hello_______world_______there_______ok______then - strlen(48)
Inp: hello - strlen(5)
Out: _____________________hello______________________ - strlen(48)
Inp: ok then - strlen(7)
Out: ok__________________________________________then - strlen(48)
Inp: this string is almost certainly longer than 48 I think - strlen(54)
Out: this_string_is_almost_certainly_longer_than_48_I - strlen(48)
Inp: two words - strlen(9)
Out: two________________________________________words - strlen(48)
Inp: three ok words - strlen(14)
Out: three__________________ok__________________words - strlen(48)
Inp: 1 2 3 4 5 6 7 8 9 - strlen(17)
Out: 1_____2_____3_____4_____5_____6_____7_____8____9 - strlen(48)
And a demo!
Edit: Cleaned up the code, and it still works :).

Made it a personal challenge to not use any loops/recursion or regex with callbacks. I used a single explode() and a single implode() to achieve this. Great success!
The Code
function justify($str, $maxlen) {
$str = trim($str);
$strlen = strlen($str);
if ($strlen >= $maxlen) {
$str = wordwrap($str, $maxlen);
$str = explode("\n", $str);
$str = $str[0];
$strlen = strlen($str);
}
$space_count = substr_count($str, ' ');
if ($space_count === 0) {
return str_pad($str, $maxlen, ' ', STR_PAD_BOTH);
}
$extra_spaces_needed = $maxlen - $strlen;
$total_spaces = $extra_spaces_needed + $space_count;
$space_string_avg_length = $total_spaces / $space_count;
$short_string_multiplier = floor($space_string_avg_length);
$long_string_multiplier = ceil($space_string_avg_length);
$short_fill_string = str_repeat(' ', $short_string_multiplier);
$long_fill_string = str_repeat(' ', $long_string_multiplier);
$limit = ($space_string_avg_length - $short_string_multiplier) * $space_count;
$words_split_by_long = explode(' ', $str, $limit+1);
$words_split_by_short = $words_split_by_long[$limit];
$words_split_by_short = str_replace(' ', $short_fill_string, $words_split_by_short);
$words_split_by_long[$limit] = $words_split_by_short;
$result = implode($long_fill_string, $words_split_by_long);
return $result;
}
Short (348 chars)
function j($s,$m){$s=trim($s);$l=strlen($s);if($l>=$m){$s=explode("\n",wordwrap($s,$m));$s=$s[0];$l=strlen($s);}$c=substr_count($s,' ');if($c===0)return str_pad($s,$m,' ',STR_PAD_BOTH);$a=($m-$l+$c)/$c;$h=floor($a);$i=($a-$h)*$c;$w=explode(' ',$s,$i+1);$w[$i]=str_replace(' ',str_repeat(' ',$h),$w[$i]);return implode(str_repeat(' ',ceil($a)),$w);}
Algorithm / Code explanation
Handle the two exceptions (string longer than max length or only one word).
Find the average space needed between each word ($space_string_avg_length).
Create a long and short fill string for use between the words, based on ceil() and floor() of the $space_string_avg_length, respectively.
Find out how many long fill strings we need. ($limit+1).
Split the text based on how many long fill strings we need.
Replace spaces in the last part of the array, made by the split, with the short fill strings.
Join the split text back together with the long fill strings.
Testing
$tests = array(
'hello world there ok then',
'hello',
'ok then',
'this string is almost certainly longer than 48 I think',
'two words',
'three ok words',
'1 2 3 4 5 6 7 8 9'
);
foreach ($tests as $test) {
$len_before = strlen($test);
$processed = str_replace(' ', '_', justify($test, 48));
$len_after = strlen($processed);
echo "IN($len_before): $test\n";
echo "OUT($len_after): $processed\n";
}
Results
IN(25): hello world there ok then
OUT(48): hello_______world_______there_______ok______then
IN(5): hello
OUT(48): _____________________hello______________________
IN(7): ok then
OUT(48): ok__________________________________________then
IN(54): this string is almost certainly longer than 48 I think
OUT(48): this_string_is_almost_certainly_longer_than_48_I
IN(9): two words
OUT(48): two________________________________________words
IN(14): three ok words
OUT(48): three__________________ok__________________words
IN(17): 1 2 3 4 5 6 7 8 9
OUT(48): 1_____2_____3_____4_____5_____6_____7_____8____9
See it run!

Here's my solution with no pesky loops
function justify( $str_in, $desired_length=48 ) {
if ( strlen( $str_in ) > $desired_length ) {
$str_in = current( explode( "\n", wordwrap( $str_in, $desired_length ) ) );
}
$string_length = strlen( $str_in );
$spaces_count = substr_count( $str_in, ' ' );
$needed_spaces_count = $desired_length - $string_length + $spaces_count;
if ( $spaces_count === 0 ) {
return str_pad( $str_in, $desired_length, ' ', STR_PAD_BOTH );
}
$spaces_per_space = ceil( $needed_spaces_count / $spaces_count );
$spaced_string = preg_replace( '~\s+~', str_repeat( ' ', $spaces_per_space ), $str_in );
return preg_replace_callback(
sprintf( '~\s{%s}~', $spaces_per_space ),
function ( $m ) use( $spaces_per_space ) {
return str_repeat( ' ', $spaces_per_space-1 );
},
$spaced_string,
strlen( $spaced_string ) - $desired_length
);
}
Comments and output...
https://gist.github.com/2939068
Find out how many spaces there are
Find out how many spaces are needed
Replace existing spaces with the amount of spaces (evenly distributed) needed to meet or just exceed desired line length
Use preg_replace_callback to replace the amount of \s{spaces_inserted} with \s{spaces_inserted-1} necessary to meet the desired line length

I wanted to see which algorithm was the most efficient, so I ran some benchmarks. I did 100k iterations of all 7 test cases. (Ran it in a single core Ubuntu VM)
The results of #ppsreejith and #Kristian Antonsen's code are omitted, because their code crashed when I tried to run it. #PhpMyCoder's code ran as long as I didn't do the formatting to 48 length after object construction. Therefore the test result is incomplete. (Fixed)
Benchmark results
$ php justify.bench.php
Galen(justify1): 5.1464750766754
nickb(justify2): 3.8629620075226
Paolo Bergantino(justify3): 4.3705048561096
user381521(justify5): 8.5988481044769
vlzvl(justify7): 6.6795041561127
Alexander(justify8): 6.7060301303864
ohaal(justify9): 2.9896130561829
PhpMyCoder: 6.1514630317688 (Fixed!)
justify.bench.php
<?php
$tests = array(
'hello world there ok then',
'hello',
'ok then',
'this string is almost certainly longer than 48 I think',
'two words',
'three ok words',
'1 2 3 4 5 6 7 8 9'
);
$testers = array(
'Galen' => 'justify1',
'nickb' => 'justify2',
'Paolo Bergantino' => 'justify3',
// 'Kristian Antonsen' => 'justify4',
'user381521' => 'justify5',
// 'ppsreejith' => 'justify6',
'vlzvl' => 'justify7',
'Alexander' => 'justify8',
'ohaal' => 'justify9'
);
// ppsreejith and Kristian Antonsen's code crashed and burned when I tried to run it
// PhpMyCoder is a special case, but his code also crashed when doing $jus->format(48);
foreach ($testers as $tester => $func) {
$b=microtime(true);
for($i=0;$i<100000;$i++)
foreach ($tests as $test)
$func($test,48);
$a=microtime(true);
echo $tester.'('.$func.'): '.($a-$b)."\n";
}
echo "\n";
// Fixed!
$jus = new Justifier($tests);
$b=microtime(true);
for($i=0;$i<100000;$i++) {
$jus->format(54);
}
$a=microtime(true);
echo 'PhpMyCoder: '.($a-$b)." (Fixed!)\n";
// ALGORITHMS BELOW
// Galen
function justify1( $str_in, $desired_length=48 ) {
if ( strlen( $str_in ) > $desired_length ) {
$str_in = current( explode( "\n", wordwrap( $str_in, $desired_length ) ) );
}
$string_length = strlen( $str_in );
$spaces_count = substr_count( $str_in, ' ' );
$needed_spaces_count = $desired_length - $string_length + $spaces_count;
if ( $spaces_count === 0 ) {
return str_pad( $str_in, $desired_length, ' ', STR_PAD_BOTH );
}
$spaces_per_space = ceil( $needed_spaces_count / $spaces_count );
$spaced_string = preg_replace( '~\s+~', str_repeat( ' ', $spaces_per_space ), $str_in );
return preg_replace_callback(
sprintf( '~\s{%s}~', $spaces_per_space ),
function ( $m ) use( $spaces_per_space ) {
return str_repeat( ' ', $spaces_per_space-1 );
},
$spaced_string,
strlen( $spaced_string ) - $desired_length
);
}
// nickb
function justify2($str_in, $desired_length, $char = '_') {
// Some common vars and simple error checking / sanitation
$return = '';
$str_in = trim( $str_in);
$desired_length = intval( $desired_length);
// If we've got invalid input, we're done
if( $desired_length <= 0)
return $str_in;
// If the input string is greater than the length, we need to truncate it WITHOUT splitting words
if( strlen( $str_in) > $desired_length) {
$str = wordwrap($str_in, $desired_length);
$str = explode("\n", $str);
$str_in = $str[0];
}
$words = explode( ' ', $str_in);
$num_words = count( $words);
// If there's only one word, it's a simple edge case
if( $num_words == 1) {
$length = ($desired_length - strlen( $words[0])) / 2;
$return .= str_repeat( $char, floor( $length)) . $words[0] . str_repeat( $char, ceil( $length));
} else {
$word_length = strlen( implode( '', $words));
// Calculate the number of spaces to distribute over the words
$num_words--; // We're going to eliminate the last word
$spaces = floor( ($desired_length - $word_length) / $num_words);
$remainder = $desired_length - $word_length - ($num_words * $spaces);
$last = array_pop( $words);
foreach( $words as $word) {
// If we didn't get an even number of spaces to distribute, just tack it on to the front
$spaces_to_add = $spaces;
if( $remainder > 0) {
$spaces_to_add++;
$remainder--;
}
$return .= $word . str_repeat( $char, $spaces_to_add);
}
$return .= $last;
}
return $return;
}
// Paolo Bergantino
function justify3($str, $to_len) {
$str = trim($str);
$strlen = strlen($str);
if($str == '') return '';
if($strlen >= $to_len) {
return substr($str, 0, $to_len);
}
$words = explode(' ', $str);
$word_count = count($words);
$space_count = $word_count - 1;
if($word_count == 1) {
return str_pad($str, $to_len, ' ', STR_PAD_BOTH);
}
$space = $to_len - $strlen + $space_count;
$per_space = $space/$space_count;
if(is_int($per_space)) {
return implode($words, str_pad('', $per_space, ' '));
}
$new_str = '';
$spacing = floor($per_space);
$new_str .= $words[0] . str_pad('', $spacing);
foreach($words as $x => $word) {
if($x == $word_count - 1 || $x == 0) continue;
if($x < $word_count - 1) {
$diff = $to_len - strlen($new_str) - (strlen(implode('', array_slice($words, $x))));
$new_str .= $word . str_pad('', floor($diff/($space_count - $x)), ' ');
}
}
$new_str .= $words[$x];
return $new_str;
}
// Kristian Antonsen
function justify4($str_in, $desired_length)
{
foreach ($str_in as &$line) {
$words = explode(' ', $line);
$word_count = count($words) - 1;
$spaces_to_fill = $desired_length - strlen($line) + $word_count;
if (count($words) == 1) {
$line = str_repeat('_', ceil($spaces_to_fill/2)) . $line
. str_repeat('_', floor($spaces_to_fill/2));
continue;
}
$next_space = floor($spaces_to_fill/$word_count);
$leftover_space = $spaces_to_fill % $word_count;
$line = array_shift($words);
foreach($words as $word) {
$extra_space = ($leftover_space) ? ceil($leftover_space / $word_count) : 0;
$leftover_space -= $extra_space;
$line .= str_repeat('_', $next_space + $extra_space) . $word;
}
}
return $str_in;
}
// user381521
function justify5 ($str, $len)
{
// split by whitespace, remove empty strings
$words = array_diff (preg_split ('/\s+/', $str), array (""));
// just space if no words
if (count ($words) == 0)
return str_repeat (" ", $len);
// add empty strings if only one element
if (count ($words) == 1)
$words = array ("", $words[0], "");
// get number of words and spaces
$wordcount = count ($words);
$numspaces = $wordcount - 1;
// get number of non-space characters
$numchars = array_sum (array_map ("strlen", $words));
// get number of characters remaining for space
$remaining = $len - $numchars;
// return if too little spaces remaining
if ($remaining <= $numspaces)
return substr (implode (" ", $words), 0, $len);
// get number of spaces per space
$spaces_per_space = $remaining / $numspaces;
$spaces_leftover = $remaining % $numspaces;
// make array for spaces, spread out leftover spaces
$spaces = array_fill (0, $numspaces, $spaces_per_space);
while ($spaces_leftover--)
$spaces[$numspaces - $spaces_leftover - 1]++;
$spaces[] = 0; // make count ($words) == count ($spaces)
// join it all together
$result = array ();
foreach ($words as $k => $v)
array_push ($result, $v, str_repeat (" ", $spaces[$k]));
return implode ($result);
}
// ppsreejith
function justify6($str, $to_len) {
$str = trim($str);
$strlen = strlen($str);
if($str == '') return '';
if($strlen >= $to_len) {
return substr($str, 0, $to_len);
}
$words = explode(' ', $str);
$word_count = count($words);
$space_count = $word_count - 1;
if($word_count == 1) {
return str_pad($str, $to_len, ' ', STR_PAD_BOTH);
}
$space = $to_len - $strlen + $space_count;
$per_space = floor($space/$space_count);
$spaces = str_pad('', $per_space, ' ');
$curr_word = implode($words, $spaces);
while(strlen($curr_word) < $to_len){
$curr_word = substr($curr_word,0,preg_match("[! ][".$spaces."][! ]",$curr_word)." ".preg_match("[! ][".$spaces."][! ]",$curr_word));
}
return $curr_word;
}
// vlzvl
function justify7($str_in, $desired_length)
{
$str_in = preg_replace("!\s+!"," ",$str_in); // get rid of multiple spaces
$words = explode(" ",$str_in); // break words
$num_words = sizeof($words); // num words
if ($num_words==1) {
return str_pad($str_in,$desired_length,"_",STR_PAD_BOTH);
}
else {
$num_chars = 0; $lenwords = array();
for($x=0;$x<$num_words;$x++) { $num_chars += $lenwords[$x] = strlen($words[$x]); }
$each_div = round(($desired_length - $num_chars) / ($num_words-1));
for($x=0,$sum=0;$x<$num_words;$x++) { $sum += ($lenwords[$x] + ($x<$num_words-1 ? $each_div : 0)); }
$space_to_addcut = ($desired_length - $sum);
for($x=0;$x<$num_words-1;$x++) {
$words[$x] .= str_repeat("_",$each_div+($each_div>1? ($space_to_addcut<0?-1:($space_to_addcut>0?1:0)) :0));
if ($each_div>1) { $space_to_addcut += ($space_to_addcut<0 ? 1 : ($space_to_addcut>0?-1:0) ); }
}
return substr(implode($words),0,$desired_length);
}
}
// Alexander
function justify8($str, $length) {
$words = explode(' ', $str);
if(count($words)==1) $words = array("", $str, "");
$spaces = $length - array_sum(array_map("strlen", $words));
$add = (int)($spaces / (count($words) - 1));
$left = $spaces % (count($words) - 1);
$spaced = implode(str_repeat("_", $add + 1), array_slice($words, 0, $left + 1));
$spaced .= str_repeat("_", max(1, $add));
$spaced .= implode(str_repeat("_", max(1, $add)), array_slice($words, $left + 1));
return substr($spaced, 0, $length);
}
// ohaal
function justify9($s,$m){$s=trim($s);$l=strlen($s);if($l>=$m){$s=explode("\n",wordwrap($s,$m));$s=$s[0];$l=strlen($s);}$c=substr_count($s,' ');if($c===0)return str_pad($s,$m,' ',STR_PAD_BOTH);$a=($m-$l+$c)/$c;$h=floor($a);$i=($a-$h)*$c;$w=explode(' ',$s,$i+1);$w[$i]=str_replace(' ',str_repeat(' ',$h),$w[$i]);return implode(str_repeat(' ',ceil($a)),$w);}
// PhpMyCoder
class Justifier {
private $text;
public function __construct($text) {
if(!is_string($text) && !is_array($text)) {
throw new InvalidArgumentException('Expected a string or an array of strings, instead received type: ' . gettype($text));
}
if(is_array($text)) {
// String arrays must be converted to JustifierLine arrays
$this->text = array_map(function($line) {
return JustifierLine::fromText($line);
}, $text);
} else {
// Single line of text input
$this->text = $text;
}
}
public function format($width = NULL) {
// Strings have to be broken into an array and then jusitifed
if(is_string($this->text)) {
if($width == null) {
throw new InvalidArgumentException('A width must be provided for separation when an un-split string is provided');
}
if($width <= 0) {
throw new InvalidArgumentException('Expected a positive, non-zero width, instead received width of ' . $width);
}
// Break up a JustifierLine of all text until each piece is smaller or equal to $width
$lines = array(JustifierLine::fromText($this->text));
$count = 0;
$newLine = $lines[0]->breakAtColumn($width);
while($newLine !== null) {
$lines[] = $newLine;
$newLine = $lines[++$count]->breakAtColumn($width);
}
} else {
$lines = $this->text;
// Allow for fluid width (uses longest line with single space)
if($width == NULL) {
$width = -1;
foreach($lines as $line) {
// Width of line = Sum of the lengths of the words and the spaces (number of words - 1)
$newWidth = $line->calculateWordsLength() + $line->countWords() - 1;
if($newWidth > $width) { // Looking for the longest line
$width = $newWidth;
}
}
}
}
// Justify each element of array
//$output = array_map(function($line) use ($width) {
// return $this->justify($line, $width);
//}, $lines);
$output = array();
foreach($lines as $line) {
$output[] = $this->justify($line, $width);
}
// If a single-line is passed in, a single line is returned
if(count($output)) {
return $output[0];
}
return $output;
}
private function justify(JustifierLine $line, $width) {
// Retrieve already calculated line information
$words = $line->extractWords();
$spaces = $line->countWords() - 1;
$wordLens = $line->findWordLengths();
$wordsLen = $line->calculateWordsLength();
$minWidth = $wordsLen + $spaces;
$output = '';
if($minWidth > $width) {
throw new LengthException('A minimum width of ' . $minWidth . ' was required, but a width of ' . $width . ' was given instead');
}
// No spaces means only one word (center align)
if($spaces == 0) {
return str_pad($words[0], $width, ' ', STR_PAD_BOTH);
}
for(;$spaces > 0; $spaces--) {
// Add next word to output and subtract its length from counters
$output .= array_shift($words);
$length = array_shift($wordLens);
$wordsLen -= $length;
$width -= $length;
if($spaces == 1) { // Last Iteration
return $output . str_repeat(' ', $width - $wordsLen) . $words[0];
}
// Magic padding is really just simple math
$padding = floor(($width - $wordsLen) / $spaces);
$output .= str_repeat(' ', $padding);
$width -= $padding;
}
}
}
class JustifierLine {
private $words;
private $numWords;
private $wordLengths;
private $wordsLength;
public static function fromText($text) {
// Split words into an array
preg_match_all('/[^ ]+/', $text, $matches, PREG_PATTERN_ORDER);
$words = $matches[0];
// Count words
$numWords = count($words);
// Find the length of each word
$wordLengths = array_map('strlen', $words);
//And Finally, calculate the total length of all words
$wordsLength = array_reduce($wordLengths, function($result, $length) {
return $result + $length;
}, 0);
return new JustifierLine($words, $numWords, $wordLengths, $wordsLength);
}
private function __construct($words, $numWords, $wordLengths, $wordsLength) {
$this->words = $words;
$this->numWords = $numWords;
$this->wordLengths = $wordLengths;
$this->wordsLength = $wordsLength;
}
public function extractWords() { return $this->words; }
public function countWords() { return $this->numWords; }
public function findWordLengths() { return $this->wordLengths; }
public function calculateWordsLength() { return $this->wordsLength; }
public function breakAtColumn($column) {
// Avoid extraneous processing if we can determine no breaking can be done
if($column >= ($this->wordsLength + $this->numWords - 1)) {
return null;
}
$width = 0;
$wordsLength = 0;
for($i = 0; $i < $this->numWords; $i++) {
// Add width of next word
$width += $this->wordLengths[$i];
// If the line is overflowing past required $width
if($width > $column) {
// Remove overflow at end & create a new object with the overflow
$words = array_splice($this->words, $i);
$numWords = $this->numWords - $i;
$this->numWords = $i;
$wordLengths = array_splice($this->wordLengths, $i);
$tempWordsLength = $wordsLength;
$wordsLength = $this->wordsLength - $wordsLength;
$this->wordsLength = $tempWordsLength;
return new JustifierLine($words, $numWords, $wordLengths, $wordsLength);
}
$width++; // Assuming smallest spacing to fit
// We also have to keep track of the total $wordsLength
$wordsLength += $this->wordLengths[$i];
}
return null;
}
}

This is my solution. No pesky regular expressions :)
function justify($str, $length) {
$words = explode(' ', $str);
if(count($words)==1) $words = array("", $str, "");
$spaces = $length - array_sum(array_map("strlen", $words));
$add = (int)($spaces / (count($words) - 1));
$left = $spaces % (count($words) - 1);
$spaced = implode(str_repeat("_", $add + 1), array_slice($words, 0, $left + 1));
$spaced .= str_repeat("_", max(1, $add));
$spaced .= implode(str_repeat("_", max(1, $add)), array_slice($words, $left + 1));
return substr($spaced, 0, $length);
}
This is powered by PHP array functions.
Here is the working example.

Just so no one thinks I'm trying to have them do my homework for me, this is my (working, I think) solution.
I'm not sure I could have possibly been expected to write this much code on a whiteboard on demand, however, so I'm mostly curious to see how others would tackle it without looking at my code. (I made it to around the foreach in the interview before they called 'time' on me, so to speak)
function justify($str, $to_len) {
$str = trim($str);
$strlen = strlen($str);
if($str == '') return '';
if($strlen >= $to_len) {
return substr($str, 0, $to_len);
}
$words = explode(' ', $str);
$word_count = count($words);
$space_count = $word_count - 1;
if($word_count == 1) {
return str_pad($str, $to_len, ' ', STR_PAD_BOTH);
}
$space = $to_len - $strlen + $space_count;
$per_space = $space/$space_count;
if(is_int($per_space)) {
return implode($words, str_pad('', $per_space, ' '));
}
$new_str = '';
$spacing = floor($per_space);
$new_str .= $words[0] . str_pad('', $spacing);
foreach($words as $x => $word) {
if($x == $word_count - 1 || $x == 0) continue;
if($x < $word_count - 1) {
$diff = $to_len - strlen($new_str) - (strlen(implode('', array_slice($words, $x))));
$new_str .= $word . str_pad('', floor($diff/($space_count - $x)), ' ');
}
}
$new_str .= $words[$x];
return $new_str;
}
$tests = array(' hello world there ok then ', 'hello', 'ok then', 'this string is almost certainly longer than 48 I think', 'two words', 'three ok words', '1 2 3 4 5 6 7 8 9');
foreach($tests as $word) {
print $word . ' = ' . str_replace(' ', '_', justify($word, 48)) . '<br>';
}

I miss my list comprehensions in Python ...
<?php
function justify ($str, $len)
{
// split by whitespace, remove empty strings
$words = array_diff (preg_split ('/\s+/', $str), array (""));
// just space if no words
if (count ($words) == 0)
return str_repeat (" ", $len);
// add empty strings if only one element
if (count ($words) == 1)
$words = array ("", $words[0], "");
// get number of words and spaces
$wordcount = count ($words);
$numspaces = $wordcount - 1;
// get number of non-space characters
$numchars = array_sum (array_map ("strlen", $words));
// get number of characters remaining for space
$remaining = $len - $numchars;
// return if too little spaces remaining
if ($remaining <= $numspaces)
return substr (implode (" ", $words), 0, $len);
// get number of spaces per space
$spaces_per_space = $remaining / $numspaces;
$spaces_leftover = $remaining % $numspaces;
// make array for spaces, spread out leftover spaces
$spaces = array_fill (0, $numspaces, $spaces_per_space);
while ($spaces_leftover--)
$spaces[$numspaces - $spaces_leftover - 1]++;
$spaces[] = 0; // make count ($words) == count ($spaces)
// join it all together
$result = array ();
foreach ($words as $k => $v)
array_push ($result, $v, str_repeat (" ", $spaces[$k]));
return implode ($result);
}
?>

Here's my attempt.
function justify($str_in, $desired_length)
{
foreach ($str_in as &$line) {
$words = explode(' ', $line);
$word_count = count($words) - 1;
$spaces_to_fill = $desired_length - strlen($line) + $word_count;
if (count($words) == 1) {
$line = str_repeat('_', ceil($spaces_to_fill/2)) . $line
. str_repeat('_', floor($spaces_to_fill/2));
continue;
}
$next_space = floor($spaces_to_fill/$word_count);
$leftover_space = $spaces_to_fill % $word_count;
$line = array_shift($words);
foreach($words as $word) {
$extra_space = ($leftover_space) ? ceil($leftover_space / $word_count) : 0;
$leftover_space -= $extra_space;
$line .= str_repeat('_', $next_space + $extra_space) . $word;
}
}
return $str_in;
}
I've tried to keep it relatively concise, which has impacted the readability. But here's how it works:
For each entry, we split up the words into an array $words. Because we might want spaces before and after the word, we also add an empty string to the beginning and ending of the array.
We calculate the left-over amount of spaces $leftover_space (that is, the spaces we need to insert somewhere), and divide it by number of words $word_count, so we know the average of how many spaces to put in between each word.
Whenever we add a word, we also add a few spaces $extra_space, depending on how many are left. After that, we remove the amount added from the $leftover_space.
Sample output
$data = justify($data, 48);
print_r($data);
Array
(
[0] => 123456789012345678901234567890123456789012345678
[1] => hello_______world_______there_______ok______then
[2] => ______________________hello_____________________
[3] => ok__________________________________________then
[4] => this__string__is_almost_certainly_longer_than_48
[5] => two________________________________________words
[6] => three__________________ok__________________words
[7] => 1_____2_____3_____4_____5_____6_____7_____8____9
)

I think this is fully working: (the "_" is just keeping the space visible)
function justify($str_in, $desired_length)
{
$str_in = preg_replace("!\s+!"," ",$str_in); // get rid of multiple spaces
$words = explode(" ",$str_in); // break words
$num_words = sizeof($words); // num words
if ($num_words==1) {
return str_pad($str_in,$desired_length,"_",STR_PAD_BOTH);
}
else {
$num_chars = 0; $lenwords = array();
for($x=0;$x<$num_words;$x++) { $num_chars += $lenwords[$x] = strlen($words[$x]); }
$each_div = round(($desired_length - $num_chars) / ($num_words-1));
for($x=0,$sum=0;$x<$num_words;$x++) { $sum += ($lenwords[$x] + ($x<$num_words-1 ? $each_div : 0)); }
$space_to_addcut = ($desired_length - $sum);
for($x=0;$x<$num_words-1;$x++) {
$words[$x] .= str_repeat("_",$each_div+($each_div>1? ($space_to_addcut<0?-1:($space_to_addcut>0?1:0)) :0));
if ($each_div>1) { $space_to_addcut += ($space_to_addcut<0 ? 1 : ($space_to_addcut>0?-1:0) ); }
}
return substr(implode($words),0,$desired_length);
}
}
EDITED:
Function now get rid of multiple spaces between words as well.
How it works (in short):
removes continuous spaces between words
count words so if one (the 'hello' example) just padding both and echo it.
..otherwise count the characters of the used words
calculate the global and partial space to add (the '_' in example).
calculate the extra space to add (string len < desired) OR remove (string len > desired) and apply it to padding.
final, reduce the final string to desired length.
TESTING:
$tests = array(
'hello world there ok then',
'hello',
'ok then',
'this string is almost certainly longer than 48 I think',
'three ok words',
'1 2 3 4 5 6 7 8 9',
'Lorem Ipsum is simply dummy text'
);
$arr = array();
foreach($tests as $key=>$val) {
$arr[$key] = justify($val,50);
$arr[$key] .= " - (chars: ".strlen($arr[$key]).")";
}
echo "<pre>".print_r($arr,TRUE)."</pre>";
AND THE RESULT:
Array
(
[0] => hello________world_______there_______ok_______then - (chars: 50)
[1] => ______________________hello_______________________ - (chars: 50)
[2] => ok____________________________________________then - (chars: 50)
[3] => this_string_is_almost_certainly_longer_than_48_I_t - (chars: 50)
[4] => three___________________ok___________________words - (chars: 50)
[5] => 1______2_____3_____4_____5_____6_____7_____8_____9 - (chars: 50)
[6] => Lorem____Ipsum____is_____simply_____dummy_____text - (chars: 50)
)
THAT WAS TOUGH :)
EDITED 2:
Function is now about 20% faster, because that benchmark touched me :)

The (Semi-Long) Solution
It's taken me a while to perfect (probably much, much longer than an interviewer would have allowed for), but I've come up with an elegant, 162 line OOP solution to this problem. I included functionality to allow for the justifying of a single string, array of strings (already separated into lines) or a long string that needs to be broken up into lines of a maximum width first. Demos follow the code block.
Important Note: This class will only work in PHP 5.4. I realized this when running a version on my own server PHP (5.3.6) to get profiling stats with XDebug. PHP 5.3 complains about my use of $this in the anonymous function. A quick check of the docs on anonymous functions reveals that $this could not be used in the context of an anonymous function until 5.4. If anyone can find a clean workaround to this, please drop it in the comments. Added support for PHP 5.3!
<?php
class Justifier {
private $text;
public function __construct($text) {
if(!is_string($text) && !is_array($text)) {
throw new InvalidArgumentException('Expected a string or an array of strings, instead received type: ' . gettype($text));
}
if(is_array($text)) {
// String arrays must be converted to JustifierLine arrays
$this->text = array_map(function($line) {
return JustifierLine::fromText($line);
}, $text);
} else {
// Single line of text input
$this->text = $text;
}
}
public function format($width = null) {
// Strings have to be broken into an array and then jusitifed
if(is_string($this->text)) {
if($width == null) {
throw new InvalidArgumentException('A width must be provided for separation when an un-split string is provided');
}
if($width <= 0) {
throw new InvalidArgumentException('Expected a positive, non-zero width, instead received width of ' . $width);
}
// Break up a JustifierLine of all text until each piece is smaller or equal to $width
$lines = array(JustifierLine::fromText($this->text));
$count = 0;
$newLine = $lines[0]->breakAtColumn($width);
while($newLine !== null) {
$lines[] = $newLine;
$newLine = $lines[++$count]->breakAtColumn($width);
}
} else {
$lines = $this->text;
// Allow for fluid width (uses longest line with single space)
if($width == NULL) {
$width = -1;
foreach($lines as $line) {
// Width of line = Sum of the lengths of the words and the spaces (number of words - 1)
$newWidth = $line->calculateWordsLength() + $line->countWords() - 1;
if($newWidth > $width) { // Looking for the longest line
$width = $newWidth;
}
}
}
}
// Justify each element of array (PHP 5.4 ONLY)
//$output = array_map(function($line) use ($width) {
// return $this->justify($line, $width);
//}, $lines);
// Support for PHP 5.3
$output = array();
foreach($lines as $line) {
$output = $this->justify($line, $width);
}
// If a single-line is passed in, a single line is returned
if(count($output)) {
return $output[0];
}
return $output;
}
private function justify(JustifierLine $line, $width) {
// Retrieve already calculated line information
$words = $line->extractWords();
$spaces = $line->countWords() - 1;
$wordLens = $line->findWordLengths();
$wordsLen = $line->calculateWordsLength();
$minWidth = $wordsLen + $spaces;
$output = '';
if($minWidth > $width) {
throw new LengthException('A minimum width of ' . $minWidth . ' was required, but a width of ' . $width . ' was given instead');
}
// No spaces means only one word (center align)
if($spaces == 0) {
return str_pad($words[0], $width, ' ', STR_PAD_BOTH);
}
for(;$spaces > 0; $spaces--) {
// Add next word to output and subtract its length from counters
$output .= array_shift($words);
$length = array_shift($wordLens);
$wordsLen -= $length;
$width -= $length;
if($spaces == 1) { // Last Iteration
return $output . str_repeat(' ', $width - $wordsLen) . $words[0];
}
// Magic padding is really just simple math
$padding = floor(($width - $wordsLen) / $spaces);
$output .= str_repeat(' ', $padding);
$width -= $padding;
}
}
}
class JustifierLine {
private $words;
private $numWords;
private $wordLengths;
private $wordsLength;
public static function fromText($text) {
// Split words into an array
preg_match_all('/[^ ]+/', $text, $matches, PREG_PATTERN_ORDER);
$words = $matches[0];
// Count words
$numWords = count($words);
// Find the length of each word
$wordLengths = array_map('strlen', $words);
//And Finally, calculate the total length of all words
$wordsLength = array_reduce($wordLengths, function($result, $length) {
return $result + $length;
}, 0);
return new JustifierLine($words, $numWords, $wordLengths, $wordsLength);
}
private function __construct($words, $numWords, $wordLengths, $wordsLength) {
$this->words = $words;
$this->numWords = $numWords;
$this->wordLengths = $wordLengths;
$this->wordsLength = $wordsLength;
}
public function extractWords() { return $this->words; }
public function countWords() { return $this->numWords; }
public function findWordLengths() { return $this->wordLengths; }
public function calculateWordsLength() { return $this->wordsLength; }
public function breakAtColumn($column) {
// Avoid extraneous processing if we can determine no breaking can be done
if($column >= ($this->wordsLength + $this->numWords - 1)) {
return null;
}
$width = 0;
$wordsLength = 0;
for($i = 0; $i < $this->numWords; $i++) {
// Add width of next word
$width += $this->wordLengths[$i];
// If the line is overflowing past required $width
if($width > $column) {
// Remove overflow at end & create a new object with the overflow
$words = array_splice($this->words, $i);
$numWords = $this->numWords - $i;
$this->numWords = $i;
$wordLengths = array_splice($this->wordLengths, $i);
$tempWordsLength = $wordsLength;
$wordsLength = $this->wordsLength - $wordsLength;
$this->wordsLength = $tempWordsLength;
return new JustifierLine($words, $numWords, $wordLengths, $wordsLength);
}
$width++; // Assuming smallest spacing to fit
// We also have to keep track of the total $wordsLength
$wordsLength += $this->wordLengths[$i];
}
return null;
}
}
Demos
Original Question (Justifying Lines of Text to width = 48)
You can pass in an array of many strings or just one string to Justifier. Calling Justifier::format($desired_length) will always return an array of justified lines *if an array of strings or string that required segmentation was passed to the constructor. Otherwise, a string will be returned. (Codepad Demo)
$jus = new Justifier(array(
'hello world there ok then',
'hello',
'ok then',
'two words',
'three ok words',
'1 2 3 4 5 6 7 8 9'
));
print_r( $jus->format(48) );
Output
Array
(
[0] => hello world there ok then
[1] => hello
[2] => ok then
[3] => two words
[4] => three ok words
[5] => 1 2 3 4 5 6 7 8 9
)
You may notice I omitted one of the OP's test lines. This is because it was 54 characters and would exceed the $desired_length passed to Justifier::format(). The function will throw an IllegalArgumentException for widths that aren't positive, non-zero numbers that exceed or equal to the minimum width. The minimum width is calculated by finding the longest line (of all the lines passed to the constructor) with single spacing.
Fluid Width Justifying With An Array of Strings
If you omit the width, Justifier will use the width of the longest line (of those passed to the constructor) when single spaced. This is the same calculation as finding the minimum width in the previous demo. (Codepad Demo)
$jus = new Justifier(array(
'hello world there ok then',
'hello',
'ok then',
'this string is almost certainly longer than 48 I think',
'two words',
'three ok words',
'1 2 3 4 5 6 7 8 9'
));
print_r( $jus->format() );
Output
Array
(
[0] => hello world there ok then
[1] => hello
[2] => ok then
[3] => this string is almost certainly longer than 48 I think
[4] => two words
[5] => three ok words
[6] => 1 2 3 4 5 6 7 8 9
)
Justifying a Single String of Text (width = 48)
I've also included a feature in the class which allows you to pass a single, non-broken string to the constructor. This string can be of any length. When you call Justifier::format($desired_length) the string is broken into lines such that each line is filled with as much text as possible and justified before starting a new line. The class will complain with an InvalidArgumentException because you must provide a width into which it can break the string. If anyone can think of a sensible default or way to programmatically determine a default for a string, I'm completely open to suggestions. (Codepad Demo)
$jus = new Justifier(
'hello world there ok then hello ok then this string is almost certainly longer than 48 I think two words three ok words 1 2 3 4 5 6 7 8 9'
);
print_r( $jus->format(48) );
Output
Array
(
[0] => hello world there ok then hello ok then this
[1] => string is almost certainly longer than 48 I
[2] => think two words three ok words 1 2 3 4 5 6 7 8 9
)

Here's my solution. For what it's worth, it took me about 20 minutes to make both the justify function and acceptance tests for it; 5 of those minutes debugging the justify function. Also, I used notpad++ instead of a more robust IDE to try to simulate to some extent the interview environment.
I think this might be too large of a problem for a whiteboard interview question, unless the interviewer lets you write in pseudocode and is more interested in your thought process that what you are putting on the board.
<?php
function justify($str_in, $desired_length) {
$words = preg_split("/ +/",$str_in);
// handle special cases
if(count($words)==0) { return str_repeat(" ",$desired_length); }
// turn single word case into a normal case
if(count($words)==1) { $words = array("",$words[0],""); }
$numwords = count($words);
$wordlength = strlen(join("",$words));
// handles cases where words are longer than the desired_length
if($wordlength>($desired_length-$numwords)) {
return substr(join(" ",$words),0,$desired_length);
}
$minspace = floor(($desired_length-$wordlength)/($numwords-1));
$extraspace = $desired_length - $wordlength - ($minspace * ($numwords-1));
$result = $words[0];
for($i=1;$i<$numwords;$i++) {
if($extraspace>0) {
$result.=" ";
$extraspace--;
}
$result.=str_repeat(" ",$minspace);
$result.=$words[$i];
}
return $result;
}
function acceptance_justify($orig_str, $just_str, $expected_length) {
// should be the correct length
if(strlen($just_str)!=$expected_length) { return false; }
// should contain most of the words in the original string, in the right order
if(preg_replace("/ +/","",substr($orig_str,0,$expected_length)) != preg_replace("/ +/","",substr($just_str,0,$expected_length))) { return false; }
//spacing should be uniform (+/- 1 space)
if(!preg_match("/( +)/",$just_str,$spaces)) { return false; }
$space_length=strlen($spaces[0]);
$smin=$space_length;
$smax=$space_length;
for($i=1;$i<count(#spaces);$i++) {
$smin=min($smin,strlen($spaces));
$smax=max($smax,strlen($spaces));
}
if(($smax-$smin)>1) { return false; }
return true;
}
function run_test($str,$len) {
print "<pre>";
print "$str ==> \n";
$result = justify($str,$len);
print preg_replace("/ /",".",$result) . "\n";
print acceptance_justify($str,$result,$len)?"passed":"FAILED";
print "\n\n</pre>";
}
run_test("hello world there ok then",48);
run_test("hello",48);
run_test("this string is almost certainly longer than 48 I think",48);
run_test("two words",48);
run_test("three ok words",48);
run_test("1 2 3 4 5 6 7 8 9",48);

Here's a little bit different implementation just towards the end.
<?php
function justify($str, $to_len) {
$str = trim($str);
$strlen = strlen($str);
if($str == '') return '';
if($strlen >= $to_len) {
return substr($str, 0, $to_len);
}
$words = explode(' ', $str);
$word_count = count($words);
$space_count = $word_count - 1;
if($word_count == 1) {
return str_pad($str, $to_len, ' ', STR_PAD_BOTH);
}
$space = $to_len - $strlen + $space_count;
$per_space = floor($space/$space_count);
$spaces = str_pad('', $per_space, ' ');
$curr_word = implode($words, $spaces);
while(strlen($curr_word) < $to_len){
$curr_word = substr($curr_word,0,preg_match("[! ][".$spaces."][! ]",$curr_word))." ".preg_match("[! ][".$spaces."][! ]",$curr_word));
}
return $curr_word;
?>
I'm not sure about the regexp, I just meant $spaces and not next space.

Related

How to capitalize first letter of word in php without ucfirst() function

I am trying to capitalize the first letter of word in php without using ucfirst() function But i am not able do it , but i am struggling with this. Please tell me its answer.
<?php
$str ="the resources of earth make life possible on it";
$str[0] = chr(ord($str[0])-32);
$length = strlen($str);
for($pos=0; $pos<$length; $pos++){
if($str[$pos]==' '){
$str[$pos+1] = chr(ord($str[$pos+1])-32);
}
}
echo $str;
?>

Without using the function ucfirst, you can do it like this:
$firstLetter = substr($word, 0, 1);
$restOfWord = substr($word, 1);
$firstLetter = strtoupper($firstLetter);
$restOfWord = strtolower($restOfWord);
print "{$firstLetter}{$restOfWord}\n";
To do it for each word, use explode(' ', $string) to get an array of words, or preg_split('#\\s+#', $string, -1, PREG_SPLIT_NO_EMPTY) for better results.
I would advise against just subtracting 32 from the first character of the next word:
you do not know it is a letter
you do not know it isn't already capitalized
you do not know it exists
you do not know it is not another space
At the very least check that its ord() value lies between ord('A') and ord('Z').
To do this all without case-changing functions, you'd do
$text = implode(' ',
array_map(
function($word) {
$firstLetter = substr($word, 0, 1);
if ($firstLetter >= 'a' && $firstLetter <= 'z') {
$firstLetter = chr(ord($firstLetter)-32);
}
$restOfWord = substr($word, 1);
$len = strlen($restOfWord);
for ($i = 0; $i < $len; $i++) {
if ($restOfWord[$i] >= 'A' && $restOfWord[$i] <= 'Z') {
$restOfWord[$i] = chr(ord(restOfWord[$i])+32);
}
}
return $firstLetter . $restOfWord;
},
preg_split('#\\s+#', $originalText, -1, PREG_SPLIT_NO_EMPTY)
)
);

as such...
$str ="the resources of earth make life possible on it";
$words=array_map(static fn($a) => ucfirst($a), explode(' ', $str));
echo implode(' ', $words);
with ord and chr
$ordb=ord('b'); //98
$capitalB=chr(98-32); //B
$ordA=ord('a'); //97
$caiptalA=chr(97-32); //A
//so
function capitalize(string $word)
{
$newWord = '';
$previousCharIsEmpty = true;
$length = strlen($word);
for ($a = 0; $a < $length; $a++) {
if ($word[$a] === ' ') {
$newWord .= ' ';
$previousCharIsEmpty = true;
} else {
if ($previousCharIsEmpty === true) {
$ord = ord($word[$a]);
$char = chr($ord - 32);
$newWord .= $char;
$previousCharIsEmpty = false;
} else {
$newWord .= $word[$a];
}
$previousCharIsEmpty = false;
}
return $newWord;
}
$word = 'this for example by dilo abininyeri';
echo capitalize($word);
and output
This For Example By Dilo Abininyeri

We cannot do this without any function. We have to use some function. Like you have applied the for-loop and for strlen function.
<?php
$str ="the resources of earth make life possible on it";
$str[0] = chr(ord($str[0])-32);
$length = strlen($str);
for($pos=0; $pos<$length; $pos++){
if($str[$pos]==' '){
$str[$pos+1] = chr(ord($str[$pos+1])-32);
}
}
echo $str;
?>

keyword highlight is highlighting the highlights in PHP preg_replace()

I have a small search engine doing its thing, and want to highlight the results. I thought I had it all worked out till a set of keywords I used today blew it out of the water.
The issue is that preg_replace() is looping through the replacements, and later replacements are replacing the text I inserted into previous ones. Confused? Here is my pseudo function:
public function highlightKeywords ($data, $keywords = array()) {
$find = array();
$replace = array();
$begin = "<span class=\"keywordHighlight\">";
$end = "</span>";
foreach ($keywords as $kw) {
$find[] = '/' . str_replace("/", "\/", $kw) . '/iu';
$replace[] = $begin . "\$0" . $end;
}
return preg_replace($find, $replace, $data);
}
OK, so it works when searching for "fred" and "dagg" but sadly, when searching for "class" and "lass" and "as" it strikes a real issue when highlighting "Joseph's Class Group"
Joseph's <span class="keywordHighlight">Cl</span><span <span c<span <span class="keywordHighlight">cl</span>ass="keywordHighlight">lass</span>="keywordHighlight">c<span <span class="keywordHighlight">cl</span>ass="keywordHighlight">lass</span></span>="keywordHighlight">ass</span> Group
How would I get the latter replacements to only work on the non-HTML components, but to also allow the tagging of the whole match? e.g. if I was searching for "cla" and "lass" I would want "class" to be highlighted in full as both the search terms are in it, even though they overlap, and the highlighting that was applied to the first match has "class" in it, but that shouldn't be highlighted.
Sigh.
I would rather use a PHP solution than a jQuery (or any client-side) one.
Note: I have tried to sort the keywords by length, doing the long ones first, but that means the cross-over searches do not highlight, meaning with "cla" and "lass" only part of the word "class" would highlight, and it still murdered the replacement tags :(
EDIT: I have messed about, starting with pencil & paper, and wild ramblings, and come up with some very unglamorous code to solve this issue. It's not great, so suggestions to trim/speed this up would still be greatly appreciated :)
public function highlightKeywords ($data, $keywords = array()) {
$find = array();
$replace = array();
$begin = "<span class=\"keywordHighlight\">";
$end = "</span>";
$hits = array();
foreach ($keywords as $kw) {
$offset = 0;
while (($pos = stripos($data, $kw, $offset)) !== false) {
$hits[] = array($pos, $pos + strlen($kw));
$offset = $pos + 1;
}
}
if ($hits) {
usort($hits, function($a, $b) {
if ($a[0] == $b[0]) {
return 0;
}
return ($a[0] < $b[0]) ? -1 : 1;
});
$thisthat = array(0 => $begin, 1 => $end);
for ($i = 0; $i < count($hits); $i++) {
foreach ($thisthat as $key => $val) {
$pos = $hits[$i][$key];
$data = substr($data, 0, $pos) . $val . substr($data, $pos);
for ($j = 0; $j < count($hits); $j++) {
if ($hits[$j][0] >= $pos) {
$hits[$j][0] += strlen($val);
}
if ($hits[$j][1] >= $pos) {
$hits[$j][1] += strlen($val);
}
}
}
}
}
return $data;
}

I've used the following to address this problem:
<?php
$protected_matches = array();
function protect(&$matches) {
global $protected_matches;
return "\0" . array_push($protected_matches, $matches[0]) . "\0";
}
function restore(&$matches) {
global $protected_matches;
return '<span class="keywordHighlight">' .
$protected_matches[$matches[1] - 1] . '</span>';
}
preg_replace_callback('/\x0(\d+)\x0/', 'restore',
preg_replace_callback($patterns, 'protect', $target_string));
The first preg_replace_callback pulls out all matches and replaces them with nul-byte-wrapped placeholders; the second pass replaces them with the span tags.
Edit: Forgot to mention that $patterns was sorted by string length, longest to shortest.
Edit; another solution
<?php
function highlightKeywords($data, $keywords = array(),
$prefix = '<span class="hilite">', $suffix = '</span>') {
$datacopy = strtolower($data);
$keywords = array_map('strtolower', $keywords);
$start = array();
$end = array();
foreach ($keywords as $keyword) {
$offset = 0;
$length = strlen($keyword);
while (($pos = strpos($datacopy, $keyword, $offset)) !== false) {
$start[] = $pos;
$end[] = $offset = $pos + $length;
}
}
if (!count($start)) return $data;
sort($start);
sort($end);
// Merge and sort start/end using negative values to identify endpoints
$zipper = array();
$i = 0;
$n = count($end);
while ($i < $n)
$zipper[] = count($start) && $start[0] <= $end[$i]
? array_shift($start)
: -$end[$i++];
// EXAMPLE:
// [ 9, 10, -14, -14, 81, 82, 86, -86, -86, -90, 99, -103 ]
// take 9, discard 10, take -14, take -14, create pair,
// take 81, discard 82, discard 86, take -86, take -86, take -90, create pair
// take 99, take -103, create pair
// result: [9,14], [81,90], [99,103]
// Generate non-overlapping start/end pairs
$a = array_shift($zipper);
$z = $x = null;
while ($x = array_shift($zipper)) {
if ($x < 0)
$z = $x;
else if ($z) {
$spans[] = array($a, -$z);
$a = $x;
$z = null;
}
}
$spans[] = array($a, -$z);
// Insert the prefix/suffix in the start/end locations
$n = count($spans);
while ($n--)
$data = substr($data, 0, $spans[$n][0])
. $prefix
. substr($data, $spans[$n][0], $spans[$n][1] - $spans[$n][0])
. $suffix
. substr($data, $spans[$n][1]);
return $data;
}

I had to revisit this subject myself today and wrote a better version of the above. I'll include it here. It's the same idea only easier to read and should perform better since it uses arrays instead of concatenation.
<?php
function highlight_range_sort($a, $b) {
$A = abs($a);
$B = abs($b);
if ($A == $B)
return $a < $b ? 1 : 0;
else
return $A < $B ? -1 : 1;
}
function highlightKeywords($data, $keywords = array(),
$prefix = '<span class="highlight">', $suffix = '</span>') {
$datacopy = strtolower($data);
$keywords = array_map('strtolower', $keywords);
// this will contain offset ranges to be highlighted
// positive offset indicates start
// negative offset indicates end
$ranges = array();
// find start/end offsets for each keyword
foreach ($keywords as $keyword) {
$offset = 0;
$length = strlen($keyword);
while (($pos = strpos($datacopy, $keyword, $offset)) !== false) {
$ranges[] = $pos;
$ranges[] = -($offset = $pos + $length);
}
}
if (!count($ranges))
return $data;
// sort offsets by abs(), positive
usort($ranges, 'highlight_range_sort');
// combine overlapping ranges by keeping lesser
// positive and negative numbers
$i = 0;
while ($i < count($ranges) - 1) {
if ($ranges[$i] < 0) {
if ($ranges[$i + 1] < 0)
array_splice($ranges, $i, 1);
else
$i++;
} else if ($ranges[$i + 1] < 0)
$i++;
else
array_splice($ranges, $i + 1, 1);
}
// create substrings
$ranges[] = strlen($data);
$substrings = array(substr($data, 0, $ranges[0]));
for ($i = 0, $n = count($ranges) - 1; $i < $n; $i += 2) {
// prefix + highlighted_text + suffix + regular_text
$substrings[] = $prefix;
$substrings[] = substr($data, $ranges[$i], -$ranges[$i + 1] - $ranges[$i]);
$substrings[] = $suffix;
$substrings[] = substr($data, -$ranges[$i + 1], $ranges[$i + 2] + $ranges[$i + 1]);
}
// join and return substrings
return implode('', $substrings);
}
// Example usage:
echo highlightKeywords("This is a test.\n", array("is"), '(', ')');
echo highlightKeywords("Classes are as hard as they say.\n", array("as", "class"), '(', ')');
// Output:
// Th(is) (is) a test.
// (Class)es are (as) hard (as) they say.

OP - something that's not clear in the question is whether $data can contain HTML from the get-go. Can you clarify this?
If $data can contain HTML itself, you are getting into the realms attempting to parse a non-regular language with a regular language parser, and that's not going to work out well.
In such a case, I would suggest loading the $data HTML into a PHP DOMDocument, getting hold of all of the textNodes and running one of the other perfectly good answers on the contents of each text block in turn.

How to search array of string in another string in PHP?

Firstly, I want to inform that, what I need is the reverse of in_array PHP function.
I need to search all items of array in the string if any of them found, function will return true otherwise return false.
I need the fastest solution to this problem, off course this can be succeeded by iterating the array and using the strpos function.
Any suggestions are welcome.
Example Data:
$string = 'Alice goes to school every day';
$searchWords = array('basket','school','tree');
returns true
$string = 'Alice goes to school every day';
$searchWords = array('basket','cat','tree');
returns false

You should try with a preg_match:
if (preg_match('/' . implode('|', $searchWords) . '/', $string)) return true;
After some comments here a properly escaped solution:
function contains($string, Array $search, $caseInsensitive = false) {
$exp = '/'
. implode('|', array_map('preg_quote', $search))
. ($caseInsensitive ? '/i' : '/');
return preg_match($exp, $string) ? true : false;
}

function searchWords($string,$words)
{
foreach($words as $word)
{
if(stristr($string," " . $word . " ")) //spaces either side to force a word
{
return true;
}
}
return false;
}
Usage:
$string = 'Alice goes to school every day';
$searchWords = array('basket','cat','tree');
if(searchWords($string,$searchWords))
{
//matches
}
Also take note that the function stristr is used to make it not case-sensitive

As per the example of malko, but with properly escaping the values.
function contains( $string, array $search ) {
return 0 !== preg_match(
'/' . implode( '|', preg_quote( $search, '/' ) ) . '/',
$string
);
}

If string can be exploded using space following will work:
var_dump(array_intersect(explode(' ', $str), $searchWords) != null);
OUTPUT: for 2 examples you've provided:
bool(true)
bool(false)
Update:
If string cannot be exploded using space character, then use code like this to split string on any end of word character:
var_dump(array_intersect(preg_split('~\b~', $str), $searchWords) != null);

There is always debate over what is faster so I thought I'd run some tests using different methods.
Tests Run:
strpos
preg_match with foreach loop
preg_match with regex or
indexed search with string to explode
indexed search as array (string already exploded)
Two sets of tests where run. One on a large text document (114,350 words) and one on a small text document (120 words). Within each set, all tests were run 100 times and then an average was taken. Tests did not ignore case, which doing so would have made them all faster. Test for which the index was searched were pre-indexed. I wrote the code for indexing myself, and I'm sure it was less efficient, but indexing for the large file took 17.92 seconds and for the small file it took 0.001 seconds.
Terms searched for included: gazerbeam (NOT found in the document), legally (found in the document), and target (NOT found in the document).
Results in seconds to complete a single test, sorted by speed:
Large File:
0.0000455808639526 (index without explode)
0.0009979915618897 (preg_match using regex or)
0.0011657214164734 (strpos)
0.0023632574081421 (preg_match using foreach loop)
0.0051533532142639 (index with explode)
Small File
0.000003724098205566 (strpos)
0.000005958080291748 (preg_match using regex or)
0.000012607574462891 (preg_match using foreach loop)
0.000021204948425293 (index without explode)
0.000060625076293945 (index with explode)
Notice that strpos is faster than preg_match (using regex or) for small files, but slower for large files. Other factors, such as the number of search terms will of course affect this.
Algorithms Used:
//strpos
$str = file_get_contents('text.txt');
$t = microtime(true);
foreach ($search as $word) if (strpos($str, $word)) break;
$strpos += microtime(true) - $t;
//preg_match
$str = file_get_contents('text.txt');
$t = microtime(true);
foreach ($search as $word) if (preg_match('/' . preg_quote($word) . '/', $str)) break;
$pregmatch += microtime(true) - $t;
//preg_match (regex or)
$str = file_get_contents('text.txt');
$orstr = preg_quote(implode('|', $search));
$t = microtime(true);
if preg_match('/' . $orstr . '/', $str) {};
$pregmatchor += microtime(true) - $t;
//index with explode
$str = file_get_contents('textindex.txt');
$t = microtime(true);
$ar = explode(" ", $str);
foreach ($search as $word) {
$start = 0;
$end = count($ar);
do {
$diff = $end - $start;
$pos = floor($diff / 2) + $start;
$temp = $ar[$pos];
if ($word < $temp) {
$end = $pos;
} elseif ($word > $temp) {
$start = $pos + 1;
} elseif ($temp == $word) {
$found = 'true';
break;
}
} while ($diff > 0);
}
$indexwith += microtime(true) - $t;
//index without explode (already in array)
$str = file_get_contents('textindex.txt');
$found = 'false';
$ar = explode(" ", $str);
$t = microtime(true);
foreach ($search as $word) {
$start = 0;
$end = count($ar);
do {
$diff = $end - $start;
$pos = floor($diff / 2) + $start;
$temp = $ar[$pos];
if ($word < $temp) {
$end = $pos;
} elseif ($word > $temp) {
$start = $pos + 1;
} elseif ($temp == $word) {
$found = 'true';
break;
}
} while ($diff > 0);
}
$indexwithout += microtime(true) - $t;

try this:
$string = 'Alice goes to school every day';
$words = split(" ", $string);
$searchWords = array('basket','school','tree');
for($x = 0,$l = count($words); $x < $l;) {
if(in_array($words[$x++], $searchWords)) {
//....
}
}

Below prints the frequency of number of elements found from the array in the string
function inString($str, $arr, $matches=false)
{
$str = explode(" ", $str);
$c = 0;
for($i = 0; $i<count($str); $i++)
{
if(in_array($str[$i], $arr) )
{$c++;if($matches == false)break;}
}
return $c;
}

Below link will help you : just need to customize as you required.
Check if array element exists in string
customized:
function result_arrayInString($prdterms,208){
if(arrayInString($prdterms,208)){
return true;
}else{
return false;
}
}
This may be helpful to you.

Reverse letters in each word of a string without using native splitting or reversing functions [duplicate]

This question already has answers here:
Reverse the letters in each word of a string
(6 answers)
Closed 1 year ago.
This task has already been asked/answered, but I recently had a job interview that imposed some additional challenges to demonstrate my ability to manipulate strings.
Problem: How to reverse words in a string? You can use strpos(), strlen() and substr(), but not other very useful functions such as explode(), strrev(), etc.
Example:
$string = "I am a boy"
Answer:
I ma a yob
Below is my working coding attempt that took me 2 days [sigh], but there must be a more elegant and concise solution.
Intention:
1. get number of words
2. based on word count, grab each word and store into array
3. loop through array and output each word in reverse order
Code:
$str = "I am a boy";
echo reverse_word($str) . "\n";
function reverse_word($input) {
//first find how many words in the string based on whitespace
$num_ws = 0;
$p = 0;
while(strpos($input, " ", $p) !== false) {
$num_ws ++;
$p = strpos($input, ' ', $p) + 1;
}
echo "num ws is $num_ws\n";
//now start grabbing word and store into array
$p = 0;
for($i=0; $i<$num_ws + 1; $i++) {
$ws_index = strpos($input, " ", $p);
//if no more ws, grab the rest
if($ws_index === false) {
$word = substr($input, $p);
}
else {
$length = $ws_index - $p;
$word = substr($input, $p, $length);
}
$result[] = $word;
$p = $ws_index + 1; //move onto first char of next word
}
print_r($result);
//append reversed words
$str = '';
for($i=0; $i<count($result); $i++) {
$str .= reverse($result[$i]) . " ";
}
return $str;
}
function reverse($str) {
$a = 0;
$b = strlen($str)-1;
while($a < $b) {
swap($str, $a, $b);
$a ++;
$b --;
}
return $str;
}
function swap(&$str, $i1, $i2) {
$tmp = $str[$i1];
$str[$i1] = $str[$i2];
$str[$i2] = $tmp;
}

$string = "I am a boy";
$reversed = "";
$tmp = "";
for($i = 0; $i < strlen($string); $i++) {
if($string[$i] == " ") {
$reversed .= $tmp . " ";
$tmp = "";
continue;
}
$tmp = $string[$i] . $tmp;
}
$reversed .= $tmp;
print $reversed . PHP_EOL;
>> I ma a yob

Whoops! Mis-read the question. Here you go (Note that this will split on all non-letter boundaries, not just space. If you want a character not to be split upon, just add it to $wordChars):
function revWords($string) {
//We need to find word boundries
$wordChars = 'abcdefghijklmnopqrstuvwxyz';
$buffer = '';
$return = '';
$len = strlen($string);
$i = 0;
while ($i < $len) {
$chr = $string[$i];
if (($chr & 0xC0) == 0xC0) {
//UTF8 Characer!
if (($chr & 0xF0) == 0xF0) {
//4 Byte Sequence
$chr .= substr($string, $i + 1, 3);
$i += 3;
} elseif (($chr & 0xE0) == 0xE0) {
//3 Byte Sequence
$chr .= substr($string, $i + 1, 2);
$i += 2;
} else {
//2 Byte Sequence
$i++;
$chr .= $string[$i];
}
}
if (stripos($wordChars, $chr) !== false) {
$buffer = $chr . $buffer;
} else {
$return .= $buffer . $chr;
$buffer = '';
}
$i++;
}
return $return . $buffer;
}
Edit: Now it's a single function, and stores the buffer naively in reversed notation.
Edit2: Now handles UTF8 characters (just add "word" characters to the $wordChars string)...

My answer is to count the string length, split the letters into an array and then, loop it backwards. This is also a good way to check if a word is a palindrome. This can only be used for regular string and numbers.
preg_split can be changed to explode() as well.
/**
* Code snippet to reverse a string (LM)
*/
$words = array('one', 'only', 'apple', 'jobs');
foreach ($words as $d) {
$strlen = strlen($d);
$splits = preg_split('//', $d, -1, PREG_SPLIT_NO_EMPTY);
for ($i = $strlen; $i >= 0; $i=$i-1) {
#$reverse .= $splits[$i];
}
echo "Regular: {$d}".PHP_EOL;
echo "Reverse: {$reverse}".PHP_EOL;
echo "-----".PHP_EOL;
unset($reverse);
}

Without using any function.
$string = 'I am a boy';
$newString = '';
$temp = '';
$i = 0;
while(#$string[$i] != '')
{
if($string[$i] == ' ') {
$newString .= $temp . ' ';
$temp = '';
}
else {
$temp = $string[$i] . $temp;
}
$i++;
}
$newString .= $temp . ' ';
echo $newString;
Output: I ma a yob

PHP Multiple Occurences Of Words Within A String

I need to check a string to see if any word in it has multiple occurences. So basically I will accept:
"google makes love"
but I don't accept:
"google makes google love" or "google makes love love google" etc.
Any ideas? Really don't know any way to approach this, any help would be greatly appreciated.

Based on Wicked Flea code:
function single_use_of_words($str) {
$words = explode(' ', trim($str)); //Trim to prevent any extra blank
if (count(array_unique($words)) == count($words)) {
return true; //Same amount of words
}
return false;
}

Try this:
function single_use_of_words($str) {
$words = explode(' ', $str);
$words = array_unique($words);
return implode(' ', $words);
}

No need for loops or arrays:
<?php
$needle = 'cat';
$haystack = 'cat in the cat hat';
if ( occursMoreThanOnce($haystack, $needle) ) {
echo 'Success';
}
function occursMoreThanOnce($haystack, $needle) {
return strpos($haystack, $needle) !== strrpos($haystack, $needle);
}
?>

<?php
$words = preg_split('\b', $string, PREG_SPLIT_NO_EMPTY);
$wordsUnique = array_unique($words);
if (count($words) != count($wordsUnique)) {
echo 'Duplicate word found!';
}
?>

The regular expression way would definitely be my choice.
I did a little test on a string of 320 words with Veynom's function and a regular expression
function preg( $txt ) {
return !preg_match( '/\b(\w+)\b.*?\1/', $txt );
}
Here's the test
$time['preg'] = microtime( true );
for( $i = 0; $i < 1000; $i++ ) {
preg( $txt );
}
$time['preg'] = microtime( true ) - $time['preg'];
$time['veynom-thewickedflea'] = microtime( true );
for( $i = 0; $i < 1000; $i++ ) {
single_use_of_words( $txt );
}
$time['veynom-thewickedflea'] = microtime( true ) - $time['veynom-thewickedflea'];
print_r( $time );
And here's the result I got
Array
(
[preg] => 0.197616815567
[veynom-thewickedflea] => 0.487532138824
)
Which suggests that the RegExp solution, as well as being a lot more concise is more than twice as fast. ( for a string of 320 words anr 1000 iterations )
When I run the test over 10 000 iterations I get
Array
(
[preg] => 1.51235699654
[veynom-thewickedflea] => 4.99487900734
)
The non RegExp solution also uses a lot more memory.
So.. Regular Expressions for me cos they've got a full tank of gas
EDIT
The text I tested against has duplicate words, If it doesn't, the results may be different. I'll post another set of results.
Update
With the duplicates stripped out ( now 186 words ) the results for 1000 iterations is:
Array
(
[preg] => 0.235826015472
[veynom-thewickedflea] => 0.2528860569
)
About evens

function Accept($str)
{
$words = explode(" ", trim($str));
$len = count($words);
for ($i = 0; $i < $len; $i++)
{
for ($p = 0; $p < $len; $p++)
{
if ($p != $i && $words[$i] == $words[$p])
{
return false;
}
}
}
return true;
}
EDIT
Entire test script. Note, when printing "false" php just prints nothing but true is printed as "1".
<?php
function Accept($str)
{
$words = explode(" ", trim($str));
$len = count($words);
for ($i = 0; $i < $len; $i++)
{
for ($p = 0; $p < $len; $p++)
{
if ($p != $i && $words[$i] == $words[$p])
{
return false;
}
}
}
return true;
}
echo Accept("google makes love"), ", ", Accept("google makes google love"), ", ",
Accept("google makes love love google"), ", ", Accept("babe health insurance babe");
?>
Prints the correct output:
1, , ,

This seems fairly fast. It would be interesting to see (for all the answers) how the memory usage and time taken increase as you increase the length of the input string.
function check($str) {
//remove double spaces
$c = 1;
while ($c) $str = str_replace(' ', ' ', $str, $c);
//split into array of words
$words = explode(' ', $str);
foreach ($words as $key => $word) {
//remove current word from array
unset($words[$key]);
//if it still exists in the array it must be duplicated
if (in_array($word, $words)) {
return false;
}
}
return true;
}
Edit
Fixed issue with multiple spaces. I'm not sure whether it is better to remove these at the start (as I have) or check each word is non-empty in the foreach.

The simplest method is to loop through each word and check against all previous words for duplicates.

Regular expression with backreferencing
http://www.regular-expressions.info/php.html
http://www.regular-expressions.info/named.html

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Justify string algorithm [closed] - php

Related

How to capitalize first letter of word in php without ucfirst() function

keyword highlight is highlighting the highlights in PHP preg_replace()

How to search array of string in another string in PHP?

Reverse letters in each word of a string without using native splitting or reversing functions [duplicate]

PHP Multiple Occurences Of Words Within A String

Categories

Resources