Need Help Optimizing php string difference function - php

I created this function and it works on small strings, but for longer strings it times-out. I'm looking for a way to make the function work faster and not timeout, or a better way to accomplish what I want.
function find_diffs($string1, $string2)
{
$array1 = preg_split("/\b/", $string1);
$array2 = preg_split("/\b/", $string2);
$array3 = array();
for($i=0, $j=0; $i < count($array1) || $j < count($array2); $i++, $j++)
{
while(badchars($array1, $i))
{
$i++;
}
while(badchars($array2, $j))
{
$j++;
}
if($array1[$i] != $array2[$j])
{
//-------------------------Find Subtractions--------------------//
$k = $i;
while($array1[$i] != $array2[$j])
{
$i++;
if($i == count($array1))
{
$end = true;
break;
}
while(badchars($array1, $i))
{
$i++;
}
}
if($end)
{
//-------------------------Find Additions--------------------//
$end = false;
$i = $k;
$k = $j;
while($array1[$i] != $array2[$j])
{
$j++;
if($j == count($array2))
{
$end = true;
break;
}
while(badchars($array2, $j))
{
$j++;
}
}
if($end)
{
//-------------------------Find Changes--------------------//
$end = false;
$j = $k;
$l = $i;
while($array1[$i] != $array2[$j])
{
$k = $j;
while($array1[$i] != $array2[$j])
{
$j++;
if($j == count($array2))
{
$end = true;
break;
}
while(badchars($array2, $j))
{
$j++;
}
}
if($end)
{
$j = $k;
$i++;
while(badchars($array1, $i))
{
$i++;
}
while(badchars($array2, $j))
{
$j++;
}
}
else
{
$array3[] = array($l,$i,'-');
$array3[] = array($k,$j,'+');
}
if($i == count($array1))
{
$end = true;
break;
}
if($j == count($array2))
{
$end = true;
break;
}
$end=false;
}
if($end)
{
break;
}
else
{
$array3[] = array($l,$i,'-');
$array3[] = array($k,$j,'+');
}
//---------------------End Find Changes--------------------//
}
else
{
$array3[] = array($k,$j,'+');
}
}
else
{
$array3[] = array($k,$i,'-');
}
}
}
$array3[] = array(0,count($array1),'=');
return array($array1,$array2,$array3);
}

Don't reinvent the wheel. This is the sort of thing that is easy to get wrong and hard to get right.
Check out the Text_Diff Pear Package. I have used it for this sort of thing and it is very well done.

Related

Solution for Missing integer problem of codility in php 100% score

Problem Find the smallest positive integer that does not occur in a given sequence.
So what is the best implementation in PHP for this problem of codility!
Solution below results 66%, causing performance issue.
function solution($A)
{
sort($A);
$end = count($A);
$flag = false;
for ($k = 0; $flag == false; $k++, $flag = false) {
for ($i = 0; $i < $end; $i++) {
if ($k + 1 == $A[$i]) {
$flag = $A[$i];
break;
}
}
if($flag == false){
return $k +1;
}
}
}
A simple solution using an associative array as a set:
function solution($A) {
$set = array_flip($A);
for ($n = 1; ; ++$n) {
if (!isset($set[$n])) {
return $n;
}
}
}
Here is the Best solution for the codility problem implemented in PHP, scoring 100%
function solution($A)
{
sort($A);
$end = count($A);
$flag = false;
for ($k = 1, $i = 0; $i < $end; $i++) {
if ($A[$i] == $k) {
$k++;
continue;
} elseif ($A[$i] < $k)
continue;
else return $k;
}
return $k;
}

Optimize substrings anagram compare algorithm

Im trying to solve one challenge where you have to check all string substrings are they anagrams. The condition is basically For S=abba, anagramic pairs are: {S[1,1],S[4,4]}, {S[1,2],S[3,4]}, {S[2,2],S[3,3]} and {S[1,3],S[2,4]}
Problem is that I have string with 100 chars and execution time should be below 9 secs. My time is around 50 secs... Below is my code, I will appreciate any advice - if you give me only directions or pseudo code it is even better.
$time1 = microtime(true);
$string = 'abdcasdabvdvafsgfdsvafdsafewsrgsdcasfsdfgxccafdsgccafsdgsdcascdsfsdfsdgfadasdgsdfawdascsdsasdasgsdfs';
$arr = [];
$len = strlen($string);
for ($i = 0; $i < strlen($string); $i++) {
if ($i === 0) {
for ($j = 1; $j <= $len - 1; $j++) {
$push = substr($string, $i, $j);
array_push($arr, $push);
}
} else {
for ($j = 1; $j <= $len - $i; $j++) {
$push = substr($string, $i, $j);
array_push($arr, $push);
}
}
}
$br = 0;
$arrLength = count($arr);
foreach ($arr as $key => $val) {
if ($key === count($arr) - 1) {
break;
}
for ($k = $key + 1; $k < $arrLength; $k++) {
if (is_anagram($val, $arr[$k]) === true) {
$br++;
}
}
}
echo $br."</br>";
function is_anagram($a, $b)
{
$result = (count_chars($a, 1) == count_chars($b, 1));
return $result;
}
$time2 = microtime(true);
echo "Script execution time: ".($time2-$time1);
Edit:
Hi again, today I had some time so I tried to optimize but couldnt crack this... This is my new code but I think it got worse. Any advanced suggestions ?
<?php
$string = 'abdcasdabvdvafsgfdsvafdsafewsrgsdcasfsdfgxccafdsgccafsdgsdcascdsfsdfsdgfadasdgsdfawdascsdsasdasgsdfs';
$arr = [];
$len = strlen($string);
for ($i = 0; $i < strlen($string); $i++) {
if ($i === 0) {
for ($j = 1; $j <= $len - 1; $j++) {
$push = substr($string, $i, $j);
array_push($arr, $push);
}
} else {
for ($j = 1; $j <= $len - $i; $j++) {
$push = substr($string, $i, $j);
array_push($arr, $push);
}
}
}
$br = 0;
$arrlen = count ($arr);
foreach ($arr as $key => $val) {
if (($key === $arrlen - 1)) {
break;
}
for ($k = $key + 1; $k < $arrlen; $k++) {
$result = stringsCompare($val,$arr[$k]);
if ($result === true)
{
$br++;
}
}
echo $br."\n";
}
function stringsCompare($a,$b)
{
$lenOne = strlen($a);
$lenTwo = strlen ($b);
if ($lenOne !== $lenTwo)
{
return false;
}
else {
$fail = 0;
if ($lenOne === 1) {
if ($a === $b) {
return true;
}
else
{
return false;
}
}
else
{
for ($x = 0; $x < $lenOne; $x++)
{
$position = strpos($b,$a[$x]);
if($position === false)
{
$fail = 1;
break;
}
else
{
$b[$position] = 0;
$fail = 0;
}
}
if ($fail === 1)
{
return false;
}
else
{
return true;
}
}
}
}
?>
You should think of another rule that all anagrams of a certain string can meet. For example, something about the number of occurrences of each character.

Sum of Prime numbers

I have the following code, to output all prime numbers from array. I would like to get the sum of the output in ex: 2+3+5 = 10, Any hint how to get that ?
$n = array(1,2,3,4,5,6);
function prime($n){
for($i=0;$i<= count($n);$i++){
$counter = 0;
for($j=1;$j<=$i;$j++){
if($i % $j==0){
$counter++;
}
}
if($counter == 2){
print $i."<br/>";
}
}
}
print prime($n);
Then this should work for you:
(Here i used $sum which i initialized before the foreach loop and then used the += operator to add the sum together)
<?php
$n = array(1,2,3,4,5,6);
function prime($n){
$sum = 0;
foreach($n as $k => $v) {
$counter = 0;
for($j = 1; $j <= $v; $j++) {
if($v % $j == 0)
$counter++;
}
if($counter == 2) {
echo $v."<br/>";
$sum += $v;
}
}
echo "Sum: " . $sum;
}
prime($n);
?>
Output:
2
3
5
Sum: 10
As #IMSoP commented above, one option is to compile the list of primes into a new array:
$m = [];
// looping code...
// If prime:
array_push( $m, $primeNumber );
Then, when you're done, you can do your printing mechanism:
print implode( "<br />", $m );
And then you can do your summing mechanism:
print "<p>Sum: " . array_sum( $m ) . "</p>";
The added benefit here is you can split out each piece of functionality into it's own function or method (which you should do to have a good design).
try this
<?php
define('N', 200);
function isPrime($num)
{
if ($num == 2 || $num == 3) { return 1; }
if (!($num%2) || $num<1) { return 0; }
for ($n = 3; $n <= $num/2; $n += 2) {
if (!($num%$n)) {
return 0;
}
}
return 1;
}
for ($i = 2; $i <= N; $i++) {
if (isPrime($i)) {
$sum += $i;
}
}
echo $sum;
You can try something like this:
function isPrime($n){
if($n == 1) return false;
if($n == 2) return true;
for($x = 2; $x <= sqrt($n); $x++){
if($n % $x == 0) return false;
}
return true;
}
$sum = 0;
$n = array(1,2,3,4,5,6);
foreach($n as $val){
if(isPrime($val)) {
echo $val . "<br />";
$sum += $val;
}
}
echo "Sum: " . $sum;
<?php
$num = 100;
for($j=2;$j<$num;$j++)
{
for($k=2;$k<$j;$k++)
{
if($j%$k==0)
{
break;
}
}
if($k==$j)
{
$prime_no[]=$j;
}
}
echo "<pre>";
print_r($prime_no);
echo "</pre>";
for($j=0;$j<count($prime_no);$j++)
{
$myprimeAdd = $prime_no[$j] + $prime_no[$j+1];
if(in_array($myprimeAdd,$prime_no))
{
echo "Resultant Prime No:-", $myprimeAdd;
echo nl2br("\n");
break;
}
}
?>

PHP Array Building - How to write it better

Been racking my brains on how to write this better AND make it loop depending on the $count value:
if($count == 2){
$thenode = $tree[$splitnode[0]][$splitnode[1]];
} elseif($count == 3){
$thenode = $tree[$splitnode[0]][$splitnode[1]][$splitnode[2]];
}
Any ideas? Thanks!
$thenode = $tree;
for ($i = 0; $i < $count; $i++) {
$thenode = $thenode[$splitnode[$i]];
}
var_dump($thenode);
or
$thenode = array_reduce(
range(0, $count - 1),
function ($thenode, $i) use ($splitnode) { return $thenode[$splitnode[$i]]; },
$tree
);
or maybe
$thenode = $tree;
foreach ($splitnode as $i => $sn) {
if ($i >= $count) {
break;
}
$thenode = $thenode[$sn];
}

PDF to XML conversion using PHP

I need help to convert PDF to XML using PHP.
There are some sites which claims to do so. But they charge for that.
I have to write my own code in PHP for that.
Being a novice in PHP I don't know how to approach this task.
So if anyone had worked on it plz help me with this.
Any help would be highly appreciated.
PDFX does PDF-to-XML conversion and it's free to use. It might be helpful in your case as it can extract things like images and captions separately.
Example input/output can be found here.
The usage page includes a simple PHP client example.
(Disclosure: It is my system.)
PDF2HTML will convert to HTML or XML (using the -xml flag), but the result is a bit of a mess. You get lots of small chunks of information about the location of fragments of text. No good of you want to extract paragraphs or sections of text. You may be able to isolate images with a suitable XPath?
If you do need paragraphs or sections of text, it appears you have to do it heuristically. Geert's blog has an interesting approach:
Isolating text runs in different zones (like header and footer)
Gathering text runs on the same ‘line’ (ignoring columns here)
Translate indentation to hierarchy (helps finding lists, provides bare table/column handling)
Merging of lines to build paragraphs
You can use this class to pars pdf into string and than work this it =)
<?php class PDF2Text2 {
var $multibyte = 4; //
var $convertquotes = ENT_QUOTES; //
var $showprogress = true; //
var $filename = '';
var $decodedtext = '';
function setFilename($filename) {
// Reset
$this->decodedtext = '';
$this->filename = $filename;
}
function output($echo = false) {
if($echo) echo $this->decodedtext;
else return $this->decodedtext;
}
function setUnicode($input) {
// 4 for unicode. But 2 should work in most cases just fine
if($input == true) $this->multibyte = 4;
else $this->multibyte = 2;
}
function decodePDF() {
// Read the data from pdf file
$infile = #file_get_contents($this->filename, FILE_BINARY);
if (empty($infile))
return "";
// Get all text data.
$transformations = array();
$texts = array();
// Get the list of all objects.
preg_match_all("#obj[\n|\r](.*)endobj[\n|\r]#ismU", $infile . "endobj\r", $objects);
$objects = #$objects[1];
// Select objects with streams.
for ($i = 0; $i < count($objects); $i++) {
$currentObject = $objects[$i];
// Prevent time-out
#set_time_limit ();
if($this->showprogress) { // echo ". ";
flush(); ob_flush();
}
// Check if an object includes data stream.
if (preg_match("#stream[\n|\r](.*)endstream[\n|\r]#ismU", $currentObject . "endstream\r", $stream )) {
$stream = ltrim($stream[1]);
// Check object parameters and look for text data.
$options = $this->getObjectOptions($currentObject);
if (!(empty($options["Length1"]) && empty($options["Type"]) && empty($options["Subtype"])) )
if ( $options["Image"] && $options["Subtype"] )
if (!(empty($options["Length1"]) && empty($options["Subtype"])) )
continue;
// Hack, length doesnt always seem to be correct
unset($options["Length"]);
// So, we have text data. Decode it.
$data = $this->getDecodedStream($stream, $options);
if (strlen($data)) {
if (preg_match_all("#BT[\n|\r](.*)ET[\n|\r]#ismU", $data . "ET\r", $textContainers)) {
$textContainers = #$textContainers[1];
$this->getDirtyTexts($texts, $textContainers);
} else
$this->getCharTransformations($transformations, $data);
}
}
}
// Analyze text blocks taking into account character transformations and return results.
$this->decodedtext = $this->getTextUsingTransformations($texts, $transformations);
}
function decodeAsciiHex($input) {
$output = "";
$isOdd = true;
$isComment = false;
for($i = 0, $codeHigh = -1; $i < strlen($input) && $input[$i] != '>'; $i++) {
$c = $input[$i];
if($isComment) {
if ($c == '\r' || $c == '\n')
$isComment = false;
continue;
}
switch($c) {
case '\0': case '\t': case '\r': case '\f': case '\n': case ' ': break;
case '%':
$isComment = true;
break;
default:
$code = hexdec($c);
if($code === 0 && $c != '0')
return "";
if($isOdd)
$codeHigh = $code;
else
$output .= chr($codeHigh * 16 + $code);
$isOdd = !$isOdd;
break;
}
}
if($input[$i] != '>')
return "";
if($isOdd)
$output .= chr($codeHigh * 16);
return $output;
}
function decodeAscii85($input) {
$output = "";
$isComment = false;
$ords = array();
for($i = 0, $state = 0; $i < strlen($input) && $input[$i] != '~'; $i++) {
$c = $input[$i];
if($isComment) {
if ($c == '\r' || $c == '\n')
$isComment = false;
continue;
}
if ($c == '\0' || $c == '\t' || $c == '\r' || $c == '\f' || $c == '\n' || $c == ' ')
continue;
if ($c == '%') {
$isComment = true;
continue;
}
if ($c == 'z' && $state === 0) {
$output .= str_repeat(chr(0), 4);
continue;
}
if ($c < '!' || $c > 'u')
return "";
$code = ord($input[$i]) & 0xff;
$ords[$state++] = $code - ord('!');
if ($state == 5) {
$state = 0;
for ($sum = 0, $j = 0; $j < 5; $j++)
$sum = $sum * 85 + $ords[$j];
for ($j = 3; $j >= 0; $j--)
$output .= chr($sum >> ($j * 8));
}
}
if ($state === 1)
return "";
elseif ($state > 1) {
for ($i = 0, $sum = 0; $i < $state; $i++)
$sum += ($ords[$i] + ($i == $state - 1)) * pow(85, 4 - $i);
for ($i = 0; $i < $state - 1; $i++) {
try {
if(false == ($o = chr($sum >> ((3 - $i) * 8)))) {
throw new Exception('Error');
}
$output .= $o;
} catch (Exception $e) { /*Dont do anything*/ }
}
}
return $output;
}
function decodeFlate($data) {
return #gzuncompress($data);
}
function getObjectOptions($object) {
$options = array();
if (preg_match("#<<(.*)>>#ismU", $object, $options)) {
$options = explode("/", $options[1]);
#array_shift($options);
$o = array();
for ($j = 0; $j < #count($options); $j++) {
$options[$j] = preg_replace("#\s+#", " ", trim($options[$j]));
if (strpos($options[$j], " ") !== false) {
$parts = explode(" ", $options[$j]);
$o[$parts[0]] = $parts[1];
} else
$o[$options[$j]] = true;
}
$options = $o;
unset($o);
}
return $options;
}
function getDecodedStream($stream, $options) {
$data = "";
if (empty($options["Filter"]))
$data = $stream;
else {
$length = !empty($options["Length"]) ? $options["Length"] : strlen($stream);
$_stream = substr($stream, 0, $length);
foreach ($options as $key => $value) {
if ($key == "ASCIIHexDecode")
$_stream = $this->decodeAsciiHex($_stream);
elseif ($key == "ASCII85Decode")
$_stream = $this->decodeAscii85($_stream);
elseif ($key == "FlateDecode")
$_stream = $this->decodeFlate($_stream);
elseif ($key == "Crypt") { // TO DO
}
}
$data = $_stream;
}
return $data;
}
function getDirtyTexts(&$texts, $textContainers) {
for ($j = 0; $j < count($textContainers); $j++) {
if (preg_match_all("#\[(.*)\]\s*TJ[\n|\r]#ismU", $textContainers[$j], $parts))
$texts = array_merge($texts, array(#implode('', $parts[1])));
elseif (preg_match_all("#T[d|w|m|f]\s*(\(.*\))\s*Tj[\n|\r]#ismU", $textContainers[$j], $parts))
$texts = array_merge($texts, array(#implode('', $parts[1])));
elseif (preg_match_all("#T[d|w|m|f]\s*(\[.*\])\s*Tj[\n|\r]#ismU", $textContainers[$j], $parts))
$texts = array_merge($texts, array(#implode('', $parts[1])));
}
}
function getCharTransformations(&$transformations, $stream) {
preg_match_all("#([0-9]+)\s+beginbfchar(.*)endbfchar#ismU", $stream, $chars, PREG_SET_ORDER);
preg_match_all("#([0-9]+)\s+beginbfrange(.*)endbfrange#ismU", $stream, $ranges, PREG_SET_ORDER);
for ($j = 0; $j < count($chars); $j++) {
$count = $chars[$j][1];
$current = explode("\n", trim($chars[$j][2]));
for ($k = 0; $k < $count && $k < count($current); $k++) {
if (preg_match("#<([0-9a-f]{2,4})>\s+<([0-9a-f]{4,512})>#is", trim($current[$k]), $map))
$transformations[str_pad($map[1], 4, "0")] = $map[2];
}
}
for ($j = 0; $j < count($ranges); $j++) {
$count = $ranges[$j][1];
$current = explode("\n", trim($ranges[$j][2]));
for ($k = 0; $k < $count && $k < count($current); $k++) {
if (preg_match("#<([0-9a-f]{4})>\s+<([0-9a-f]{4})>\s+<([0-9a-f]{4})>#is", trim($current[$k]), $map)) {
$from = hexdec($map[1]);
$to = hexdec($map[2]);
$_from = hexdec($map[3]);
for ($m = $from, $n = 0; $m <= $to; $m++, $n++)
$transformations[sprintf("%04X", $m)] = sprintf("%04X", $_from + $n);
} elseif (preg_match("#<([0-9a-f]{4})>\s+<([0-9a-f]{4})>\s+\[(.*)\]#ismU", trim($current[$k]), $map)) {
$from = hexdec($map[1]);
$to = hexdec($map[2]);
$parts = preg_split("#\s+#", trim($map[3]));
for ($m = $from, $n = 0; $m <= $to && $n < count($parts); $m++, $n++)
$transformations[sprintf("%04X", $m)] = sprintf("%04X", hexdec($parts[$n]));
}
}
}
}
function getTextUsingTransformations($texts, $transformations) {
$document = "";
for ($i = 0; $i < count($texts); $i++) {
$isHex = false;
$isPlain = false;
$hex = "";
$plain = "";
for ($j = 0; $j < strlen($texts[$i]); $j++) {
$c = $texts[$i][$j];
switch($c) {
case "<":
$hex = "";
$isHex = true;
$isPlain = false;
break;
case ">":
$hexs = str_split($hex, $this->multibyte); // 2 or 4 (UTF8 or ISO)
for ($k = 0; $k < count($hexs); $k++) {
$chex = str_pad($hexs[$k], 4, "0"); // Add tailing zero
if (isset($transformations[$chex]))
$chex = $transformations[$chex];
$document .= html_entity_decode("&#x".$chex.";");
}
$isHex = false;
break;
case "(":
$plain = "";
$isPlain = true;
$isHex = false;
break;
case ")":
$document .= $plain;
$isPlain = false;
break;
case "\\":
$c2 = $texts[$i][$j + 1];
if (in_array($c2, array("\\", "(", ")"))) $plain .= $c2;
elseif ($c2 == "n") $plain .= '\n';
elseif ($c2 == "r") $plain .= '\r';
elseif ($c2 == "t") $plain .= '\t';
elseif ($c2 == "b") $plain .= '\b';
elseif ($c2 == "f") $plain .= '\f';
elseif ($c2 >= '0' && $c2 <= '9') {
$oct = preg_replace("#[^0-9]#", "", substr($texts[$i], $j + 1, 3));
$j += strlen($oct) - 1;
$plain .= html_entity_decode("&#".octdec($oct).";", $this->convertquotes);
}
$j++;
break;
default:
if ($isHex)
$hex .= $c;
elseif ($isPlain)
$plain .= $c;
break;
}
}
$document .= "\n";
}
return $document;
}}?>

Categories