php Compare two large text files with each other displaying the difference - php

Does anyone have any good ways of comparing two large (9000+Lines) of files and highlighting the differences between the two?
The few things i found online seem to choke and die off when i throw in large files.

You can use the Text_Diff pear package for comparing the difference between 2 text files.
There is also the xdiff extension available that you can use with xdiff_file_diff function like below:
xdiff_file_diff('old_file.txt', 'new_file.txt', 'diff.txt');
Where diff.txt would be the resulting file with the comparison between the two files.
Also you can use xdiff_file_diff function for comparing PHP files like below:
$old_version = 'my_script.php';
$new_version = 'my_new_script.php';
xdiff_file_diff($old_version, $new_version, 'my_script.diff', 2);
// above code makes unified diff of two php files with context length of 2.

Related

How to avoid php parsing the whole php file and includes and make it only parse what will be used?

I read somewhere that php parses the whole .php file every time it is executed. Some solution was proposed in there (that was not opcache), but I lost the website and I couldn't find it.
Now I have an enormous php website that has many long functions that are often used alone, and it's required that the execution be fast.
To avoid having php parsing all the other functions that won't be used, I was thinking of making a modular design in which the functions, stored in independent php files, will only be included if they will be actually used. But I haven't been able to confirm that php will not parse an include inside of a function or inside of a conditional statement unless it is required. Does php parse those includes?
Example:
<?php
$func_to_execute = $_GET['func'];
$parameter = $_GET['parameter'];
switch($func_to_execute)
{
case 'a':
include 'func_a.php';
$output = func_a($parameter);
break;
case 'b':
include 'func_b.php';
$output = func_b($parameter);
break;
case 'c':
include 'func_c.php';
$output = func_c($parameter);
break;
};
echo $output;
?>
In this example, I would like php to parse only the func_a if I am requesting a, only the func_b if I am requesting b, etcetera. There are in practice more than just 3 functions, and each is a very long algorythm with also very long strings and arrays.
As an alternative to includes I was thinking of making independent php files and execute them and retrieve their output only if they are required, with a shell_exec. But that would take other complexities, like formatting the parameters (I don't have idea of how I would pass a very long string with special characters, or a JSON, as a parameter in the shell) and calling the function to execute in the shell. Would those complexities make it slower than just letting php parse the whole file?
I know about the opcache function. Would it be enough even if all the ops of all the functions will be tested each time?
Are there other ways to make a PHP website modular, and not having php parsing the whole of php files everytime?
Thank you.
since php uses many optimizations and caching apcu i.e. you dont need to care about this
include wont be parsed at load time.. its more like file_get_contents and execute in same context - and these will be optimized by internal php cache
http://php.net/manual/en/intro.apc.php
I made a benchmarking experiment and it seems that php truly does not parse conditional includes. I made the test using the example script mentioned, and defining each as:
func_a: it only declares that the value of the variable $x is the sentence 'war and peace'.
$x = 'war and peace';
func_b: it only declares that the value of the variable $x is the whole text of the novel war and peace, which is approximately 3.2 MB long (the whole text was pasted in the php file). This would be a very long file to parse.
$x = 'War and Peace, by Leo Tolstoy...(the whole novel...)...';
func_c: it contained incorrect syntax, that should immediately launch an error message from php. This was made to guarantee that php was not actually parsing what was not included.
I measured the execution time from another php script with the function shell_exec(). The results were (in seconds):
func_a ≈ 0.122
func_b ≈ 0.152
func_c ≈ 0.119
Therefore I conclude that:
- Includes in a switch statement are not parsed unless they are actually required.
- A syntax mistake in an include (inside a switch statement) will not launch any error if it is not actually required, because it is not parsed.
- Anyway, the difference on the time of the process is very little (about 0.03 extra second for an extra text of 3.3 MB; or crudely said, 0.01 extra second per 1 MB of text to parse). However this might be important to consider if there are many users requesting the website at the same time, and therefore it might be useful to divide in modules (includes) if the script is actually that big. Also the fact that a wrongly written include that was not required be not parsed helps to not launch errors when they aren't relevant.
It seems then for me a good manner to design a modular application in PHP where the modules be extremely big.

PHP library to generate code diff (github style)?

I'm looking for an free php library that can generate code diff HTML. Basically just like GitHub's code diffs pages.
I've been searching all around and can't find anything. Does anyone know of anything out there that does what I'm looking for?
It looks like I found what I'm looking for after doing more Google searches with different wording.
php-diff seems to do exactly what I want. Just a php function that accepts two strings and generates all the HTML do display the diff in a web page.
To add my two cents here...
Unfortunately, there are no really good diff libraries for displaying/generating diffs in PHP. That said, I recently did find a circuitous way to do this using PHP. The solution involved:
A pure JavaScript approach for rendering the Diff
Shelling out to git with PHP to generate the Diff to render
First, there is an excellent JavaScript library for rendering GitHub-style diffs called diff2html. This renders diffs very cleanly and with modern styling. However diff2html requires a true git diff to render as it is intended to literally render git diffs--just like GitHub.
If we let diff2html handle the rendering of the diff, then all we have left to do is create the git diff to have it render.
To do that in PHP, you can shell out to the local git binary running on the server. You can use git to calculate a diff on two arbitrary files using the --no-index option. You can also specify how many lines before/after the found diffs to return with the -U option.
On the server it would look something like this:
// File names to save data to diff in
$leftFile = '/tmp/fileA.txt';
$rightFile = '/tmp/fileB.txt';
file_put_contents($leftFile, $leftData);
file_put_contents($rightFile, $rightData);
// Generate git diff and save shell output
$diff = shell_exec("git diff -U1000 --no-index $leftFile $rightFile");
// Strip off first line of output
$diff = substr($diff, strpos($diff, "\n"));
// Delete the files we just created
unlink($leftFile);
unlink($rightFile);
Then you need to get $diff back to the front-end. You should review the docs for diff2html but the end result will look something like this in JavaScript (assuming you pass $diff as diffString):
function renderDiff(el, diffString) {
var diff2htmlUi = new Diff2HtmlUI({diff: diffString});
diff2htmlUi.draw(el);
}
I think what you're looking for is xdiff.
xdiff extension enables you to create and apply patch files containing differences between different revisions of files.
This extension supports two modes of operation - on strings and on files, as well as two different patch formats - unified and binary. Unified patches are excellent for text files as they are human-readable and easy to review. For binary files like archives or images, binary patches will be adequate choice as they are binary safe and handle non-printable characters well.

How do you EXPLODE CSV line with a comma in value?

"AMZN","Amazon.com, Inc.",211.22,"11/9/2011","4:00pm","-6.77 - -3.11%",4673052
Amazon.com, Inc. is being treated as 2 values instead of one.
I tried this $data = explode( ',', $s);
How do I modify this to avoid the comma in the value issue?
You should probably look into str_getcsv() (or fgetcsv() if you're loading the CSV from a file)
This will read the CSV contents into an array without the need for exploding etc.
Edit- to expand upon the point made by Pekka, if you're using PHP < 5.3 str_getcsv() won't work but there's an interesting approach here which reproduces the functionality for lesser versions. And another approach here which uses fgetcsv() after creating a temporary file.
Use a dedicated CSV library. It's been explained over and over that parsing file formats like CSV manually is asking for trouble, because you don't know all the variations of CSV and all the rules to do it right.

compare 4 or more files

Is there a command line utility or a php/py script that will generate a html diff so that multiple files can be compared in order to compare 4 or more files.
Each of my files have max of 10k lines each.
Note: these files are plain text files . not html . Only contain A-Za-z0-9=., . and no HTML tags
It depends what type of data you're comparing/analyzing.
The basic solution is
file_get_contents gives you strings of the file data
strcmp will do a "binary-safe compare" of the data
You will probably want to explode() your data to delimit it somehow, and compare sections of the data.
Another option is to delimit, loop through, and make a "comparison coefficient" which would indicate to what degree the files deviate from a norm. For example, File 1 has cc=3, file 4 has cc=8. File 4 would be a closer match.
A final problem you'll run into is the memory limit on the server computer. You can change this in php.ini.
//EDIT
Just noticed the diff tag, but I'll leave this up anyway in case it helps somehow.

Find differences between 2 HTML files

Is there a way to display differences between two HTML documents?
There is a PHP class called daisdiff, but it has no documentation. Can anyone show how to use it, or any alternative?
I advise you to use the pear Text_Diff package, the package come with some class and easy extensible, you can write your own "diff" renderer so it's easy to adapt and a lot more easy then parsing the output of the diff command.
here a short code snippet to compare two text files:
include_once "Text/Diff.php";
include_once "Text/Diff/Renderer.php";
// define files to compare
$file1 = "data1.txt";
$file2 = "data2.txt";
// perform diff, print output
$diff = &new Text_Diff(file($file1), file($file2));
$renderer = &new Text_Diff_Renderer();
echo $renderer->render($diff);
There is a UNIX program called diff which is meant just for that purpose. You use it like this:
diff -crB file1 file2
c stands for context. It shows some extra lines around the changed lines so that you can find them more easily.
r stands for recursive. That way you can specify directories as file1 and file2, with all the files therein being compared to each other, too.
B makes it ignore blank lines and their changes.
Let me go find the Windows solution just in case.
Here is a pure php implementation of diff, http://www.holomind.de/phpnet/diff.src.php. If you skip to the bottom of the page there is an example of how to use it.

Categories