PHP library to generate code diff (github style)? - php

I'm looking for an free php library that can generate code diff HTML. Basically just like GitHub's code diffs pages.
I've been searching all around and can't find anything. Does anyone know of anything out there that does what I'm looking for?

It looks like I found what I'm looking for after doing more Google searches with different wording.
php-diff seems to do exactly what I want. Just a php function that accepts two strings and generates all the HTML do display the diff in a web page.

To add my two cents here...
Unfortunately, there are no really good diff libraries for displaying/generating diffs in PHP. That said, I recently did find a circuitous way to do this using PHP. The solution involved:
A pure JavaScript approach for rendering the Diff
Shelling out to git with PHP to generate the Diff to render
First, there is an excellent JavaScript library for rendering GitHub-style diffs called diff2html. This renders diffs very cleanly and with modern styling. However diff2html requires a true git diff to render as it is intended to literally render git diffs--just like GitHub.
If we let diff2html handle the rendering of the diff, then all we have left to do is create the git diff to have it render.
To do that in PHP, you can shell out to the local git binary running on the server. You can use git to calculate a diff on two arbitrary files using the --no-index option. You can also specify how many lines before/after the found diffs to return with the -U option.
On the server it would look something like this:
// File names to save data to diff in
$leftFile = '/tmp/fileA.txt';
$rightFile = '/tmp/fileB.txt';
file_put_contents($leftFile, $leftData);
file_put_contents($rightFile, $rightData);
// Generate git diff and save shell output
$diff = shell_exec("git diff -U1000 --no-index $leftFile $rightFile");
// Strip off first line of output
$diff = substr($diff, strpos($diff, "\n"));
// Delete the files we just created
unlink($leftFile);
unlink($rightFile);
Then you need to get $diff back to the front-end. You should review the docs for diff2html but the end result will look something like this in JavaScript (assuming you pass $diff as diffString):
function renderDiff(el, diffString) {
var diff2htmlUi = new Diff2HtmlUI({diff: diffString});
diff2htmlUi.draw(el);
}

I think what you're looking for is xdiff.
xdiff extension enables you to create and apply patch files containing differences between different revisions of files.
This extension supports two modes of operation - on strings and on files, as well as two different patch formats - unified and binary. Unified patches are excellent for text files as they are human-readable and easy to review. For binary files like archives or images, binary patches will be adequate choice as they are binary safe and handle non-printable characters well.

Related

PHP - Check if pdf contains given text - TcpdfFpdi / pdftk / fpdi

I have a pdf document and I want to check if a specific text occurs (which are tags that I put in while generating the pdf) in the document, however using these libraries (tcpdfFpdi, pdftk or fdpi) I couldn't figure out if it's possible or how to do it.
$str = "{hello}";
$pdf = new TcpdfFpdi();
$pdf->setSourceFile($filePath);
$pdf->searchForText($str); // something like this which returns boolean
If I try without any library to dd(file_get_contents($filePath)), it returns a very long output and doesn't seem to contain the file I want so I think it's better to use one of those libraries.
Just an idea…
It's no actual PHP solution but you could use tools like pdftotext which I know from this post (where a PDF file is converted into a string to count its words): https://superuser.com/a/221367/535203
You can install it and play around with that command and call it from within your PHP application.
As far as I remember (long time ago since I used pdftotext) the output text is not exaclty the PDF's content but to search a few tags in it it's at least a good try.

php Compare two large text files with each other displaying the difference

Does anyone have any good ways of comparing two large (9000+Lines) of files and highlighting the differences between the two?
The few things i found online seem to choke and die off when i throw in large files.
You can use the Text_Diff pear package for comparing the difference between 2 text files.
There is also the xdiff extension available that you can use with xdiff_file_diff function like below:
xdiff_file_diff('old_file.txt', 'new_file.txt', 'diff.txt');
Where diff.txt would be the resulting file with the comparison between the two files.
Also you can use xdiff_file_diff function for comparing PHP files like below:
$old_version = 'my_script.php';
$new_version = 'my_new_script.php';
xdiff_file_diff($old_version, $new_version, 'my_script.diff', 2);
// above code makes unified diff of two php files with context length of 2.

Implementing internationalization (language strings) in a PHP application

I want to build a CMS that can handle fetching locale strings to support internationalization. I plan on storing the strings in a database, and then placing a key/value cache like memcache in between the database and the application to prevent performance drops for hitting the database each page for a translation.
This is more complex than using PHP files with arrays of strings - but that method is incredibly inefficient when you have 2,000 translation lines.
I thought about using gettext, but I'm not sure that users of the CMS will be comfortable working with the gettext files. If the strings are stored in a database, then a nice administration system can be setup to allow them to make changes whenever they want and the caching in RAM will insure that the fetching of those strings is as fast, or faster than gettext. I also don't feel safe using the PHP extension considering not even the zend framework uses it.
Is there anything wrong with this approach?
Update
I thought perhaps I would add more food for thought. One of the problems with string translations it is that they doesn't support dates, money, or conditional statements. However, thanks to intl PHP now has MessageFormatter which is what really needs to be used anyway.
// Load string from gettext file
$string = _("{0} resulted in {1,choice,0#no errors|1#single error|1<{1, number} errors}");
// Format using the current locale
msgfmt_format_message(setlocale(LC_ALL, 0), $string, array('Update', 3));
On another note, one of the things I don't like about gettext is that the text is embedded into the application all over the place. That means that the team responsible for the primary translation (usually English) has to have access to the project source code to make changes in all the places the default statements are placed. It's almost as bad as applications that have SQL spaghetti-code all over.
So, it makes sense to use keys like _('error.404_not_found') which then allow the content writers and translators to just worry about the PO/MO files without messing in the code.
However, in the event that a gettext translation doesn't exist for the given key then there is no way to fall back to a default (like you could with a custom handler). This means that you either have the writter mucking around in your code - or have "error.404_not_found" shown to users that don't have a locale translation!
In addition, I am not aware of any large projects which use PHP's gettext. I would appreciate any links to well-used (and therefore tested), systems which actually rely on the native PHP gettext extension.
Gettext uses a binary protocol that is quite quick. Also the gettext implementation is usually simpler as it only requires echo _('Text to translate');. It also has existing tools for translators to use and they're proven to work well.
You can store them in a database but I feel it would be slower and a bit overkill, especially since you'd have to build the system to edit the translations yourself.
If only you could actually cache the lookups in a dedicated memory portion in APC, you'd be golden. Sadly, I don't know how.
For those that are interested, it seems full support for locales and i18n in PHP is finally starting to take place.
// Set the current locale to the one the user agent wants
$locale = Locale::acceptFromHttp(getenv('HTTP_ACCEPT_LANGUAGE'));
// Default Locale
Locale::setDefault($locale);
setlocale(LC_ALL, $locale . '.UTF-8');
// Default timezone of server
date_default_timezone_set('UTC');
// iconv encoding
iconv_set_encoding("internal_encoding", "UTF-8");
// multibyte encoding
mb_internal_encoding('UTF-8');
There are several things that need to be condered and detecting the timezone/locale and then using it to correctly parse and display input and output is important. There is a PHP I18N library that was just released which contains lookup tables for much of this information.
Processing User input is important to make sure you application has clean, well-formed UTF-8 strings from whatever input the user enters. iconv is great for this.
/**
* Convert a string from one encoding to another encoding
* and remove invalid bytes sequences.
*
* #param string $string to convert
* #param string $to encoding you want the string in
* #param string $from encoding that string is in
* #return string
*/
function encode($string, $to = 'UTF-8', $from = 'UTF-8')
{
// ASCII is already valid UTF-8
if($to == 'UTF-8' AND is_ascii($string))
{
return $string;
}
// Convert the string
return #iconv($from, $to . '//TRANSLIT//IGNORE', $string);
}
/**
* Tests whether a string contains only 7bit ASCII characters.
*
* #param string $string to check
* #return bool
*/
function is_ascii($string)
{
return ! preg_match('/[^\x00-\x7F]/S', $string);
}
Then just run the input through these functions.
$utf8_string = normalizer_normalize(encode($_POST['text']), Normalizer::FORM_C);
Translations
As Andre said, It seems gettext is the smart default choice for writing applications that can be translated.
Gettext uses a binary protocol that is quite quick.
The gettext implementation is usually simpler as it only requires _('Text to translate')
Existing tools for translators to use and they're proven to work well.
When you reach facebook size then you can work on implementing RAM-cached, alternative methods like the one I mentioned in the question. However, nothing beats "simple, fast, and works" for most projects.
However, there are also addition things that gettext cannot handle. Things like displaying dates, money, and numbers. For those you need the INTL extionsion.
/**
* Return an IntlDateFormatter object using the current system locale
*
* #param string $locale string
* #param integer $datetype IntlDateFormatter constant
* #param integer $timetype IntlDateFormatter constant
* #param string $timezone Time zone ID, default is system default
* #return IntlDateFormatter
*/
function __date($locale = NULL, $datetype = IntlDateFormatter::MEDIUM, $timetype = IntlDateFormatter::SHORT, $timezone = NULL)
{
return new IntlDateFormatter($locale ?: setlocale(LC_ALL, 0), $datetype, $timetype, $timezone);
}
$now = new DateTime();
print __date()->format($now);
$time = __date()->parse($string);
In addition you can use strftime to parse dates taking the current locale into consideration.
Sometimes you need the values for numbers and dates inserted correctly into locale messages
/**
* Format the given string using the current system locale
* Basically, it's sprintf on i18n steroids.
*
* #param string $string to parse
* #param array $params to insert
* #return string
*/
function __($string, array $params = NULL)
{
return msgfmt_format_message(setlocale(LC_ALL, 0), $string, $params);
}
// Multiple choices (can also just use ngettext)
print __(_("{1,choice,0#no errors|1#single error|1<{1, number} errors}"), array(4));
// Show time in the correct way
print __(_("It is now {0,time,medium}), time());
See the ICU format details for more information.
Database
Make sure your connection to the database is using the correct charset so that nothing gets currupted on storage.
String Functions
You need to understand the difference between the string, mb_string, and grapheme functions.
// 'LATIN SMALL LETTER A WITH RING ABOVE' (U+00E5) normalization form "D"
$char_a_ring_nfd = "a\xCC\x8A";
var_dump(grapheme_strlen($char_a_ring_nfd));
var_dump(mb_strlen($char_a_ring_nfd));
var_dump(strlen($char_a_ring_nfd));
// 'LATIN CAPITAL LETTER A WITH RING ABOVE' (U+00C5)
$char_A_ring = "\xC3\x85";
var_dump(grapheme_strlen($char_A_ring));
var_dump(mb_strlen($char_A_ring));
var_dump(strlen($char_A_ring));
Domain name TLD's
The IDN functions from the INTL library are a big help processing non-ascii domain names.
There are a number of other SO questions and answers similar to this one. I suggest you search and read them as well.
Advice? Use an existing solution like gettext or xliff as it will save you lot's of grief when you hit all the translation edge cases such as right to left text, date formats, different text volumes, French is 30% more verbose than English for example that screw up formatting etc. Even better advice Don't do it. If the users want to translate they will make a clone and translate it. Because Localisation is more about look and feel and using colloquial language this is usually what happens. Again giving and example Anglo-Saxon culture likes cool web colours and san-serif type faces. Hispanic culture like bright colours and Serif/Cursive types. Which to cater for you would need different layouts per language.
Zend actually cater for the following adapters for Zend_Translate and it is a useful list.
Array:- Use PHP arrays for Small pages; simplest usage; only for programmers
Csv:- Use comma separated (.csv/.txt) files for Simple text file format; fast; possible problems with unicode characters
Gettext:- Use binary gettext (*.mo) files for GNU standard for linux; thread-safe; needs tools for translation
Ini:- Use simple INI (*.ini) files for Simple text file format; fast; possible problems with unicode characters
Tbx:- Use termbase exchange (.tbx/.xml) files for Industry standard for inter application terminology strings; XML format
Tmx:- Use tmx (.tmx/.xml) files for Industry standard for inter application translation; XML format; human readable
Qt:- Use qt linguist (*.ts) files for Cross platform application framework; XML format; human readable
Xliff:- Use xliff (.xliff/.xml) files for A simpler format as TMX but related to it; XML format; human readable
XmlTm:- Use xmltm (*.xml) files for Industry standard for XML document translation memory; XML format; human readable
Others:- *.sql for Different other adapters may be implemented in the future
I'm using the ICU stuff in my framework and really finding it simple and useful to use. My system is XML-based with XPath queries and not a database as you're suggesting to use. I've not found this approach to be inefficient. I played around with Resource bundles too when researching techniques but found them quite complicated to implement.
The Locale functionality is a god send. You can do so much more easily:
// Available translations
$languages = array('en', 'fr', 'de');
// The language the user wants
$preference = (isset($_COOKIE['lang'])) ?
$_COOKIE['lang'] : ((isset($_SERVER['HTTP_ACCEPT_LANGUAGE'])) ?
Locale::acceptFromHttp($_SERVER['HTTP_ACCEPT_LANGUAGE']) : '');
// Match preferred language to those available, defaulting to generic English
$locale = Locale::lookup($languages, $preference, false, 'en');
// Construct path to dictionary file
$file = $dir . '/' . $locale . '.xsl';
// Check that dictionary file is readable
if (!file_exists($file) || !is_readable($file)) {
throw new RuntimeException('Dictionary could not be loaded');
}
// Load and return dictionary file
$dictionary = simplexml_load_file($file);
I then perform word lookups using a method like this:
$selector = '/i18n/text[#label="' . $word . '"]';
$result = $dictionary->xpath($selector);
$text = array_shift($result);
if ($formatted && isset($text)) {
return new MessageFormatter($locale, $text);
}
The bonus for my system is that the template system is XSL-based which means I can use the same translation XML files directly in my templates for simple messages that don't need any i18n formatting.
Stick with gettext, you won't find a faster alternative in PHP.
Regarding the how, you can use a database to store your catalog and allow other users to translate the strings using a friendly gui. When the new changes are reviewed/approved, hit a button, compile a new .mo file and deploy.
Some resources to get you on track:
http://code.google.com/p/simplepo/
http://www.josscrowcroft.com/2011/code/php-mo-convert-gettext-po-file-to-binary-mo-file-php/
https://launchpad.net/php-gettext/
http://sourceforge.net/projects/tcktranslator/
What about csv files (which can be easily edited in many apps) and caching to memcache (wincache, etc.)? This approach works well in magento. All languages phrases in the code are wrapped into __() function, for example
<?php echo $this->__('Some text') ?>
Then, for example before new version release, you run simple script which parses source files, finds all text wrapped into __() and puts into .csv file. You load csv files and cache them to memcache. In __() function you look into your memcache where translations are cached.
In a recent project, we considered using gettext, but it turned out to be easier to just write our own functionality. It really is quite simple: Create a JSON file per locale (e.g. strings.en.json, strings.es.json, etc.), and create a function somewhere called "translate()" or something, and then just call that. That function will determine the current locale (from the URI or a session var or something), and return the localized string.
The only thing to remember is to make sure any HTML you output is encoded in UTF-8, and marked as such in the markup (e.g. in the doctype, etc.)
Maybe not really an answer to your question, but maybe you can get some ideas from the Symfony translation component? It looks very good to me, although I must confess I haven't used it myself yet.
The documentation for the component can be found at
http://symfony.com/doc/current/book/translation.html
and the code for the component can be found at
https://github.com/symfony/Translation.
It should be easy to use the Translation component, because Symfony components are intended to be able to be used as standalone components.
On another note, one of the things I don't like about gettext is that
the text is embedded into the application all over the place. That
means that the team responsible for the primary translation (usually
English) has to have access to the project source code to make changes
in all the places the default statements are placed. It's almost as
bad as applications that have SQL spaghetti-code all over.
This isn't actually true. You can have a header file (sorry, ex C programmer), such as:
<?php
define(MSG_404_NOT_FOUND, 'error.404_not_found')
?>
Then whenever you want a message, use _(MSG_404_NOT_FOUND). This is much more flexible than requiring developers to remember the exact syntax of the non-localised message every time they want to spit out a localised version.
You could go one step further, and generate the header file in a build step, maybe from CSV or database, and cross-reference with the translation to detect missing strings.
have a zend plugin that works very well for this.
<?php
/** dependencies **/
require 'Zend/Loader/Autoloader.php';
require 'Zag/Filter/CharConvert.php';
Zend_Loader_Autoloader::getInstance()->setFallbackAutoloader(true);
//filter
$filter = new Zag_Filter_CharConvert(array(
'replaceWhiteSpace' => '-',
'locale' => 'en_US',
'charset'=> 'UTF-8'
));
echo $filter->filter('ééé ááá 90');//eee-aaa-90
echo $filter->filter('óóó 10aáééé');//ooo-10aaeee
if you do not want to use the zend framework can only use the plugin.
hug!

convert txt or doc to pdf using php

have anyone come across a php code that convert text or doc into pdf ?
it has to follow the same format as the original txt or doc file meaning the line feed as well as new paragraph...
Converting from DOC to PDF is possible using phpLiveDocx:
$phpLiveDocx = new Zend_Service_LiveDocx_MailMerge();
$phpLiveDocx->setUsername('username')
->setPassword('password');
$phpLiveDocx->setLocalTemplate('document.doc');
// necessary as of LiveDocx 1.2
$phpLiveDocx->assign('dummyFieldName', 'dummyFieldValue');
$phpLiveDocx->createDocument();
$document = $phpLiveDocx->retrieveDocument('pdf');
file_put_contents('document.pdf', $document);
unset($phpLiveDocx);
For text to PDF, you can use the pdf extension is PHP.
You can view the examples here.
Have a look at this SO question. Using OpenOffice in command line mode for conversions can be done, though you'd have to search a bit for the conversion macro's. I'm not saying it's light-weight though :)
See HTML_ToPDF. It also works for text.
It has been a long time since I touched PHP, but if you can make web service calls from it then try this product. It provides excellent conversion fidelity. It also supports additional formats including Infopath, Excel, PowerPoint etc as well as Watermarking support.
Please note that I have worked on this product so the usual disclaimers apply.

Find differences between 2 HTML files

Is there a way to display differences between two HTML documents?
There is a PHP class called daisdiff, but it has no documentation. Can anyone show how to use it, or any alternative?
I advise you to use the pear Text_Diff package, the package come with some class and easy extensible, you can write your own "diff" renderer so it's easy to adapt and a lot more easy then parsing the output of the diff command.
here a short code snippet to compare two text files:
include_once "Text/Diff.php";
include_once "Text/Diff/Renderer.php";
// define files to compare
$file1 = "data1.txt";
$file2 = "data2.txt";
// perform diff, print output
$diff = &new Text_Diff(file($file1), file($file2));
$renderer = &new Text_Diff_Renderer();
echo $renderer->render($diff);
There is a UNIX program called diff which is meant just for that purpose. You use it like this:
diff -crB file1 file2
c stands for context. It shows some extra lines around the changed lines so that you can find them more easily.
r stands for recursive. That way you can specify directories as file1 and file2, with all the files therein being compared to each other, too.
B makes it ignore blank lines and their changes.
Let me go find the Windows solution just in case.
Here is a pure php implementation of diff, http://www.holomind.de/phpnet/diff.src.php. If you skip to the bottom of the page there is an example of how to use it.

Categories