how to get html text differences like svn?

how to get html text differences like svn? - php

I am using PEAR text_diff class to get comparison of text. It works correct for plain text, but when I try to compare text with HTML tags, It gives wrong result
is there any way to compare two HTML blocks and in result display text that pre-serv its HTML and show differences like svn

In my experience, these two are fantastic:
https://github.com/cygri/htmldiff (Python)
http://www.w3.org/People/Bos/#jpegxmp (C, must be compiled)
Yes, none of the programs are written in plain PHP. You just run them via PHP:
// Python script:
$html_diff = shell_exec ('python /path/to/htmldiff version1.html version2.html');
// C program:
$html_diff = shell_exec ('/path/to/htmldiff --start-delete="<span class=\'delete\'>" --end-delete="</span>" --start-insert="<span class=\'insert\'>" --end-insert="</span>" version1.html version2.html');
Since they aren't written in PHP, you can enjoy the incredible high speed :)

I'm not sure which OS you're on, but I always use Meld on Ubuntu for diff:ing non-versioned files. It doesn't have any problems diffing HTML code (or anything else afaik):
http://meldmerge.org/

You can do this right in javascript itself. Check out google-diff-match-patch.
Diff demo here.

Related

modulus behaving differently on different servers? (with / without spaces)

The following code
<?php
echo ((12+1)%12)."<br/>";
echo ((12+1) % 12)."<br/>";
?>
leads to an unexpected result (13,1) instead of (1,1) on phpfiddle.org but it runs as expected on my server.
http://phpfiddle.org/lite/code/qdb-s4t
Is this an error in their sandbox or does it have to do with different php versions? How is the case without spaces interpreted?
I was just looking at some code for quite a long time and couldn't understand what was the difference.
i know i could use fmod or other sandboxes like http://ideone.com/.

PHPFiddle is just a website that is attempting to provide an easy way to execute PHP code samples from the browser. This isn't going to give you native behavior, simply because the code is going to be processed by JavaScript first using whatever logic the people at PHPFiddle seem fit. This leads to the possibility of bugs that have nothing to do with PHP and that is what is going on here. If you turn those same lines of codes into full strings, you will see the output still isn't even correct.

PHP - Syntax of exec() function to call another php file

This question is in reference to:
Free (preferably) PHP RTF to HTML converter?
I'm trying to execute that last line of code in my php:
exec(rtf2htm file.rtf file.html)
I understand what parameters need to go within the parentheses, I just do not know how to write it. I've looked at multiple examples along with the php documentation and still I remain confused, so could someone show me how it is written? rtf2htm refers to a PHP file which converts RTF to HTML.
Ultimately what I am trying to do is convert the content of numerous RTF docs to HTML, maintaining the formatting, while not creating tags such as<head> or <body> which programs like Word or TextEdit generate when converting to HTML.

rtf2htm is not a php script, it is a program installed on the server. exec() is used to call external applications.
EDIT: After looking up this script, it seems that it is indeed a php script. But it has been coded to be usable from the command line only.
This should work:
<?php
exec('php /path/to/rtf2htm /path/to/source.rtf /path/to/output.html');
?>

Extract data from HTML in PHP or Python

I need to extract this data and display a simple graph out of it.
Something like Equity Share Capital -> array (30.36, 17, 17 .... etc) would help.
<html:tr>
<html:td>Equity Share Capital</html:td>
<html:td class="numericalColumn">30.36</html:td>
<html:td class="numericalColumn">17.17</html:td>
<html:td class="numericalColumn">15.22</html:td>
<html:td class="numericalColumn">9.82</html:td>
<html:td class="numericalColumn">9.82</html:td>
</html:tr>
How do I go about this task in PHP or Python?

A good place to start looking would be the python module BeautifulSoup which extracts the text and places it into a table.
Assuming you've loaded the data into a variable called raw:
from BeautifulSoup import BeautifulSoup
soup = BeautifulSoup(raw)
for x in soup.findAll("html:td"):
if x.string == "Equity share capital":
VALS = [y.string for y in x.parent.findAll() if y.has_key("class")]
print VALS
This gives:
[u'30.36', u'17.17', u'15.22', u'9.82', u'9.82']
Which you'll note is a list of unicode strings, make sure to convert them to whatever type you desire before processing.
There are many ways to do this via BeautifulSoup. The nice thing I've found however is the quick hack is often good enough (TM) to get the job done!

BeautifulSoup

Don't forget lxml in Python. It also works well to extract data. It's harder to install but faster. http://pypi.python.org/pypi/lxml/2.2.8

'echo' or drop out of 'programming' write HTML then start PHP code again

For the most part, when I want to display some HTML code to be actually rendered I would use a 'close PHP' tag, write the HTML, then open the PHP again. eg
<?php
// some php code
?>
<p>HTML that I want displayed</p>
<?php
// more php code
?>
But I have seen lots of people who would just use echo instead, so they would have done the above something like
<?php
// some php code
echo("<p>HTML that I want displayed</p>");
// more php code
?>
Is their any performance hit for dropping out and back in like that? I would assume not as the PHP engine would have to process the entire file either way.
What about when you use the echo function in the way that dose not look like a function, eg
echo "<p>HTML that I want displayed</p>"
I would hope that this is purely a matter of taste, but I would like to know if I was missing out on something. I personally find the first way preferable (dropping out of PHP then back in) as it helps draw a clear distinction between PHP and HTML and also lets you make use of code highlighting and hinting for your HTML, which is always handy.

The first type is preferable, exactly for the reasons you mentioned.
Actually, echoing out whole chunks of html is considered bad practice.

No, there's no performance increase that would be visible.
Sometimes its just simply easier to output content using echo (for example, when inside a while or for loop) than to close the php tag.

I think there's a preprocessor which converts the same form into the second. That's what happens in ASP.NET, anyway. And in both ASP.NET and classic ASP, loops can actually stretch across raw-HTML regions.

There's no performance difference at all.
Just the style that produces the most readable code. Depending on the actual situation that can be either of the two.
But mixing HTML and PHP should be avoided where possible anyway. THis can be accomplished by using a template system for your views.

"Safe" markdown processor for PHP?

Is there a PHP implementation of markdown suitable for using in public comments?
Basically it should only allow a subset of the markdown syntax (bold, italic, links, block-quotes, code-blocks and lists), and strip out all inline HTML (or possibly escape it?)
I guess one option is to use the normal markdown parser, and run the output through an HTML sanitiser, but is there a better way of doing this..?
We're using PHP markdown Extra for the rest of the site, so we'd already have to use a secondary parser (the non-"Extra" version, since things like footnote support is unnecessary).. It also seems nicer parsing only the *bold* text and having everything escaped to <a href="etc">, than generating <b>bold</b> text and trying to strip the bits we don't want..
Also, on a related note, we're using the WMD control for the "main" site, but for comments, what other options are there? WMD's javascript preview is nice, but it would need the same "neutering" as the PHP markdown processor (it can't display images and so on, otherwise someone will submit and their working markdown will "break")
Currently my plan is to use the PHP-markdown -> HTML santiser method, and edit WMD to remove the image/heading syntax from showdown.js - but it seems like this has been done countless times before..
Basically:
Is there a "safe" markdown implementation in PHP?
Is there a HTML/javascript markdown editor which could have the same options easily disabled?
Update: I ended up simply running the markdown() output through HTML Purifier.
This way the Markdown rendering was separate from output sanitisation, which is much simpler (two mostly-unmodified code bases) more secure (you're not trying to do both rendering and sanitisation at once), and more flexible (you can have multiple sanitisation levels, say a more lax configuration for trusted content, and a much more stringent version for public comments)

PHP Markdown has a sanitizer option, but it doesn't appear to be advertised anywhere. Take a look at the top of the Markdown_Parser class in markdown.php (starts on line 191 in version 1.0.1m). We're interested in lines 209-211:
# Change to `true` to disallow markup or entities.
var $no_markup = false;
var $no_entities = false;
If you change those to true, markup and entities, respectively, should be escaped rather than inserted verbatim. There doesn't appear to be any built-in way to change those (e.g., via the constructor), but you can always add one:
function do_markdown($text, $safe=false) {
$parser = new Markdown_Parser;
if ($safe) {
$parser->no_markup = true;
$parser->no_entities = true;
}
return $parser->transform($text);
}
Note that the above function creates a new parser on every run rather than caching it like the provided Markdown function (lines 43-56) does, so it might be a bit on the slow side.

JavaScript Markdown Editor Hypothesis:
Use a JavaScript-driven Markdown Editor, e.g., based on showdown
Remove all icons and visual clues from the Toolbar for unwanted items
Set up a JavaScript filter to clean-up unwanted markup on submission
Test and harden all JavaScript changes and filters locally on your computer
Mirror those filters in the PHP submission script, to catch same on the server-side.
Remove all references to unwanted items from Help/Tutorials
I've created a Markdown editor in JavaScript, but it has enhanced features. That took a big chunk of time and SVN revisions. But I don't think it would be that tough to alter a Markdown editor to limit the HTML allowed.

How about running htmlspecialchars on the user entered input, before processing it through markdown? It should escape anything dangerous, but leave everything that markdown understands.
I'm trying to think of a case where this wouldn't work but can't think of anything off hand.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

how to get html text differences like svn? - php

I am using PEAR text_diff class to get comparison of text. It works correct for plain text, but when I try to compare text with HTML tags, It gives wrong result is there any way to compare two HTML blocks and in result display text that pre-serv its HTML and show differences like svn

I'm not sure which OS you're on, but I always use Meld on Ubuntu for diff:ing non-versioned files. It doesn't have any problems diffing HTML code (or anything else afaik): http://meldmerge.org/

You can do this right in javascript itself. Check out google-diff-match-patch. Diff demo here.

Related

modulus behaving differently on different servers? (with / without spaces)

PHP - Syntax of exec() function to call another php file

Extract data from HTML in PHP or Python

'echo' or drop out of 'programming' write HTML then start PHP code again

"Safe" markdown processor for PHP?

Categories

Resources