How to show difference between original text and edited text? - php

I have to show difference between two sentence or paragraph. It can be any thing.
Same like in this site original question and edited question.
for example I love apple original sentence. and edited sentence is I do not love banana, I need is: do not and banana here. different from original.
How can I do this in PHP?

What you're asking about is called the Longest common subsequence problem, which is a dynamic algorithm that's typically the basis of comparison utilities like a diff utility (also like those you see in svn or git, for example).
Luckily PHP has a massive PECL repository with an xdiff extension with just such functions already available for you, such as xdiff_string_diff

Here nice Lib you can use it
finediff.php
http://www.raymondhill.net/finediff/finediff-code.php
your-file.php
include 'finediff.php';
$opcodes = FineDiff::getDiffOpcodes($original, $edited);

Related

In search for PEAR HTML_Table replacement

I have some trouble using PEAR HTML_Table (Strict error, bug seems still open).
I want to find a standard way to create HTML output that produces a table from associative arrays (where the key shall be in col#1, value in col#2 if nested it shall make a sub-table, if possible, if not, just indent the sub-key.
Also, if possible, would be nice to have formatting means like alteranting rows and hover of lines, but that's obviously an option.
Significant: I would like to have "plain php code" rather than an extension that requires a dll due to update restrictions on the PHP server I use.
Any hints / tips for me to do this without crunching my own code?
There are lot's of table generation classes out there.
https://github.com/search?l=PHP&q=datagrid&type=Repositories
Just to name a few:
https://github.com/donquixote/cellbrush
https://github.com/naomik/htmlgen
https://github.com/intrip/bootstrap-table-generator
DataGrid Classes for Zend Framework - http://modules.zendframework.com/?query=datagrid

Extract URL containing /find/ from numerous URL's?

I'm really a major novice at RegEx and could do with some help.
I have a long string containing lots of URL's and other text, and one of the URL's contains has /find/ in it. ie:
1. http://www.example.com/not/index.html
2. http://www.example.com/sat/index.html
3. http://www.example.com/find/index.html
4. http://www.example.com/rat/mine.html
5. http://www.example.com/mat/find.html
What sort of RegEx would I use to return the URL that is number 3 in that list but not return me number 5 as well? I suppose basically what I'm looking for is a way of returning a whole word that contains a specific set of letters and / in order.
TIA
I would assume you want preg_match("%/find/%",$input); or similar.
EDIT: To get the full line, use:
preg_match("%^.*?/find/.*$%m",$input);
I can suggest you to use RegExr to generate regular expressions.
You can type in a sample list (like the one above) and use a palette to create a RegExp and test it in realtime. The program is available both online and as downloadable Adobe AIR package.
Unfortunately I cannot access their site now, so I'm attaching the AIR package of the downloadable version.
I really recommend you this, since it helped a RegExp newbie like me to design even the most complex patterns.
However, for your question, I think that just
\/find\/
goes well if you want to obtain a yes/no result (i.e. if it contains or not /find/), otherwise to obtain the full line use
.*\/find\/.*
In addition to Kolink's answer, in case you wanted to regex match the whole URI:
This is by no means an exhaustive regex for URIs, but this is a good starting point. I threw in a few options at key points, like .com, .net, and .org. In reality you'll have a fairly hard time matching URIs with regular expressions due to the lack of conformity, but you can come very close
The regex from the above link:
/(https?:\/\/)?(www\.)?([a-zA-Z0-9-_]+)\.(com|org|net)\/(find)\/([a-zA-Z0-9-_]+)\.(html|php|aspx)?/is

How to determine if a sentence is talking about a specific subject?

I have predefined words and would like to know if the sentence primary subject is about the predefined words.
Example:
Predefined words:
iPhone, Nexus, HTC
Sentence:
I like the new design of iPhone - primary subject is iPhone
I am listing to Nirvana on my Nexus. - primary subject is not in predefined words
The HTC phone is better than iPhone - primary subject is HTC
Would like to do this in PHP or something I that can have PHP interface.
Alias-i has a natural language parser for PHP.
Edit: this page says Alias-i's parser is written in PHP, but Alias-i's website says it is written in Java.
The short version: By Keywords.
This method works only with a limited set of Keywords.
A related question might be: Using preg_match to find all words in a list
The long version: By parsing the language and making the computer system understand it.
The later is something linguists do. They develop such systems and it takes years. Probably you find some implementations available, but I do not know any from memory. Would need to ask a friend.
Try to get goog heurstic and evaluate them.
Examples:
1. Keyword is at beginning of sentence.
2. There are only one keyword in text.
3. Is there are continius form like "litenining" etc usaly leadt to subjective/uninformative message.
Write classifier upon those features. I would recommend Mallet.

Calling wordnet from php (Wordnet class or API for PHP)

I am trying to write a program to find similarity between two documents, and since im using only english, I decided to use wordnet, but I cannot find a way to link the wordnet with php, I cannot find any wordnet api from php.
I saw in the forum some one said (Spudley) he called wordnet from php (using shell_exec() function),
Thesaurus class or API for PHP [edited]
I would really like to know a method used or some example code, a tutorial perhaps to start using the wordnet with php.
many thanks
The PHP extension which is linked to from the WordNet site is very old and out of date -- it claims to work with PHP4, so I don't think it's been looked at in years.
There aren't any other APIs available for WordNet->PHP, so I rolled my own solution.
WordNet can be run from the command-line, so PHP's shell_exec() function can read the output.
If you run WordNet from the command-line (cd to Wordnet's directory, then just wn) without any parameters, it will show you a list of possible functions that Wordnet supports.
Still in the command-line, if you then try one/some of those functions, you'll see how Wordnet outputs its results. For example, if you want synonyms for the word 'star', you could try the -synsn function:
wn star -synsn
This will produce output that looks a bit like this:
Synonyms/Hypernyms (Ordered by Estimated Frequency) of noun star
8 senses of star
Sense 1 star
=> celestial body, heavenly body
Sense 2 ace, adept, champion, sensation, maven, mavin, virtuoso, genius, hotshot, star, superstar, whiz, whizz, wizard, wiz
=> expert
Sense 3 star
=> celestial body, heavenly body
Sense 4 star
=> plane figure, two-dimensional figure
Sense 5 star, principal, lead
=> actor, histrion, player, thespian, role player
Sense 6 headliner, star
=> performer, performing artist
Sense 7 asterisk, star
=> character, grapheme, graphic symbol
Sense 8 star topology, star
=> topology, network topology
In PHP, you can read this same output using the shell_exec() function.
$result = shell_exec('/path/to/wn '.$word.' -synsn');
Now $result should contain the block of text quoted above.
At this point, you have to do some proper coding. You'll need to take that block of text and parse it for the data you want.
This is where it gets tricky. Because the data is presented in a format designed to be read by a human rather than by a program, it is tricky to parse accurately.
It is important to note that different search options present their output slightly differently. And, some of the results that are returned can be somewhat esoteric. I ended up writing a weighting system to score the results, but it was fairly specific to my needs, so you'll need to experiment with it to come up with your own system.
I hope that's enough help for you. :)
I know it's kinda too late but recently I made a library to scratch my own itch
Wordnet php wrapper

What does wikipedia use for text and revision diffs

I'm trying to take 2 versions of text (10 pages long) and compare the 2 to produce the difference. I know Wikipedia has a similar feature to compare revisions. Does anyone know what they use? I'm hoping they're using a php-driven solution.
There is an implimentation of diff in php. I haven't used it but it's a start. There is also something called PHP inline diff that you can check out

Categories