String/Paragraph/Document comparison in php - php

I'm trying to add a feature to generate a difference report between 2 20,000 character sections of text. I've done some Googling and I heard about Pear's diff library - which has been discontinued - and found this: https://github.com/paulgb/simplediff/blob/5bfe1d2a8f967c7901ace50f04ac2d9308ed3169/simplediff.php
Ideally I'd like to see what was removed, edited, or added and be able to show that to the user. Are there any libraries or simple ways of accomplishing this that you may know of?

I use this code in a live project
http://svn.geograph.org.uk/svn/branches/british-isles/libs/3rdparty/simplediff.inc.php
Example use
http://svn.geograph.org.uk/svn/branches/british-isles/public_html/article/diff.php
but the code is very simple
$a1 = explode("\n",$file1);
$a2 = explode("\n",$file2);
print diff2table($a1,$a2);
(the code just accepts the input as arrays, and outputs html table. But diff2table can be customised)

Related

PHP pdf form parse regex

I have a two PDF forms that I'd like to input values for using PHP. There doesn't seem to be any open source solutions. The only solution seems to be SetaSign which is over $400. So instead I'm trying to dump the data as a string, parse using a regex and then save. This is what I have so far:
$pdf = file_get_contents("../forms/mypdf.pdf");
$decode = utf8_decode($pdf);
$re = "/(\d+)\s(?:0 obj <>\/AP<>\/)(.*)(?:>> endobj)/U";
preg_match_all($re, $decode, $matches);
print_r($matches);
However, my print_r is empty even after testing here. The matches on the right are first a numerical identifier for the field (I think) and then V(XX1) where "XX1" is the text I've manually entered into the form and saved (as a test to find how and where that data is stored). I'm assuming (but haven't tested) that N<>>>/AS/Off is a checkbox.
Is there something I need to change in my regex to find matches like (2811 0 obj <>/AP<>/V(XX2)>> endobj) where the first find will be a key and the second find is the value?
Part 1 - Extract text from PDF
Download the class.pdf2text.php # http://pastebin.com/dvwySU1a (Updated on 5 of April 2014) or http://www.phpclasses.org/browse/file/31030.html (Registration required)
Usage:
include('class.pdf2text.php');
$a = new PDF2Text();
$a->setFilename('test.pdf');
$a->decodePDF();
echo $a->output();
The class doesn't work with all pdf's I've tested, give it a try and you may get lucky :)
Part 2 - Write to PDF
To write the pdf contents use tcpdf which is an enhanced and maintained version of fpdf.
Thanks for those who've looked into this. I decided to convert the pdfs (since I'm not doing this as a batch) into svg files. This online converter kept the form fields and with some small edits I've made them printable. Now, I'll be able to populate the values and have a visual representation of the pdf. I may try tcpdf in the event I want to make it an actual pdf again though I'm assuming it wont keep the form fields.

Generating an HTML color code for event category in ai1ec

Although this question relates to a particular Wordpress plugin called the All in One Event Calender by Time.ly it can also be a general PHP related question.
I am trying to modify a theme to use event colours selected in the ai1ec back end and would like to produce a simple HTML colour code - ie "#f2f2f2"
The plugin itself has loads of functions and php shortcodes to pull a wealth of information off each event such as the one listed below.
<?php echo $event->get_category_text_color(); ?> which will print style="color: #f2a011;"
Can also change to print style="background-color: #f2a011;" with the use of $event->get_category_bg_color();
Now the real meat of the question
All I want to do is get that HTML colour so I can also code it into buttons and other visual elements. I have scoured through blogs and code to try and find something that does it to no avail.
What I was wondering is if you could write a filter of some sort to just take the information within the "#f2f2f2" quotation marks. I know it's not called a filter as searches for php filter return information about something completely different - I'm a self taught PHP programmer and so searching for a lot of terms I don't know can be pretty tough!
As pointed above, substr would be a great solution but it wouldn't solve the case where the color code is expressed in this format:
#FFF;
instead of:
#FFFFFF;
Therefore, this regex should do the job quite well:
'/?=#(.*(?=;)))/'
Example:
$matches = array();
preg_match('/?=#(.*(?=;)))/', $event->get_category_text_color(), $matches);
$colorCode = "#{$matches[0]};";
You could use the substr() function like so:
echo substr($event->get_category_text_color(),14,-2);
Which in the example, would return #f2f2f2.

Perf. issue / Too much calls to string manipulation functions

This question is about optimizing a part of a program that I use to add in many projects as a common tool.
This 'templates parser' is designed to use a kind of text pattern containing html code or anything else with several specific tags, and to replace these by developer given values when rendered.
The few classes involved do a great job and work as expected, it allows when needed to isolate design elements and easily adapt / replace design blocks.
The patterns I use look like this (nothing exceptional I admit) :
<table class="{class}" id="{id}">
<block_row>
<tr>
<block_cell>
<td>{content}</td>
</block_cell>
</tr>
</block_row>
</table>
(Example code below are adapted extracts)
The parsing does things like that :
// Variables are sorted by position in pattern string
// Position is read once and stored in cache to avoid
// multiple calls to str_pos or str_replace
foreach ($this->aVars as $oVar) {
$sString = substr($sString, 0, $oVar->start) .
$oVar->value .
substr($sString, $oVar->end);
}
// Once pattern loaded, blocks look like --¤(<block_name>)¤--
foreach ($this->aBlocks as $sName=>$oBlock) {
$sBlockData = $oBlock->parse();
$sString = str_replace('--¤(' . $sName . ')¤--', $sBlockData, $sString);
}
By using the class instance I use methods like 'addBlock' or 'setVar' to fill my pattern with data.
This system has several disadvantages, among them the multiple objects in memory (one for each instance of block) and the fact that there are many calls to string manipulation functions during the parsing process (preg_replace in the past, now just a bunch of substr and pals).
The program on which I'm working is making a large use of these templates and they are just about to show their limits.
My question is the following (No need for code, just ideas or a lead to follow) :
Should I consider I've abused of this and should try to manage so that I don't need to make so many calls to these templates (for instance improving cache, using only simple view scripts...)
Do you know a technical solution to feed a structure with data that would not be that mad resource consumer I wrote ? While I'm writing I'm thinking about XSLT, would it be suitable, if yes could it improve performances ?
Thanks in advance for your advices
Use the XDebug extension to profile your code and find out exactly which parts of the code are taking the most time.

Setting language for user

I would like my php website to be able to be multilinguistic. I thought of using:
echo $lang[$_SESSION['lang']]['WellcomeMessage'];
but I found that I will be needing to format the text, say for example male/female or putting some values from the DB. So I thought that simple strings might not do the trick for formatting?
I know #define might have worked in C as the string translates to code, but I don't know how php does that. For example:
define ($lang['en']['credit_left'],'you have $credits_left');
define ($lang['sp']['credit_left'],'tienes $credits_left creditos mas');
Any suggestions?

I need help with php parsing of xml for insertion into mysql database

Aloha everyone,
I apologize in advance for the many questions, but I've been asked to develop a database and have no experience with PHP and MySQL. I thought it would be a good exercise for me to attempt to learn a little bit about them and try to develop a concept database for my work at the same time. Basically this is a database that uses SYDI to obtain WMI information from our Windows-based computers to use for patch management. The way I envision this working is like this:
SYDI is run and an XML file is generated with the information.
Using the PHP front end to our patch database, the XML report is parsed and the desired information is then inserted into the MySQL database.
Reports are generated from the database to compare with the latest known baseline for the activity. If computers are found to be below the baseline, the patch server is used to deliver the needed patches to the delinquent computer(s).
There are a couple of formats used in the XML report from SYDI, one with attributes in a single tag, and another where a single parent tag contains several child tags with attributes. I have figured out how to parse the first. Here's a sample of the data and the code for that (it's really pretty basic stuff) with the resulting ouput:
<machineinfo manufacturer="Dell Inc." productname="Precision M90" identifyingnumber="87ZGFD1" chassis="Portable" />
$xml = simplexml_load_file("sydiTest.xml");
foreach($xml->machineinfo[0]->attributes() as $a => $b)
{
echo $b, "</br>";
}
Dell Inc.
Precision M90
87ZGFD1
Portable
I didn't need the name of the attribute, only the value, so I only echo'd $b there. For the second, here's a sample of the data itself as well as the code and output for the parse:
<patches>
<patch description="Microsoft .NET Framework 1.1 Security Update (KB2416447)" hotfixid="M2416447" installdate="04-Feb-11" />
<patch description="Microsoft .NET Framework 1.1 Service Pack 1 (KB867460)" hotfixid="S867460" installdate="04-Feb-11" />
<patch description="Windows Management Framework Core" hotfixid="KB968930" installdate="2/4/2011" />
<patch description="Security update for MSXML4 SP2 (KB954430)" hotfixid="Q954430" installdate="04-Feb-11" />
<patch description="Security update for MSXML4 SP2 (KB973688)" hotfixid="Q973688" installdate="04-Feb-11" />
<patch description="Microsoft Internationalized Domain Names Mitigation APIs" hotfixid="IDNMitigationAPIs" installdate="6/30/2008" />
</patches>
foreach ($xml->patches->patch[0]->attributes() as $a => $b)
{
echo $b, "</br>";
}
Microsoft .NET Framework 1.1 Security Update (KB2416447)
M2416447
04-Feb-11
As you can see, I only got the first patch, not the rest of them. I figure that 'patch[0]' is most likely the issue, as it only references the first child tag. How can I get it to reference the rest of the children?
The results raise another issue. Is there any way to pick out specific attributes and disregard the rest? For example, in the first parse, the machineinfo parse gets all the information I need. In the second parse, I only need the description and hotfixid. Once I get the correct syntax for the parse, assuming it runs like the first one, I would most likely get all of the attributes. I don't need the install date.
Lastly, how can I assign the retrieved values to variables? The first parse results in the data I need, but not in the correct order. My table structure is like this:
CREATE TABLE InventoryItems
(InvSerNum VARCHAR(20) NOT NULL,
Make VARCHAR(20),
Model VARCHAR(20),
Platform VARCHAR(12),
CONSTRAINT Inventory_PK PRIMARY KEY (InvSerNum));
I need the identifyingnumber (InvSerNum) first. Of course, I could always reorder the fields in the table to match the XML, but I'd rather leave it as is. My thinking is that I can use an INSERT statement and just use the variables for the values to be input.
I'm trying to do all of this on my own, but got stuck on the XML parsing part. If anyone can assist me in understanding the process, I would be in your debt.
Try using RapidXML in PHP. Makes XML parsing a bit easier. It's still not that intuitive: you'll need a good debugger to get to the bottom of it.
The rest of your questions require you to do a bit of research into the mysql_(function_name) bindings in PHP. There's heaps of articles out there about this.
I figured out the second parse question. I used the following code:
foreach ($xml->patches->patch as $patch1) {
foreach ($patch1->attributes() as $a => $b) {
echo $b, "<br />";
}
}
and it worked like a charm! I still need to omit the last attribute, assign them to variables, the use the INSERT statement to get them into the database, but at least I'm that much closer to a resolution.

Categories