PHP pdf form parse regex - php

I have a two PDF forms that I'd like to input values for using PHP. There doesn't seem to be any open source solutions. The only solution seems to be SetaSign which is over $400. So instead I'm trying to dump the data as a string, parse using a regex and then save. This is what I have so far:
$pdf = file_get_contents("../forms/mypdf.pdf");
$decode = utf8_decode($pdf);
$re = "/(\d+)\s(?:0 obj <>\/AP<>\/)(.*)(?:>> endobj)/U";
preg_match_all($re, $decode, $matches);
print_r($matches);
However, my print_r is empty even after testing here. The matches on the right are first a numerical identifier for the field (I think) and then V(XX1) where "XX1" is the text I've manually entered into the form and saved (as a test to find how and where that data is stored). I'm assuming (but haven't tested) that N<>>>/AS/Off is a checkbox.
Is there something I need to change in my regex to find matches like (2811 0 obj <>/AP<>/V(XX2)>> endobj) where the first find will be a key and the second find is the value?

Part 1 - Extract text from PDF
Download the class.pdf2text.php # http://pastebin.com/dvwySU1a (Updated on 5 of April 2014) or http://www.phpclasses.org/browse/file/31030.html (Registration required)
Usage:
include('class.pdf2text.php');
$a = new PDF2Text();
$a->setFilename('test.pdf');
$a->decodePDF();
echo $a->output();
The class doesn't work with all pdf's I've tested, give it a try and you may get lucky :)
Part 2 - Write to PDF
To write the pdf contents use tcpdf which is an enhanced and maintained version of fpdf.

Thanks for those who've looked into this. I decided to convert the pdfs (since I'm not doing this as a batch) into svg files. This online converter kept the form fields and with some small edits I've made them printable. Now, I'll be able to populate the values and have a visual representation of the pdf. I may try tcpdf in the event I want to make it an actual pdf again though I'm assuming it wont keep the form fields.

Related

extract info from jpeg with PHP

I want to extract variable lengths of information from a jpeg-file using PHP, but it is not exif-data.
If I open the jpeg with a simple text editor, I can see that the wanted informations are at the end of the file and seperated by \00.
Like this:
\00DATA\00DATA00DATA\00DATA\000\00DATA
Now if I use PHP's file_get_contents() to load the file into a string, the dividers \00 are gone and other symbols show up.
Like so:
ÿëžDATADATADATADATADATA ÿÙ
Could somebody please eplain:
Why do the \00 dividers vanish?
How to get the informations using PHP?
EDIT
The question is solved, but for those seeking a smarter solution, here is the file I try to obtain the DATA parts from: https://www.dropbox.com/s/5cwnlh2kadvi6f7/test-img.jpg?dl=0 (yes I know its corrupted)
Use instead $data = exif_read_data("PATH/some.jpg") it will give you all headers data about image, you can check its manual here - http://php.net/manual/en/function.exif-read-data.php
I came up with a solution on my own. May not be pretty, but works for me.
Using urlencode(file_get_contents()) I was able to retrieve the \00 parts as %00.
So now it reads like this:
%00DATA%00DATA%00DATA%00DATA%000%00DATA
I can split the string at the %00 parts.
I am going to accept this answer, once SO lets me do so and nobody comes up with a better solution.

Using non standard characters in associative array

Good day all! I am working on a parser for a chat room that can color text based on who was talking for archive purposes. I have it working perfectly, except now the administrator wants to be able to remove the "fancy" names and replace with more readable versions for some of their regular people.
The chat room allows an extended range of letters and symbols to use, that, when transferred to a rtf file, may not exactly transfer fully.
I cant get it to work, and dont see any reason why it should not.
This is an example of what I have:
$nameconvert = array(
"îrúål__Þħōþħ" => "Eriel__Thoth",
);
***Scripting that parses an uploaded text
file line by line, each line places in an
array using space as delimiter... thus
name of person talking is $row_data[0]***
$name = $row_data[0];
$name = $nameconvert[$name];
** Code to throw everything back together **
Now, this is just a simplified snippet, but for whatever reason, it does not work. Now if I did $name = $nameconvert['îrúål__Þħōþħ'] then it does work, telling me that the name im putting in script, and name being pulled from mytext file are two different things, though they are visually identical
HELP!
I have found the answer, and wish to share my solution to others.
This is the modified code
$nameconvert = array(
"0123456789abcdef" => "Eriel__Thoth",
);
***Scripting that parses an uploaded text
file line by line, each line places in an
array using space as delimiter... thus
name of person talking is $row_data[0]***
$name = $row_data[0]
$name = $nameconvert[bin2hex(mb_convert_encoding($name,"UTF-8"))];
$name = $nameconvert[$name];
** Code to throw everything back together **
The command bin2hex(mb_convert_encoding($name,"UTF-8")) takes the name from the file, ensures it is in UTF-8 format, then creates its hexadecimal equivalent. It then uses that in the array to correspond to a easier to read name
It works just the way I am wanting!

Create value with specific parts of a text file

Ok, I am working on a flatfile shoutbox, and I am trying to achieve a way to get the username from the flatfile and making it a variable so I can use it to make a call to the database to check if the user is admin so they can delete/ban users directly from the shoutbox.
This is an example line in the flatfile
<div><i><div class='date'>12/08/2012 18:56 pm </div></i> <div class='groupAdmin'><b>Admin</b></div><b>kira423:</b> hiya :D</div>
So I wanna take the username which is kira423 in this case and create a variable such as $shoutname and make it equal kira423
I have tried a google search and looked around on here, but was unable to find an answer, so I am hoping that I can get some insight on how to do this with a question of my own here.
Thanks,
Kira
You should use preg_match for those tasks like this:
preg_match_all('|<div class=\'date\'>(?P<date>.*?) .*<a.*>(?P<user>.*)</a>|i', $data, $matches);
var_dump($matches);
Interating through all array elements:
foreach ($matches['user'] as $key => $user) {
var_dump($user);
}
I think you should just parse each line in the flatfile as HTML (there are simple HTML tags used), just like described in PHP Parse HTML code (or type "php parse HTML" in google). Then you may access the username (kira123) from an array or whatever.
PS HTML is not the best way you can store messages to display. Even CSV seems to be better - it'd be "kira123;date;some text" - it's easier to read and to access each part. When displaying, use the standar decorator pattern.

Read the content of a PDF with PHP?

I need to read certain parts from a complex PDF. I searched the net and some say FPDF is good, but it cant read PDF, it can only write. Is there a lib out there which allows to get certain content of a given PDF?
If not, whats a good way to read certain parts of a given PDF?
Thanks!
I see two solutions here:
converting your PDF file into something else before: text, html.
using a library to do so and bad news here, most of them are written in Java.
https://whatisprymas.wordpress.com/2010/04/28/lucene-how-to-index-pdf-files/
What about that ?
http://www.phpclasses.org/package/702-PHP-Searches-pdf-documents-for-text.html
ps: I don't test this class, just read the description.
$result = pdf2text ('sample.pdf');
echo "<pre>$result</pre>";
How to get “clean” text :source code pdf2text
http://webcheatsheet.com/php/reading_clean_text_from_pdf.php

String/Paragraph/Document comparison in php

I'm trying to add a feature to generate a difference report between 2 20,000 character sections of text. I've done some Googling and I heard about Pear's diff library - which has been discontinued - and found this: https://github.com/paulgb/simplediff/blob/5bfe1d2a8f967c7901ace50f04ac2d9308ed3169/simplediff.php
Ideally I'd like to see what was removed, edited, or added and be able to show that to the user. Are there any libraries or simple ways of accomplishing this that you may know of?
I use this code in a live project
http://svn.geograph.org.uk/svn/branches/british-isles/libs/3rdparty/simplediff.inc.php
Example use
http://svn.geograph.org.uk/svn/branches/british-isles/public_html/article/diff.php
but the code is very simple
$a1 = explode("\n",$file1);
$a2 = explode("\n",$file2);
print diff2table($a1,$a2);
(the code just accepts the input as arrays, and outputs html table. But diff2table can be customised)

Categories