I need to find a certain key in a pdf file. As far as I know the only way to do that is to interpret a pdf as txt file. I want to do this in PHP without installing a addon/framework/etc.
Thanks
You can certainly open a PDF file as text. PDF file format is actually a collection of objects. There is a header in the first line that tells you the version. You would then go to the bottom to find the offset to the start of the xref table that tells where all the objects are located. The contents of individual objects in the file, like graphics, are often binary and compressed. The 1.7 specification can be found here.
I found this function, hope it helps.
http://community.livejournal.com/php/295413.html
You can't just open the file as it is a binary dump of objects used to create the PDF display, including encoding, fonts, text, images. I wrote an blog post explaining how text is stored at http://pdf.jpedal.org/java-pdf-blog/bid/27187/Understanding-the-PDF-file-format-text-streams
Thank you all for your help. I owe you this piece of code:
// Proceed if file exists
if(file_exists($sourcePath)){
$pdfFile = fopen($sourcePath,"rb");
$data = fread($pdfFile, filesize($sourcePath));
fclose($pdfFile);
// Check if file is encrypted or not
if(stripos($data,$searchFor)){ // $searchFor = "/Encrypt"
$counterEncrypted++;
}else{
$counterNotEncrpyted++;
}
}else{
$counterNotExisting++;
}
Related
I have a series of base64 PDF Files that I would like to merge together. Currently I am using file_get_contents() and with PHPMailer can attach each of them separately.
$woFile = file_get_contents($url);
$invoiceFile = file_get_contents($invPDF64);
$tsFile = file_get_contents($tsPDF64);
...
$mail->AddStringAttachment($woFile, "1.pdf", "base64", "application/pdf");
$mail->AddStringAttachment($invoiceFile, "2.pdf", "base64", "application/pdf");
$mail->AddStringAttachment($tsFile, "3.pdf", "base64", "application/pdf");
All the examples I've seen online such as FPDF require the file to be locally downloaded, at least from what I saw. Is there a way to append each of these PDF files into one, and then have that attached to the email?
Thanks in advance!
I'm not sure if you specifically need to merge the PDFs into one PDF, or if you just want one file. Here are options for both:
If you want to merge all PDFs into a single PDF file, then this is a duplicate question. You mention not wanting to have a local file, but this may be an unreasonable constraint (e.g., memory issues with large PDFs). Use temporary files as appropriate and clean up after yourself.
If you just want a single file, consider putting the files into a ZIP archive and sending that. You might also like the ZipStream library for this purpose. Here's some minimal code using the native library:
$attachmentArchiveFilename = tempnam('tmp', 'zip');
$zip = new ZipArchve();
# omitting error checking here; don't do it in production
$zip->open($attachmentArchiveFilename, ZipArchve::OVERWRITE);
$zip->addFromString('PDFs/first.pdf', $woFile);
$zip->addFromString('PDFs/second.pdf', $invoiceFile);
$zip->addFromString('PDFs/third.pdf', $tsFile);
$zip->close();
$mail->addAttachment($attachmentArchiveFilename, 'InvoicePDFs.zip');
# be sure to unlink/delete/remove your temporary file
unlink( $attachmentArchiveFilename );
I've run into this weird issue with PDF file handling. Not sure if SO is the right place to ask this, but I couldn't find any specific sites for this. I hope that someone can shed some light on the issue.
This happens with the following specific process, if some of steps are omitted - the issue is not observed.
I have a PHP application that serves PDF files to users. These files are created by authors in MS Word 2007, then printed to protected PDF (using pdf995, most likely, I can confirm if needed).
I'll call this initial PDF file as 'source' hereinafter.
Upon request, the source file is processed in PHP the following way:
we decrypt it using qpdf:
qpdf --decrypt "source.pdf" "tmp_output.pdf"
Then we add security label / wartermark to it, encrypt and output to browser using mPDF 6.0:
$mpdf = new mPDF();
$mpdf->SetImportUse();
$pagecount = $mpdf->SetSourceFile($fpath);
if ($pagecount) {
for ($i=1;$i<=$pagecount;$i++){
$tplId = $mpdf->ImportPage($i);
$mpdf->UseTemplate($tplId);
$html = '[security label / watermark contents...]';
$mpdf->WriteHTML($html);
}
}
$mpdf->SetProtection(array('copy','print'), '', 'password',128);
$mpdf->Output('final_output.pdf','I');
With the exact steps described above, images in the output that were pasted in the Word doc appear as follows:
In the source PDF, tmp_output (qpdf decrypted file) the pasted images look correct:
The distortion doesn't take place if any of the following occurs:
Word doc printed to PDF without protection
mPDF output is not protected.
As you can see there too many factors, so I don't know where to look for a bug.
Each component works correctly on it's own and I cannot find any info on the issue. Any insights are greatly appreciated.
EDIT 1
After some more testing, it appears that this only happens to screenshots taken from web browser, Windows explorer, MS Word. Cannot reproduce this with screenshots from Gimp.
It appears that something along the way attempts to convert white to alpha and fails.
The current version (6.1) of Mpdf has a bug which does not handle escaped PDF strings (imported via FPDI) correct if they should be encrypted.
A pull request, which fixes this issue is available here.
I am extracting text from PDF files. this is the code:
<?php
require("PdfToText.php");
$file = 'SamplePF' ;
$pdf = new PdfToText ( "$file.pdf" ) ;
echo ( $pdf -> Text ) ;
?>
This class work fine for some PDF files.
The problem with this class is :
for some PDF files it take text from random page/line not in the
page sequence wise.
for some PDF files it is not showing any result.
for some PDF files it extract only one or two lines.
Please suggest some solution. Thank You!
I am not sure that this might be the exact problem because of which you are not able to extract but I also encountered something similar when extracting data from pdf. Sometimes the PDF files are locked by owner passwords which puts certain restrictions on the document and does not allow changing, content copying or extraction etc so as to protect its copyright issues. Check this link for more info on owner passwords.
So you can first try to remove owner password and then try to extract such pdf's. To remove owner passwords there are a number of tools available online, you can choose whichever fits you the best.
So I'm making a notepad app in PHP, but I want to add the ability to share the file amongst your peers or something.
It's based on AJAX, and it saves the file automatically, and the file is named to what your IP address is after being hashed in md5.
What I want to do is maybe go to /view/837ec5754f503cfaaee0929fd48974e7, while the actual text file is located at /notes/837ec5754f503cfaaee0929fd48974e7.txt
I know I'll have to use file_get_contents(), but I don't know how to display it on a page.
I could just have it link to the .txt file, but I don't want it raw. I want it to have some style.
How would I go about doing this? Where can I start?
First you would need a way to store a variable in the URL (the file name). This can be easiest done using the querystring.
So the link to a file for your user to see would be '/view/?file=MYFILENAME'
This would then be interpreted by your php (this could also be wrapped in AJAXy goodness) into a path to retrieve the text file from.
view/index.php
//Fetch the file based on the get variable
//Note the relative path
$file = file_get_contents('../notes/'.$_GET['file'].'.txt');
//Print the file. You can also dress it up or wrap it in HTML tags
echo $file;
When displaying the text file, there is some built in functions that will help. Most notable nl2br() which takes the new line characters in a text file and makes them into html <br> tags.
More reading on the GET array can be found here
I am using PHPRtfLite library (http://sigma-scripts.de/phprtflite/docs/index.html) to produce an RTF file using PHP and Yii.
So far, I've made a simple "Hello world" function.
Yii::import('ext.phprtf.PHPRtfLite');
Yii::registerAutoloader(array('PHPRtfLite','registerAutoloader'), true);
$rtf = new PHPRtfLite();
$sect = $rtf->addSection();
$sect->writeText('Hello world!', new PHPRtfLite_Font(), new PHPRtfLite_ParFormat());
//save rtf document
$rtf->sendRtf('takis.rtf');
File is created successfully, but when I open it (either wordpad or ms word) I do not see the actual content of the file but the raw code of the RTF:
{\rtf\ansi\deff0\fs20
{\fonttbl{\f0 Times New Roman;}}
{\colortbl;\red0\green0\blue0;}
{\info
}
\paperw11907 \paperh16840 \deftab1298 \margl1701 \margr1701 \margt567 \margb1134 \pgnstart1\ftnnar \aftnnrlc \ftnstart1 \aftnstart1
\pard \ql {\fs20 Hello world!}
}
Do you have any idea on how to solve this?
Thank you very much in advance.
To answer my own question, in case someone is having the same issue in the coming future...
It seems to be a problem of the sendRTF function. Now, I save the created file locally:
$rtf->save('takis.rtf');
and then generate a link for the user to download the file. This works pretty good.
I have experienced same thing myself. I'm not sure, if you had same reasons, but in my case, there was extra newline in the beginning of PHP file, before <?php tag. When I used sendRtf to download file from browser, that newline ended up also in RTF file, making it invalid and as result, raw rtf code was displayed. When using save, such extra characters won't reach to file.
So one thing to check in similar situations - open Rtf file in Notepad and examine beginning of file.