I have a quite complicated HTML/CSS layout which I would like to convert to PDF on my server. I already have tryed DOMPDF, unfortunately it did not convert the HTML with correct layout. I have considered HTMLDOC but I have heard that it ignores CSS to a large extent, so I suppose the layout would break apart with that tool too.
My question therefor is - are there any online demos for other tools (like wkhtmltopdf i.e.) that I could use to verify how my HTML is converted? Before spending the rest of my life installing & testing one by one?
Unfortunately, I can't change the HTML layout to fit those tools. Or better said - I could, if any of them would get close to an acceptable result...
Not really an answer but for the question above, but I'll try to provide some of my experience, maybe it will help someone somwhere in the future.
wkthmltopdf is really THE ONLY solution that worked for me that could produce what I call acceptable results. Still, some minor modifications to the CSS had to be made, however, it worked really well when it comes to rendering the content. All the other packages are really only suitable if you have a rather simply document with one basic table etc. No chance to get them to produce fair results on complex docs with design elements, css, multiple overlapping images etc. If complex documents are in game - do not spend the time (like I did) - go straight to wkhtmltopdf.
Beware - the wkhtmltopdf installation is tricky. It was not so easy for me as the guys said in their comments (one of the reasons might be that I am not too familiar with Linux). The static binary did not work for me for some reason I can't explain. I suspect that there were problems with the version - apparently there is a difference between versions for different OS and processors, maybe I have the vrong version. For installing the non-static version first of all you have to have root access to the server, that's obvious. I installed it with apt-get using PuTTy, went quite well. I was lucky that my server already had all the predispositions to install wkhtmltopdf. So this was the easy part for me :) (btw, you don't have to care for symbolic links or wrappers as many tutorials tell you - I spent hours trying to figure out how to do that part, in the end I gave it up and everything works well though)
After the install I got the quite famous Cannot connect to X server error. This is due to the fact that we need to run wkhtmltopdf headless on a 'virtual' x server. Getting around this was also quite simple (if one does not care for the symbolic links). I installed it with apt-get install xvfb. This also went quite well for me, no problems.
After completing this I was able to run wkhtmltopdf. Beware - it took me some time to figure out that trying to run xvfb was the wrong way - instead you have to run xvfb-run. My PHP code now looks like this exec("xvfb-run wkhtmltopdf --margin-left 16 /data/web/example.com/source.html /data/web/example.com/target.pdf"); (notice the --margin-left 16 command line option for wkhtmltopdf - it makes my content more centered; I left it in place to demonstrate how you can use command line options).
I also wanted to protect the generated PDF files from editing (in my case, print protect is also possible). After doing some research I found this class from ID Security Suite. First of all I have to say - IT'S OLD (I am running PHP 5+). However, I made some improvements to it. First of all - it's a wrapper around the FPDF library, so there is a file called fpdf.php in the package. I replaced this file from the latest FPDF version I got from here. It made my PHP warnings look more sustainable. I also changed the $pdf =& new FPDI_Protection(); and removed the & sign as I was getting an deprecated warning for it. However, there are more of those to come. Instead of searching and modifying the code I just turned the error reporting lvl to 0 with error_reporting(0); (although turning off the warnings only should be sufficient). Now someone will say that this is not "good practice". I am using this whole stuff on an internal system, so I do not really have to care. For sure the scripts could be modifiyed to match latest requirements. For me I didn't want to spend another hours working on it. Be careful where the script says $pdf->SetProtection(array('print'), '', $password); (I allowed printing my documents as you can see). It took me a while to figure out that the first argument is the permissions. The second is the USER PASSWORD - if you provide this then the docs will require a password to open (I left this blank). The third is the OWNER PASSWORD - this is what you need to make the docs "secured" against editing, copying etc.
My whole code now looks like:
// get the HTML content of the file we want to convert
$invoice = file_get_contents("http://www.example.com/index.php?s=invoices-print&invoice_no=".$_GET['invoice_no'];
// replace the CSS style from a print version to a specially modified PDF version
$invoice = str_replace('href="design/css/base.print.css"','href="design/css/base.pdf.css"',$invoice);
// write the modified file to disk
file_put_contents("docs/invoices/tmp/".$_GET['invoice_no'].".html", $invoice);
// do the PDF magic
exec("xvfb-run wkhtmltopdf --margin-left 16 /data/web/domain.com/web/docs/invoices/tmp/".$_GET['invoice_no'].".html /data/web/domain.com/web/docs/invoices/".$_GET['invoice_no'].".pdf");
// delete the temporary HTML data - we do not need that anymore since our PDF is created
unlink("docs/invoices/tmp/".$_GET['invoice_no'].".html");
// workaround the warnings
error_reporting(0);
// script from ID Security Suite
function pdfEncrypt ($origFile, $password, $destFile){
require_once('libraries/fpdf/FPDI_Protection.php');
$pdf = new FPDI_Protection();
$pdf->FPDF('P', 'in');
//Calculate the number of pages from the original document.
$pagecount = $pdf->setSourceFile($origFile);
//Copy all pages from the old unprotected pdf in the new one.
for ($loop = 1; $loop <= $pagecount; $loop++) {
$tplidx = $pdf->importPage($loop);
$pdf->addPage();
$pdf->useTemplate($tplidx);
}
//Protect the new pdf file, and allow no printing, copy, etc. and
//leave only reading allowed.
$pdf->SetProtection(array('print'), '', $password);
$pdf->Output($destFile, 'F');
return $destFile;
}
//Password for the PDF file (I suggest using the email adress of the purchaser).
$password = md5(date("Ymd")).md5(date("Ymd"));
//Name of the original file (unprotected).
$origFile = "docs/invoices/".$_GET['invoice_no'].".pdf";
//Name of the destination file (password protected and printing rights removed).
$destFile = "docs/invoices/".$_GET['invoice_no'].".pdf";
//Encrypt the book and create the protected file.
pdfEncrypt($origFile, $password, $destFile );
Hope this helps someone to save some time in the future. This whole solution took me like 12 hours to implement into our invoicing system. If there was better info on wkhtmltopdf for users like me, who are not that familiar with Linux/UNIX, I could have saved some of the hours spent on this.
However - what doesn't kill you makes you stronger :) So I am a bit more perfect now that I made this run :)
Related
So my problems stems from trying to generate large PDF files but being hit by the memory limit / execution timeouts in PHP, the data is too great in volume to simply extend these limits so that solution is out of the question.
I have a background shell task running which handles all of this rendering and then alerts the user once the PDF has been completed.
In theory I would have a loop within this shell which would take in a chunk of data and render it to file, then take the next chunk and do the same. Once out of data to render, the file would then be written and completed ready to be served. This way the memory limit of PHP would not be hit as a manageable chunk will only ever be loaded.
I am currently using the CakePDF(v.3.5) plugin for CakePHP 3 (v.3.5.13) but am sturggling to find a solution which allows the rendering of some data and then adding more data to the same pdf.
Has anyone managed this before or is it out of scope of the plugin? Would another solution be to create multiple PDF files and then merge them together after all separate PDF's have been created?
This is more of a theoretical question if this would work and if anyone has managed it before. I don't have much code to show but if more detail is required then give me a shout and I will try and get something for you or some example code!
Thanks
I don't have direct experience with that version CakePdf, but under CakePHP 2.x I use the wkhtmltopdf engine which takes an .html output to produce the PDF.
If your shell generates such .html in chunks, it is easy to append.
Of course wkhtmltopdf is likely to put some load on the machine to produce the PDF, but since it's a binary, it happens outside of PHP's memory/time contraints.
That's certainly out of the scope of the plugin, it's built around the idea of rendering a single view to a single file, the interface doesn't support chunked creation of a single file, and if I'm not mistaken, none of the supported engines do support that either, at least not in a straightforward and efficient manner when it comes to large documents.
There's certainly lots of ways to do this, creating multiple PDFs and merging/concatenating them afterwards might be one of them, generating the source content in chunks, and passing it to a PDF renderer that can handle lots of content efficiently might be another one, and surely there also might be libraries out there that do explicitly support chunked creation of PDFs...
I thought I would post what I ended up doing for anyone in the future.
I used CakePDF to generate smaller PDF's which I stored in a tmp directory these are all under the limit of PHP's execution time and memory limits as I don't believe altering those provides a good solution. In this step I also saved the names of all of the PDF's generated for use in the next step.
The code for this looked something like:
while (!is_last_pdf) {
// Generate pdf in here with a portion of the data
$CakePdf = new CakePdf();
$CakePdf->template('page', 'default');
$CakePdf->viewVars(compact('data', 'other_stuff'));
// Save file name to array
$tmp_file_list[] = $file_name;
// Update the is_last_pdf variable
is_last_pdf = check_for_more_data();
}
From this I used GhostScript from within the Shell to merge all of the PDF files, the code for this looked something like this:
$output_path = 'output.pdf';
$file_list = '';
// Create a string of all the files to merge
foreach ($tmp_file_list as $file) {
$file_list .= $file . ' ';
}
// Execute GhostScript to merge all the files into the `output.pdf` file
exec('gs -dBATCH -dNOPAUSE -sDEVICE=pdfwrite -sOUTPUTFILE=' . $output_path . ' ' . $file_list);
All of the code here was in the Shell file responsible for creating the PDF.
Hope this helps someone :)
I've run into this weird issue with PDF file handling. Not sure if SO is the right place to ask this, but I couldn't find any specific sites for this. I hope that someone can shed some light on the issue.
This happens with the following specific process, if some of steps are omitted - the issue is not observed.
I have a PHP application that serves PDF files to users. These files are created by authors in MS Word 2007, then printed to protected PDF (using pdf995, most likely, I can confirm if needed).
I'll call this initial PDF file as 'source' hereinafter.
Upon request, the source file is processed in PHP the following way:
we decrypt it using qpdf:
qpdf --decrypt "source.pdf" "tmp_output.pdf"
Then we add security label / wartermark to it, encrypt and output to browser using mPDF 6.0:
$mpdf = new mPDF();
$mpdf->SetImportUse();
$pagecount = $mpdf->SetSourceFile($fpath);
if ($pagecount) {
for ($i=1;$i<=$pagecount;$i++){
$tplId = $mpdf->ImportPage($i);
$mpdf->UseTemplate($tplId);
$html = '[security label / watermark contents...]';
$mpdf->WriteHTML($html);
}
}
$mpdf->SetProtection(array('copy','print'), '', 'password',128);
$mpdf->Output('final_output.pdf','I');
With the exact steps described above, images in the output that were pasted in the Word doc appear as follows:
In the source PDF, tmp_output (qpdf decrypted file) the pasted images look correct:
The distortion doesn't take place if any of the following occurs:
Word doc printed to PDF without protection
mPDF output is not protected.
As you can see there too many factors, so I don't know where to look for a bug.
Each component works correctly on it's own and I cannot find any info on the issue. Any insights are greatly appreciated.
EDIT 1
After some more testing, it appears that this only happens to screenshots taken from web browser, Windows explorer, MS Word. Cannot reproduce this with screenshots from Gimp.
It appears that something along the way attempts to convert white to alpha and fails.
The current version (6.1) of Mpdf has a bug which does not handle escaped PDF strings (imported via FPDI) correct if they should be encrypted.
A pull request, which fixes this issue is available here.
I have been trying to find a simple way to create OpenOffice calc files with no success.
I have tried:
openTBS - Seems to work writing an xml and a template file but can't find anything about how the xml file format.
Ods php generator - I tried this one as it provides clear examples, but when I copy the files to my server I always get corrupted files
Php doc writer - Tried an example and got an sxw file. I don't even know what that is
ODS-PHP - No documentation, only one example for creating 4 cells
Everything looks old, stalled and undocumented. ¿Any suggestion?
I have used opentbs successfully.
You can generate both excel and calc files. It also nice that you can "reuse" your html implementation so to speak.
Maybe this thread could get you going http://www.tinybutstrong.com/forum.php?thr=3069
Do the html version first.. then edit for calc/excel
Spout from Box works well enough for me. There are some missing features but it is simple to use, has a fluent API, and has no dependencies (it supports composer but you can use it standalone and its dependency graph has zero depth 😉 ).
Here's my "array of objects to ODS" pipeline, using Spout:
(I'm not using their recommended use import because all this code fits in a much larger file that I didn't want to contaminate and the $factory pattern looks cleaner to me anyway)
$factory = 'Box\Spout\Writer\Common\Creator\WriterEntityFactory';
$factory::createODSWriter()
->openToBrowser('filename.ods')
->addRow($factory::createRow([
$factory::createCell(__('Heading 1')),
$factory::createCell(__('Heading 2')),
$factory::createCell(__('Heading 3')),
]))
->addRows(array_map(function($row) use ($factory) {
return $factory::createRow([
$factory::createCell($row->first_val),
$factory::createCell($row->second_val),
$factory::createCell($row->third_val),
]);
}, loadDataFromSomewhere()))
->close();
I'm trying to build a simple PDF document signing routine, using PHP, openssl, and the Zend framework (for pdf redering/handling).
I've found this, but it simply won't work, Zend is not able to open any pdf's, not even Zend's own test pdf and Zend will not report why, only that it 'cannot'.
I'm pretty sure I would be able to create the keys/certs effectively as that is well documented, but is there a solid approach to attaching the generated certificate to the PDF as the above Zend extension suggests it once did?
function DigiSignPDF ($pdf_sign_request) {
if (get_magic_quotes_gpc()) {
$new_pdf = stripslashes($pdf_sign_request['raw_pdf']);
} else {
$new_pdf = $pdf_sign_request['raw_pdf'];
}
$test_pdf = stripslashes(file_get_contents('path/Some.pdf'));
$test_pdf2 = ('path/Some.pdf');
$pdf = Zend_Pdf::load($new_pdf2);
//below is the signing code, from another library, it works as long as Zend_Pdf works
$certificate = file_get_contents('path/certificate.p12');
$certificatePassword = 'test123';
if (empty($certificate)) {
throw new Zend_Pdf_Exception('Cannot retrieve/generate the certificate.');
}
$pdf->attachDigitalCertificate($certificate,$certificatePassword);
$eSig_pdf = $pdf->render();
file_put_contents('path/signed_pdf.pdf', $eSig_pdf);
}
Edit, adding code: The above only works if I use 'test_pdf2' as the input for Zend_Pdf. It recognizes the cert as binary with no problems, but I need to be able to pass the PDF without ever writing it to disk.
TCPDF supports signing of pdf files. Maybe you find something useful in the source code.
Adding my solution as answer, per halfer's advice: Solved this one, because I was passing the content to Zend_Pdf as a string, i should have been using Zend_Pdf::parse($new_pdf);, as it very likely says in the manual. (oops)
Further; I solved pretty much ALL of my problems with digitally signing PDFs of various versions and form constituents by moving to TCPDF, as several of the articles here suggest. A similar caveat was met with TCPDF though, when using strings, ensure that you are using TCPDF's 'writeHTMLCell' instead of 'writeHTML'. And watch for PHPs 'magic_quotes', errant whitespace, encoding, and goblins.
I am using PDF Jam for manipulating pdf. I need to add a text line at the bottom of generated file. I tried it but not able to made it.
Can anybody guide me how to do it?
I did in my php code as
$command = '-----------------';
exec($command);
As you know, PDFJAM is for manipulating pds. It is a small collection of shell scripts which provide a simple interface to much of the functionality of the excellent pdf pages. See the Ubuntu Manual
pdfjam - A shell script for manipulating PDF files
You should create your sheet as your doing (5x6) and create a separate sheet of minimal page size with required information than merge both the file into one.
Else in first step create your sheet and use pdflib to add text as second step. It very good tool. I hope its a good solution of your problem.
I love pdftk and so wanted to find a solution using that. The following worked for me.
pdfjam --preamble '\usepackage{fancyhdr} \topmargin 85pt \oddsidemargin 140pt \cfoot{\thepage}' --pagecommand '\thispagestyle{plain}' --landscape --nup 2x1 --frame false --clip true --trim ".5in 0.5in 0.5in .65in" --delta '-0.25in 0' tmp.pdf
I cribbed it from: Page Numbering with the "{page} of {pages}", removing the "of pages" part.
Command converts pdf to 2x1, trims margins, and crops. Output is landscape.
\topmargin and \oddsidemargin seem to tell pdflatex where to put the numbers.