PDFParser (TCPdf) connection reset - php

I'm using PDFParser by Smalot
Some of my PDF's can be converted without any issues using PDF2Text. Unfortunately it often returns empty content or no line breaks for example. While the demo of PDFParser from Smalot converts all my pdf's without any problem. After installig with composer when using it I get a Connection Reset (ERR_CONNECTION_RESET).
I've tried set_time_limit to increase my execution time and setting the memory limit up to 1024M, but both had no success.
Calling my php from the command line works fine.
I've also tried to convert these pdf's on http://pdfparser.org/demo and here they get converted without any problem.
I'm not doing anything special in my index file.
Here's the content if it is in anyway useful
// Include Composer autoloader if not already done.
include 'vendor/autoload.php';
// Parse pdf file and build necessary objects.
$parser = new \Smalot\PdfParser\Parser();
$pdf = $parser->parseFile('test.pdf');
$text = $pdf->getText();
echo $text;

Related

Get PDF-page-size in points with php

I'd like to add an item on the last page of an existing PDF. The Item is added by an commercial non modifiable API.
The positon and size is definied relative to the Page. To support diffrent page formats, I have to get the page size of the existing PDF.
I've tried it with FPDI/FPDF but there are warnings. I'm not allowed to update the version.
Next try with the TCPDF doesn't throw an exception but after the import the function getNumPages returns 0;
Is there a possibility to get the page size of a PDF without using a library or a command line tool?
I have solved the problem by fixing the existing fpdi library. There were only small problems with the compatibility with newer PHP Versions. I've solved them. So I could use the following snippet:
$oPDF = new fpdi();
$iPageCount = $oPDF->setSourceFile($sFileLocation);
$mTemplateId = $oPDF->importPage($iPageCount);
$aTemplateSize = $oPDF->getTemplateSize($mTemplateId);

(Inline) PHP in domPDF 7.0

I switched from TCPDF to domPDF because it seems more convenient to handle when creating invoices from html to pdf (I am rather a low pro on PHP :)). Now that I created the html file as a PDF file I recognized it does not output any PHP in the PDF - since the data from my sql databanks should fill the PDF it is kinda a problem.
I saw that you can enable PHP in the options.php included in the src-folder and I tried to do like it is written in the manual (and also tried various other code lines) but it just doesn't want to work:
$root = realpath($_SERVER["DOCUMENT_ROOT"]);
require_once ("$root/../xxx/dompdf/autoload.inc.php");
use Dompdf\Dompdf;
use Dompdf\Options;
$options = new Options();
$options->setIsPhpEnabled('true');
$dompdf = new Dompdf($options);
$dompdf->loadHtml(file_get_contents("testdomhtml.php"));
$dompdf->setPaper('A4', 'portrait');
$dompdf->render();
$dompdf->stream("bla",array("Attachment"=>0));
The PDF is shown but without the input from any PHP code.
If someone would be so kind, I would also be interested in knowing why and in how far enabling PHP is a security risk since I actually want to use that for my business. Would it be more advisable to wrap it all up in the main php file without loading external html and css files?
Thanks a lot in advance!
You could do something like this (not tested the code). Replace
$dompdf->loadHtml(file_get_contents("testdomhtml.php"));
With
ob_start();
include 'testdomhtml.php';
$output = ob_get_clean();
$dompdf->loadHtml($output);
More options How to execute and get content of a .php file in a variable?
Your file_get_contents("testdomhtml.php") will get actual content of file and will not execute any code inside it. Instead make it web accessible and pass URL to this page:
$dompdf->load_html_file('http://yourdomain.ext/testdomhtml.php');

dompdf and php (mysql data)

I would like to pull data from a mysql db.
This data is then inserted into a html file which is then converted to a pdf using dompdf.
The template is perfect and display's well when I run call dompdf.
However as soon as I try and insert php code, the template still shows perfectly, how ever the php code is displays nothing. If I open the page its shows, so I know it works.
In the options file I have done this :
private $isPhpEnabled = true;
my php file to call the template (LeaseBase.php):
<?php
$options = new Options(); $options->set('isPhpEnabled','true');
$leasefile = file_get_contents("Leases/LeaseBase.php");
$dompdf = new Dompdf($options); $dompdf->loadHtml($leasefile);
$dompdf->stream(); $output = $dompdf->output(); file_put_contents('Leases/NewLeases.pdf', $output);
?>
I also can't seem to pick up anything in the log files.
Any assistance is appreciated
However as soon as I try and insert php code, the template still shows
perfectly, how ever the php code is displays nothing.
Answer: It shows nothing because when a php page is executed, it outputs html (and not the php code). If you don't have an echo or print or any code that generates html code from the php script, the page will in fact be blank.
It's important to remember that php is serverside code WHICH CAN generate html code as long as you instruct it accordingly.
With versions of Dompdf prior to 0.6.1 you could load a PHP document and the PHP would be processed prior to rendering. Starting with version 0.6.1 Dompdf no longer parses PHP at run time. This means that if you have a PHP-based document you have to pre-render it to HTML, which does not happen when using file_get_contents().
You have two options:
First: Use output buffering to capture the rendered PHP.
ob_start();
require "Leases/LeaseBase.php";
$leasefile = ob_get_contents();
ob_end_clean();
Second: Fetch the PHP file via your web server:
$leasefile = file_get_contents('http://example.com/Leases/Leasebase.php');
...though, actually, if I were loading the file into a variable and feeding it to dompdf without doing any further manipulation I would use dompdf to fetch the file instead. In this way you are less likely to have to deal with external resource (images, stylesheets) reference problems:
$dompdf->load_html_file('http://example.com/Leases/Leasebase.php');

Generate PDF from .docx generated by PHPWord

I am creating .docx files from a template using PHPWord. It works fine but now I want to convert the generated file to PDF.
First I tried using tcpdf in combination with PHPWord
$wordPdf = \PhpOffice\PhpWord\IOFactory::load($filename.".docx");
\PhpOffice\PhpWord\Settings::setPdfRendererPath(dirname(__FILE__)."/../../Office/tcpdf");
\PhpOffice\PhpWord\Settings::setPdfRendererName('TCPDF');
$pdfWriter = \PhpOffice\PhpWord\IOFactory::createWriter($wordPdf , 'PDF');
if (file_exists($filename.".pdf")) unlink($filename.".pdf");
$pdfWriter->save($filename.".pdf");
but when I try to load the file to convert it to PDF I get the following exception while loading the file
Fatal error: Uncaught exception 'BadMethodCallException' with message 'Cannot add PreserveText in Section.'
After some research I found that some others also have this bug (phpWord - Cannot add PreserveText in Section)
EDIT
After trying around some more I found out, that the Exception only occurs when I have some mail merge fields in my document. Once I removed them the Exception does not come up anymore, but the converted PDF files look horrible. All style information are gone and I can't use the result, so the need for an alternative stays.
I thought about using another way to generate the PDF, but I could only find 4 ways:
Using OpenOffice - Impossible as I cannot install any software on the Server. Also going the way mentioned here did not work either as my hoster (Strato) uses SunOS as the OS and this needs Linux
Using phpdocx - I do not have any budget to pay for it and the demo cannot create PDF
Using PHPLiveDocx - This works, but has the limitation of 250 documents per day and 20 per hour and I have to convert arround 300 documents at once, maybe even multiple times a day
Using PHP-Digital-Format-Convert - The output looks better than with PHPWord and tcpdf, but still not usable as images are missing, and most (not all!) of the styles
Is there a 5th way to generate the PDF? Or is there any solution to make the generated PDF documents look nice?
I used Gears/pdf to convert the docx file generated by phpword to PDF:
$success = Gears\Pdf::convert(
'file_path/file_name.docx',
'file_path/file_name.pdf');
You're trying to unlink the PDF file before saving it, and you have also to unlink the DOCX document, not the PDF one.
Try this.
$pdfWriter = \PhpOffice\PhpWord\IOFactory::createWriter($wordPdf , 'PDF');
$pdfWriter->save($filename.".pdf");
unlink($wordPdf);
I don't think I'm correct..
You save the document as HTML content
$objWriter = \PhpOffice\PhpWord\IOFactory::createWriter($phpWord, 'HTML');
After than you read the HTML file content and write the content as PDF file with the help of mPDF or tcPdf or fpdf.
Try this:
// get the name of the input PDF
$inputFile = "C:\\PHP\\Test1.docx";
// get the name of the output MS-WORD file
$outputFile = "C:\\PHP\\Test1.pdf";
try
{
$oLoader = new COM("easyPDF.Loader.8");
$oPrinter = $oLoader->LoadObject("easyPDF.Printer.8");
$oPrintJob = $oPrinter->PrintJob;
$oPrintJob->PrintOut ($inputFile, $outputFile);
print "Success";
}
catch(com_exception $e)
{
Print "error code".$e->getcode(). "\n";
print $e->getMessage();
}

PDF Parser PHP Library Not Working

I'm using the PDF Parser PHP library to parse the text from several PDFs. It works perfectly for a majority of these, but seems to just timeout and stop working for certain PDFs.
This is the code I'm using (straight from their demo page):
<?php
include 'vendor/autoload.php';
$parser = new \Smalot\PdfParser\Parser();
$pdf = $parser->parseFile('document.php');
$text = $pdf->getText();
echo $text;
?>
When I replace 'document.pdf' with the URL to this file, it works perfectly as expected.
However, when I replace 'document.pdf' with the URL to this file, it just times out with a blank page.
Any ideas why it would work for one file and not the other?
Thanks in advance for any advice!
yes this "ghost" error I saw it too, nothing even in error_log, nor tripped in try catch very hard to diagnose if you increase the memory_limit in php.ini it goes away, it's either something to do with the bad garbage collection on the developers part or ballooning - i think the latter because my loop failed after 4 pdf's but when I quadrupled available ram it didn't fail after 60

Categories