FPDF Error: Error while decompressing stream - php

I've built a web application incorporating the fpdf library which allows clients to upload pdf files which my system then combines into a monthly report (adding a cover, contents page etc.).
Last month I got this error:
FPDF Error: Error while decompressing stream
I've googled it and the only people who have encountered it before seem to be German!
The error handler is at line 241 of fpdi_pdf_parser.php and refers to "case '/FlateDecode':" and other things I don't understand.
I traced the problem to a single pdf file which appeared normal but consistently caused the problem. I created a new version of the pdf by screen grabbing from the old one and when I uploaded that everything worked.
As I say I got round the problem but don't really understand how and don't want to run into the same thing again.
Any ideas what was going on?
Thanks in advance.

PDF files can be compressed in different ways with different algorithms, if your application is open to receive any file it is possible that you got a corrupt one that FPDF was not able to decompress. Even in such scenarios (I mean corrupt files) other PDF parsers/readers may be able to recover the file and show the content (or some part of it), but it does not mean the file is valid.
It is also possible that this file contains some specific feature from the PDF specification that is not supported by FPDF. If it is an option for you to post the offending file it might be possible to narrow down the issue a bit more.

usual in such cases helps install or update zlib module by PHP. The problem also arises due to the pictures are inserted into the pdf-document (see requirements by image on http://www.fpdf.org/en/doc/image.htm).

Related

How to add header in an uploaded PDF using barryvdh/dompdf

I am using barryvdh/dompdf in my project to deal with pdfs. Yes as you are thinking with php and Laravel.
In backend, I have a form which uploads a pdf and from the frontend we can download it.
This functionality is working perfectly.
But now I want to add a header in that uploaded pdf to add my project's name.
I have worked with pdf generation by this package so I know how this package works but I can't find anything about editing a pdf.
I have been doing some research on google but can't find anything related to my problem.
I haven't done anything so far related to this so I have no more details or code to show but to clear confusion let me just show you the points you need to keep in mind and forget anything you read till now.
I am using barryvdh/dompdf to deal with pdfs.
Want to edit a pdf which is uploaded in the system and will be downloaded afterwards.
I Want to add a header.
I just have a mere idea(maybe i am wrong) that it can be done at any of the two stages-
1. At upload time
2. At download time
[Note]: Please don't close it as off-topic or duplicate or something similair. Please do read the whole post before doing anything like that. Because like I said, I have done some research and haven't found anything.
This can't be done in DOMPDF.
I guess the solution is to either use native-ish php commands, another package like FPDI or to drop the idea altogether.

Create a PDF file on the fly and stream it while it is not yet finished?

We want to merge a lot of PDF files into one big file and send it to the client. However, the resources on our production server are very restricted, so merging all files in memory first and then sending the finished PDF file results in our script being killed because it exhausts its available memory.
The only solution (besides getting a better server, obviously) would be starting to stream the PDF file before it is fully created to bypass the memory limit.
However I wonder if that is even possible. Can PDF files be streamed before they're fully created? Or doesn't the PDF file format allow streaming unfinished files because some headers or whatever have to be set after the full contents are certain?
If it is possible, which PDF library supports creating a file as a stream? Most libraries that I know of (like TCPDF) seem to create the full file in memory and then in the end output this finished result somewhere (i. e. via the $tcpdf->Output() method).
The PDF file format is entirely able to be streamed. There's certainly nothing that'll prevent it anyway.
As an example, we recently had a customer that required reading a single page over a HTTP connection to a remote PDF, without downloading or reading the whole PDF. We're able to do this by making many small HTTP requests for specific content within the PDF. We use the trailer at the end of the PDF and the cross reference table to find the required content without having to parse the whole PDF.
If I understand your problem, it looks like your current library you're using loads each PDF in memory before creating or streaming out the merged document.
If we look at this problem a different way, the better solution would be for the PDF library to only take references to the PDFs to be merged, then when the merged PDF is being created or streamed, pull in the content and resources from the PDFs to be merged, as-and-when required.
I'm not sure how many PHP libraries there are that can do this as I'm not too up-to-date with PHP, but I know there are probably a few C/C++ libraries that may be able to do this. I understand PHP can use extensions to call these libraries. Only downside is that they'll likely have commercial licenses.
Disclaimer: I work for the Mako SDK R&D group, hence why I know for sure there are some libraries which will do this. :)

DOMPDF not working for data:image on one computer, fine on another

I have a bit of an interesting one here. In our team there are 2 people working on one project on macs, both using MAMP Pro and the same codebase, why then is it that one machine can happily produce PDF documents with images in in the data:image... format and the other one can't?
Both can see the data uri and both can generate the html with it in, both are happy with normal images but one of them throws an error: file_put_contents(): Filename cannot be empty error.
I'm at a loss with this so can anyone please shed some light on it?
Many thanks everyone,
Gareth
Quick requirements for image processing:
Read/Write access to DOMPDF_TEMP_DIR
Availability of the GD PHP extension
For images fetched via web server, allow_url_fopen enabled

Check if a PDF file is corrupted with PHP

I was wondering if is there a way for php to check if a PDF file stored locally on the server is corrupted or not. We have a php application that deals with a lot of scanned documents converted in PDF and it would be nice to check which of them is corrupted to alert the user.
I tried to look around but with no luck.
There are versions of pdflib available which can read PDFs - you could simply try to open and read each page with that.
The problem is there are many ways a PDF file can be corrupt.
Maybe your best solution would be to find a PDF reading lib and try to extract the first word from each page or something. That would at least catch some basic types of corruption.

PDFLib in PHP hogging resources and not flushing to file

I just inherited a PHP project that generates large PDF files and usually chokes after a few thousand pages and several gigs of server memory. The project was using PDFLib to generate these files 'in memory'.
I was tasked with fixing this, so the first thing I did was to send PDFLib output to a file instead of building in memory. The problem is, it still seems to be building PDFs memory. And much of the memory never seems to be returned to the OS. Eventually, the whole things chokes and dies.
When I task the program with building only snippets of the large PDFs, it seems that the data is not fully flushed to the file on end_document(). I get no errors, yet the PDF is not readable and opening it in a hex editor makes it obvious that the stream is incomplete.
I'm hoping that someone has experienced similar difficulties.
Solved! Needed to call PDF_delete_textflow() on each textflow, as they are given document scope and don't go away until the document is closed, which was never since all available memory was exhausted before that point.
You have to make sure that you are closing each page as well as closing the document. This would be done by calling the "end_page_ext" at the end of every written page.
Additionally if you are importing pages from another PDF you have to call "close_pdi_page" after each improted page and "close_pdi_document" when you're done with each imported document.

Categories