'insufficient data for an image' message when opening PDF - php

My php cli application creates PDF's by using the TCPDF library. In most cases PDF's are created successfully, but sometimes a PDF is created that makes adobe reader pop up the error: 'insufficient data for an image'.
I obviously did some research on this message, and none of the named causes nor provided solutions are relevant in my case or solve the problem. Although Adobe products are the only ones that pop up an error and other PDF viewers can open the corrupt file and display it correct, this doesn't mean it's an adobe related problem. For an advanced PDF editor, NITRO 9, can display the corrupt files but at the same time does detect the issue and pops up an alert.
By using Nitro I am able to fix the PDF file. Steps are: extract the image from the corrupt image object from the pdf and then replace the image in the pdf file by the saved image...
The specific images that trigger the error/alert aren't of one type (f.e. jpg2000). BMP, png and gifs have triggered the erorr/alert as well.
I read in a few similar topics on stackoverflow that the 'XOBJECT stream' might be malformed? However, I have no idea how to check this.
I hope one of you guys knows where to look.
I tried to look for similar topics on the TCPF form as well, but the creator tells the topic starters this is an adobe issue or a pdf issue and he can't help them.
Attached files
I have uploaded two pdf files: one with a broken 'image' that triggers an error (naamloos1_bad.pdf) and one that is fixed (naamloos1_fixed.pdf) by using Nitro. You can download them here
I hope someone with knowledge of the PDF file type can compare these and let me know details on what's going wrong so I know what to look for in my code and that of the TCPDF library in order to fix this issue.
The bottom-right image in the PDF file is the one that triggers the alert/error.
Thanks!

Problem solved for me : when i add image in a pdf with TCPDF, i got the message "insufficient data for an image" on adobe pdf reader when i open my pdf (but same file is ok in chrome).
i open the same image in photoshop i got error 'bad profil icc'
so i remove icc profil with imagick (stripimage command)
$sPathImg= 'something.jpg';
$image = new Imagick($sPathImg);
$image->stripImage();
$image->writeImage($sPathImg);
Tadam : No more error when i open my pdf with adobe reader

Okay, so at least I found a workaround that I can automatize. I learned nothing more about this issue, but in case this topic ends dead and people read it in the future, here it is:
Im running Linux and calling ghostscript like this:
gs -sDEVICE=pdfwrite -dCompatibilityLevel=1.7 -dPDFSETTINGS=/printer -dNOPAUSE -dQUIET -dBATCH -dUseCIEColor -sOutputFile=ouput.pdf input.pdf
will repair the pdf in such a way the insufficient data error goes away. (Note that some arguments are set because I need to create pdfs for print. If you dont need it, set dPDFSETTINGS to /screen, f.e.)

Jerome had the answer for me.
In C#:
MagickReadSettings settings = new MagickReadSettings();
ImageMagick.MagickImageCollection images = new MagickImageCollection();
images.Read(sourceFile, settings);
images[0].Strip();
images.Write(targetFile);

Related

PDF manipulation - images are distorted after few consecutive operations on PDF file

I've run into this weird issue with PDF file handling. Not sure if SO is the right place to ask this, but I couldn't find any specific sites for this. I hope that someone can shed some light on the issue.
This happens with the following specific process, if some of steps are omitted - the issue is not observed.
I have a PHP application that serves PDF files to users. These files are created by authors in MS Word 2007, then printed to protected PDF (using pdf995, most likely, I can confirm if needed).
I'll call this initial PDF file as 'source' hereinafter.
Upon request, the source file is processed in PHP the following way:
we decrypt it using qpdf:
qpdf --decrypt "source.pdf" "tmp_output.pdf"
Then we add security label / wartermark to it, encrypt and output to browser using mPDF 6.0:
$mpdf = new mPDF();
$mpdf->SetImportUse();
$pagecount = $mpdf->SetSourceFile($fpath);
if ($pagecount) {
for ($i=1;$i<=$pagecount;$i++){
$tplId = $mpdf->ImportPage($i);
$mpdf->UseTemplate($tplId);
$html = '[security label / watermark contents...]';
$mpdf->WriteHTML($html);
}
}
$mpdf->SetProtection(array('copy','print'), '', 'password',128);
$mpdf->Output('final_output.pdf','I');
With the exact steps described above, images in the output that were pasted in the Word doc appear as follows:
In the source PDF, tmp_output (qpdf decrypted file) the pasted images look correct:
The distortion doesn't take place if any of the following occurs:
Word doc printed to PDF without protection
mPDF output is not protected.
As you can see there too many factors, so I don't know where to look for a bug.
Each component works correctly on it's own and I cannot find any info on the issue. Any insights are greatly appreciated.
EDIT 1
After some more testing, it appears that this only happens to screenshots taken from web browser, Windows explorer, MS Word. Cannot reproduce this with screenshots from Gimp.
It appears that something along the way attempts to convert white to alpha and fails.
The current version (6.1) of Mpdf has a bug which does not handle escaped PDF strings (imported via FPDI) correct if they should be encrypted.
A pull request, which fixes this issue is available here.

Saving images from sources without a specific mime type

As an example, say I'm trying to download App Icons from the Google Play store for a service. Here is an example URL:
https://lh6.ggpht.com/1eVPA6Iukw-F4i5xq1ZWicaKBzmprLGw98YhdG20E-wlsHHg3PcKJqbY_fWLdJeGRw=w512-rw
There is no mime type associated with the data provided, and when the file is saved any image viewers (or at least the ones I've tried) will say the file is corrupt. They will show up in Chrome and a couple of other things, but when inspecting the data, it's clear there's simply no mime type. This is an issue, because I am further using these data streams in other scripts which require that they be recognized as a specific type. (namely PNG)
I've tried things such as:
imagepng(imagecreatefromstring($icondata), $finaldir.'/icon.png');
Where $icondata is simply a curl response for the image. This will return an error saying that the data is of an unknown format for the imagecreatefromstring function. Of course, I've also tried:
file_put_contents($finaldir.'/icon.png', $icondata);
To no avail. It creates the file, but as I said, the image is not recognized as an image in most applications and in various analyzing functions. Is there a way to specifically set the mime type of a given string of data? Or some other workaround I'm not quite seeing?
Edit: Also, to note, there is nothing wrong with the $icondata variable. I have tried manually saving the image to a file through my web browser, and the same problem arises.
I have developed a small CMS for myself and was facing the same issue. After trying a lot, I have found a solution. It is working for me and I hope you will find it useful for your project too.
Regarding the corruption of file, for Chrome user-agent, Google servers send icons as .webp format and you need a PHP library for handling images of this type. And for other user-agents (like Firefox), images are sent as PNG.
Now comparing the two URLs for a same icon in Firefox or Chrome, you will notice that image paths generated for Chrome contain -rw at the end while the same URL in Firefox doesn't contain that -rw.
Not digging very deeper, simply remove the -rw from the end of the URL and copy the image. You will get a PNG image. A hint is here for you:
<?php
$image_path = "https://lh5.ggpht.com/8PODwBXKk4L201m4IO1wifRDfbn4Q1JxNxOzj-5TXPJ85_S-vOqntLi7TsVyeFQM0w4=w300-rw"; // Firefox app on Google Play
$png_path = substr($image_path, 0, -3);
copy($png_path, 'file.png');
?>
This will save the image as PNG. Please note that I have used substr() function to remove -rw from the end. To make it precise, you may use any other way to fix that part of the path.
P.S. You may also try sending a custom user-agent (i.e. Firefox) with your CURL request to receive the PNG path so you will not need to fix it by yourself :)
You can use HttpResponse::getContentType to determine type of content you're getting from URL
This not a png, but WebP
You can use it in PHP with
imagecreatefromwebp( string $filename );
More information:
- https://developers.google.com/speed/webp/
- http://php.net/manual/function.imagecreatefromwebp.php

How to convert a PDF into image exactly similar to the PDF with PHP/Imagemagik/Ghostscript

Im generating PDF documents with PHP(TCPDF is the library behind) and for displaying them Im converting them as images using ghostscript, and displaying the previews, but the preview doesnt actually similar to the PDF document.
The code Im using to convert is here
$pdf = 'my_report.pdf';
$output = 'my_preview.jpg';
$quality=90;
$res='300x300';
$exportPath=$output;
set_time_limit(900);
exec("'gs' '-dNOPAUSE' '-sDEVICE=jpeg' '-dUseCIEColor' '-dTextAlphaBits=4' '-dGraphicsAlphaBits=4' '-o$exportPath' '-r$res' '-dJPEGQ=$quality' '$pdf'",$output);
and the preview generated with the code for this document is right below
where as my actual PDF file looks like below
You can see a lot of inequalities between, I need a way to convert like just a copy of it.
and im sure there is nothing wrong in the PDf report, I tried it uploading it into Google mail, that gave a perfect image, and I did convert the PDf into jpeg here
http://pdf2jpg.net/
That to gave a perfect copy of the document, only the Imagemagick/Gjostscript is unable to generate an exact one.
Any help would be helpful.
What are you using to view the 'correct' display of the PDF ? Does Ghostscript issue you any warnings when rendering ?
It looks to me like there 'may' be fonts missing in your original PDF file, which will lead to font substitution.
Why are you using -dUseCIEColor ? This will almost certainly lead to colour shifts, which I also see in your images. If you have a good reason for using this, what is it ? If you don't have a good reason, don't do that.
Is the second image a JPEG ? The first clearly is, and jpeg is a lossy compression, have you tried using TIFF instead ?
It is always useful with these sorts of questions to post a link to the original PDF file, so that some investigation can be done, without that, this is all guesswork I'm afraid.

With PHP, how can I check if a PDF file has errors

I have a DB system built in PHP/MySql. I'm fairly new at this. The system allows the user to upload an invoice. Others give permission to pay the invoice. The accounting person uploads the check. After check is uploaded, it generates a PDF as a cover, then uses PDFTK (using Ben Squire's PDFTK-PHP-Library) to combine all of the files together and present the user with a single PDF to download.
Some users upload PDF files which cause PDFTK to hang indefinitely when it tries to combine the PDF with others (but most of the time it works fine). No returned error, just hangs. In order to get back onto the sytem, user must clear cache and re-log in. There are no error messages logged by the server, it just freezes. The only difference I can find in the files that do or do not work in looking at them with Acrobat is that the bad files are legal sized (8.5 x 14) ... but if I create my own legal sized file and try that, it works fine.
Using Putty I've gone to command line and replicated the same problem, PDFTK can't read the file, it hangs on the command line as well. I tried using PDFMerge which uses FPDF to combine the files and get an error with the file as well (The error I get back from this is: FPDF error: Unable to find object (4, 0) at expected location). On the command line I was able to use ImageMagick to convert PDF to JPG, but it gives me an error: "Warning: File has an invalid xref entry: 2. Rebuilding xref table." and then it converts it to a jpg but gives a few other less helpful warnings.
If I could get PHP to check the PDF file to determine if is valid without hanging the system, I could use ImageMagick to convert the file and then convert it back to a PDF, but I don't want to do this to all files. How can I get it to check the validity of the file when uploaded to see if it needs to be converted without causing the system to hang?
Here is a link to a file that is causing problems: http://www.cssc-testing.org/accounting/school_9/20130604-a1atransportation-1.pdf
Thanks in advance for any guidance you can offer!
My Code (which I'm guessing is not very clean, as I'm new):
$pdftk = new pdftk();
if($create_cover) { $pdftk->setInputFile(array("filename" => $cover_page['server'])); }
// Load a list of attachments
$sql = "SELECT * FROM actg_attachments WHERE trans_id = {$trans_id}";
$attachments = Attachment::find_by_sql($sql);
foreach($attachments as $attachment) {
// Check if the file exists from the attachments
$attachment->set_variables();
$file = $attachment->abs_path . DS . $attachment->filename;
if(file_exists($file)){
// Use the pdftk tool to attach the documents to this PDF
$pdftk->setInputFile(array("filename" => $file));
}
}
$pdftk->setOutputFile($save_file);
$pdftk->_renderPdf();
the $pdftk class it is calling is from: https://github.com/bensquire/php-pdtfk-toolkit
You could possibly use Ghostscript using exec() to check the file.
The non-accepted answer here may help:
How can you find a problem with a programmatically generated PDF?
I wont say this is an appropriate/best fix, but it may resolve your problem,
In: pdf_parser.php, comment out the line:
$this->error("Unable to find object ({$obj_spec[1]}, {$obj_spec[2]}) at expected location");
It should be near line 544.
You'll also likely need to replace:
if (!is_array($kids))
$this->error('Cannot find /Kids in current /Page-Dictionary');
with:
if (!is_array($kids)){
// $this->error('Cannot find /Kids in current /Page-Dictionary');
return;
}
in the fpdi_pdf_parser.php file
Hope that helps. It worked for me.

FPDF Image from Dynamic Source. php/png/pdf. Can't get image to work

http://babymoments.co/preview/highres%20preview/5_357/
According to the FPDF documentation here: http://www.fpdf.org/en/doc/image.htm
You are supposed to be able to use an Image from a dynamic source.. however, as per the first link, i'm getting an fopen error.
Any suggestions?
Code Snippet:
// Overlay Text & Images
$pdf->Image($conf['rbase'].'page_maker/image_hr.php?id=5&side=1&bg=cover_pink&lo=0_1&imgtxt=0|0|u5_1310329746.jpg##1|1|Elina\'s Puppies 9/2/2010|15|arial_bi.ttf|db0ddb|fedfe4&applet_type=cover',$sx,$sh,(0-$dpi), 0, 'png');
You are trying to open a local php file with get parameters - try instead to open the image file as a url. For example :
http://domain.com/image.php?id=5
Or in your case...
http://babymoments.co/preview/page_maker/image_hr.php?id=5&side=1&bg=cover_pink&lo=0_1&imgtxt=0|0|u5_1310329746.jpg##1|1|Elina's%20Puppies%209/2/2010|15|arial_bi.ttf|db0ddb|fedfe4&applet_type=cover
I wouldn't fetch the preview over HTTP but would include the required code and generate the image in the same script that generates the PDF. That way there is no problems with setups that have disabled fopen()-usage with URLs.
I was implementing a barcode for a label printer. So I was using the 'barcodegen' library and 'fpdf' library for this project, but I was having problems including dynamically the image generated with barcodegen, giving me the following error:
FPDF error: Not a PNG file: ./misc/barcodegen/mostrar-codigo-bcgcode39.php
Afeter that, I use one of the answers described here, and I solved the problem using the full URL of the image, like this:
$pdf->Image("http://localhost/caaf/misc/barcodegen/mostrar-codigo-bcgcode39.php",0,0,20,0,'PNG');
And it worked for me.

Categories