Error converting .docx file (with .emf image background) to PDF - php

A Laravel-based application is converting documents (.doc, .docx, .pdf, .png, .otd, html, etc) to PDF so that they can all be merged together into a master PDF document. It is using a combination of plugins like PHPWord and DOMPDF Wrapper to do the file loading and creation. Every once and awhile, the process encounters an error due to a Word file.
ERROR: PhpOffice\PhpWord\Exception\InvalidImageException: Invalid
image: zip:// ... #word/media/image2.emf
The error is caused by an image background within the document that acts like a watermark. The PHPWord part that errors out is the PhpOffice\PhpWord\Element\Image->checkImage() method, but happens when the file is trying to be loaded.
Settings::setPdfRendererName(Settings::PDF_RENDERER_DOMPDF);
$pdfWord = IOFactory::load(storage_path() . '/app/uploads/randomfile.docx', 'Word2007');
How can the application convert a Word document, with an EMF image embedded, to a PDF?
For more code/info on how to recreate the error, a few issues exist in the Github PHPWord library.
Support EMF image #1480
Read docx error when contains image from remote url #1173
The environment-related information:
Server: Windows / IIS
PHP: 7.2.11
Laravel: 5.7.15
PHPWord: 0.15.0
EDIT:
I also tried to come at this from a different angle, to no avail. I tried using PHP's ZipArchive to unzip the docx file, remove the emf image from the document (ZipArchive::deleteName()), remove the reference to the emf image in the [Content_Types].xml (ZipArchive::getFromName()), then zip the docx file back up but that did not work. I can open the new docx file and see that the image is gone, but the PHPWord error still persists in the application.

It looks like PHPWord has a feature request open to solve this issue.
https://github.com/PHPOffice/PHPWord/issues/1480
I think you're on the right path with the file alteration, there is probably a reference to the image you are missing somewhere that PHPWord is still trying to access.
I would unzip the file on your local drive and grep (search the contents of the exploded file) the directory for the file you are looking for. This will show you where else you may need to remove it from being referenced in the file.

Related

Laravel Streamed Response From Azure (Multiple Files)

I've been working on this setup but I cannot save the downloaded streamed response to my zip file. I am using ZipArchive package and currently when I use return on the one with the yellow arrow. It returns the correct pdf but when I try to put it in the zip folder. It doesn't recognize it as a pdf file and return null. I need to save multiple pdf file in the add Zip File but right now I'm trying with only one pdf for now.
Got it. What I did is convert it to a raw pdf file and read it using AddFromString.

PHPExcel Writer .xls file generates Excel error ... how can I change the extension to .csv?

I'm using a Wordpress plugin, wpdatatables, that uses PHPExcel to produce pdf, csv, and Excel file exports.
Anyway, when downloading an .xls file from my website, when I open it .. Excel gives me the error “The file format and extension of “blahblah.xls” don’t match. The file could be corrupted or unsafe. Unless you trust its source, don’t open it.”
Of course the file still opens fine, but I'd like to get rid of this error. One thing I noticed is the .CSV export is seemingly identical in all ways except the file extension, and opens without an error.
As someone who is not terribly familiar with PHP sadly, what direction should I look in to make PHPExcel produce .csv files only? Is there a specific function or directory in PHPExcel responsible for the Excel writer file output?

PHP not detecting XLS file as zip

I am using a library to read XLS files which internally uses PHP's zip_open() function. When creating the files locally and then uploading to my test server everything works fine. However, when I use the XLS files downloaded from a website (normal download via browser), it does not work, instead returning Error 19 meaning that the file is not seen as a zip file, which is incorrect. Excel opens the file without problems. If I re-save the file locally as an XLSX file and then upload it, I get the same error (in this instance the file is opened by the PHP's ZipArchive class). Any ideas what the reason could be? I checked that the files are not read only, possibly some Unix permissions could be set that are not displayed in Windows? (Doubt this, as the error code indicates that the file could be accessed, but could not be identified as XLS)
Using:
Apache under Windows (WAMP)
PHP 5.4.12
It seems I had misread a line of code, the zip check is only done to determine if the XLS file is an incorrectly named XLSX file. The problem with the XLS file is that it returns no sheets when parsing, I need to look into this still. I do not know why saving the XLS file as an XLSX file (using Excel) results in an incorrect ZIP archive though, but guessing it is related.

ZendService\LiveDocx returns zip file instead of docx

I'm using the ZendService\LiveDocx library, but when I run this on our stage server (Linux) and request the format to be docx, it returns me back a zip file rather than the actual document. The zip file consists of XML files describing the document. If I request the format to be PDF it works fine. This works fine in my local development environment (Windows 7) when I try to generate a docx document.
Any ideas why the LiveDocx service would return a zip file instead of the actual document?
The zip file consists of XML files describing the document.
That's how .docx files work.
A .docx in fact is a zip file, so you can simply rename them according to your needs.
You can try this by taking a "good looking" docx file and rename it to .zip, then extract.
The solution for your problem is to rename the file from .zip to .docx before exposing to the user as a download.

PHP - Grab PDF with URL that does not have the .pdf extension

I'm using Filepicker.io to upload PDFs to my application. I have all those URLs and now I am trying to merge some of those PDFs using the PDF Tool Kit PHP library. It was not working for me so I ran some tests using the "file_exists" on PHP and it kept returning false.
I think this has to do with the fact that the URL does not have a ".pdf" extension at the end. This is what they look like: "https://www.filepicker.io/api/file/LCvbgpqEQLGwt8bfnqc1"
Does anyone know how I can pull the PDF using PHP in order to merge those files using the PDF Toolkit Library?
Thanks!
Alain F.
file_exists doesn't work with URLs, only with local files. Instead download the file to the temp dir using the copy command.
If the file can't be downloaded, the copy command will return false.
$exists = copy('https://www.filepicker.io/api/file/LCvbgpqEQLGwt8bfnqc1', '/tmp/example.pdf');
if (!$exists) throw new Exception("PDF could not be downloaded");
Use the downloaded file in the PDF Tool Kit.
EDIT: This does not solve this particular problem but does address the theory that it didn't work because "the URL does not have a ".pdf" extension at the end."
You can add things to the end of the filepicker URL with a trailing +
The following urls are equivalent:
https://www.filepicker.io/api/file/LCvbgpqEQLGwt8bfnqc1
https://www.filepicker.io/api/file/LCvbgpqEQLGwt8bfnqc1+name.pdf

Categories