check valid docx from linux command line

check valid docx from linux command line - php

I generate docx files in a php script, but sometimes they are corrupted. This is not known by the server and it returns the docx file to the user and he discovers that it's is corrupted, creating a very bad experience.
Does someone have a solution to check in linux cli if the docx is corrupted? So I could be more resilient, trying to fix it or give a proper response to the user.
By now I'm experimenting with:
libreoffice --headless --convert-to html corrupted.docx
But if the file is not corrupted, most of cases, it will increase the response time.
you can debug with this corrupted file

You could call a PHP script opening the doc with PHPWord which can report on success for failure. See this example:
include_once 'Sample_Header.php';
// Read contents
$name = basename(__FILE__, '.php');
$source = __DIR__ . "/resources/{$name}.docx";
echo date('H:i:s'), " Reading contents from `{$source}`", EOL;
$phpWord = \PhpOffice\PhpWord\IOFactory::load($source);
return $phpWord instanceof PhpOffice\PhpWord\PhpWord;

Related

Convert .HEIC to .JPG with ImageMagick in PHP

I am looking to try and create a small image converter that would convert HEIC files that are uploaded to a php web document to .JPG (or any other generic file format).
I am running PHP off a unix server and have ImageMagick installed on the server. The following command line code works from the server:
mogrify -format jpg *.HEIC
I'd like to convert this command line code to PHP.
As mentioned I like to convert the command line formatting code to PHP. I currently have the following code set up in a basic HTML + PHP form. The file being converted is newly uploaded and not located on the server. If necessary I can upload to the server first then read form the server file.
if($_SERVER["REQUEST_METHOD"] == "POST")
{
if(empty($_FILES['image_url']['name']))
{
echo "No File uploaded";
}
else{
$uploadedImage = fopen($_FILES['image_url']['tmp_name'], 'rb');
$image_to_convert = new Imagick();
$image_to_convert->readImageFile($uploadedImage);
$image_to_convert->setFormat("jpg");
$image_to_convert->setFileName('test.jpg');
header('Content-Type: image/jpg');
header('Content-disposition: attachment; filename='.$image_to_convert->getFileName());
header("Content-Description: File Transfer");
readfile($image_to_convert);
}
}
This code downloads a "test.jpg" file, but when I try to open it in Windows image viewer it displays a "It looks like we don't support this file format" message. I'm relatively new to PHP so I don't know all the tricks for output/input streams so if my code is horrible let me know.
Any and all help is welcome. Thanks!

I think you need to specify 'jpeg' rather than 'jpg' for your format.
$image_to_convert->setFormat("jpeg");

Managed to get it working after I changed from image formats like png and jpeg, and instead downloaded the converted HEIC image as a .PDF file.
I also had to upload (write) the PDF file to the server first before I was able to download (read) it through the website.

PHP - Converting Excel to PDF (Phpspreadsheet) - Operation not permitted

I am trying to convert an Excel file to PDF (Base64).
This is the code that converts the Excel to PDF:
$spreadsheet = $this->objPHPExcel = \PhpOffice\PhpSpreadsheet\IOFactory::load("MyExcelFile.xlsx");
$class = \PhpOffice\PhpSpreadsheet\Writer\Pdf\Mpdf::class;
\PhpOffice\PhpSpreadsheet\IOFactory::registerWriter('Pdf', $class);
$this->objPHPExcel = \PhpOffice\PhpSpreadsheet\IOFactory::createWriter($spreadsheet, 'Pdf');
$this->objPHPExcel->writeAllSheets();
//Save the file. (THIS IS WHERE THE ERROR OCCUR)
$this->objPHPExcel->save(storage_path() . '/app/temp_files/' . $newFileName);
Everything works locally, but whenever I try to run the same code on my Laravel Forge server, I get below error:
unlink(/tmp/imagick-3.4.0.tgz): Operation not permitted
If I trace the error, it is in this specific line:
$this->objPHPExcel->save(storage_path() . '/app/temp_files/' . $newFileName);
As said, this code runs fine locally. The temp file $newFileName is created inside my /temp_files folder.
What am I doing wrong?

Ok so the solution to this was rather tricky. I found out that it had nothing to do with Phpspreadsheet but rather Mpdf.
The problem was that the file "imagick-3.4.0.tgz" file permission was set to read only. Mening that unlink could not work on this specific file. This goes all the way back to when I first installed the imagick library.
The solution was to go to the /tmp folder and delete the imagick-3.4.0.tgz file manually. This folder should actually be deleted when doing the imagick installation.

I can't make ImageMagick work from a browser

I'm trying to convert a PDF file to JPG files.
The PDF is created using Prince, and immediately after, I call the function that calls ImageMagick. Here's the content of the said function:
if (!file_exists(Settings::getPDFFilePath())) {
Log::l(true, "ERROR: File \"" . Settings::getPDFFilePath() . "\" doesn't exist.");
throw new SeverityException(SeverityException::MISSINGPDFFILE);
}
if ((!file_exists(Settings::getPreviewJPGDirectory())) && (!mkdir(Settings::getPreviewJPGDirectory()))){
Log::l(true, "ERROR: Preview JPG directory couldn't be created.");
throw new SeverityException(SeverityException::UNWRITABLE_BOOK_DIRECTORY);
} elseif (!chmod(Settings::getPreviewJPGDirectory(), 0777)) {
/** Files might be modified by other script/user. */
Log::l(true, "WARNING: Access rights could not be modified for Preview JPG directory. Any further modification might become impossible.");
Settings::submitException(new SeverityException(SeverityException::JPG_DIR_RIGHTS_UNMODIFIABLE));
}
$convert = "/usr/local/bin/convert -quality 100 -density 100x100 /path/to/pdf/file.pdf /path/to/jpg/file.jpg 2>&1";
exec($convert, $output, $res);
Here's the thing:
When I call ImageMagick from a command-line with my user or the user _www, it works.
When I call the php script from a command-line, using my user or the user _www, it works.
But when I call the php script from a browser (ImageMagick is then called by the user _www, I've checked) I get this error:
convert: no images defined `/path/to/jpg/file.jpg\' # error/convert.c/ConvertImageCommand/3187.'
The pdf file permissions are 666 and the jpg's destination folder's permissions are 777.
I doubt that the problem comes from Prince, and it seems obvious to me that it's not access-rights related either.
Both command-line mode and Apache use the same php.ini file (/etc/php.ini).
I may have missed something, but I really don't know what...
Edit: Oh, and I'm using MacOS Maverick, but I don't think that's relevant.
Edit2: I just tried with pdftopng (look XPDF up for more info) and it works fine. So the problem definitely comes from ImageMagick.

File is not readable with Excelwriter and phpExcelReader 2

I use Excel Writer of Harish Chauhan to generate an excel (xls) file.
Then I use phpExcelReader 2 to read the file created by the Excel Writer class but have this error all the time :
The filename myXls.xls is not readable
I can open the "myXls.xls" file with MS Excel. But if I save the file with another name , it can be read successfully.
Try to explore the code, it seems that the error was given by :
if (substr($this->data, 0, 8) != IDENTIFIER_OLE) {
//echo 'Error';
$this->error = 1;
return false;
}
IDENTIFIER_OLE was defined :
define('IDENTIFIER_OLE', pack("CCCCCCCC",0xd0,0xcf,0x11,0xe0,0xa1,0xb1,0x1a,0xe1));
I dont have any idea about how to fix it. Please help.
Thanks for your time!

The file generated by Harish Chauhan's ExcelWriter class is not an actual OLE BIFF .xls file, but a mix of HTML markup and some elements from SpreadSheetML, the XML format defined by Microsoft as an alternative to BIFF in Excel 2003. It never proved particularly popular; but the later versions of MS Excel itself can still read and write this format. MS Excel is also very forgiving about reading HTML markup, though the latest version will give you a notice informing you if a file format does not match its extension.
phpExcelReader 2 is written to read Excel BIFF files, therefore it is incapable of reading the non-OLE/non-BIFF files generated by Harish Chauhan's class.
If you want to write and read files in the correct format, then I suggest you use PHPExcel, or one of the many other PHP libraries that work with genuine Excel files.

I had the same problem. The task was to parse very old XLS file (Excel2). I could not find any library in PHP which works with such an old format.
So the solution was to make conversion with LibreOffice command line to XLSX (works to CSV also) and then parse it with any "moderner" Excel parser.
We got LibreOffice installed in our server and this is the command to convert:
libreoffice --headless --convert-to xlsx original_source.xls
or
libreoffice --headless --convert-to csv original_source.xls

Dynamically created zip files by ZipStream in PHP won't open in OSX

I have a PHP site with a lot of media files and users need to be able to download multiple files at a time as a .zip. I'm trying to use ZipStream to serve the zips on the fly with "store" compression so I don't actually have to create a zip on the server, since some of the files are huge and it's prohibitively slow to compress them all.
This works great and the resulting files can be opened by every zip program I've tried with no errors except for OS X's default unzipping program, Archive Utility. You double click the .zip file and Archive Utility decides it doesn't look a real zip and instead compresses into a .cpgz file.
Using unzip or ditto in the OS X terminal or StuffIt Expander unzips the file with no problem but I need the default program (Archive Utility) to work for the sake of our users.
What sort of things (flags, etc.) in otherwise acceptable zip files can trip Archive Utility into thinking a file isn't a valid zip?
I've read this question, which seems to describe a similar issue but I don't have any of the general purpose bitfield bits set so it's not the third bit issue and I'm pretty sure I have valid crc-32's because when I don't, WinRAR throws a fit.
I'm happy to post some code or a link to a "bad" zip file if it would help but I'm pretty much just using ZipStream, forcing it into "large file mode" and using "store" as the compression method.
Edit - I've tried the "deflate" compression algorithm as well and get the same results so I don't think it's the "store". It's also worth pointing out that I'm pulling down the files one a time from a storage server and sending them out as they arrive so a solution that requires all the files to be downloaded before sending anything isn't going to be viable (extreme example is 5GB+ of 20MB files. User can't wait for all 5GB to transfer to zipping server before their download starts or they'll think it's broken)
Here's a 140 byte, "store" compressed, test zip file that exhibits this behavior: http://teknocowboys.com/test.zip

The problem was in the "version needed to extract" field, which I found by doing a hex diff on a file created by ZipStream vs a file created by Info-zip and going through the differences, trying to resolve them.
ZipStream by default sets it to 0x0603. Info-zip sets it to 0x000A. Zip files with the former value don't seem to open in Archive Utility. Perhaps it doesn't support the features at that version?
Forcing the "version needed to extract" to 0x000A made the generated files open as well in Archive Utility as they do everywhere else.
Edit: Another cause of this issue is if the zip file was downloaded using Safari (user agent version >= 537) and you under-reported the file size when you sent out your Content-Length header.
The solution we employ is to detect Safari >= 537 server side and if that's what you're using, we determine the difference between the Content-Length size and the actual size (how you do this depends on your specific application) and after calling $zipStream->finish(), we echo chr(0) to reach the correct length. The resulting file is technically malformed and any comment you put in the zip won't be displayed, but all zip programs will be able to open it and extract the files.
IE requires the same hack if you're misreporting your Content-Length but instead of downloading a file that doesn't work, it just won't finish downloading and throws a "download interrupted".

use ob_clean(); and flush();
Example :
$file = __UPLOAD_PATH . $projectname . '/' . $fileName;
$zipname = "watherver.zip"
$zip = new ZipArchive();
$zip_full_path_name = __UPLOAD_PATH . $projectname . '/' . $zipname;
$zip->open($zip_full_path_name, ZIPARCHIVE::CREATE);
$zip->addFile($file); // Adding one file for testing
$zip->close();
if(file_exists($zip_full_path_name)){
header('Content-type: application/zip');
header('Content-Disposition: attachment; filename="'.$zipname.'"');
ob_clean();
flush();
readfile($zip_full_path_name);
unlink($zip_full_path_name);
}

I've had this exact issue but with a different cause.
In my case the php generated zip would open from the command line, but not via finder in OSX.
I had made the mistake of allowing some HTML content into the output buffer prior to creating the zip file and sending that back as the response.
<some html></....>
<?php
// Output a zip file...
The command line unzip program was evidently tolerant of this but the Mac unarchive function was not.

No idea. If the external ZipString class doesn't work, try another option. The PHP ZipArchive extension won't help you, since it doesn't support streaming but only ever writes to files.
But you could try the standard Info-zip utility. It can be invoked from within PHP like this:
#header("Content-Type: archive/zip");
passthru("zip -0 -q -r - *.*");
That would lead to an uncompressed zip file directly send back to the client.
If that doesn't help, then the MacOS zip frontend probably doesn't like uncompressed stuff. Remove the -0 flag then.

The InfoZip commandline tool I'm using, both on Windows and Linux, uses version 20 for the zip's "version needed to extract" field. This is needed on PHP as well, as the default compression is the Deflate algorithm. Thus the "version needed to extract" field should really be 0x0014. If you alter the "(6 << 8) +3" code in the referenced ZipStream class to just "20", you should get a valid Zip file across platforms.
The author is basically telling you that the zip file was created in OS/2 using the HPFS file system, and the Zip version needed predates InfoZip 1.0. Not many implementations know what to do about that one any longer ;)

For those using ZipStream in Symfony, here's your solution: https://stackoverflow.com/a/44706446/136151
use Symfony\Component\HttpFoundation\StreamedResponse;
use Aws\S3\S3Client;
use ZipStream;
//...
/**
* #Route("/zipstream", name="zipstream")
*/
public function zipStreamAction()
{
//test file on s3
$s3keys = array(
"ziptestfolder/file1.txt"
);
$s3Client = $this->get('app.amazon.s3'); //s3client service
$s3Client->registerStreamWrapper(); //required
$response = new StreamedResponse(function() use($s3keys, $s3Client)
{
// Define suitable options for ZipStream Archive.
$opt = array(
'comment' => 'test zip file.',
'content_type' => 'application/octet-stream'
);
//initialise zipstream with output zip filename and options.
$zip = new ZipStream\ZipStream('test.zip', $opt);
//loop keys useful for multiple files
foreach ($s3keys as $key) {
// Get the file name in S3 key so we can save it to the zip
//file using the same name.
$fileName = basename($key);
//concatenate s3path.
$bucket = 'bucketname';
$s3path = "s3://" . $bucket . "/" . $key;
//addFileFromStream
if ($streamRead = fopen($s3path, 'r')) {
$zip->addFileFromStream($fileName, $streamRead);
} else {
die('Could not open stream for reading');
}
}
$zip->finish();
});
return $response;
}
If your controller action response is not a StreamedResponse, you are likely going to get a corrupted zip containing html as I found out.

It's an old question but I leave what it worked for me just in case it helps someone else.
When setting the options you need set Zero header to true and enable zip 64 to false (this will limit the archive to archive to 4 Gb though):
$options->setZeroHeader(true);
$opt->setEnableZip64(false)
Everything else as described by Forer.
Solution found on https://github.com/maennchen/ZipStream-PHP/issues/71

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

check valid docx from linux command line - php

Related

Convert .HEIC to .JPG with ImageMagick in PHP

PHP - Converting Excel to PDF (Phpspreadsheet) - Operation not permitted

I can't make ImageMagick work from a browser

File is not readable with Excelwriter and phpExcelReader 2

Dynamically created zip files by ZipStream in PHP won't open in OSX

Categories

Resources