Error converting docx to pdf using Unoconv - php

I am trying to convert .docx files to .pdf files using Unoconv. Libreoffice is installed on my server and the script works for another website on the server.
Using the line use Unoconv\Unoconv; results in an HTTP ERROR 500.
Does someone know why I get a HTTP ERROR 500?
Here is my script:
<?php
require './Unoconv.php';
use Unoconv\Unoconv;
$originFilePath = './uf/invoice/17/word/202100021.docx';
$outputDirPath = './uf/invoice/17/pdf/202100021.pdf';
Unoconv::convertToPdf($originFilePath, $outputDirPath);
header("Content-type:application/pdf");
header("Content-Disposition:attachment;filename=202100021.pdf");
?>
Here is my Unoconv.php script:
<?php
namespace Unoconv;
class Unoconv {
public static function convert($originFilePath, $outputDirPath, $toFormat)
{
$command = 'unoconv --format %s --output %s %s';
$command = sprintf($command, $toFormat, $outputDirPath, $originFilePath);
system($command, $output);
return $output;
}
public static function convertToPdf($originFilePath, $outputDirPath)
{
return self::convert($originFilePath, $outputDirPath, 'pdf');
}
public static function convertToTxt($originFilePath, $outputDirPath)
{
return self::convert($originFilePath, $outputDirPath, 'txt');
}
}
?>

#Alex is correct about wrapping in try/catch first, but should the syntax be:
...
} catch(\Exception $e){
...

Start from wrapping your code with try...catch to get the error message first:
<?php
try {
require 'Unoconv.php';
use Unoconv\Unoconv;
$map1 = $_SESSION['companyid'];
$filename = $result1['filename'];
$originFilePath = './uf/doc/'.$map1.'/word/'.$filename.'.docx';
$outputDirPath = './uf/doc/'.$map1.'/pdf/'.$filename.'.pdf';
Unoconv::convertToPdf($originFilePath, $outputDirPath);
header("Content-type:application/pdf");
header("Content-Disposition:attachment;filename=".$filename.".pdf");
readfile($outputDirPath);
} catch (\Exception $e) {
die($e->getMessage());
}

I've observed that LibreOffice can be a little quirky when doing conversions, especially when running in headless mode from a webserver account.
The simplest thing to try is to modify unoconv to use the same Python binary that is shipped with LibreOffice:
#!/usr/bin/env python
should be (after checking where libreoffice is installed)
#!/opt/libreoffice7.1/program/python
Otherwise, I have worked around the problem by invoking libreoffice directly (without Unoconv):
$dir = dirname($docfile);
// Libreoffice saves here
$pdf = $dir . DIRECTORY_SEPARATOR . basename($docfile, '.docx').'.pdf';
$ret = shell_exec("export HOME={$dir} && /usr/bin/libreoffice --headless --convert-to pdf --outdir '{$dir}' '{$docfile}' 2>&1");
if (file_exists($pdf)) {
rename($pdf, $realPDFName);
} else {
return false;
}
return true;
Note the export HOME={$dir} directive, to ensure that temporary lock files will be saved in the current directory where, presumably, the web server has full permissions. If this requirement isn't met,
LibreOffice will silently fail (or at least, it will fail - that much I observed - and I haven't been able to locate an error message anywhere - I found out what was going on through the use of strace).
So your code would become:
$originFilePath = './uf/invoice/17/word/202100021.docx';
$outputDirPath = './uf/invoice/17/pdf/202100021.pdf';
$dir = dirname($originFilePath);
$pdf = $dir . DIRECTORY_SEPARATOR . basename($originFilePath, '.docx').'.pdf';
$ret = shell_exec("export HOME={$dir} && /usr/bin/libreoffice --headless --convert-to pdf --outdir '{$dir}' '{$originFilePath}' 2>&1");
// $ret will contain any errors
if (!file_exists($pdf)) {
die("Conversion error: " . htmlentities($ret));
}
rename($pdf, $outputDirPath);
header("Content-type:application/pdf");
header("Content-Disposition:attachment;filename=202100021.pdf");
readfile($outputDirPath);
I assume that libreoffice is present in the usual alternatives link of "/usr/bin/libreoffice", otherwise you need to retrieve its path with the terminal command of "which libreoffice". Or, from a php script,
<?php
header('Content-Type: text/plain');
print "If this works:\n";
system('which libreoffice 2>&1');
print "\n-- otherwise a different attempt, returning too much information --\n";
system('locate libreoffice');

Related

HTMLDOC does not execute from PHP

I am trying to create a PDF from an HTML file from a PHP page (Apache, LAMP) Wierd thing is, when I execute the script from the command line, it works and creates the PDF as expected. However when I browse to the page in my browser, it does nothing. I'm thinking it's a permissions issue somewhere, but I'm stumped! Here's the code. (NOTE the ls command DOES produce output in the browser so it's not just an issue of PHP not being allowed to execute shell commands)
<?php
$htmlName = ("output2/alex" . time() . ".html");
$pdfName = ("output2/alex" . time() . ".pdf");
$html = "<html><body><h1>Hello, World!</h1></body></html>";
$fileHandle = fopen($htmlName, "w+");
fwrite($fileHandle, $html);
fclose($fileHandle);
$command= "htmldoc -t pdf --browserwidth 1000 --embedfonts --header ... --footer t./ --headfootsize 5.0 --fontsize 9 --bodyfont Arial --size letter --top 4 --bottom 25 --left 28 --right 30 --jpeg --webpage $options '$htmlName' -f '$pdfName'";
echo "OUTPUT: \r\n";
$X=passthru($command);
echo "TESTING LS:";
$y=passthru("ls -al");
if(file_exists($htmlName) && file_exists($pdfName)) {
echo "Success.";
} else {
echo "Sorry, it did not create a PDF";
}
?>
When I execute the script from the command line it produces the expected output, and creates a PDF file like it's supposed to:
> php alextest.php
Zend OPcache requires Zend Engine API version 220131226.
The Zend Engine API version 220100525 which is installed, is outdated.
OUTPUT:
PAGES: 1
BYTES: 75403
TESTING LS:total 2036
drwxr-xr-x 9 ----- and so on...
When I browse the page in Chrome, it outputs only the LS command.
help!?
You might try using a full path as your php file my be executing in a different directory than it is saved in depending on how it is loaded. (IE via include, require, or .htaccess or directly by apache.)
IE
$htmlName = ("/home/alex/html/output2/alex" . time() . ".html");
$pdfName = ("/home/alex/html/output2/output2/alex" . time() . ".pdf");
I agree with the comments that using a package like http://dompdf.github.io/ or https://tcpdf.org/ would be best though.
I've seen the same issue, and for the life of me I simply couldn't find the answer to why it wouldn't do it from a web based call, but never a problem from the command line. So, instead of fighting my way to a solution on that front, I created a Perl proxy to allow me to parse PDFs from the command line making it useful for virtually any given purpose. For, with Perl, I've never had a problem parsing PDFs, and I've been doing it for decades now.
So, here's what you do. PHP Code:
exec("/usr/local/bin/perl5 -s /path/to/perl/executable/parse-pdf-from-html-document.pl -source=markup-file.html",$output);
foreach ($output as $aline) {
#- WAS SUCCESSFUL
if (strstr($aline,'Successful!') == TRUE) {
#- no feedback, win silently
}
#- NOT SUCCESSFUL
else {
echo $aline . "\n";
}
}
With $output holding the results of running exec.
Now let's look at the Perl code for parse-pdf-from-html-document.pl:
#!/usr/local/bin/perl5 -s
#- $document coming from commandline variable, via caller: PHP script
$myDocumentLocale = "/path/to/markup/document/".$document;
if (-e $myDocumentLocale) {
$documentHTML = $myDocumentLocale;
$documentPDF = $documentHTML;
$documentPDF =~ s/\.html/\.pdf/gi;
$myDocumentHTML = `cat $myDocumentLocale`;
$badPDF = 0;
$myPDFDocumentLocale = $myDocumentLocale;
$myPDFDocumentLocale =~ s/\.html/\.pdf/gi;
$badPDF = &parsePDF($myDocumentLocale, $myPDFDocumentLocale);
if ($badPDF == 0) {
print "Successful!";
}
else {
print "Error: No PDF Created.";
}
exit;
}
else {
print "Error: No document found.";
exit;
}
sub parsePDF {
my ($Ihtml, $Ipdf) = #_;
$wasBad = 0;
#- create PDF
$ENV{HTMLDOC_NOCGI} = 1;
$commandline="/usr/local/bin/htmldoc -t pdf13 --pagemode document --header ... --footer ... --left 1cm --size Letter --webpage -f $Ipdf $Ihtml";
select(STDOUT);
$| = 1;
#print "Content-Type: application/pdf\n\n";
system($commandline);
if (-e $Ipdf) {
$wasBad = 0;
}
else {
$wasBad = 1;
}
return $wasBad;
}
exit;

Can PHP decompress a taz file? (.tar.Z)

I have tried to use Zlib to decompress the file, but it just said "Data error" and gave me an empty file.
This is the code I tried:
// Open a new temp file to write new file to
$tempFile = fopen("tempFile", "w");
// Make sure tempFile is empty
ftruncate($tempFile, 0);
// Write new decompressed file
fwrite($tempFile, zlib_decode(file_get_contents($path))); // $path = absolute path to data.tar.Z
// close temp file
fclose($tempFile);
I have also tried to decompress it in parts, going from .tar.Z to .tar to just a file. I tried using lzw functions to take off the .Z, but I was unable to make it work. Is there a way to do this?
EDIT:
Here is some more code I have tried. Just to make sure the file_get_contents was working. I still get a "data error".
$tempFile = fopen("tempFile.tar", "w");
// Make sure tempFile is empty
ftruncate($tempFile, 0);
// Write new decompressed file
$contents = file_get_contents($path);
if ($contents) {
fwrite($tempFile, gzuncompress($contents));
}
// close temp file
fclose($tempFile);
EDIT2: I think the reason why LZW was not working is because the contents of the .tar.Z file looks like this:
��3dЀ��0p���a�
H�H��ŋ3j��#�6l�
The LZW functions I have tried both use ASCII characters in their dictionaries. What kind of characters are these?
So you want to decompress a taz file natively with PHP? Give my new extension a try!
lzw_decompress_file('3240_05_1948-1998.tar.Z', '3240_05_1948-1998.tar');
$archive = new PharData('/tmp/3240_05_1948-1998.tar');
mkdir('unpacked');
$archive->extractTo('unpacked');
Also note, the reason the zlib functions aren't working is because you need LZW compression, not gzip compression.
according to this url https://kb.iu.edu/d/acsy you can try
<?php
$file = '/tmp/archive.z';
shell_exec("uncompress $file");
if you don't have Unix like OS check https://kb.iu.edu/d/abck for appropriate program.
The file is compressed with LZW compression, and I tried a few but there seems to be no reliable method for decompressing these in PHP. Cosmin's answer contains the correct first step but after using your system's uncompress utility to decompress the file, you still have to extract the TAR file. This can be done with PHP's built-in tools for handling its custom PHAR files.
// the file we're getting
$url = "ftp://ftp.ncdc.noaa.gov/pub/data/hourly_precip-3240/05/3240_05_2011-2011.tar.Z";
// where to save it
$output_dir = ".";
// get a temporary file name
$tempfile = sys_get_temp_dir() . basename($url);
// get the file
$compressed_data = file_get_contents($url);
if (empty($compressed_data)) {
echo "error getting $url";
exit;
}
// save it to a local file
$result = file_put_contents($tempfile, $compressed_data);
if (!$result) {
echo "error saving data to $tempfile";
exit;
}
// run the system uncompress utility
exec("/usr/bin/env uncompress $tempfile", $foo, $return);
if ($return == 0) {
// uncompress strips the .Z off the filename
$tempfile = preg_replace("/.Z$/", "", $tempfile);
// remove .tar from the filename for use as a directory
$tempdir = preg_replace("/.tar$/", "", basename($tempfile));
try {
// extract the tar file
$tarchive = new PharData($tempfile);
$tarchive->extractTo("$output_dir/$tempdir");
// loop through the files
$dir = new DirectoryIterator($tempdir);
foreach ($dir as $file) {
if (!$file->isDot()) {
echo $file->getFileName() . "\n";
}
}
} catch (Exception $e) {
echo "Caught exception untarring: " . $e->getMessage();
exit;
}
} else {
echo "uncompress returned error code $return";
exit;
}
Please try this.
<?php
try {
$phar = new PharData('myphar.tar');
$phar->extractTo('/full/path'); // extract all files
$phar->extractTo('/another/path', 'file.txt'); // extract only file.txt
$phar->extractTo('/this/path',
array('file1.txt', 'file2.txt')); // extract 2 files only
$phar->extractTo('/third/path', null, true); // extract all files, and overwrite
} catch (Exception $e) {
// handle errors
}
?>
Source : http://php.net/manual/en/phardata.extractto.php
I haven't tested it but i hope it will work for you.

PDF to HTML with PHP

I need to convert some pdf files into HTML. I downloaded pdftohtml for PHP but I don't know how to use it. I am trying to run it with this code:
<?php
include 'pdf-to-html-master/src/Gufy/PdfToHtml.php';
$pdf = new \Gufy\PdfToHtml;
$pdf->open('1400.pdf');
$pdf->generate();
?>
This results in a blank web page.
What do I need to modify? What is the correct code to run this script?
First option is using poppler utils
<?php
// if you are using composer, just use this
include 'vendor/autoload.php';
// if not, use this
include 'src/Gufy/PdfToHtml.php';
// initiate
$pdf = new \Gufy\PdfToHtml;
// opening file
$pdf->open('file.pdf');
// set different output directory for generated html files
$pdf->setOutputDirectory('/your/absolute/directory/path');
// do this if you want to convert in the same directory as file.pdf
$pdf->generate();
// you think your generated files is annoying? simple do this to remove the whole files
$pdf->clearOutputDirectory();
?>
Download library from here
Second option could be using pdf.js
PDFJS.getDocument('helloworld.pdf')
I'm the maintainer of the package. The package has updated. Have you already used the latest version? And, if you're using Windows, please read again the doc. Also, please do not download directly from github, use composer instead.
include 'vendor/autoload.php';
use Gufy\PdfToHtml\Pdf;
use PHPHtmlParser\Dom;
use DateTime;
public function parsepdf(Request $request)
{
$pdf = new Pdf($request->file('csv_file'));
$html = $pdf->html();
$dom = new Dom;
$total_pages = $pdf->getPages();
if ($total_pages == 1) {
$html->goToPage(1);
$dom->load($html);
$paragraphs = $dom->find('p');
$paragraphs = collect($paragraphs);
foreach($paragraphs as $p){
$datestring = preg_replace('/\xc2\xa0/', ' ', trim($p->text));
echo $datestring;
}
}
Above code for Convert pdf to html in laravel
composer require gufy/pdftohtml-php:~2
Poppler-Utils (if you are using Ubuntu Distro, just install it from
apt ) sudo apt-get install poppler-utils
I use wkhtmltopdf and it works okay. You can download it from here: http://wkhtmltopdf.org/downloads.html
I installed it in Linux and I use it like this:
$url = "https://www.google.com";
$command = "/usr/bin/wkhtmltopdf --load-error-handling ignore --disable-smart-shrinking -T 5mm -B 5mm -L 2mm -R 2mm --page-size Letter --encoding utf-8 --quiet";
$filename = '[file path].pdf';
if (file_exists($filename)) {
unlink($filename);
}
$output = shell_exec($command . " $url " . $filename);
echo $output;
Hope this helps.

Can't Use unlink function with path defined by variable

I can't read the path by use variable
$absPath = realpath('./');
//$absPath = /home/abc/domains/server2.abc.com/public_html/mockup
//$oldPath = /project_image/easy/Desert.jpg
$npath = $absPath."".$oldPath; //$oldPath is get by image element scr="xxx"
$npath will return this:
/home/abc/domains/server2.abc.com/public_html/mockup/project_image/easy/Desert.jpg
When i unlink it
unlink($npath);
Then php return
No such file or directory in /home/../mockup/update_file.php -> point unlink($npath);
And i try to Hard Code it
$npath = /home/abc/domains/server2.abc.com/public_html/mockup/project_image/easy/Desert.jpg
unlink($npath);
Then it will success.
I want to know how to use variable combine a new path to unlink it.
Sorry for my poor English
Are you sure your script is producing the exact same path? If you are building the path correctly and unlink is failing, you can try the shell instead. Sometimes this works like with samba mounts and what-not.
$absPath = realpath('./');
$npath = $absPath."".$oldPath;
// Bail if the path doesn't exist
if(!file_exists($npath)) die('Failed to build correct path');
// Try unlink via PHP
if(!unlink($npath) || file_exists($npath)) {
// PHP couldn't cut it, try unlink via shell
exec('rm ' . $npath, $output, $return);
if($return > 0)
die('Failed to delete file: ' . $npath);
}

PHP image wrong mimetype

I'm uploading images from my Android app to my server. The app uses the android camera intent and upload via PHP script is ok.
I want to verify if the uploaded files are real images, I'm not checking the extension but the mimetype (I suppose this is the best way to do it, tell me if I'm wrong).
I'm using a Slackware Linux Apache server and I'm trying this code:
....
$finfo = finfo_open(FILEINFO_MIME, '/etc/httpd/magic');
....
fwrite($fp, finfo_file($finfo, "file.jpg"));
....
But I'm getting "application/octet-stream; charset=binary" instead of "image/jpeg; charset=binary" which is given from "file -i file.jpg" (shell command).
What's the problem?
Solved using $finfo = finfo_open(FILEINFO_MIME); instead of the other line. I think the default magic file is not the same that I was specifing.
As refered on www.php.net/manual/en/ref.fileinfo.php:
<?php
function is_jpg($fullpathtoimage){
if(file_exists($fullpathtoimage)){
exec("/usr/bin/identify -format %m $fullpathtoimage",$out);
//using system() echos STDOUT automatically
if(!empty($out)){
//identify returns an empty result to php
//if the file is not an image
if($out == 'JPEG'){
return true;
}
}
}
return false;
}
?>
Alternately, if you've got execution rights and want to use a "hacky" solution, you can simply do what you've already done (using file -i path with shell_exec):
<?php
function shell_get_mime_type($path) {
if (is_readable($path)) {
$command = 'file -i "' . realpath($path) . '"';
$shellOutput = trim(shell_exec($command));
//eg. "nav_item.png: image/png; charset=binary"
if (!empty($shellOutput)) {
$colonPosition = strpos($shellOutput, ':');
if ($colonPosition !== false) {
return rtrim(substr($shellOutput, $colonPosition + 1));
}
return $shellOutput;
}
}
return false;
}
?>
Try to use function mime_content_type().

Categories