I want to display documents on my website. The server is hosted on a Debian machine. I was thinking I can allow the upload of support documents then use a Linux app or PHP app to convert the doc into PDF and display that in an HTML page. Are there any APIs or binaries that allow me to do this?
If it is an office document, one option would be to use openoffice in headless mode. See here for a python script that shows how: http://www.oooninja.com/2008/02/batch-command-line-file-conversion-with.html
If it is any other kind of document (e.g. your own XML document), then you would need to do a bit more work. I have had some success using XSL to define a translation to docbook format, then using docbook tools to generate the PDF (and various other formats). You could also use XSL to go straight to PDF if you need more precise control over how things look.
You can create a PDF print-to-file printer and send any number of documents to the printer via lpr.
function lpr($STR,$PRN,$TITLE) {
$prn=(isset($PRN) && strlen($PRN))?"$PRN":C_DEFAULTPRN ;
$title=(isset($TITLE))?"$TITLE":"stdin" . rand() ;
$CMDLINE="lpr -P $prn -T $title";
$pipe=popen("$CMDLINE" , 'w');
if (!$pipe) {print "pipe failed."; return ""; }
fwrite($pipe,$STR);
pclose($pipe);
} // lpr()
//open document...
//read into $source
lpr($source, "PDF", $title); //print to device
exit();
Also HTMLDOC can convert your HTML into a PDF.
A relatively new project, called phpLiveDocx can convert DOC to PDF (in addition to a number of other formats). It is a SOAP based service and can be used completely free of charge. For sample code to convert a DOC to PDF using phpLiveDocx, take a look at this recent blog post:
http://www.phplivedocx.org/2009/02/06/convert-doc-to-pdf-in-php/
Of course, as it is SOAP based, it can be used on all operating systems that support PHP :-)
An alternative method is to generate an HTML file that contains what you need in the pdf. Then use htmldoc to convert it to a PDF.
http://www.easysw.com/htmldoc/
It actually is much easier than directly manipulating the objects in a PDF doc.
Pear has a PHP PDF class. See:
http://pear.php.net/package/File_PDF
http://pear.php.net/package/File_PDF/docs/latest/apidoc/File_PDF/File_PDF.html
Related
I am trying to convert a DOCX file to PDF with PHPWord. When I execute the script it looks like that some style elements are not converted. In the DOCX file I have one image, two tables with border 1px and hidden borders and I am using Tabs.
When I execute the script I get a PDF file without the image, all the Tabs are replaced with Space and all the tables have a border 3px.
Does someone know why I am missing these styles?
Here is my script:
while ($data2 = mysql_fetch_array($rsSql)){
$countLines=$countLines+1;
$templateProcessor->setValue('quantity#'.$countLines, $data2['quantity']);
$templateProcessor->setValue('name#'.$countLines, $data2['name']);
$templateProcessor->setValue('price#'.$countLines, "€ " .$data2['price'] ."");
}
\PhpOffice\PhpWord\Settings::setPdfRenderer('./dompdf');
\PhpOffice\PhpWord\Settings::setPdfRendererPath('./dompdf');
\PhpOffice\PhpWord\Settings::setPdfRendererName('DOMPDF');
$temp_file = tempnam(sys_get_temp_dir(), 'Word');
\$templateProcessor->saveAS($temp_file);
$phpWord = \PhpOffice\PhpWord\IOFactory::load($temp_file);
$xmlWriter = \PhpOffice\PhpWord\IOFactory::createWriter($phpWord , 'PDF');
$xmlWriter->save('result.pdf');
header("Content-type:application/pdf");
header("Content-Disposition:attachment;filename='result.pdf'");
readfile("result.pdf");
After a look on the source code, it seems that PHPWord previously converts the document into an HTML representation before letting it be saved it into PDF by dompdf, another converter.
That's what the opened issue #1139 confirms, moreover it deals with styles missing:
The PDF writers being used are taking in the HTML output, which also lacks the styling. The classes are being defined in the <style> tag, but they are just not being used.
Also the last message adds:
This still seems to be an issue. html and pdf outputs do not replicate the some styles in docx (header / footers).
Concerning your border problem, another SO question shows a similar issue in a conversion HTML -> PDF. A solution was to edit the CSS style, which you obviously cannot perform in your sample code, unless you proceed to pre-convert into HTML.
In conclusion, you may not solve your problem in the short term. If you won't be a part of the dev team, you could submit bug reports to them (and not to dompdf, since it's an HTML-to-PDF converter and they are outside the scope). Github lets you to add DOCX files to the issue report.
Alternatives
You could check out a SO question 204860 about server sides PDF editing library. Below two alternatives, one is free software, the other is closed source and priced.
LibreOffice
Another way is to use LibreOffice in headless mode (command line execution without interface):
libreoffice --headless --convert-to pdf <filename_to_convert>
A PHP wrapper for LibreOffice, Office Converter is also available here if you don't want to bother using libreoffice through exec().
Check if LibreOffice conversion will suit your needs (it may not cover all cases, but be satisfying your scope).
Aspose
The best converter I ever used at work is Aspose, an API covering Documents with Aspose.Words package, Worksheets with Aspose.Cells, Presentations with Aspose.Slides and so on. But it's closed-source and pretty expensive (and you'll pay for updates if you want them after your license expiration).
There is a way to use it in PHP through Java (Aspose.Words and Aspose.Cells) or .NET (Aspose.Words same seems to go with Aspose.Cells).
I have a website now and I want to create a button on it to convert this page to PDF.
Is there any code to make this happen? I cannot find it on the internet.
So I want to have a button and when I press on it it converts the page to a .PDF file.
I do not want to use a third party website to generate the PDF's. I want to use it for internal purposes to generate files with PHP. So I need the code what can make a PDF for each page.
I use wkhtmltopdf - works very well - http://code.google.com/p/wkhtmltopdf/ there is a PHP wrapper
Updated based on comments below on usage :
How to use the integration class:
require_once('wkhtmltopdf/wkhtmltopdf.php'); // Ensure this path is correct !
$html = file_get_contents("http://www.google.com");
$pdf = new WKPDF();
$pdf->set_html($html);
$pdf->render();
$pdf->output(WKPDF::$PDF_EMBEDDED,'sample.pdf');
Use FPDF. It's a well-respected PDF-generating library for PHP that is written in pure PHP (so installing it should be dead simple for you).
Try this:
http://www.macronimous.com/resources/Converting_HTML2PDF_using_PHP.asp
It will convert HTML to a PDF using FPDF and HTML2PDF class.
Also found this:
http://www.phpclasses.org/package/3168-PHP-Generate-PDF-documents-from-HTML-pages.html
I would like to merge multiple doc or rtf files into a single file which should be the same format of multiple files.
What I mean is that if a user selects multiple rtf template files from a list box and clicks on a button on web page, the output should be a single rtf file which combines multiple rtf template files, I should use php for this.
I haven't decided the format of template files, but it should be either rtf or doc, and also I assume that template file has some images as well.
I have spent many hours to research the library for this, but still can't find it out.
Please help me out here!! :(
Thanks in advance.
If you are searching for a solution for handling RTF documents only, you can find a PHP package to merge multiple RTF documents here :
www.rtftools.com
Here is a short example on how to merge multiple documents together :
include ( 'path/to/RtfMerger.phpclass' ) ;
$merger = new RtfMerger ( 'sample1.rtf', 'sample2.rtf' ) ; // You can specify docs to be merged to the class constructor...
$merger -> Add ( 'sample3.rtf' ) ; // or by using the Add() method
$merger [] = 'sample4.rtf' ; // or by using the array access methods
$merger -> SaveTo ( 'output.rtf' ) ; // Will save files 'sample1' to 'sample4' into 'output.rtf'
This package allows you to handle documents that are bigger than the available memory.
I've been working on a similar project and havne't managed to find any PHP (or any other open source language) libraries for manipulating MSWord files. The way I approach it is kind of complicated, but works. Here's how I would do it (assuming you have a Linux server):
Setup:
Install JODConverter and OpenOffice
Start open office as a server (see http://www.artofsolving.com/node/10)
Approach (ie. what to do in your PHP code):
Convert your MSWord or RTF files into ODT format by calling JODConverter via backticks or exec()
Unzip each file into a temporary directory of its own
Read the contents.xml file from each unzipped document using a DOM Parser
Extract the <office:text> contents from each, and concatenate
Put this concatenated xml back into the right spot in one of the content.xml files
Re-zip the contents of that temporary directory and give it an .odt extension
Use JODConverter to convert this file back to MSWord again
As I said, it's not pretty, but it does the job.
If you're looking to go down the RTF route, this question may also help: Concatenate RTF files in PHP (REGEX)
I want to add an word import function to our CMS, the only problem I cannot seems to find a good library for reading docx files (Word 2007).
Do anyone has some recommendations, the library should be able to extract content of the document and basic styling like italic, bold, superscript?
Thanks for your help
docx files are actually just containers for the document's XML. You should be able to unzip the docx file and then go to the word folder inside, then to the document.xml. This has the actual text. But things like the fonts and styles are in other xml files in the docx container, so you'll probably want to mess around a bit and figure out what is what and how to match it up (start by using namespaces, I bet).
But yea, unzip the file, then use simplexml to convert it into something you can actually mess around with.
PHPDocX PRO includes a TransformDoc class that can read .docx (zip) files and generate XHTML (or PDF) from it:
...
require_once 'phpdocx_pro/classes/TransformDoc.inc';
$doc = new TransformDoc();
$doc->setStrFile($file->filepath);
$doc->generateXHTML();
$html = $doc->getStrXHTML();
There is a library to do this but it works with Zend framework may be it will help you
It is called phpLiveDocx : http://www.phplivedocx.org/downloads/
The library is licensed under New Bcd
I have just find a library that has both reading and writing support check it on the codeplex forge http://openxmlapi.codeplex.com and it is licensed under GPLv2 .
Or, since you requested a library, you may want to look into something like Docvert. I was just looking around based on your question, and it's my favorite so far for PHP. You input the word file location, it transforms it into something simple with the attributes and all that good stuff.
Convert a docx document to a odt using OpenOffice. Use then eZ Components to do the parsing and import. They actually use the import in their CMZ eZ Publish.
Here is a simple working solution I found
http://webcheatsheet.com/php/reading_the_clean_text_from_docx_odt.php
How can I convert 2 tiff images to PDF, I already knows how to get the image out of the DB, and I print it using echo and setting up the MIME type.
But, right know I need to use a duplex printer option, so I need a way to generate a PDF from inside my PHP page, that PDF must containt both TIFF images (one per page) How can I do that? What do I need for php to work with that library.
Thank you very much.
EDIT:
Is a self hosted app, I own the server (actually I'm using WAMP 2).
I extract the images from the MySQL DB (stored using LONGBLOBS).
There is a very simple PHP script that interfaces with ImageMagick:
How to convert multipage TIFF to PDF in PHP
I haven't used it myself but it looks all right.
For this you will need
ImageMagick installed
Ghostscript installed
the linked article describes how to install those in a Ubuntu Linux environment.
Another road to take would be inserting the images directly into a auto-generated PDF file without ImageMagick. The best-known PDF generation library, FPDF, can do this, but for JPEG, PNG and GIF only.
Maybe one of these works for you.
What you really need is a library that brings you a PDF composition engine. And of course you need that engine to support image insertions (specifically TIFF).
The best option is iText.
public void createPdf(String filename) throws DocumentException, IOException
{
// step 1
Document document = new Document();
// step 2
PdfWriter.getInstance(document, new FileOutputStream(filename));
// step 3
document.open();
// step 4
document.add(new Paragraph("PDF Title"));
// step 5
document.add(new Image("Tiff image path..."));
// step 6
document.close();
}
Hope it helps!
Using imagick library, below solution worked for me -
$document = new Imagick($path."/".$fileName.tiff);
$data = $document->getImageBlob();
$document->setImageFormat("pdf");
$document->writeImages($path."/".$fileName.pdf, true);