Best way to convert files into pdf files using php - php

What is, according to you, the best way to convert uploaded files of any kind (.doc, .docx,...) into a pdf-file using nothing but php. Is it even possible to do so?
I looked at FPDF, but this creates the pdf files from text.
An other solution previously given was to use the PDFlib library on your server, but unfortunately, my server doesn't support this library...
What is the best way to convert to files my users upload on my site to pdf files?

A simpler approach would be to restrict uploads to .PDF format programmatically and require your users to only upload .pdf files. Provide a link on the upload page to a free and open source pdf printer (e.g. Cuteftp) that the user can install to create .pdf documents from any file that can be printed.
Trying to do it through PHP will be problematic because the uploads could be generated from many different programs that would be impossible to cater for in their entirety. e.g. How would it handle Scribus or ABC Flowcharter or any other 'non-standard' application someone used to create a document?
Much better to filter the upload upfront.

The best server-side PDF generator from those I tried was, so far, wkhtmltopdf, a WebKit-based, self-contained invisible browser that can render any HTML+CSS and generate a PDF from it. Reasonably fast and fairly reliable, has some useful PDF options, such as page size, orientation, etc.
The second part of the job in your case is to convert documents to HTML prior to feeding them to wkhtmltopdf. If possible, have your users upload the docs in HTML (Word and Co. can export (crappy) HTML). If this is not an option, you will have to find a tool just for that, which, in my opinion, is much easier than finding a tool that converts Word docs directly into PDF.
Good thing about wkhtmltopdf is also that you can feed the output of your PHP script to it using the ob_xxx() functions.

PHP Excel best simple way to create doc, docx, xls, xlsx, pdf files with PHP. Its lot easier with clear documentation.

Use Microsoft Office to render Microsoft Office documents, if you care about accuracy at all. This is easily done by invoking Office over COM.
Get access to your server, and install what you need. Doing so would be far easier than monkeying around with sub-par solutions.

Well... I can think of one way of doing it quite easily, but it doesn't involve using PHP.
Upload your documents to a folder on your server, that are browsable by your users.
EG: http://mysite.com/docs/
Then get your users to install a virtual printer driver such as Primo PDF
http://www.primopdf.com/index.aspx
then they can load the document into their browser, and print to PDF for offline browsing.
If this is not an option, and your dealing with office documents that conform to the openXML standard, you could attempt to parse the XML doc into a PHP page for display in the browser, then use JavaScript to trigger a print.
Unfortunately, it does still depend on your user having a PDF printer installed.
Alternatively, you could just load the docs natively, and print to your own PDF printer, then upload the PDF's to the web server for download.
I can't think of any easy way of doing this otherwise, without installing all sorts of different document parser tool-kits and doing a huge amount of behind the scenes work.

Related

Create a PDF file on the fly and stream it while it is not yet finished?

We want to merge a lot of PDF files into one big file and send it to the client. However, the resources on our production server are very restricted, so merging all files in memory first and then sending the finished PDF file results in our script being killed because it exhausts its available memory.
The only solution (besides getting a better server, obviously) would be starting to stream the PDF file before it is fully created to bypass the memory limit.
However I wonder if that is even possible. Can PDF files be streamed before they're fully created? Or doesn't the PDF file format allow streaming unfinished files because some headers or whatever have to be set after the full contents are certain?
If it is possible, which PDF library supports creating a file as a stream? Most libraries that I know of (like TCPDF) seem to create the full file in memory and then in the end output this finished result somewhere (i. e. via the $tcpdf->Output() method).
The PDF file format is entirely able to be streamed. There's certainly nothing that'll prevent it anyway.
As an example, we recently had a customer that required reading a single page over a HTTP connection to a remote PDF, without downloading or reading the whole PDF. We're able to do this by making many small HTTP requests for specific content within the PDF. We use the trailer at the end of the PDF and the cross reference table to find the required content without having to parse the whole PDF.
If I understand your problem, it looks like your current library you're using loads each PDF in memory before creating or streaming out the merged document.
If we look at this problem a different way, the better solution would be for the PDF library to only take references to the PDFs to be merged, then when the merged PDF is being created or streamed, pull in the content and resources from the PDFs to be merged, as-and-when required.
I'm not sure how many PHP libraries there are that can do this as I'm not too up-to-date with PHP, but I know there are probably a few C/C++ libraries that may be able to do this. I understand PHP can use extensions to call these libraries. Only downside is that they'll likely have commercial licenses.
Disclaimer: I work for the Mako SDK R&D group, hence why I know for sure there are some libraries which will do this. :)

Displaying word documents, excel sheet, power point in browser

Is there is any way to displaying world document,excel sheet and power point in browser with out downloading.
I assume that you are going to use php for this, so you can try checking some libraries such as PHPWord by Microsoft for example.
If you wish to only display the document content, it is possible to do using some scripting language such as php. Basically office 2007+ formats are zipped XML documents with changed extension. Make a simple word 2007+ document, save it and change extension from .docx to .zip, than you can extract it and see what it's made of. You can find a lot of details here. Now displaying content may be a little tricky. As mentioned, there are libraries out there to handle this, but how will they handle the documents, I am not really sure. Most of them are abandoned, PHPword is in beta since 2011.
There are some indications that Apache is working on cloud version of Open office, but there is no release date yet. Once done, you will have a full featured office suite web app.
If you feel really creative you could use cron job (or scheduled task if you like Windows) to open a document, take a screenshot and basically make .jpg or .png version of the document (works fine with short documents, longer ones may be problematic), displaying it in a browser without much complication. It is also possible to schedule export to .pdf - all browsers do have Adobe PDF plugins.
To sum up, using php for parsing simple documents should be fine, but getting complex docs to display properly, may be much more difficult task and possibly not worth your time. I would go for cron export to pdf, to preserve most if not all of the document's structure.

Converting images to pdfs server side

I'm currently working on a project involving converting form results to a downloadable PDF, which is simple enough. I was recently asked, however, to add attachment functionality. I'm using dompdf to convert the form results to PDF, but is there a way to convert the attachments separately (can be jpg, png, doc/x, or pdf) to a PDF file and then append the attachment file to the dompdf output?
I can handle the implementation; are there any free libraries that will support anything like this? I found FPDF, which supports images, but it does not support Word files.
First of all, you will need to find a library for every kind of conversion you need (you mentioned jpg, png, and doc/x, but you didn't say if that was all of them.)
For common office formats, you can launch a headless (meaning it can run on a server without a graphical display) instance of OpenOffice or LibreOffice. Then you can interact with it from various programming languages, or you can use a ready-made commandline tool such as pyodconverter, to ask it to convert between various file formats. This is the best way to convert doc and docx files to pdf, by the way, short of spending money on Microsoft software.
As for "appending the attachment file", by which I take it you mean concatenating a bunch of PDF files together, you can use the free tool pdftk.

.docx, .xlsx, .pdf to .pdf using PHP

I have relatively sensitive data in .docx, .xlsx and PDF files that all need to be converted to a single PDF file locally. Sending these files off to phpdocx or Google Docs or anything like this is not an option.
The only other option I am seeing is OpenOffice / LibreOffice but I am not satisfied with how they are converting the documents.
Is there any other alternative anyone is aware of? Thanks!
Definitely a difficult task. The very recent release of LibreOffice 3.6 has fixes to it's docx processing if that might help, but you haven't specified what the actual problems you encountered when you tried OpenOffice.
If you have time to experiment (and bring in any tools/languages you need to get the job done) you could try LibreOffice to produce PDFS, then use one of the many PDF libs to stitch the PDFs into the single file you require.
You could also look at ODFConverter which has traditionally been much better with DOCX than either OpenOffice or LibreOffice. This would allow you docx -> odt -> pdf. I think it can do the xlsx also. Then do the PDF stitching again.
I suggest testing the stages manually at first and if promising, try something like JODConverter (requires Java) to allow you to automate the process via scripts.
Good luck.

converting a given file to PDF

hey all,
is there any way to convert a given file (this could be of any type) in to a pdf file in .net or php?
eg: suppose there is a upload link to upload your file of any type(word,excel,autocad,images..) and once the upload button is clicked the uploaded file should be converted into a pdf.
i checked out fpdf.but according to my knowledge all file types cannot be converted.a module to plugin to the CMS would also be fine.
FPDF does support images. I know because I have used it recently.
If you are wanting a pure PHP solution, you can use the PHP COM functions along with Word or Excel on the server to open up those files then copy the data out.
If I were you though, I would use Google. Load the doc into Google Docs then export it as a new format with the API.
There are a couple of 3rd party solutions available such as this one, which is optimised for use on the server and accessible from any web services capable environment, including .net. Supports loads of file types including MS-Office based documents.
Disclaimer, I worked on this product so consider me biased. Having said that, it works very well.

Categories