I want to generate thumbnail (first page) of following file formats:
PDF
DOC/DOCX [MS OFFICE]
PPT/PPTX [MS OFFICE]
For PDF I got many libraries and ImageMagick & Ghost Script did for me.
But for other formats i.e. ppt, pptx, doc and docx. I can't find any lead to solution.
Preferred language is PHP but option is open for every language that can run on linux. Thanks alot.
You can use a service like Post2Preview. It can generate thumbnails and OCR for hundreds of file types and doesn't require any third party libraries. Just a simple POST request. Disclaimer: I work for Post2Preview
I want to convert scanned pdf files to text-searchable pdf files.
I want to give an input as a scanned PDF then my expected output is searchable PDF.
There are few tools which give us the text as output from scanned pdf file but I want text searchable pdf file as output, not just the text.
I have searched about it and found 1 solution here but my Production server is amazon centos and installation of this tool is only working for ubuntu not for amazon centos.
I am ready to pay for it if required. Please help me to give the link of any open source web api or paid web api services or any tools which can convert to text searchable pdf file.
I am using PHP language in my web applicatin.
There are several commercial web API services that will convert scanned PDFs (or scanned images generally) to searchable PDF. Of these, I would recommend trying ABBYY's Cloud OCR SDK. They've been in the OCR space for decades and use their own OCR engine, which tends to give better OCR results than APIs based off other technologies (e.g. Tesseract) based on my observations and what I've heard from others.
I am working on a PHP web application which programatically generates some DOCX files.
I want these files to be converted to PDF, but their layout is so complex that not any PHP-PDF generator library (domPDF, TCPDF, etc.) works well. They result in a poorly formatted PDF in each case.
In this situation, I have decided to let Google Drive do the conversion. For this, I have to:
Upload the DOCX files to GDrive
And then export them in PDF...
I have seen all of the GDrive API documentation, but it is very poorly documented. I only want to execute one single PHP script which:
Uploads the file to GDrive
Downloads its exported PDF version
Lets the PDF be downloaded when the script is finished...
I am searching for the optimal way to achieve this behaviour... (with or without GDrive, since the LibreOffice/Openffice CLI command is not an option because I am on a web hosting and I can't install any software...).
Have you considered using a file conversion service to do this for you ?
For complete transparency I work for Zamzar (an online file conversion website), we have recently released a developer API - https://developers.zamzar.com/ that would allow you to convert your DOCX files to PDF with little or no loss of formatting.
This would then eradicate your need to convert your file(s) using the Google Drive API. Check out our fairly extensive docs here - https://developers.zamzar.com/docs.
When users upload certain files to my site (such as .doc, .xls, .pdf, etc) I'd like to be able to generate a preview thumbnail (of the first page of the document). I'm working with PHP in a LAMP stack but would be happy with any library or command-line tool that can do the job (Linux highly preferred).
It's not easy to convert certain document formats to image. php alone cannot do this.
The 'proper' way to do this is to first of all have the program installed on your server that can open the document in that format.
For example, for .doc documents you can use OpenOffice
it also can open most other document formats
You then need to setup your open office to work in 'headless' mode, sending the output to virtual display (XVFB is what you going to need on Linux)
You php script will then call OpenOffice, passing the path to uploaded doc. OpenOffice will actually open that doc. Then you need to create an image from the screen buffer. You can use ImageMagick for that
Then once you have the capture of your screen you can resize it to a thumbnail.
Look at this link for more details
http://www.mysql-apache-php.com/website_screenshot.htm
The best way is to have all your documents converted to PDF
after that you can make preview thumbnail
& this is how simply explained
How do I convert a PDF document to a preview image in PHP?
What is, according to you, the best way to convert uploaded files of any kind (.doc, .docx,...) into a pdf-file using nothing but php. Is it even possible to do so?
I looked at FPDF, but this creates the pdf files from text.
An other solution previously given was to use the PDFlib library on your server, but unfortunately, my server doesn't support this library...
What is the best way to convert to files my users upload on my site to pdf files?
A simpler approach would be to restrict uploads to .PDF format programmatically and require your users to only upload .pdf files. Provide a link on the upload page to a free and open source pdf printer (e.g. Cuteftp) that the user can install to create .pdf documents from any file that can be printed.
Trying to do it through PHP will be problematic because the uploads could be generated from many different programs that would be impossible to cater for in their entirety. e.g. How would it handle Scribus or ABC Flowcharter or any other 'non-standard' application someone used to create a document?
Much better to filter the upload upfront.
The best server-side PDF generator from those I tried was, so far, wkhtmltopdf, a WebKit-based, self-contained invisible browser that can render any HTML+CSS and generate a PDF from it. Reasonably fast and fairly reliable, has some useful PDF options, such as page size, orientation, etc.
The second part of the job in your case is to convert documents to HTML prior to feeding them to wkhtmltopdf. If possible, have your users upload the docs in HTML (Word and Co. can export (crappy) HTML). If this is not an option, you will have to find a tool just for that, which, in my opinion, is much easier than finding a tool that converts Word docs directly into PDF.
Good thing about wkhtmltopdf is also that you can feed the output of your PHP script to it using the ob_xxx() functions.
PHP Excel best simple way to create doc, docx, xls, xlsx, pdf files with PHP. Its lot easier with clear documentation.
Use Microsoft Office to render Microsoft Office documents, if you care about accuracy at all. This is easily done by invoking Office over COM.
Get access to your server, and install what you need. Doing so would be far easier than monkeying around with sub-par solutions.
Well... I can think of one way of doing it quite easily, but it doesn't involve using PHP.
Upload your documents to a folder on your server, that are browsable by your users.
EG: http://mysite.com/docs/
Then get your users to install a virtual printer driver such as Primo PDF
http://www.primopdf.com/index.aspx
then they can load the document into their browser, and print to PDF for offline browsing.
If this is not an option, and your dealing with office documents that conform to the openXML standard, you could attempt to parse the XML doc into a PHP page for display in the browser, then use JavaScript to trigger a print.
Unfortunately, it does still depend on your user having a PDF printer installed.
Alternatively, you could just load the docs natively, and print to your own PDF printer, then upload the PDF's to the web server for download.
I can't think of any easy way of doing this otherwise, without installing all sorts of different document parser tool-kits and doing a huge amount of behind the scenes work.