i've been searching for quite a long now to convert a word document (.doc & .docx ) to pdf.....my application is about taking a word document from clients than converting them to a pdf with added changes ( like header, footer ) to the original document.
Any suggestions are welcomed.
Thank you
Look at http://www.phplivedocx.org/ for converting DOC to PDF.
Example code with nusoap, http://www.phplivedocx.org/articles/using-livedocx-with-nusoap/
You can call the Docmosis Cloud Services from PHP using a http post or curl command. It supports doc and docx imports and pdf outputs. It's more intended to perform document manipulations than conversions but perhaps that suits you even better. Please note I work for the company that created Docmosis.
Hope that helps.
This is a year old, but for people still looking I found a link to a pretty decent api you can look at and see if it works for your needs.
https://cloudconvert.org/page/api#overview
Essentially you use their api and a key they provide when you sign up for an account. Send all the data for conversions to their servers then get the return. You can then store it on your own computer, the users computer, or a server of your choosing. There is some minimal set up and sign up processes and their Github account walks you through pretty easily.
Check it out and hopefully it is a good choice.
We need to install below libraries on server for convert DOC file format to PDF file format:
(Libre Office OR Open Office) and Unoconv librarie
Related
This is just a speculative idea for a client who has a lot of PDF files.
Algolia say in their FAQs that to search PDF files you first need to extract the text from the file. How would you go about this?
The way I envisage the a system working would be:
Client uploads PDF via CMS
CMS calls some service / program to
extract the text
Algolia indexes the extracted and it's somehow
linked to the original PDF
It would need to be an automated system as the client shouldn't have to tell it to index.
It would be built in PHP, probably Laravel running on Ubuntu.
What software / service could do the text extraction from the PDFs and is any magic needed to 'link' this with the PDF file?
I'm also happy to have suggestions on other search services which may handle this.
Fortunately, text extraction from pdf's is a subject that has been covered multiple times. On the command line, you could use pdftotext (available on Linux or Mac) or in your code a library as Apache Tika (for which you can find a PHP wrapper).
To avoid having too much noise in your records, I'd recommend you to then split the text and create one record per paragraph. You can then use Algolia's distinct feature to deduplicate the results.
You should already have the links to your files somewhere, just store them in your records and then, in your front-end you'll easily be able to create links to them using for instance autocomplete.js or instantsearch.js .
For anyone still looking for a solution, I put together a GitHub repository that does exactly that: https://github.com/PDFTron/pdftron-document-search.
The text extraction happens client-side as the user uploads the document using React + Firebase + Algolia.
You can check out a quick video walking you through the sample app: https://youtu.be/IQATnzHTp7Q.
Let me know if you have any questions.
I am currently trying to get the downloadURL from a response sent via my server of which, whenever $file->getdownloadUrl() is used it returns ['downloadURL'] =>
My question is, is it possible to download Google Documents in the application/vnd.google-apps.document MIME Type?
My assumption is, these would contain a link to the online version of the document, but it would be good to be able to edit the document in the correct format so that any formatting done would be retained when re-uploaded to drive,
Regards,
Nope, you cannot download Google Documents in application/vnd.google-apps.document MIME type. You only can export it to other formats.
Some workarounds:
Apps script Document Services provide a little bit better control over the document, but you won't be able to get full control over all formatting for now.
Export file as known formats such as Microsoft words and edit it. When you upload it back to Drive, you can request to convert it back to Google Docs format. Although you might possibly lose or corrupt with some formatting.
I want to make a web app that can get the values from a commonly used file type (such as xsl or ppt) to allow me to convert it into a custom format (like Google Drive). With an xsl (excel document) file, for example, I want to be able to get the value for each cell. I would be fine getting html for a file (like getting the html code that would display a word document) because values can be extracted out of that. I would like to be able to do it on the client side, but I am okay with using it on the server side with PHP.
Another approach would be to import the file as XML. PHP has great support for XML and could make short work of this. If you can get the files uploaded as Open Doc Format you can parse just about any of the types you listed (XLS, PPT, DOC, etc).
A pretty easy way to get data out of an excel sheet online is to use a Google Apps Script. The process would be a lot to explain here, but with a bit of google searching, you can find all your answers.
As for a PPT, I can't think of an easy way.
As for documents (i.e. pdf, doc, docx), you can use Google Apps Script as well.
Although, if you're making your own tool for this, you may want to just research how the data is stored in the file and work from there.
is there any way how to covert PDF to HTML? I need a text from the file and when I tried PDFtoText library, I got the text, but unsorted and without any rules for parsing.
I noticed, that some PDFtoHTML online services works great with the file. So, any tips please? Here is the PDF file and I need only one specific row in the right column.
Try integrating the PDFtoHTML from the poppler project; that should support table recognition.
pdftohtml works fine : fast, stable but the html result is ugly at best. I have used it for quite some time for a web site that has many job resumes.
It is a good solution for extracting textual content however.
I would give the scribd API a try
http://www.scribd.com/developers/api
or the google apps document API. GOogle does a great job a displaying and converting pdf files
Recently i worked in a project. On this project I need convert page into a Microsoft word document (.doc file) and offer the document for download, all using PHP. But I can't solve this problem.
Please help me. Thank You very much, Arif
This is not easy to solve.
First off, if you want to write real word documents, you will have to do on Windows. You can use COM to talk to Word and this is how you manage to get good results. I've tried all the unix/linux based solutions and the results were not so great.
Otherwise, I'd suggest you write RTF -- which is just as good. And in the end, you can call the .rtf-file, .doc and no one will notice it. RTF has a couple limitations (formatting), but on the flipside -- it's all ASCII and the RTF standard is pretty comprehensive and well documented.
There's a class which does it pretty nicely -- phpLiveDocx (this is a great introduction). And this class also claims to write PDF and DOC -- but I haven't tried those yet. I use another solution for PDF.
I would recommend using the RTF format instead of the .doc - it's much simpler to write to, and all text editors understand it. Similar recommendation for .csv when you want to output an Excel file.
Perhaps not the answer you seek, but still interesting to note, there is a open source word processor out there called abiword that has a CLI (Command Line Interface). You can use it to easily convert between document formats. I know that at least one website uses it to convert text files into various formats.
It is actively getting developed and could easily be used as a 3de party black box solution to converting documents server side.
Here is a blog from one of the developers on how to integrate it with PHP
Server-Side AbiWord
abiword home page