Generate ODT documents with dynamic images in PHP - php

I maintain a couple of web databases based on PHP and mySQL on a shared hosting package.
The databases have a mechanism for the user to upload OpenOffice documents with placeholders:
[person.name] [person.address] [person.postcode]
I then use this great PHP tool to run through the OpenOffice document and insert values from the database into it. The result is again, an OpenOffice document.
What it can't do is dynamic images.
Does anybody know a - preferably PHP-only - solution to insert images into OpenOffice documents?
I know PUNO. Can't use it in this context because it's shared hosting.
I know OpenOffice can be run as a daemon - ditto.
I know phpDocWriter. It was great for SXW files but is dead now.
I know OpenDocument is a collection of XML files in a ZIP file. I once tried to programmatically add a caption to every image in a ODT document. It drove me fricking crazy. I look with admiration upon developers who work with the format, but it's not for me.
I would really appreciate any hints on existing solutions.

I think odtPHP might be what you're looking for
is seems to be able to insert images on a placeholder in the document and reads simply from an array to see which image to place.
http://www.odtphp.com/index.php?i=tutorials&p=tutorial5
Now, if you do this as a post-process after your current code, or simply use it instead of TBS, you got everything you need IMHO
Alternatively, you can include a default image with a certain filename in your document, and simply replace that imagefile in the archive.

There is a new version of TbsOOo, it's OpenTBS and it has a feature for inserting/changing a picture in the file.
http://www.tinybutstrong.com/opentbs.php

Did you try to use the AddFileToDoc method to add an image to the document?
The documentation on this method is here:
http://www.tinybutstrong.com/tbsooo.php#AddFileToDoc

Related

Searching (extracting text) PDF files with Algolia

This is just a speculative idea for a client who has a lot of PDF files.
Algolia say in their FAQs that to search PDF files you first need to extract the text from the file. How would you go about this?
The way I envisage the a system working would be:
Client uploads PDF via CMS
CMS calls some service / program to
extract the text
Algolia indexes the extracted and it's somehow
linked to the original PDF
It would need to be an automated system as the client shouldn't have to tell it to index.
It would be built in PHP, probably Laravel running on Ubuntu.
What software / service could do the text extraction from the PDFs and is any magic needed to 'link' this with the PDF file?
I'm also happy to have suggestions on other search services which may handle this.
Fortunately, text extraction from pdf's is a subject that has been covered multiple times. On the command line, you could use pdftotext (available on Linux or Mac) or in your code a library as Apache Tika (for which you can find a PHP wrapper).
To avoid having too much noise in your records, I'd recommend you to then split the text and create one record per paragraph. You can then use Algolia's distinct feature to deduplicate the results.
You should already have the links to your files somewhere, just store them in your records and then, in your front-end you'll easily be able to create links to them using for instance autocomplete.js or instantsearch.js .
For anyone still looking for a solution, I put together a GitHub repository that does exactly that: https://github.com/PDFTron/pdftron-document-search.
The text extraction happens client-side as the user uploads the document using React + Firebase + Algolia.
You can check out a quick video walking you through the sample app: https://youtu.be/IQATnzHTp7Q.
Let me know if you have any questions.

Extracting text from PDFs in PHP

I'm creating a php based web application which allows the user to upload a PDF file. This file will then be read and checked for certain data (text).
The problem is I can't figure out how to even open a PDF file in PHP. There are some PDF libraries mainly for creating PDF's, but they don't seem to be very good at reading them.
An alternative solution would be to use an already available solution in Python or something else (as described in other threads on this site) but I'd really like to stay as much as possible in PHP as I intend to later export the data to mysql, etc.
Any input on how to read a PDF and extract data from it would be much appreciated.
I personally haven't tried this out, but it looks like this one works: http://www.pdfparser.org/documentation
It's just a matter of downloading and telling your code to include it, just like the documentation shows.
Or you could try the class.pdf2text.php found in http://www.phpclasses.org/browse/file/31030.html

Pdf book reader in php?

Actually I have to upload pdf files and need to read on my website as book reader like a presentation. Please show me the possible ways to achieve my goals.
Thank you
I've been using flexpaper, I use pdf2swf to convert the pdf to swf as I used the flash version but there is a javascript version too.
One possible solution would be to use scribd. You simply upload your document to their website and embed their reader on your website. This is the easiest way, and you get things like searchability. Their reader also works like Adobe's Acrobat Reader.
The downside is that you are uploading your documents onto a public website, so everyone will be able to view it. Perhaps they might have settings where you can lock your documents so that only certain people can see them.
The next solution is to roll your own. You can use turn.js. In this case, you will need to find a way to convert your PDF files to HTML files or perhaps image files. With images, your text won't be selectable, and they won't be discoverable by search engines. Again, converting PDF to HTML can also be difficult as you might lose formatting in the process.
But it is entirely up to your use case. Personally, I would go with scribd, as their platform works very well, and you won't have to worry about implementing your own system.

Is there a way with PHP to access a file on a server and save only the first half of the file?

I want to give users a preview of certain files on my site and will be using scribd API. Does anyone know how I can access the full file from my server and save the file under a different name , which I will then show to users..Can't think of a way to do this with PHP for .docx and image files...Help is much appreciated.
For "splitting" images, use an image processing library like gd to crop the image (lots of examples to be found on how to do that all over the place). For Word documents, use a library like PHPWord (or one of the other myriad such libraries) to open the document, remove/extract as much text as you need, then save that into a new Word file.
For other file types, find the appropriate method that allows you to manipulate that format, then do whatever you need to do with it.

How can i convert a php page into .doc file with php

Recently i worked in a project. On this project I need convert page into a Microsoft word document (.doc file) and offer the document for download, all using PHP. But I can't solve this problem.
Please help me. Thank You very much, Arif
This is not easy to solve.
First off, if you want to write real word documents, you will have to do on Windows. You can use COM to talk to Word and this is how you manage to get good results. I've tried all the unix/linux based solutions and the results were not so great.
Otherwise, I'd suggest you write RTF -- which is just as good. And in the end, you can call the .rtf-file, .doc and no one will notice it. RTF has a couple limitations (formatting), but on the flipside -- it's all ASCII and the RTF standard is pretty comprehensive and well documented.
There's a class which does it pretty nicely -- phpLiveDocx (this is a great introduction). And this class also claims to write PDF and DOC -- but I haven't tried those yet. I use another solution for PDF.
I would recommend using the RTF format instead of the .doc - it's much simpler to write to, and all text editors understand it. Similar recommendation for .csv when you want to output an Excel file.
Perhaps not the answer you seek, but still interesting to note, there is a open source word processor out there called abiword that has a CLI (Command Line Interface). You can use it to easily convert between document formats. I know that at least one website uses it to convert text files into various formats.
It is actively getting developed and could easily be used as a 3de party black box solution to converting documents server side.
Here is a blog from one of the developers on how to integrate it with PHP
Server-Side AbiWord
abiword home page

Categories