Convert HTML & CSS to DOC(X)? - php

Is there some utility that could be called via command line to produce a doc(x) file? The source file would be HTML and CSS.
I am trying to generate Word documents on the fly with PHP. I am only aware of phpdocx library, which is very low level and not much use for me (I already have one poor implementation of Word document generation).
What I need from a document:
TOC
Images
Footers/Headers (they could be manually made on each HTML page)
Table
Lists
Page break (able to decide what goes to which page, eg one HTML file per page, join multiple HTML files to produce the entire document.)
Paragraphs
Basic bold/etc styles

I didn't find PHPDOCX very useful either. An alternative could be PHPWord, i think it covers what you need. According the website it can do these things:
Insert and format document sections
Insert and format Text elements
Insert Text breaks
Insert Page breaks
Insert and format Images and binary OLE-Objects
Insert and format watermarks (new)
Insert Header / Footer
Insert and format Tables
Insert native Titles and Table-of-contents
Insert and format List elements
Insert and format hyperlinks
Very simple template system (new)
In your case that isn't enough, but there is a plugin available to convert (basic) HTML to Docx and it works very good in my opinion. http://htmltodocx.codeplex.com/
I am using this for a year or two now and am happy with it. Altough i have to add that the HTML can't be to complex.

The way I usually do these is to have a word document template file with the parts I want to replace using keywords (usually something like "{FIRSTNAME}").
This allows you to read the file via PHP then simply do str_replace on all the parts you want to replace, then write that to another file.
Dynamic tables using this method are a bit more tricky, as you need a sub template for a row, which you can then include inside the main template as many times as required.
I'm not sure if this is the best solution, it's always seemed very fiddly to me and every time I'm asked to do this I get frustrated with it, but I guess it works. So if anyone knows a better solution I'd love to hear it too!

Related

Grabbing selective code from external HTML file with PHP

I am beginning to create a pattern library from BareBones. It uses PHP to pull in external HTML files to create the different sections. I would like to be able to have a heading and description for each section. BareBones seems to use a separate text file for the usage descriptions, which I could do for the heading too. This would mean I would have 3 files for each section though. Is there a better way to do this? Some thoughts I had, but haven't been able to figure out are:
Have heading as commented out first line of HTML file and then pull the first line via PHP
Create a variable in the HTML doc somehow and use it to store the heading and call it from the main file. I think to do this I would need to make all of the snippets PHP files though.
Any thoughts?

Best method to create a PDF from MySQL: TCPDF/FPDF or FDF?

Our company allows its clients to view reports via our website. The pages are php based and the data is collected from MySQL. These reports were written a long time ago and include inline css. The pages themselves look fine, but the print version is lacking. I want to take the reports and create visually appealing "printable" pages that contain our branding.
I have found three solutions so far.
#Media Print Stylesheets
This is the easiest method, but does not give me complete layout control. I want landscape mode and need to control where the page breaks occur so this method has been eliminated from my list of possible solutions. The reports are built by looping through PHP data, so while I can always put a page break after a or for example, I can't stop the page from breaking before it gets to the next set of data.
TCPDF/FPDF
From what I have seen these classes will give me all of the control I need to customer a PDF. The challenge is that this appears to be a little more advanced than my programming skills require, and all of the inline CSS contained within the HTML tables may throw off formatting.
FDF
I am leaning towards this method if I understand it correctly. First I would create a PDF form and define all of the fields to be populated by the MySQL data. Then I would create a FDF file that would populate the form template with the data from the database. It seems easier to me to create a visually pleasing form via PDF and then populate that form using this method, rather than create the entire pdf from scratch using method 2.
Does it sound like I am on the right track? Are any of these methods "easier" than the other?
Any help is greatly appreciated.
TCPDF has the most control of each page which is what I am looking for. It is extremely sensitive when writing HTML, but that is the only downside I have found so far.
There's this excellent answer on SO already.
If you're looking for easy, my money is on mPDF. I found it to be the easiest, and essentially an out-of-the-box solution (often zero server configuration to do).
I think you should try out wkhtmltopdf.
https://code.google.com/p/wkhtmltopdf/
As for the TCPDF/FPDF pagination issue, you can see this other question for the solution provided and use the flow in it to sort yours out.
TCPDF / FPDF - Page break issue
Just found this other solution as well and think you'll need it
Convert HTML + CSS to PDF with PHP?
For me personally, FPDF works great to fetch data from my database, insert into the FPDF class and dynamically create PDF's for customers.
I see some people want to write HTML/CSS to create PDF's but you will always have
differences as the browser parses the HTML/CSS differently than when using it in PDF's.
When using FPDF's built-in method's, I have been able to get exactly what I wanted
and haven't seen any issues (yet).

Bulk template based pdf generation in PHP using pdftk

I am doing a bulk generation of pdf files based on templates and I ran into big performance issues pretty fast.
My current scenario is as follows:
get data to be filled from db
create fdf based on single data row and pdf form
write .fdf file to disk
merge the pdf with fdf using pdftk (fill_form with flatten command)
continue iterating over rows until all .pdf's are generated
all the generated files are merged together in the end and the single pdf is given to the client
I use passthru to give the raw output to the client (saves time writing file), but this is just a little performance improvements. The total operation time is about 50 seconds for 200 records and I would like to get down to at least 10 seconds in some way.
The ideal scenario would be operating all these pdfs in memory and not writing every single one of them to separate file but then the output would be impossible to do as I can't pass that kind of data to external tool like pdftk.
One other idea was to generate one big .fdf file with all those rows, but it looks like that is not allowed.
Am I missing something very trivial here?
I'm thanksfull for any advice.
PS. I know I could use some good library like pdflib but I am considering only open licensed libraries now.
EDIT:
I am up to figuring out the syntax to build an .fdf file with multiple pages using the same pdf as a template, spent few hours and couldn't find any good documentation.
After beeing faced with the same problem for a long time (wanted to generate my pdfs based on LaTeX) i finally decided to switch to another crude but effective technique:
i generate my pdfs in two steps: first i generate html with a template engine like twig or smarty. second i use mpdf to generate pdfs out of it. I tryed many other html2pdf frameworks and ended up using mpdf, it's very mature and is developed since a long time (frequent updates, rich functionality). the benefit using this technique: you can use css to design your documents (mpdf completely features css) - which comes along with the css benefit (http://www.csszengarden.com) and generate dynamic tables very easy.
Mpdf parses the html tables and looks for the theader, tfooter element and puts it on each page if your tables are bigger than one page size. Also you have the possibility to define page header and page footer elements with dynamic entities like page nr and so on.
i know, using this detour seems to be a workaround, but to be honest, no latex, pdf whatever engine is as strong and simple as html!
Try a different less complex library like fpdf (http://www.fpdf.org/)
I find it quite good and lite.
Always find libraries that are small and only do what you need them to do.
The bigger the library the more resources it consumes.
This won't help your multiple-page problem, but I notice that pdftk accepts the - character to mean 'read from standard input'.
You may be able to send the .fdf to the pdftk process via it's stdin, in order to avoid having to write them to disk.

edit uploaded docx file in PHP

I need to open uploaded .docx file and possibility to change values. I know, that docx file consists of xml files. So, the main question is maybe somebody know a good WYSIWYG web-based xml editor?
I know one called XOPUS, but i have no idea how to configure it. Maybe somebody knows other alternatives for that task or advices, how to put xml file to textfield, where i could change values.
There are a couple of PHP toolkits that you can use for this task, first off there's an early dev on on codeplex:
http://openxmlapi.codeplex.com/
However you may b better off with one of the more mature ones:
http://holloway.co.nz/docvert/index.html
http://www.phpdocx.com/
Both of these can convert from docx to most of the popular formats, HTML included.
Once you've converted to something like HTML, then you can use an onscreen editor such as tiny MCE:
http://www.tinymce.com/
To provide in page rich editing capabilities, before finally using the above toolkits to convert back to DOCX or any other applicable format.
Update February 2014
Since I first wrote this reply things have moved on. The open xml kits I mentioned above are still valid, however in page editing is now more of a possibility than ever using the new HTML5 content editable and edit mode attributes.
It's now insanely easy to add your own buttons (Using something like bootstrap) above a div that has a content editable attribute attached to it.
Connecting the buttons to "document.execCommand" can then send, bold, italic, underline, link & image creation, list insertion and all manner of other HTML constructions methods directly to this div without needing anything like tinyMce or another in page editor anymore.
There is full details available on the Mozilla developer network, and I am planning o do a blog post on using this stuff very soon.
Have you tried PHPWord?
One may use the DocxUtilities class of PHPDocX to do some partial editing of an existing Word document.
This class allows you for:
searching and replacing a particular string of text
searching a string of text and remove the containing paragraph or section
highlight predefined strings (search and highlight)
full merging of docx files (text, images, charts, footnotes, ...)
If that is not enough for your purposes you should then prepare a PHPDocX template to fully customized an existing Word document.

What's the best way to generate a Word document from a Word template in PHP?

I have a template for a style of Avery stationery in a Word document. What I'd like to do is fill in the template with images (in this case, QR codes) for easy printing and labeling of objects.
I'm wondering, what would be the easiest way to do this? I saved the template as a Word XML file, but looking at the file, I feel hopeless. I also tried converting the template to HTML, but unsuprisingly it screwed up the formatting. I'm not sure where to go next, any ideas?
There are a couple of good options out there to do this. I would recommend PHP LiveDocX (free) or PHP DocX (free basic version).
Check out the PHP Class, MsDoc Generator. It will let you create and add elements (text, tables, images) dynamically.

Categories