So, I have the following scenario.
I am working on a system for academical papers. I have several inputs that are for stuff like author name, coauthors, title, type of paper, introduction, objectives and so on. I store all that information in a database. The user has a Preview button which when clicked, generates a Word asynchronously and sends the file location back to the user and that file is afterwards shown to the user in an iframe using Google Doc Viewer.
There's a specific use case where the user/author of the paper can attach a .docx file with a table, or a .jpeg file for a figure. That table/figure has to be included inside the final .docx file.
For the .docx generation process I am using PHPWord.
So up until this point everything works fine, but my issues start when I try to mix everything and put together the .docx file.
Approach Number One
My first approach on doing this was to do everything with PHPWord. I create the file, add the texts where required and in the case of the image just insert the image and after that the figure caption below the image.
Things get tricky though, when I try doing the same thing with the .docx table file. My only option was to get the table XML using this. It did the trick, but the problem I ran into was that when I opened the resulting Word file, the table was there, but had lost all of its styling and had transparent borders. Because of those transparent borders, afterwards when converting it to PDF the borders were ignored and the table info is just scrambled text.
Approach Number Two (current one)
After fighting with Approach Number One and just complicating stuff more, I decided to do something different. Since I already generated one docx file with the main paper information and I needed to add another docx file, I decided to use the DocX Merge Library.
So, what i basically did was I have three generated word files, one for the main paper information, one for the table and one for the table caption (that last one is mainly to not overcomplicated the order of information). Also, that data is not in the table .docx file.
Then I run this:
$dm->merge( [
'paper-info.docx',
'attached-table.docx',
'attached-table-caption.docx'
], 'complete-file.docx');
So, afterwards, I check and the Word file is generated just as I need it with the table maintaining its original styles and dimensions.
If I open it in LibreOffice though, I get this error message:
Then if I continue and open the file, the file opens correctly with all the data with the only exception that it no longer respects the fonts of the file as they appear in Word.
So, the problem comes in the next step. Since I need to present a preview of the file using Google Doc Viewer using this syntax:
<iframe src="https://docs.google.com/gview?embedded=true&hl=es_LA&url=https://usersite.net/complete-file.docx?pid=explorer&efh=false&a=v&chrome=false&embedded=true" width="100%" height="600" style="border: none;"></iframe>
The document gets loaded fine, but when I review it what I see is that it only shows the content of the first paper-info.docx file and ends right where the table and table caption should appear. I open the exact same file in Word and it shows the table and caption.
The other issue is when I try to convert the file to PDF.
If I use PHPWord's method of conversion in combination with DomPDF I get the exact same issue as with the Google Docs Viewer, I just have the content of the first file, using this code:
$phpWordPDF = \PhpOffice\PhpWord\IOFactory::load('complete-file.docx');
$xmlWriterPDF = \PhpOffice\PhpWord\IOFactory::createWriter($phpWordPDF, 'PDF');
$xmlWriterPDF->save('complete-file-pdf');
So my only other viable route was to use LibreOffice's command line using this command:
soffice --headless --convert-to pdf complete-file.docx
This converts the file correctly, but has the issue mentioned when trying to open the .docx file in LibreOffice, the font styles are disconfigured.
Also weird part is that if I try to run this in my PHP script:
shell_exec('soffice --headless --convert-to pdf complete-file.docx');
Nothing happens.
I am running Apache 2.4.25, PHP 7.4.11 on Windows 10 x64.
Conclusion
Until now my best result was by merging the files, but it also caused this issue. So maybe the issue is coming from the merging process I am using. What would be ideal is to be able to just insert the table with styles and everything using PHPWord, but I haven't been able to and haven't found any examples on how to do that.
Another option that I've seen is this library, but the merge features is only in the license that's $599 USD, and since I am pretty close to solving this, I am not sure if it would solve my issue. If it does, I'd invest in it since I need to get this done ASAP, but I wanted to check with you guys what your recommendations would be for this case. Maybe another merging library or doing everything via PHPWord.
Help is appreciated!
After a lot of attempts to fix it, I wasn't able to achieve what I wanted with PHPWord and the merging library I mentioned.
Since I needed to fix this I decided to invest in the paid library I mentioned in my question. It was an expensive purchase, but for those who are interested, it does exactly what was required and it does it perfectly.
The two main functions I required were document merging and importing of content to a .docx file.
So I had to purchase the Premium package. Once there, the library literally does everything for you.
Example for docx files merge code:
require_once 'classes/MultiMerge.php';
$merge = new MultiMerge();
$merge->mergeDocx('document.docx', array('second.docx', 'other.docx'), 'output.docx', array());
Example for how to import a table from another docx file
require_once 'classes/CreateDocx.php';
$docx = new CreateDocxFromTemplate('document.docx');
// import tables
$referenceNode = array(
'type' => 'table',
);
$docx->importContents('document_1.docx', $referenceNode);
$docx->createDocx('output');
As you can see it is pretty easy. This answer is by no means an ad for this library, but for those that have the same problem as me, this is a life saver.
Auto-generate multiple PHP files from Template and excel files
Am looking for suggestions to efficiently auto-generate multiple PHP (and HTML) files from a template doc pulling fields from an excel file
What I want to do is:
1. Populate the fields in the template file from an excel file, each excel row will generate a new file
2. Save it as a PHP file
3. Name each generated file based on specified field in the correspondig row
I am intentionally trying to multiple and separate static HTML and PHP pages. Have been using mail merge with word and excel, but, takes to long to resave word file as php and rename, etc. Not sure how to programmaticaly do this, and my skillset is limited.
Open to different approaches to handle this, appreciate any help and thoughts.
Thanks!
I have to create a document using user inserted data and including data from a .rtf document into a web page layout i created (HTML+CSS and PHP for scripting).
My problem is, i can't find any way to obtain the full content of the .rtf document.
Being a technical document symbols, tables, graphs and images are very often included: with the methods I've found i could obtain the text with symbols in a decent formatting but i had no luck with images.
So what i need is a way to obtain the full content of a .rtf file, possibly maintaining the document formatting, so i can display and organize it in a webpage; preferrably in pure PHP but use of js/executables via php is fine.
I've tried:
-rtf to html converters but the best i could get is clear text and symbols but no images;
using COM extension to open the .rtf in ms word and saving it as .html (i noticed that if i open up the .rtf then save it as webpage in word it creates a perfect html page) but it only changed the extension and didn't create a html page;
extracting text and image sperately: works but again being the document a technical document image placement is very important.
It's my first question here, after many research; please bear with me in case of errors.
Using Tinybutstrong and openTBS i created a script in PHP that opens multiple docx templates and replaces a lot of variables with values from a database. In a nutshell clients can download their unique files, add information and pictures and upload them again. This works excellent. But of coarse i wouldn't post here if there wasn't some sort of problem.
Because of the barcodes (I am using barcode fonts and embed them in Word because the documents will be scanned far later in the process), the documents get huge. Instead of 100 KB average, they'll easily get 7MB. This is a problem, because per year about 20.000 documents will be scanned. That's an extra +/- 130 GB per year.
It's a long story but we need docx, so we can't simply replace it with some sort of PHP / MySQL template that would be far more efficient.
Word has the option to just embed the font symbols that are being used to cut on the size. But that isn't an option, because the main template needs to have all chars available. It's also not an option to send the font to the users, since there are +/- 20.000 new ones each year.
Is there another solution to cut the file size or use compression. Perhaps in Word, PHP, FTP, Apache?
I'm afraid the solution of using the option "Embed fonts in the file" with "Embed only characters used in the document" cannot be exploited. Ms Word saves the font using a special format with the extension ODTTF (for example, you have it in "word\fonts\font1.odttf"). But this format is binary, it seems badly documented and thus it stays as a proprietary format. Only Ms Word will be able to build such a sub-file.
Since you haven't any lighter font for the barcode, the only solution I can see is to use image instead of font for you barcode:
OpenTBS has a feature to easily replace a picture inside a DOCX file (parameter "op=changepic").
Barcode2Image tools are easy to find in PHP. For example : Barcode Generator.
Then you only have to code your process like this :
Load the DOCX template,
Create the temporary image of the barcode.
Change the image inside the template.
Merge the template, and save or send the result.
Delete the temporary image.
It's important to delete the temporary image only after the final merge of the template, because OpenTBS actually inserts the image only when method $tbs->Show() is called.
It's also important to use a different temporary file for each merging because many merges can occur in the same time.
If temporary files have a prefix or are saved into a dedicated directory, then it is advisable to clean up old temporary images regulary.
I am trying to read a .doc file and find tokens like {name}, {phone}, {address} etc. now display tokens with text box and allow user to replace by inserting original data. so that .doc file will replace with actual data.how to do this using php? the color, fonts, and style of .doc should not be changed.
thanks....
This will be very tricky if you are using the old style Word documents. The new Word documents are saved in a some sort of Zip archive and therefore are much easier to edit.
You can extract this files and with some knowledge of the contents and Word WSDL you can edit the contents of the file.
Much easier is to make use of the PHPDocX Library. We are using it in a project and works like a charm. Only disadvantage is that it only works with .docx files.