Generating pdf files from docx templates from mysql database - php

I've tried just about everything i could find so far. What i'm trying to do is save docx files in a mysql database, in which i want to replace a few strings and then convert it to a pdf file without any saving and put that in an html object element with application/pdf in the data tag.
I've tried phpword and gears\pdf but it was mostly outdated and even when i patched them up it still required saving the files in the website directory. I've looked at phpdocx, but as far as i can tell it's not exactly what i need and i'd rather not pay that much money without being sure it is.
So my questions are:
Is there any way at all to do this without having to buy phpdocx?
If not, is phpdoxc what i need?
If not, are there any resources i can use to figure out how these files are made up so i can make this conversion myself without any libraries?
I hope someone can help me, i'm getting really desperate

Related

Issue converting to .pdf a merged .docx file that opens fine in Word

So, I have the following scenario.
I am working on a system for academical papers. I have several inputs that are for stuff like author name, coauthors, title, type of paper, introduction, objectives and so on. I store all that information in a database. The user has a Preview button which when clicked, generates a Word asynchronously and sends the file location back to the user and that file is afterwards shown to the user in an iframe using Google Doc Viewer.
There's a specific use case where the user/author of the paper can attach a .docx file with a table, or a .jpeg file for a figure. That table/figure has to be included inside the final .docx file.
For the .docx generation process I am using PHPWord.
So up until this point everything works fine, but my issues start when I try to mix everything and put together the .docx file.
Approach Number One
My first approach on doing this was to do everything with PHPWord. I create the file, add the texts where required and in the case of the image just insert the image and after that the figure caption below the image.
Things get tricky though, when I try doing the same thing with the .docx table file. My only option was to get the table XML using this. It did the trick, but the problem I ran into was that when I opened the resulting Word file, the table was there, but had lost all of its styling and had transparent borders. Because of those transparent borders, afterwards when converting it to PDF the borders were ignored and the table info is just scrambled text.
Approach Number Two (current one)
After fighting with Approach Number One and just complicating stuff more, I decided to do something different. Since I already generated one docx file with the main paper information and I needed to add another docx file, I decided to use the DocX Merge Library.
So, what i basically did was I have three generated word files, one for the main paper information, one for the table and one for the table caption (that last one is mainly to not overcomplicated the order of information). Also, that data is not in the table .docx file.
Then I run this:
$dm->merge( [
'paper-info.docx',
'attached-table.docx',
'attached-table-caption.docx'
], 'complete-file.docx');
So, afterwards, I check and the Word file is generated just as I need it with the table maintaining its original styles and dimensions.
If I open it in LibreOffice though, I get this error message:
Then if I continue and open the file, the file opens correctly with all the data with the only exception that it no longer respects the fonts of the file as they appear in Word.
So, the problem comes in the next step. Since I need to present a preview of the file using Google Doc Viewer using this syntax:
<iframe src="https://docs.google.com/gview?embedded=true&hl=es_LA&url=https://usersite.net/complete-file.docx?pid=explorer&efh=false&a=v&chrome=false&embedded=true" width="100%" height="600" style="border: none;"></iframe>
The document gets loaded fine, but when I review it what I see is that it only shows the content of the first paper-info.docx file and ends right where the table and table caption should appear. I open the exact same file in Word and it shows the table and caption.
The other issue is when I try to convert the file to PDF.
If I use PHPWord's method of conversion in combination with DomPDF I get the exact same issue as with the Google Docs Viewer, I just have the content of the first file, using this code:
$phpWordPDF = \PhpOffice\PhpWord\IOFactory::load('complete-file.docx');
$xmlWriterPDF = \PhpOffice\PhpWord\IOFactory::createWriter($phpWordPDF, 'PDF');
$xmlWriterPDF->save('complete-file-pdf');
So my only other viable route was to use LibreOffice's command line using this command:
soffice --headless --convert-to pdf complete-file.docx
This converts the file correctly, but has the issue mentioned when trying to open the .docx file in LibreOffice, the font styles are disconfigured.
Also weird part is that if I try to run this in my PHP script:
shell_exec('soffice --headless --convert-to pdf complete-file.docx');
Nothing happens.
I am running Apache 2.4.25, PHP 7.4.11 on Windows 10 x64.
Conclusion
Until now my best result was by merging the files, but it also caused this issue. So maybe the issue is coming from the merging process I am using. What would be ideal is to be able to just insert the table with styles and everything using PHPWord, but I haven't been able to and haven't found any examples on how to do that.
Another option that I've seen is this library, but the merge features is only in the license that's $599 USD, and since I am pretty close to solving this, I am not sure if it would solve my issue. If it does, I'd invest in it since I need to get this done ASAP, but I wanted to check with you guys what your recommendations would be for this case. Maybe another merging library or doing everything via PHPWord.
Help is appreciated!
After a lot of attempts to fix it, I wasn't able to achieve what I wanted with PHPWord and the merging library I mentioned.
Since I needed to fix this I decided to invest in the paid library I mentioned in my question. It was an expensive purchase, but for those who are interested, it does exactly what was required and it does it perfectly.
The two main functions I required were document merging and importing of content to a .docx file.
So I had to purchase the Premium package. Once there, the library literally does everything for you.
Example for docx files merge code:
require_once 'classes/MultiMerge.php';
$merge = new MultiMerge();
$merge->mergeDocx('document.docx', array('second.docx', 'other.docx'), 'output.docx', array());
Example for how to import a table from another docx file
require_once 'classes/CreateDocx.php';
$docx = new CreateDocxFromTemplate('document.docx');
// import tables
$referenceNode = array(
'type' => 'table',
);
$docx->importContents('document_1.docx', $referenceNode);
$docx->createDocx('output');
As you can see it is pretty easy. This answer is by no means an ad for this library, but for those that have the same problem as me, this is a life saver.

How to make phpExcel to PDF doc fit in one page

I'm trying to save docs made by phpExcel into pdf format. Everything is fine, except for the fact that my file is very wide and doesn't fit in one page (so part of the data is lost behind the doc border)
I tried to use:
$phpExcel->getActiveSheet()->getPageSetup()->setFitToPage(true);
$phpExcel->getActiveSheet()->getPageSetup()->setFitToWidth(1);
$phpExcel->getActiveSheet()->getPageSetup()->setFitToHeight(0);
But nothing changed. (And as I understand these only works for print version, not for converting to PDF). Does anyone has any ideas how to improve that?
I solve this issue with the following command:
$phpExcel->getActiveSheet()->getPageSetup()->setPaperSize(PHPExcel_Worksheet_PageSetup::PAPERSIZE_A3);

Extracting text from PDFs in PHP

I'm creating a php based web application which allows the user to upload a PDF file. This file will then be read and checked for certain data (text).
The problem is I can't figure out how to even open a PDF file in PHP. There are some PDF libraries mainly for creating PDF's, but they don't seem to be very good at reading them.
An alternative solution would be to use an already available solution in Python or something else (as described in other threads on this site) but I'd really like to stay as much as possible in PHP as I intend to later export the data to mysql, etc.
Any input on how to read a PDF and extract data from it would be much appreciated.
I personally haven't tried this out, but it looks like this one works: http://www.pdfparser.org/documentation
It's just a matter of downloading and telling your code to include it, just like the documentation shows.
Or you could try the class.pdf2text.php found in http://www.phpclasses.org/browse/file/31030.html

Submit HTML form to PDF

We have a high-resolution PDF (for printing) which has some form fields on it. We would like to have an HTML form which submits to the PDF, which is then placed into the respective fields.
I found a solution on google: http://koivi.com/fill-pdf-form-fields/
However, with that solution you only get an FDF file... And the demo does not work for me, opening the FDF file simply downloads another FDF file.
Since this PDF will be available to the public we would like to keep it as simple as possible. If we must open our original PDF and import this FDF file, we need a different solution (which I'm not sure is what the FDF file is for, since it didn't work).
A related post talking about .net framework had the same idea, but there were only paid commercial solutions: From HTML form to PDF
The PHP solutions I have found so far are for creating a new PDF, which is not what I need. Our PDF is created with Adobe Illustrator (or a similar adobe product) and is high-res with embedded fonts, svg and image content.
The form elements are in place, we just need to get the data to there.
Update April 11, 2013:
Since posting this question I have been utilizing FPDF on multiple projects where I needed to accomplish this goal. Although it cannot seem to "merge" template PDFs with the provided data, it can create the PDF from scratch.
One example I have used, I had a high resolution PNG for printing (similar to initial question) which we had to write the customer's name and today's date clearly in the center. I simply made the background of the PDF using FPDF->Image() and write the text afterwards using FPDF->Text().
It was very simple after all, you will need to look up the paper sizes to determine the X,Y,W,H of the image and then base your text fields relative to those numbers.
There was even a Form Filling extension, but I couldn't get it to work.
It seems as though I should answer my own question, although Visions answer may be better (seems to be deleted?). I used Vasiliy Faronov's link which was a comment to my main question: https://stackoverflow.com/a/1890835/200445
Here I found how to install pdftk and run a command to merge (flatten) my FDF and PDF files. I still used the "hacky" way to generate an FDF using Koivi's FDF Generator but it works for the most part.
One caveat is that some characters, like single and double quotes are not inserted correctly. It may be an issue of escaping the fields, but I could not find an answer.
Regardless, my PDF form generator is working, but anyone with a similar issue should look for a better solution.
There are number of tools which are not paid like itextsharp. try the following https://web.archive.org/web/20211020001747/https://www.4guysfromrolla.com/articles/030211-1.aspx Hope this code will help you. I have tried it its worked for me. If you can pay then there are number of paid tools which convert the HtML to PDF like ABCPDF etc.This example is in Asp.net and i am sure if you can convert it in PHP it will work for you too.

Parsing PDF file using PHP to submit values to MySQL

I want to know if it is possible to view a PDF file in a webapp of some kind, then be able to trace over lines in the PDF file and submit the lengths to an MySQL database.
Basically import a PDF file, trace around a shape in that PDF file and save those values.
I work with House plans and this would make my job a lot easier.
Is it possible? and can someone point me in the right direction on where to read more.
I know its possible to pull text from a PDF file, I've seen heaps of info on how to do that using various libraries, but nothing on a Vector shape (house plan).
If it's not possible in PHP which language should I look into, the only reason I ask is because I have basic skills in PHP and have already written some simple apps that help me out, this would just be the final touch to make my job easier.
Thanks
Parsing PDFs is a world of pain... And PHP is probably as good language to jump into this as any other.
I would recommend you to start your research with GhostScript, it has a CLI interface which could be accessed by php

Categories