docx to html with phpword issue - php

I'm encountering an issue when converting docx document into HTML with PHPWord library (https://github.com/PHPOffice/PHPWord).
Here is the code snippet I use:
$phpWord = \PhpOffice\PhpWord\IOFactory::load('test.docx');
$htmlWriter = new \PhpOffice\PhpWord\Writer\HTML($phpWord);
$htmlWriter->save('test.html');
The issue is that each block of text is encapsulated in <p> tags regardless if I defined titles in the docx document. I would expect <h1> <h2>... tags to be generated. Bullet list are lost too.
Does it work as designed or did I miss something?
Thank you for your feedback.
Regards

There's a little bit of a problem when it comes to using IOFactory::load of PHPWord such as what you encountered now, depending what saved the file or what version of Microsoft Word is used to create that file. If the encoding and tags of the docx file cannot be found by PHPWord , then it will produce unexpected results
The code is fine, the problem is already with the dependency.

Related

OpenTBS Excel number format

I'm trying to format numbers in an Excel file using PHP and opentbs.
Here's template code I'm working from:
[gross_pay_names;block=begin;sub1=departments][gross_pay_names.name]
[gross_pay_names_sub1;block=begin]
[gross_pay_names_sub1.val; ope=tbs:num]
[gross_pay_names_sub1;block=end]
[gross_pay_names;block=end]
The problem is in the third line:
[gross_pay_names_sub1.val; ope=tbs:num]
It always renders with an apostrophe in the beginning ('0.00). So I can't use it in other formulas in the file.
Ok, I found a solutions myself. In case anybody needs it in the future here's the template code I ended up using:
[gross_pay_names.name;block=tbs:row;sub1=departments] [gross_pay_names_sub1.val;block=tbs:cell;ope=tbs:num]

Convert HTML code to doc using PHP and PHPWord

I am using PHPWord to load a docx template and replace tags like {test}. This is working perfectly fine.
But I want to replace a value with html code. Directly replacing it into the template is not possible. There is now way to do this using PHPWord, as far as I know.
I looked at htmltodocx. But it seams it will not work either, is it posible to transform a peace of code like <p>Test<b>test</b><br>test</p> to a working doc markup? I only need the basic code, no styleing. but Linebreaks have to work.
Here is the link to the github. It is working fine Html-Docx-js.
And it is the demo also available here.
Other option is this Link.
$toOpenXML = HTMLtoOpenXML::getInstance()->fromHTML("<p>te<b>s</b>t</p>");
$templateProcessor->setValue('test', $toOpenXML);
The other answers propose H2OXML which only supports
Bold, italic and underlined text
Bulled lists
As described in their docs and their last update was in 2012.
I did some research and found a pretty nice solution:
$var = 'Some text';
$xml = "<w:p><w:r><w:rPr><w:strike/></w:rPr><w:t>". $var."</w:t></w:r></w:p>";
$templateProcessor->setValue('param_1', $xml);
The above example, shows how would be a striked text. Instead of "w:strike" you can use "w:i" for italic or "w:b" bold, and so on. Not sure if it works on all tags or not.
Thanks for your answer, Varun.
The simple PHP library H2OXML works for me https://h2openxml.codeplex.com/
$toOpenXML = HTMLtoOpenXML::getInstance()->fromHTML("<p>te<b>s</b>t</p>");
$templateProcessor->setValue('test', $toOpenXML);
I can now convert html code to insert it using PHPWord.
$content = '<p>Test<b>test</b><br>test</p>';
use it before IOFactory::createWriter();
\PhpOffice\PhpWord\Shared\Html::addHtml($section, $content);

PHP pdf form parse regex

I have a two PDF forms that I'd like to input values for using PHP. There doesn't seem to be any open source solutions. The only solution seems to be SetaSign which is over $400. So instead I'm trying to dump the data as a string, parse using a regex and then save. This is what I have so far:
$pdf = file_get_contents("../forms/mypdf.pdf");
$decode = utf8_decode($pdf);
$re = "/(\d+)\s(?:0 obj <>\/AP<>\/)(.*)(?:>> endobj)/U";
preg_match_all($re, $decode, $matches);
print_r($matches);
However, my print_r is empty even after testing here. The matches on the right are first a numerical identifier for the field (I think) and then V(XX1) where "XX1" is the text I've manually entered into the form and saved (as a test to find how and where that data is stored). I'm assuming (but haven't tested) that N<>>>/AS/Off is a checkbox.
Is there something I need to change in my regex to find matches like (2811 0 obj <>/AP<>/V(XX2)>> endobj) where the first find will be a key and the second find is the value?
Part 1 - Extract text from PDF
Download the class.pdf2text.php # http://pastebin.com/dvwySU1a (Updated on 5 of April 2014) or http://www.phpclasses.org/browse/file/31030.html (Registration required)
Usage:
include('class.pdf2text.php');
$a = new PDF2Text();
$a->setFilename('test.pdf');
$a->decodePDF();
echo $a->output();
The class doesn't work with all pdf's I've tested, give it a try and you may get lucky :)
Part 2 - Write to PDF
To write the pdf contents use tcpdf which is an enhanced and maintained version of fpdf.
Thanks for those who've looked into this. I decided to convert the pdfs (since I'm not doing this as a batch) into svg files. This online converter kept the form fields and with some small edits I've made them printable. Now, I'll be able to populate the values and have a visual representation of the pdf. I may try tcpdf in the event I want to make it an actual pdf again though I'm assuming it wont keep the form fields.

Read the content of a PDF with PHP?

I need to read certain parts from a complex PDF. I searched the net and some say FPDF is good, but it cant read PDF, it can only write. Is there a lib out there which allows to get certain content of a given PDF?
If not, whats a good way to read certain parts of a given PDF?
Thanks!
I see two solutions here:
converting your PDF file into something else before: text, html.
using a library to do so and bad news here, most of them are written in Java.
https://whatisprymas.wordpress.com/2010/04/28/lucene-how-to-index-pdf-files/
What about that ?
http://www.phpclasses.org/package/702-PHP-Searches-pdf-documents-for-text.html
ps: I don't test this class, just read the description.
$result = pdf2text ('sample.pdf');
echo "<pre>$result</pre>";
How to get “clean” text :source code pdf2text
http://webcheatsheet.com/php/reading_clean_text_from_pdf.php

How to save a the content into the file with neat alignment?

I am having a textbox, in that i have loaded a xml file.
After editing and saving the xml content into the xml file, the content is not in the right formate.
While loading again, its not in the xml format
How to save a the content into the file with neat alignment?
Please help me
For ExampleI need to save like the following
<section>
<value>a</value>
<value>b</value>
</section>
But after saving its looks like
<section><value>a</value><value>b</value></section>
Thanks,Praveen J
As Gordon says your question makes no sense - the XML fragment is still "well-formed" (but its far from complete) so it is in the right format.
Do you mean you want to preserve the format it was submitted in? In which case output it using <pre>...</pre> tags. OTOH there are standard tools out there which wil format XML according to specific standards - e.g. geshi
C.
I think is issue is that it doesn't preserve whitespace, so opening the xml file later shows it all in a single line as opposed to spaced/tabbed as originally created.
You can try white-space: physical as a CSS attribute on your textarea. Alternatively you can try adding the attribute/value pair "wrap=hard" to your textarea declaration. Both methods should preserve whtepace.

Categories