create an image from table - php

I have following problem. I have txt file here : http://ch1zra.com/d2/runes.txt
I use PHP to loop throgh the file and generate this table : http://ch1zra.com/d2/runes.php
Table uses some basic styles and I like it that way.
txt file is generated and uploaded via python. I would like to create an image that looks like that table. Is there any way using python or PHP to do so ?
Any image format that is acceptable on the web is good, PNG being even quite welcome.
I've read somewhere that python reportlab can make styled tables with alignments and so on, so that could be a good start, but reportlab generates PDF. Of course, if that is just a step between it is also acceptable (if I could do the PDF > img conversion on my machine). ALso, IIRC every PDF contains a "screenshot" of each page for fast browsing, so that would also be cool.
All in all, I have this txt file and this HTML table that I want as image. If any1 can help that would be great :)
thanx in advance!

I can only speak for PHP.
You could try to build the image by hand with PHP's image functions http://www.php.net/manual/en/book.image.php
Or you could try executing a external script like: http://marginalhacks.com/Hacks/html2jpg/

Related

Issue converting to .pdf a merged .docx file that opens fine in Word

So, I have the following scenario.
I am working on a system for academical papers. I have several inputs that are for stuff like author name, coauthors, title, type of paper, introduction, objectives and so on. I store all that information in a database. The user has a Preview button which when clicked, generates a Word asynchronously and sends the file location back to the user and that file is afterwards shown to the user in an iframe using Google Doc Viewer.
There's a specific use case where the user/author of the paper can attach a .docx file with a table, or a .jpeg file for a figure. That table/figure has to be included inside the final .docx file.
For the .docx generation process I am using PHPWord.
So up until this point everything works fine, but my issues start when I try to mix everything and put together the .docx file.
Approach Number One
My first approach on doing this was to do everything with PHPWord. I create the file, add the texts where required and in the case of the image just insert the image and after that the figure caption below the image.
Things get tricky though, when I try doing the same thing with the .docx table file. My only option was to get the table XML using this. It did the trick, but the problem I ran into was that when I opened the resulting Word file, the table was there, but had lost all of its styling and had transparent borders. Because of those transparent borders, afterwards when converting it to PDF the borders were ignored and the table info is just scrambled text.
Approach Number Two (current one)
After fighting with Approach Number One and just complicating stuff more, I decided to do something different. Since I already generated one docx file with the main paper information and I needed to add another docx file, I decided to use the DocX Merge Library.
So, what i basically did was I have three generated word files, one for the main paper information, one for the table and one for the table caption (that last one is mainly to not overcomplicated the order of information). Also, that data is not in the table .docx file.
Then I run this:
$dm->merge( [
'paper-info.docx',
'attached-table.docx',
'attached-table-caption.docx'
], 'complete-file.docx');
So, afterwards, I check and the Word file is generated just as I need it with the table maintaining its original styles and dimensions.
If I open it in LibreOffice though, I get this error message:
Then if I continue and open the file, the file opens correctly with all the data with the only exception that it no longer respects the fonts of the file as they appear in Word.
So, the problem comes in the next step. Since I need to present a preview of the file using Google Doc Viewer using this syntax:
<iframe src="https://docs.google.com/gview?embedded=true&hl=es_LA&url=https://usersite.net/complete-file.docx?pid=explorer&efh=false&a=v&chrome=false&embedded=true" width="100%" height="600" style="border: none;"></iframe>
The document gets loaded fine, but when I review it what I see is that it only shows the content of the first paper-info.docx file and ends right where the table and table caption should appear. I open the exact same file in Word and it shows the table and caption.
The other issue is when I try to convert the file to PDF.
If I use PHPWord's method of conversion in combination with DomPDF I get the exact same issue as with the Google Docs Viewer, I just have the content of the first file, using this code:
$phpWordPDF = \PhpOffice\PhpWord\IOFactory::load('complete-file.docx');
$xmlWriterPDF = \PhpOffice\PhpWord\IOFactory::createWriter($phpWordPDF, 'PDF');
$xmlWriterPDF->save('complete-file-pdf');
So my only other viable route was to use LibreOffice's command line using this command:
soffice --headless --convert-to pdf complete-file.docx
This converts the file correctly, but has the issue mentioned when trying to open the .docx file in LibreOffice, the font styles are disconfigured.
Also weird part is that if I try to run this in my PHP script:
shell_exec('soffice --headless --convert-to pdf complete-file.docx');
Nothing happens.
I am running Apache 2.4.25, PHP 7.4.11 on Windows 10 x64.
Conclusion
Until now my best result was by merging the files, but it also caused this issue. So maybe the issue is coming from the merging process I am using. What would be ideal is to be able to just insert the table with styles and everything using PHPWord, but I haven't been able to and haven't found any examples on how to do that.
Another option that I've seen is this library, but the merge features is only in the license that's $599 USD, and since I am pretty close to solving this, I am not sure if it would solve my issue. If it does, I'd invest in it since I need to get this done ASAP, but I wanted to check with you guys what your recommendations would be for this case. Maybe another merging library or doing everything via PHPWord.
Help is appreciated!
After a lot of attempts to fix it, I wasn't able to achieve what I wanted with PHPWord and the merging library I mentioned.
Since I needed to fix this I decided to invest in the paid library I mentioned in my question. It was an expensive purchase, but for those who are interested, it does exactly what was required and it does it perfectly.
The two main functions I required were document merging and importing of content to a .docx file.
So I had to purchase the Premium package. Once there, the library literally does everything for you.
Example for docx files merge code:
require_once 'classes/MultiMerge.php';
$merge = new MultiMerge();
$merge->mergeDocx('document.docx', array('second.docx', 'other.docx'), 'output.docx', array());
Example for how to import a table from another docx file
require_once 'classes/CreateDocx.php';
$docx = new CreateDocxFromTemplate('document.docx');
// import tables
$referenceNode = array(
'type' => 'table',
);
$docx->importContents('document_1.docx', $referenceNode);
$docx->createDocx('output');
As you can see it is pretty easy. This answer is by no means an ad for this library, but for those that have the same problem as me, this is a life saver.

Extracting text from PDFs in PHP

I'm creating a php based web application which allows the user to upload a PDF file. This file will then be read and checked for certain data (text).
The problem is I can't figure out how to even open a PDF file in PHP. There are some PDF libraries mainly for creating PDF's, but they don't seem to be very good at reading them.
An alternative solution would be to use an already available solution in Python or something else (as described in other threads on this site) but I'd really like to stay as much as possible in PHP as I intend to later export the data to mysql, etc.
Any input on how to read a PDF and extract data from it would be much appreciated.
I personally haven't tried this out, but it looks like this one works: http://www.pdfparser.org/documentation
It's just a matter of downloading and telling your code to include it, just like the documentation shows.
Or you could try the class.pdf2text.php found in http://www.phpclasses.org/browse/file/31030.html

Is there a way with PHP to access a file on a server and save only the first half of the file?

I want to give users a preview of certain files on my site and will be using scribd API. Does anyone know how I can access the full file from my server and save the file under a different name , which I will then show to users..Can't think of a way to do this with PHP for .docx and image files...Help is much appreciated.
For "splitting" images, use an image processing library like gd to crop the image (lots of examples to be found on how to do that all over the place). For Word documents, use a library like PHPWord (or one of the other myriad such libraries) to open the document, remove/extract as much text as you need, then save that into a new Word file.
For other file types, find the appropriate method that allows you to manipulate that format, then do whatever you need to do with it.

pdf text extracter class in php

Is there available any class in php that extract all text from pdf file so i can store it in mysql database. My pdf has many elements like images, tables,plain text,form elements,charts etc.
So far i saw many classes for last two days, that extract texts, but no one facilitate with complete text extraction, Not extracting complete text from pdf.
I want to extract all text from given pdf file, even if the text is in table etc.
Any one know about this ? :)
Thanks a lot. Have a nice day :)
Find the below url,
Reading the clean text from PDF with PHP
If you are running this on a linux server, you could try using apdf2text calling it via exec then grabing the contents of the output file.
Note that a few pdf to text scripts are around and you'll get different mileage from all.
I've tested many command line program, but none has 100% result.
So I've started my own library in PHP :
https://github.com/smalot/pdfparser
Currently it's text oriented, but image support will be planned.
If you encountered issues, thanks for sending me your PDF and if possible, the way you made it .

PHP : Creating image from html table

Can I get some sample code in PHP for converting an html table to image
form(.gif,.jpg or any format)? I am using XAMPP on Windows.
Yes, the table is coming from the database.
The best way is to convert first in a .ps, then jpg, pdf, or whatelse you need.
I can suggest you 2 links:
html2ps
wkhtmltopdf
Tested both, and both works perfectly... html2ps is little slow (~30 sec for a 3 pages pdf, dunno about jpg) but more customizable.
Give them a look
you like to have an screenshot from an html-table / html-code?
Thats not possible with php only.
You need a webbrowser or an html-renderer and a program do make an screenshot.
look at
http://www.thumbshots.org/ (onlineservice)
or
http://www.intellitamper.com/webswoon/ (python tool.)
At first glance, that's quite a tall order. Here are some pointers:
You'll probably want to get cosy with the GD library
Where is this table coming from? If it's coming from your database originally, this would be easier to work with. Otherwise...
You'll need to get the remote page (I recommend curl)
Then you'll need to extract the table data
The complexity of the second step really depends on how similar each page and table is going to be. Regex is probably going to be useful though.
Hope this helps,
Tom
As DrFuture mentions, using php only is probably not the best way to go to convert a table into an image. I modified http://www.zubrag.com/scripts/website-thumbnail-generator.php to get it to convert my html into an image (put the html on another webpage and pass that url to the script). I would suggest going with the 'screenshot' route instead of GDI and other drawing tools.

Categories