Compress MS word docx documents using barcode fonts in PHP scipt - php

Using Tinybutstrong and openTBS i created a script in PHP that opens multiple docx templates and replaces a lot of variables with values from a database. In a nutshell clients can download their unique files, add information and pictures and upload them again. This works excellent. But of coarse i wouldn't post here if there wasn't some sort of problem.
Because of the barcodes (I am using barcode fonts and embed them in Word because the documents will be scanned far later in the process), the documents get huge. Instead of 100 KB average, they'll easily get 7MB. This is a problem, because per year about 20.000 documents will be scanned. That's an extra +/- 130 GB per year.
It's a long story but we need docx, so we can't simply replace it with some sort of PHP / MySQL template that would be far more efficient.
Word has the option to just embed the font symbols that are being used to cut on the size. But that isn't an option, because the main template needs to have all chars available. It's also not an option to send the font to the users, since there are +/- 20.000 new ones each year.
Is there another solution to cut the file size or use compression. Perhaps in Word, PHP, FTP, Apache?

I'm afraid the solution of using the option "Embed fonts in the file" with "Embed only characters used in the document" cannot be exploited. Ms Word saves the font using a special format with the extension ODTTF (for example, you have it in "word\fonts\font1.odttf"). But this format is binary, it seems badly documented and thus it stays as a proprietary format. Only Ms Word will be able to build such a sub-file.
Since you haven't any lighter font for the barcode, the only solution I can see is to use image instead of font for you barcode:
OpenTBS has a feature to easily replace a picture inside a DOCX file (parameter "op=changepic").
Barcode2Image tools are easy to find in PHP. For example : Barcode Generator.
Then you only have to code your process like this :
Load the DOCX template,
Create the temporary image of the barcode.
Change the image inside the template.
Merge the template, and save or send the result.
Delete the temporary image.
It's important to delete the temporary image only after the final merge of the template, because OpenTBS actually inserts the image only when method $tbs->Show() is called.
It's also important to use a different temporary file for each merging because many merges can occur in the same time.
If temporary files have a prefix or are saved into a dedicated directory, then it is advisable to clean up old temporary images regulary.

Related

Generating PDFs with images way too slow

A big bottleneck I have at the moment is PDF generation each time someone places an order. It's not a big deal for a single order, but when there are a lot in a short time frame, this process is very slow.
The PDF needs text information, a QR code, a Bar code, a logo, and 1 or more (up to 20+) 1/4-width images.
Current process w/ DOMPDF:
QR code image created w/ PHP and saved as png
Bar code image created and saved as png
DomPDF generates PDF
New thought:
HTML2PDF creates PDF, and uses it's qr and bar code tags to generate the bar codes
That theoretically would take care of the QR and Barcode images, but still, the rest of the images make it too slow.
Doing it this way, without any images other than the (QR and Bar code), the PDF can generate in ~500ms, but as soon as I start adding images, it goes up to 2, 3, 4, 5+ seconds each.
When running tests, and processing ~10k orders (in a few minutes), it was still processing the PDFs around 12 hours later until I just shut it down in frustration.
The PDF is generated in a separate queue process, so the person doesn't have to wait when ordering, but - still... it can't take 5+ hours for them to receive their confirmation PDF during high traffic.
Questions / TLDR:
How can I make my process of creating PDFs with a dynamic qr code, a dynamic bar code, dynamic text, and 1-20 static images (images are same across all PDFs) faster?
Are there other potential things I haven't thought of? Maybe making a template PDF and somehow use PHP to just fill in the dynamic spots?
I would strongly advice you to use TCPDF library. It's quite fast and can be easily integrated into CakePHP. You can find a lot of examples of how to include images, barcodes and QR codes into PDF at TCPDF examples page.
To further improve the performance use tips from this page:
Install and configure a PHP opcode cacher like XCache;
Edit the php.ini file and increase the maximum amount of memory a script may consume (memory_limit);
Edit the php.ini file and increase the maximum execution time of each script (max_execution_time);
Edit the config/tcpdf_config.php file: manually set the $_SERVER['DOCUMENT_ROOT'], K_PATH_MAIN and K_PATH_URL constants, and remove the automatic calculation part;
If you are not using the Thai language, edit the config/tcpdf_config.php file and set the K_THAI_TOPCHARS constant to false;
If you do not need extended chars, edit the config/tcpdf_config.php file and set the default fonts to core fonts;
If you do not need UTF-8 Unicode, set the $unicode parameter on TCPDF constructor to false and the $encoding parameter to 'ISO-8859-1' or other character map.
By default TCPDF enables font subsetting to reduce the size of embedded Unicode TTF fonts, this process, that is very slow and requires a lot of memory, can be turned off using setFontSubsetting(false) method;
Use core fonts instead of embedded fonts whenever possible;
Avoid using the HTML syntax (writeHTML and writeHTMLCell methods) if not strictly required;
Split large HTML blocks in smaller pieces;
Avoid using transactions if not strictly required;
Restart the webserver after changes.
If that does not improve the performance to the acceptable level you can install your CakePHP application (or just the script that runs the generation of PDFs if it doesn't use CakePHP) on a second server with more available resources and use that server only for PDF generation.
You can try to use JPEG instead of PNG files if you don't need transparency.
For example, in TCPDF, I had to generate a PDF with a big PNG in background (18cm x 18cm, 300dpi). I had to wait for 11 seconds before the file is generated.
I replaced the image with a JPEG of the same size and DPI, and it took less than 1 second.

How do I create a large pdf, semi quickly

I need to generate a large PDF, 2480 pages to be exact.
Currently I am using indesign, and while the output is exactly what I want.
I would rather not be involved in the document creation process.
It takes 31 minutes for indesign to execute the data merge, generate the pdf, save the pdf, and to save the pdf.indd file. (I dont really need the pdf.indd file, but I would rather not have to recreate the data merge if something were to happen to the pdf)
I am hoping for a php, or similar solution. Currently my data is stored in MySQL.
The majority of the pdf is static text, with 19 dynamically driven text fields.
There is one image on the pdf, 75x100px # 72dpi.
The output needs to be exact, the pdf file is printed and cut in half at 4.25 inches.
I have tried TCPDF, while it is fast at generating upto 50 pages, after that it would rather die than give me an output. I have also played with mPDF, and found it to be, ..., not as friendly. I have also considered generating many small files and using some utility to merge the smaller pdf's into one large pdf. Though that seems like driving around the mountain.
Any thoughts would be helpful.
You certainly can create documents directly with PHP, but it can be difficult. One method is to use one of the various PDF classes to create the document, as you have found. Another is to create images (using ImageMagic, GD, etc.) and convert those to PDF. (This method is less efficient, as you are creating raster graphics making the whole PDF page one giant graphic.)
However, I think you should consider simply scripting InDesign. InDesign has the capability to read data in via XML and create the document. This way, the design of your document isn't dependent on your programming abilities and you can still have the power of programmatically creating the document.
When it comes to huge number of pages in PDF, LaTeX is always the best answer. Nothing can really handle huge PDF generation as fast, accurate and elegant as LaTeX.
Check this question to see how to retrieve your data from the database.

HTML To PDF Converter - Html2PDF - how to reduce the size of the result PDF?

I am using HTML2PDF converter in order to export web page to PDF file. The issue I have faced is that the result PDF is to large (more than 1 MB). I want to reduce it, so here is what I have basically:
2 images (100 KB both)
1 Courier Bulgarian font - added
3 tables with a lot of inline styles of each cell
Could these things lead to the large size of the output PDF? And could anyone share some experience and best practices with the library in order to get smaller PDF as result.
Thanks in advance.
If you have access to Adobe Acrobat, it'll let you peek in the PDF and see which areas use what percentage of space. This would tell you how much of the space is taken up by images, fonts etc... If you make a sample PDF available I'll be happy to take a look at it.
How much space in the PDF is used by different objects really depends on how the PDF was written and how efficiently it uses different compression algorithms. For example, images can be ZIP, JPEG or even JPEG-2000 compressed in PDF, the question is what HTML2PDF does with your images.
Fonts can be big as well - depending on what the size of the original font is.
Page content (your tables with inline styles etc) are written in a condensed textual format (for example 1 0 0 rg would be the instruction to set the fill color for text and line-art to red). Normally efficient PDF writers will write all of these textual instructions and then ZIP compress them, which means they won't take much place in the file.
So you'd really need to take a look at the PDF file itself to see where the space is used. That will allow you to start looking at the library to see if it can be made more efficient.
To start with...
Have one separate page for printing and displaying - for printing.
While printing use Ptint.CSS and include as less as possible classes
In those three tables. Keep only require classes in CSS. Please use compress images for printing e.g if you have BMP in display use the PNG for printing.
Make tables simple using TR-TD and make them less colorfull do not put inline CSS.
This may help you to save PDF in compress version.
I have mananged to find what causes the big size of my output PDFs files. It last the fonts.
For my task I have used courier bulgarian (italics, normal and bold). This means I have embeded three additional fonts.
Something more, I do not know, but helvica is the defauld font for this PHP library and althought I am not using it, it is embeded to the PDFs too.
If you meet the same issue, open all files that have "helvica" and replace the "helvica" font with this that you are using.

Is there a way with PHP to access a file on a server and save only the first half of the file?

I want to give users a preview of certain files on my site and will be using scribd API. Does anyone know how I can access the full file from my server and save the file under a different name , which I will then show to users..Can't think of a way to do this with PHP for .docx and image files...Help is much appreciated.
For "splitting" images, use an image processing library like gd to crop the image (lots of examples to be found on how to do that all over the place). For Word documents, use a library like PHPWord (or one of the other myriad such libraries) to open the document, remove/extract as much text as you need, then save that into a new Word file.
For other file types, find the appropriate method that allows you to manipulate that format, then do whatever you need to do with it.

Create Font from images? [duplicate]

Are there existing libraries for generating .ttf via image(s) using PHP (say, a series of images)? There are several references about creating gdf from images, but I've not yet found examples of ttf-font creation via PHP.
N.B. There are also several online resources that let you upload an image (write a letter in each box on an image template) to be instantly converted to a TTF. http://www.yourfonts.com is one of them
To my knowledge, there is no such tool.
Creating a TrueType font is a hugely difficult enterprise. A font consists of a lot of very complex information (see the "technical notes" in the Wikipedia article to get a tiny impression). It won't do to just paste a series of images together.
Depending on what you want to do, I suppose you could work around this by building a faux "bitmap font", one image file containing one character, and glue the correct images together to form a sentence. The results will probably be less than perfect, though, because there will be no Kerning.

Categories