Missing characters in filled pdf using PDFTk with encoding UTF-8 - php

I'm trying to fill pdf documents using PDFTk. Script working fine, it fills inputs in form but I don't get special characters [polish charset: UTF-8 or ISO-8859-2].
Script: https://github.com/mikehaertl/php-pdftk
The weird thing is that generated pdf actually has polish characters when I click on field.
Before click:
After click on field:
Default encoding is set to UTF-8. The problem is that PDFTk can't use chars outside the standard ASCII with FDF form fill. It doesn't allow multi-byte characters.
What I did:
Add fonts to pdf files (checked and files has font)
Create fields in pdf files with default font (Arial)
Change encoding in script (function fillForm) to ISO-8859-2
Change data values encoding (iconv or mb_convert_encoding)
Change functions encoding and data value encoding to ISO-8859-2
Flatten pdf after filling the form
Read all topics about this problem in stackoverflow, google
UPDATE (25.03.2016): Findout that pdf documents works fine on some computers. Some people have polish characters and other don't. All of
us have right fonts (with polish charset). I used default Arial or
Times New Roman. Fonts are also embed in that file.
Any ideas?

you need to run pdftk with need_appearances as an argument.
kudos to the guys from this issue on github.

I had similar issue.
Solved it with utf8_decode function. eg utf8_decode('Łukasz')

The best results (without flatten) I got when I was creating FDF file with UTF-8 values encoded into UTF-18BE
chr(0xfe) . chr(0xff) . str_replace(array('\\', '(', ')'), array('\\\\', '\(', '\)'), mb_convert_encoding($string, 'UTF-16BE'));
Your library works quite well but ie. when I open the PDF generated with it directly in Safari on MACOS it does not show polish chars until I click the field. When I open it with Adobe Reader - it works fine.

I could not find how to change font, so my solution - use itext, https://itextpdf.com/en/resources/examples/itext-5/filling-out-forms
wrote for my project https://github.com/dddeeemmmooonnn/pdf_form_filler

Related

How to export emoji to PDF document using PHP?

I am trying to export to PDF using FPDF and TCPDF php library. I found that the emojis like 😁 😀 💃🏻 ❤️ 🥳 where not converted. Only ️️some rectangle box there in generated pdf. I also tried tfpdf.
$text = "There is my text 😁 , 😀 and emojis 💃🏻 ❤️ 🥳";
require('tfpdf/tfpdf.php');
$pdf = new tFPDF();
$pdf->AddPage();
//Add a Unicode font (uses UTF-8)
$pdf->AddFont('Segoe UI Symbol','','seguisym.ttf',true); // DejaVuSans.ttf
$pdf->SetFont('Segoe UI Symbol','',12);
$pdf->Write(8,$text);
$pdf->Output();
I also tried different font. But didn't work for me. Can any one help me in this regard?
Sadly fPDF, TCPDF nor tFPDF can't print those characters. Issue is, these characters are not part of BMP, they are expressed with surrogate pairs, meaning they behave like multiple characters in UTF-16 (because of that one emoticon is printed as 2 rectangle boxes, not one) and also they have codepoint above 65535. However all mentioned PDF libraries relies on codepoint index being <= 65535 as well as TFontFile class reading TTF files.
You would also need to add TTF file having complete set of Unicode charset, or at least emoticons. Most fonts does not have it. This brings another issue for PDF library, which would probably need to have support for fallback font, which will be used when codepoint is not found in main font (for example you want to print text in Gotham, but since that does not include emoji, use other font for them). Btw for example emoji font "Noto Color Emoji" has 23 MB TTF file. So it gets big easily.
Anyway, all of the above can be added to PDF libraries, but it will require some effort. I am planning to do it for my needs as well sometimes. I think it will take roughly 1 man day.
Alternativelly, you might try something more robust like mPDF, but that library is huge, slow and require complete rewrite of your fPDF code. Also can't guarantee it can print emojis as well.

Unable to save file with Thai characters filename

I'm currently trying to save a pdf file using mPDF library. My problem is when I try to output file using English filename, the filename is displayed correctly, but if the filename contains any Thai characters it became weird.
My mPDF outputs code.
$save_file = $s_code.'_'.$classroom.'.pdf';
$mpdf->Output('../../../upload/'.$save_file,'F');
With English filename it displayed correctly.
t10024_201.pdf
With Thai characters it doesn't.
เธ—เธช10024_201.pdf
I can't figure out what causes the problem.
The filename is restricted to the character set supported by the device (in this case, the server where mpdf is generating your pdfs), and doesn't actually reflect a problem with mpdf itself. [If you can add/write Thai characters within the pdf, just the filename doesn't reflect Thai characters].
You may need to configure the Content-Disposition headers for the webserver's response with PDF file. As an example, see this blog post that describes how a ColdFusion application developer dealt with saving files with French characters.
Thank you so much, Anson W Han.
It's about character encoding. I finally found a solution. I simply convert the filename to Thai edcoding using "iconv" and it displays correctly.
The code:
$mpdf->Output('../../../upload/'.iconv("UTF-8", "TIS-620",$save_file),'F');

Html entity decode issue using html2pdf

I'm trying to display strings of text fetched from a database in a PDF document correctly. What I can't figure out is the following.
I'm using fpdf and html2pdf for the generation of the PDF document. After I fetched my information from my DB I use:
iconv('UTF-8', 'windows-1252', $data);
This displays correctly in the PDF document if I use:
$pdf->Cell();
But when I use:
$pdf->WriteHtmlCell();
it seems that it has decoding issues. It seems to be in another charset because ù turns into ù and Ä into Ä and so on. I have tried to convert it to UTF-8 (which it is originally in) or ISO, but I keep getting the same result. When I run a
mb_detect_encoding();
on the string it always comes back as ASCII (that is UTF-8?).
Is WriteHtmlCell(); using another encoding?
try this
html_entity_decode($your_data, ENT_XHTML,"ISO-8859-1");

Special symbols in PDF

Im developing a PDF generation script with PHP, using FPDF library, its all fine for text and images, but when I put corrency symbols like Pound or Euro, it is giving some special symbols instead, I could solver the same in normal pages by setting character encoding of the webpage, but not sure how to set character encoding for a PDF document.
You are running into encoding - decoding issues.. Add this line on top of your script.
<?php
header('Content-Type: text/html;charset=utf-8');
followed by your FDPF generation code.

Encoding issue with Apache , displaying diamond characters in browser

Request you all to help me set up Apache server on Cent OS. It looks like some encoding issue, but I am not able to resolve it yet.
Instead of HTML content it displays HTML source in (chrome,firefox), IE 9 works fine. It displays � character after each "<" symbol.
http://pdf.gen.in/index1.htm
Second Problem is with PHP. It displays source code of PHP http://pdf.gen.in/index.php with similar diamond characters, wherever it encounters a "<" character. It seems like php issue is related to the first issue.
Those files are encoded with UTF-16LE. For the static HTML page, you might be able to get it to work by setting the charset correctly in the MIME type (it's currently text/html; charset=UTF-8). I don't know how strong PHP's Unicode support is. Try using UTF-8 instead, it's generally more well supported due to its partial overlap with ASCII.
You should use a decent text editor, and always set encoding of php/html to "UTF-8 without BOM".
Create a file named "test.php", paste below codes and save with "UTF-8 without BOM" encoding, then it will work just fine.
<?php
phpinfo();
?>

Categories