Html entity decode issue using html2pdf

Html entity decode issue using html2pdf - php

I'm trying to display strings of text fetched from a database in a PDF document correctly. What I can't figure out is the following.
I'm using fpdf and html2pdf for the generation of the PDF document. After I fetched my information from my DB I use:
iconv('UTF-8', 'windows-1252', $data);
This displays correctly in the PDF document if I use:
$pdf->Cell();
But when I use:
$pdf->WriteHtmlCell();
it seems that it has decoding issues. It seems to be in another charset because ù turns into Ã¹ and Ä into Ã„ and so on. I have tried to convert it to UTF-8 (which it is originally in) or ISO, but I keep getting the same result. When I run a
mb_detect_encoding();
on the string it always comes back as ASCII (that is UTF-8?).
Is WriteHtmlCell(); using another encoding?

try this
html_entity_decode($your_data, ENT_XHTML,"ISO-8859-1");

Related

Missing characters in filled pdf using PDFTk with encoding UTF-8

I'm trying to fill pdf documents using PDFTk. Script working fine, it fills inputs in form but I don't get special characters [polish charset: UTF-8 or ISO-8859-2].
Script: https://github.com/mikehaertl/php-pdftk
The weird thing is that generated pdf actually has polish characters when I click on field.
Before click:
After click on field:
Default encoding is set to UTF-8. The problem is that PDFTk can't use chars outside the standard ASCII with FDF form fill. It doesn't allow multi-byte characters.
What I did:
Add fonts to pdf files (checked and files has font)
Create fields in pdf files with default font (Arial)
Change encoding in script (function fillForm) to ISO-8859-2
Change data values encoding (iconv or mb_convert_encoding)
Change functions encoding and data value encoding to ISO-8859-2
Flatten pdf after filling the form
Read all topics about this problem in stackoverflow, google
UPDATE (25.03.2016): Findout that pdf documents works fine on some computers. Some people have polish characters and other don't. All of
us have right fonts (with polish charset). I used default Arial or
Times New Roman. Fonts are also embed in that file.
Any ideas?

you need to run pdftk with need_appearances as an argument.
kudos to the guys from this issue on github.

I had similar issue.
Solved it with utf8_decode function. eg utf8_decode('Łukasz')

The best results (without flatten) I got when I was creating FDF file with UTF-8 values encoded into UTF-18BE
chr(0xfe) . chr(0xff) . str_replace(array('\\', '(', ')'), array('\\\\', '\(', '\)'), mb_convert_encoding($string, 'UTF-16BE'));
Your library works quite well but ie. when I open the PDF generated with it directly in Safari on MACOS it does not show polish chars until I click the field. When I open it with Adobe Reader - it works fine.

I could not find how to change font, so my solution - use itext, https://itextpdf.com/en/resources/examples/itext-5/filling-out-forms
wrote for my project https://github.com/dddeeemmmooonnn/pdf_form_filler

XML file isn't UTF-8 encoded when created in PHP

I'm trying to output XML file using PHP, and everything is right except that the file that is created isn't UTF-8 encoded, it's ANSI. (I see that when I open the file an do the Save as...).
I was using
$dom = new DOMDocument('1.0', 'UTF-8');
but I figured out that non-english characters don't appear on the output.
I was searching for solution and I tryed first adding
header("Content-Type: application/xml; charset=utf-8");
at the beginning of the php script but it say's:
Extra content at the end of the document
Below is a rendering of the page up to the first error.
I've tryed some other suggestions like not to include 'UTF-8' when creating the document but to write it separately:
$doc->encoding = 'UTF-8'; , but the result was the same.
I used
$doc->save("filename.xml");
to save the file, and I've tryed to change it to
$doc->saveXML();
but the non-english characters didn't appear.
Any ideas?

ANSI is not a real encoding. It's a word that basically means "whatever encoding my Windows computer is configured to use". Getting ANSI is a clear sign of relying on default encoding somewhere.
In order to generate valid UTF-8 output, you have to feed all XML functions with proper UTF-8 input. The most straightforward way to do it is to save your PHP source code as UTF-8 and then just type some non-English letters. If you are reading data from external sources (such as a database) you need to ensure that the complete toolchain makes proper use of encodings.
Whatever, using "Save as" in an undisclosed piece of software is not a reliable way to determine the file encoding.

Decoding HTML Special Character

I want to export some of my data into CSV with the help of PHP. The code is working correctly but some of the keywords that I want to export was in some other unicode which I had saved in database using
urlencode('אריה דרעי');
This saved the unicode in this format in database:
%26%231488%3B%26%231512%3B%26%231497%3B%26%231492%3B+%26%231491%3B%26%231512%3B%26%231506%3B%26%231497%3B
The main problem here is when I display it in an HTML page it displays fine but when I try to export it in CSV it shows the same texts.
I tried to use following function
url_decode('%26%231488%3B%26%231512%3B%26%231497%3B%26%231492%3B+%26%231491%3B%26%231512%3B%26%231506%3B%26%231497%3B');
But it again genereated special characters as
אריה דרעי
Then I tried it to further decode using
htmlspecialchars_decode();
But still it shows אריה דרעי in the CSV files.
I hope I make sense.

Try using htmlspecialchars() instead of url_encode()
http://www.php.net/manual/en/function.htmlspecialchars.php
and http://www.php.net/manual/en/function.htmlspecialchars-decode.php

An error occurred while displaying Korean in PHP

Hi all I need your help for my problem.
I try to display text (Korean) from a .txt file but the output is different.
I have a .txt file contains Korean characters like this
냐는 한국을 사랑
but when i try :
$str= file_get_contents($path."result.txt");
echo $str;
on the browser the result came out like this : �먮뒗 �쒓뎅�� �щ옉
but It's OK when i just echo "냐는 한국을 사랑"
IS there something wrong ?
Thank for your help

Either use header("Content-Type: text/html; charset=UTF-8") in your php file or a meta tag in your html <meta charset='utf-8'>. And make sure the font you are using supports unicode characters you need.

Apparently the character encoding of the file is different from the character encoding of the HTML document that your code is generating.
You could dynamically convert the text data in PHP, or you could just use a suitable conversion program to convert the text file. You could just open the text file in a text editor and use Save As to save it as UTF-8 encoded (without BOM), assuming that your PHP is generating a UTF-8 encoded document.

I struggled a while fixing this problem until I discovered this which works perfectly for me:
echo call_user_func_array('mb_convert_encoding', array("행동 방식",'HTML-ENTITIES','UTF-8'));

Parse XML with special characters (UTF-8)

I'm starting out with some XML that looks like this (simplified):
<?xml version="1.0" encoding="UTF-8"?>
<alldata>
<data name="Forsetì" />
</alldata>
</xml>
But after I've parsed it with simplexml_load_string the special character (the i) becomes: Ã¬ which is obviously pretty mangled.
Is there a way to prevent this from happening?
I know for a fact the XML is fine, when saved as .txt and viewed in the browser the characters are fine. When I use simplexml_load_string on the XML and then save values as a text file, or to the database, its mangled.

This looks SimpleXML is creating a UTF-8 string, which is then rendered in ISO-8859-1 (latin-1) or something close like CP-1252.
When you save the result to a file and serve that file via a web server, the browser will use the encoding declared in the file.
Including in a web page
Since your web page encoding is not UTF-8, you need to convert the string to whatever encoding you are using, eg ISO-8859-1 (latin-1).
This is easily done with iconv():
$xmlout = iconv('UTF-8', 'ISO-8859-1//TRANSLIT', $xmlout);
Saving to database
You database column is not using UTF-8 collation, so you should use iconv to convert the string to the charset that your database uses.
Assuming your database collation is the same as the encoding that you render in, you will not have to do anything when reading from the database.
Explanation
In UTF-8, a 0xc2 prefix byte is used to access the top half of the "Latin-1 Supplement" block which includes characters such as accented letters, currency symbols, fractions, superscript 2 and 3, the copyright and registered trademark symbols, and the non-breaking space.
However in ISO-8859-1, the byte 0xC2 represents an Â. So when your UTF-8 string is misinterpreted as one of those, then you get Â followed by some other nonsense character.

It's very likely that the XML is fine, but the character gets mangled when stored or output.
If you're outputting data on a HTML page: Make sure it's encoded in UTF-8 as well. If your HTML page is in ISO-8859-1, you can use utf8_decode as a quick fix; using UTF-8 is the better option in the long run.
If you're storing the data in a mySQL, you need to have UTF8 selected as the encoding all the way through: As the connection's encoding, in the table, and in the column(s) you insert the data into.

I've also had some problems with this, and it came from the PHP script encoding. Make sure it's set to UTF-8.
If it's still not good, try printing the variable using uft8_encode or utf8_decode.

XML is strict when it comes to entities, like & should be &amp; and ì should &igrave;
So you will need a translation table.
function xml_entity_decode($_string) {
// Set up XML translation table
$_xml=array();
$_xl8=get_html_translation_table(HTML_ENTITIES,ENT_COMPAT);
while (list($_key,)=each($_xl8))
$_xml['&#'.ord($_key).';']=$_key;
return strtr($_string,$_xml);
}

Late to the party... But I've faced this and solved like below.
You have declared encoding in XML so if you load xml file using DOMDocument it won't cause any issue.
But in case it happens in other use case, you can use html_entity_decode like below:
html_entity_decode($xml->saveXML());

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Html entity decode issue using html2pdf - php

try this html_entity_decode($your_data, ENT_XHTML,"ISO-8859-1");

Related

Missing characters in filled pdf using PDFTk with encoding UTF-8

XML file isn't UTF-8 encoded when created in PHP

Decoding HTML Special Character

An error occurred while displaying Korean in PHP

Parse XML with special characters (UTF-8)

Categories

Resources