Library Mpdf (php): Set utf-8 and use WriteHTML with utf-8 - php

I need help with the php Mpdf library. I am generating content for a pdf, it is in a div tag, and sent by jquery to the php server, where Mpdf is used to generate the final file.
In the generated pdf file the utf-8 characters go wrong, for example "generación" instead of "generación".
I detail how they are implemented:
HTML
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
Sending content for pdf (jquery)
$('#pdf').click(function() {
$.post(
base_url,
{
contenido_pdf: $("#div").html(),
},
function(datos) {
}
);
});
Reception content (PHP)
$this->pdf = new mPDF();
$this->pdf->allow_charset_conversion = true;
$this->pdf->charset_in = 'iso-8859-1';
$contenido_pdf = this->input->post('contenido_pdf');
$contenido_pdf_formateado = mb_convert_encoding($contenido_pdf, 'UTF-8', 'windows-1252');
$this->m_pdf->pdf->WriteHTML($contenido_pdf_formateado);
Other tested options:
1.
$this->pdf->charset_in = 'UTF-8';
Get error:
Severity: Notice --> iconv(): Detected an illegal character in input string
2.
$contenido_pdf_formateado = mb_convert_encoding($contenido_pdf, 'UTF-8', 'UTF-8');
or
3.
$contenido_pdf_formateado = utf8_encode($contenido_pdf);
Get incorrect characters, like the original case.
What is wrong or what is missing to see the text well? Thanks

Solution
$contenido_pdf_formateado = utf8_decode($contenido_pdf);
$this->m_pdf->pdf->WriteHTML($contenido_pdf_formateado);

I had used the mode on object creation
$mpdf = new Mpdf(['mode' => 'UTF-8']);
Use this if you are sure that your html is utf-8

A combination of this
$mpdf = new Mpdf(['mode' => 'UTF-8']);
and the below worked for me.
$mpdf->autoScriptToLang = true;
$mpdf->autoLangToFont = true;

the only thing you have to do to active utf-8 is to add a defult font :
i did this and worket very well so try it and let the others knows if it's a good solotion or not.
just add a defult font and see..
$mpdf = new \Mpdf\Mpdf([
'default_font_size' => 9,
'default_font' => 'Aegean.otf' ]);

Related

conflict utf-8 tcpdf with cakeph 3

Hi community I'm using plugin CakePdf with the library tcpdf and when generating the pdf it shows me the following error
Error:
Warning: htmlspecialchars() [function.htmlspecialchars]: charset `ASCII' not supported, assuming utf-8 in G:\Trabajos_Web_PHP\diplomas\vendor\cakephp\cakephp\src\Core\functions.php on line 69
Warning: htmlspecialchars() [function.htmlspecialchars]: charset `ASCII' not supported, assuming utf-8 in G:\Trabajos_Web_PHP\diplomas\vendor\cakephp\cakephp\src\Core\functions.php on line 69
Warning: htmlspecialchars() [function.htmlspecialchars]: charset `ASCII' not supported, assuming utf-8 in G:\Trabajos_Web_PHP\diplomas\vendor\cakephp\cakephp\src\Core\functions.php on line 69
Warning: htmlspecialchars() [function.htmlspecialchars]: charset `ASCII' not supported, assuming utf-8 in G:\Trabajos_Web_PHP\diplomas\vendor\cakephp\cakephp\src\Core\functions.php on line 69
Warning: htmlspecialchars() [function.htmlspecialchars]: charset `ASCII' not supported, assuming utf-8 in G:\Trabajos_Web_PHP\diplomas\vendor\cakephp\cakephp\src\Core\functions.php on line 69
Warning (2): htmlspecialchars() [<a href='http://php.net/function.htmlspecialchars'>function.htmlspecialchars</a>]: charset `ASCII' not supported, assuming utf-8 [CORE\src\Core\functions.php, line 69]
my configuration is like this
Plugin::load('CakePdf', ['bootstrap' => true]);
Configure::write('CakePdf', [
'engine' => 'CakePdf.Tcpdf',
'encoding' => 'UTF-8'
'download' => true
]);
within my action which generates the pdf is this way
public function pdfdo($names = null) {
$file = new File(WWW_ROOT.'bd/'.'base_datos_do.json');
$json = $file->read(TRUE,'r');
$config = json_decode($json,TRUE);
$this->set('config',$config);
$persons = explode(',', $names);
$this->set('lastnames',$persons);
$this->viewBuilder()->setLayout('ajax');
$this->viewBuilder()->setTemplate('pdf/pdfdo');
$this->response->withType('application/pdf');
}
inside my template the configuration is this way, also apply the function mb_internal_encoding ('UTF-8'); to reset the enconding but still the error continues
$pdf = new TCPDF('L',PDF_UNIT,PDF_PAGE_FORMAT,TRUE,'UTF-8',FALSE);
$pdf->SetCreator(PDF_CREATOR);
$pdf->setPrintHeader(false);
$pdf->setPrintFooter(false);
$pdf->SetAutoPageBreak(TRUE, PDF_MARGIN_BOTTOM);
$pdf->setImageScale(PDF_IMAGE_SCALE_RATIO);
// build my pdf
// finalization of my pdf
mb_internal_encoding('UTF-8');
$pdf->Output('Diplomas-DO.pdf', 'D');
header('Content-Type: application/pdf; charset=utf-8');
please help I go several days with the problem thanks.
I recently made a pdf with TCPDF and had the same problem. It looks like you're building your PDF with the TCPDF engine directly.
This error happens when CakePHP throws an error before the PDF output can begin... for example, it could be a "Trying to get a property of a non-object in...." error or something like that. You should be able to see the specific error message info below the htmlspecialchars() warnings.
I suggest checking to make sure your pdf is working correctly first... instead of your //build my pdf code, make a simple line like
$pdf->setXY(13, 13);
$pdf->Write(5, 'Test Hello');
If that works, then your configuration is working and the error is likely in your variables somewhere, so start building your pdf piece by piece, testing as you go.
I'll also add that I also chose to use the TCPDF engine directly, so I didn't use the CakePDF plugin (which works great but didn't meet my needs for this particular problem). I can provide more info on this if needed.
EDIT:
I'll provide some info on how I used TCPDF directly in my project without CakePDF in case you or anyone finds it helpful.
First, I wanted to use TCPDF engine directly for a few reasons:
Precise control of headers and footers
Able to use the text scaling, FIT CELL functions of TCPDF
more precise absolute positioning of elements
avoid CSS.
So I installed TCPDF directly with composer
composer require tecnickcom/tcpdf
Added this to app/vendor/cakephp-plugins.php
'Tecnickcom/Tcpdf' => $baseDir . '/vendor/tecnickcom/tcpdf/'
Then in app/config/bootstrap.php
Plugin::load('Tecnickcom/Tcpdf', ['bootstrap' => true]);
Then in app/config/routes.php
Router::extensions(['pdf']);
Then in app/src/controller/mycontroller.php, I created the method outputpdf. In that method, I set all the data collections to be used in the pdf, then
$this->viewBuilder()->template('mypdf');
Then in app/src/template/mycontroller/pdf/ i created the mypdf.php. This file contains only this code:
header("Content-type:application/pdf");
$this->layout = 'mypdf';
Then in app/src/template/layout/pdf/ I created the file mypdf.php. In this file I built my PDF with the data from the controller.
header("Content-type:application/pdf");
// Extend the TCPDF class to create custom Header and Footer
class MYPDF extends TCPDF {
//And build the header and footer in here
}
$pdf = new MYPDF(PDF_PAGE_ORIENTATION, PDF_UNIT, PDF_PAGE_FORMAT, true, 'UTF-8', false);
//And make all the body content here
$pdf->Output('mypdf.pdf', 'I');
One downside with this approach is with foreign language fonts, you need to add and use the fonts you need in the app/vendor/tecnickcom/tcpdf/fonts folder, and those are all that are available for your pdf.
Please feel free to critique or advise on improvements to this approach.
I found the error is with the images that I use inside the pdf, one of them I use as the background of the pdf and another one is like a small image.
$pdf = new TCPDF('L',PDF_UNIT,PDF_PAGE_FORMAT,TRUE,'UTF-8',FALSE);
$pdf->SetCreator(PDF_CREATOR);
$pdf->setPrintHeader(false);
$pdf->setPrintFooter(false);
$pdf->SetAutoPageBreak(TRUE, PDF_MARGIN_BOTTOM);
$pdf->setImageScale(PDF_IMAGE_SCALE_RATIO);
$fontname = TCPDF_FONTS::addTTFfont(WWW_ROOT.'font'.DS.'Mada'.DS.'Mada-Regular.ttf', 'TrueTypeUnicode', '', 96);
$tagvs = array(
'div'=> array(
0 => array('h'=>0,'n' => 0),
1 => array('h'=>0,'n' => 0)),
'p'=> array(
0 => array('h'=>0,'n' => 0),
1 => array('h'=>0,'n' => 0)),
'h2' => array(
0 => array('h'=>0,'n' => 0),
1 => array('h'=>0,'n' => 0)),
'img' => array(
0 => array('h'=>0,'n' => 0),
1 => array('h'=>0,'n' => 0)
)
);
//variable that has small image
$imglogo = WWW_ROOT.'logos'.DS.'logoempresa.png';
foreach ($lastnames as $names) {
$pdf->AddPage();
$bMargin = $pdf->getBreakMargin();
$auto_page_break = $pdf->getAutoPageBreak();
$pdf->SetAutoPageBreak(false, 0);
//image for background
$img = WWW_ROOT.'img'.DS.'Diploma_DO.png';
$pdf->Image($img, 0, 0, 300, 210, 'png', '', '', false, 600, '', false, false, 0);
$pdf->SetAutoPageBreak($auto_page_break, $bMargin);
$pdf->setPageMark();
$pdf->setHtmlVSpace($tagvs);
$html_title = '<table cellspacing="0">'
. '<tr style="text-aling:center;line-height:11px">'
. '<td style="font-size: 37pt;font-weight: 600;color: #034bdb;color:#003275">'.$names.'</td>'
. '</tr>'
. '</table>';
$html_text_content = '<div style="text-align: center">'
. '<p style="color:#333;font-size: 16px;text-align: center">Ha completado con éxito el '.$config["Nombre-Taller-Curso"].',</p>'
. '<p style="color:#333;font-size: 16px;text-align: center">efectuada el '.$config["Fecha-Inicio-Fin"].' de '.$config["Mes-Ano"].' con una duración de '.$config["Horas"].' Horas.</p>'
. '</div>';
$html_text_content_bussines = '<div style="text-align: center">'
. '<p style="color:#333;font-size: 16px;text-align: center">Este taller ha sido diseñado especialmente para '.$config["Empresa"].'.</p>'
. '</div>';
$html_text_content_close = '<div style="text-align: center">'
. '<p style="color:#333;font-size: 16px;">'.$config["Fecha-Curso-Ubicacion"].'</p>'
. '</div>';
//img tag that contains the small image
$html_logo_bussines = '<img src="'.$imglogo.'" width="150" height="100">';
$pdf->SetFont($fontname, 'B', 26, '',false);
$pdf->writeHTMLCell(300,0,0,78,$html_title, '', 1, 0, true, 'C',true);
$pdf->SetFont($fontname,'',14,'',false);
$pdf->writeHTMLCell(300, 0, 0, 88, $html_text_content, '', 1, 0, true, 'C', true);
$pdf->writeHTMLCell(300, 0, 0, 109, $html_text_content_bussines, '', 1, 0, true, 'C', true);
$pdf->writeHTMLCell(300,0,0,125,$html_text_content_close,'',1,0,true,'C',true);
//use of the small image
$pdf->writeHTMLCell(300,0,0,155,'<div style="text-align:center">'.$html_logo_bussines.'<div>',0,0,0,true,'C',true);
$pdf->lastPage();
}
the error continues, the error stops showing when I comment the line where the
$pdf->writeHTMLCell(300,0,0,155,'<div style="text-align:center">'.$html_logo_bussines.'<div>',0,0,0,true,'C',true);
I do not know what I'm doing wrong I read the documentation and this function if you accept the img tag.
Can you debug your $imglogo variable to see if the file path is correct?
Or, try displaying the image with the $pdf->Image() function?
Note that TCPDF has a configuration option in vendor\tecnickcom\tcpdf\config\tcpdf_config.php:
define ('K_PATH_IMAGES', 'C:\\windowsfolder\\htdocs\\app\\webroot\\img\\');
So you can call the image in the PDF via:
$image_file = K_PATH_IMAGES.'imagefile.jpg';
See if that works...

Cyrillic text with dompdf, print pdf scale error

I need a help with this one. I use PHP and dompdf to create this invoice in Cyrillic:
On pdf everything is looking fine, but when I try to print it, I get it like:
and I get also one extra blank page added. Do you know how this can be fixed?
The example can be seen at: http://projects.stanislavstankov.com/php-pdf/pdfview.php
and sourse downloaded at: http://projects.stanislavstankov.com/php-pdf/php-pdf.rar
PHP code:
<?php
$data = file_get_contents("HTML Invoice Template.html");
$data = iconv('UTF-8//IGNORE','UTF-8//IGNORE', $data);
$data = preg_replace('/<body>/', "", $data);
$data = preg_replace('/<\/body>/', "", $data);
$data = preg_replace('/<html>/', "", $data);
$data = preg_replace('/<\/html>/', "", $data);
// inhibit DOMPDF's auto-loader
define('DOMPDF_ENABLE_AUTOLOAD', false);
//include the DOMPDF config file (required)
require 'extensions/dompdf/dompdf_config.inc.php';
//if you get errors about missing classes please also add:
require_once('extensions/dompdf/include/autoload.inc.php');
//$data = mb_convert_encoding($data, 'HTML-ENTITIES', 'UTF-8');
//$data = utf8_decode($data);
//$data = iconv('Windows-1251','UTF-8', $data);
//
//generate some PDFs!
$dompdf = new DOMPDF(); //if you use namespaces you may use new \DOMPDF()
$dompdf->set_paper('a4', 'portrait');
$dompdf->load_html($data);
//$dompdf->set_paper(array(0,0,595,842));
$dompdf->render();
$dompdf->stream($log_id.".pdf", array("Attachment"=>0));
?>

Get html source of external webpage without header/encode

I just want to know if its possible to extract content encoded (in utf-8) from a html file without encoding header.
My specific case is this website:
http://www.metal-archives.com/band/discography/id/203/tab/all
I want to extract all the info but, as you can see, this word for example, looks bad:
Motörhead
I tried to use file_get_html, htmlentities, utf_decode, utf_encode and mix of them with different options but I cant find a solution...
Edit:
I just want to see the same website with correct format with this simple code:
$html_discos = file_get_html("http://www.metal-archives.com/band/discography/id/223/tab/all");
//some transform/decode here
print_r($html_discos);
I want the content in correct format in a string or DOM object to get some parts later.
Edit 2:
$file_get_html is a function of "simple html dom" library:
http://simplehtmldom.sourceforge.net/
That have this code:
function file_get_html($url, $use_include_path = false, $context=null, $offset = -1, $maxLen=-1, $lowercase = true, $forceTagsClosed=true, $target_charset = DEFAULT_TARGET_CHARSET, $stripRN=true, $defaultBRText=DEFAULT_BR_TEXT, $defaultSpanText=DEFAULT_SPAN_TEXT)
{
// We DO force the tags to be terminated.
$dom = new simple_html_dom(null, $lowercase, $forceTagsClosed, $target_charset, $stripRN, $defaultBRText, $defaultSpanText);
// For sourceforge users: uncomment the next line and comment the retreive_url_contents line 2 lines down if it is not already done.
$contents = file_get_contents($url, $use_include_path, $context, $offset);
// Paperg - use our own mechanism for getting the contents as we want to control the timeout.
//$contents = retrieve_url_contents($url);
if (empty($contents) || strlen($contents) > MAX_FILE_SIZE)
{
return false;
}
// The second parameter can force the selectors to all be lowercase.
$dom->load($contents, $lowercase, $stripRN);
return $dom;
}
The Content-Type of the URL
http://www.metal-archives.com/band/discography/id/203/tab/all
is:
Content-Type: text/html
This will default to ISO-8859-1. But instead you want to use UTF-8. Change the Content-Type so this is correctly signaled:
Content-Type: text/html; charset=utf-8
See: Setting the HTTP charset parameter
header('Content-Type: text/html; charset=utf-8');
echo file_get_contents('http://www.metal-archives.com/band/discography/id/203/tab/all');
As long as you are emitting as UTF-8, the raw data will work properly.
Try using html_eneity_decode http://php.net/manual/en/function.html-entity-decode.php (the source of that page has encoded characters)

PHP: converting html file to pdf

I have an html file named welcomemailtemplate.html and I need to convert that file to a PDF.
I first read this file using the following method provided by Yii framework:
$filename = Yii::app()->basePath.'\views\email\welcomemailtemplate.html';
$handle = fopen($filename, "r");
$contents = fread($handle, filesize($filename));
$name = $model->name;
fclose($handle);
$message = str_replace ( "[username]", $name, $contents );
Then, to generate the PDF file, the following parameters are set:
Yii::import('application.vendors.*');
require_once('tcpdf/tcpdf.php');
require_once('tcpdf/config/lang/eng.php');
$pdf = new TCPDF();
$pdf->SetCreator("Torget");
$pdf->SetAuthor('test name');
$pdf->SetTitle('Savani Test');
$pdf->SetSubject(' Torget Order Confirmation');
$pdf->SetKeywords(' Torget, Order, Confirmation');
//$pdf->SetHeaderData('', 0, PDF_HEADER_TITLE, '');
$pdf->SetHeaderData('', 0, "Torget Order", '');
$pdf->setHeaderFont(Array('helvetica', '', 8));
$pdf->setFooterFont(Array('helvetica', '', 6));
$pdf->SetMargins(15, 18, 15);
$pdf->SetHeaderMargin(5);
$pdf->SetFooterMargin(10);
$pdf->SetAutoPageBreak(TRUE, 0);
$pdf->SetFont('dejavusans', '', 7);
$pdf->AddPage();
If I pass the content as follows, it creates the PDF:
$pdf->writeHTML("<span>Hello World!</span>", true, false, true, false, '');
But if I pass the read html file content for pdf creating using following method it gives following error:
$pdf->writeHTML($message, true, false, true, false, '');
$pdf->LastPage();
Error message:
Undefined index: thead
Try to validate the file using the w3c validator http://validator.w3.org/.
I've worked with tcpdf before but i gave it up because it didn't seem reliable. You can also try wkhtmltopdf binary (only if your hosting allows you to use proc_open/proc_close). Seems a little more stable to me. It also has a PHP class to help you use it.
CutyCapt seems to be a very good option for you. Its very easy to integrate also.

Zend_Cache And UTF-8 Problem

I'm trying to save UTF-8 characters with Zend_Cache (like Ť, š etc) but Zend_Cache is messing them up and saves them as Å, ¾ and other weird characters.
Here is a snippet of my code that saves the data to the cache (the UTF-8 characters are messed up only online, when I try it on my PC on localhost it works ok):
// cache the external data
$data = array('nextRound' => $nextRound,
'nextMatches' => $nextMatches,
'leagueTable' => $leagueTable);
$cache = Zend_Registry::get('cache');
$cache->save($data, 'externalData');
Before I save the cached data, I purify it with HTMLPurifier and do some parsing with DOM, something like this:
// fetch the HTML from external server
$html = file_get_contents('http://www.example.com/test.html');
// purify the HTML so we can load it with DOM
include BASE_PATH . '/library/My/htmlpurifier-4.0.0-standalone/HTMLPurifier.standalone.php';
$config = HTMLPurifier_Config::createDefault();
$config->set('HTML.Doctype', 'XHTML 1.0 Strict');
$purifier = new HTMLPurifier($config);
$html = $purifier->purify($html);
$dom = new DOMDocument();
// hack to preserver UTF-8 characters
$dom->loadHTML('<?xml encoding="UTF-8">' . $html);
$dom->preserveWhiteSpace = false;
// some parsing here
Here is how I initialize Zend_Cache in the bootstrap file:
protected function _initCache()
{
$frontend= array('lifetime' => 7200,
'automatic_serialization' => true);
$backend= array('cache_dir' => 'cache');
$this->cache = Zend_Cache::factory('core',
'File',
$frontend,
$backend);
}
Any ideas? It works on localhost (where I have support for the foreign language used in the HTML) but not on the server.
I had a similar problem with a FPDF deployment. Here, the html space character &nbsp was being converted into that same Å character that you're getting here. It was fine on my local windows, but did not work in my linux server environment.
Try this:
$str = iconv('UTF-8', 'windows-1252', html_entity_decode($str));

Categories