I Have the some problem als this post from 2 year ago on this website. I tried everything, but nothing help.
Does anybody over here a working solutions for this problem?
html_entity_decode in FPDF(using tFPDF extention)
I am using tFPDF to generate a PDF. The php file is UTF-8 encoded. I want © for example, to be output in the pdf as the copyright symbol.
I have tried iconv, html_entity_decode, htmlspecialchars_decode. When I take the string I am trying to decode and hard-code it in to a different file and decode it, it works as expected. So for some reason it is not being output in the PDF. I have tried output buffering. I am using DejaVuSansCondensed.ttf (true type fonts).
Link to tFPDF: http://fpdf.org/en/script/script92.php
I am out of ideas. I tried double decoding, I checked everywhere to make sure it was not being encoded anywhere else.
Help!
Related
Since yesterday, my server is adding some garbage characters to any script in PHP. If I look at my code with view source, I see some spaces and if I display a JSON string, it is considered invalid.
If I take this simple example:
<?PHP
echo "hello";
?>
It displays hello but in the source code I see a blank line before hello. The encoding of the file is in UTF8 without BOM (did it with Notepad++)
If I use file_get_contents to load the PHP file and then use rawurlencode before outputting the content, I get the following garbage before hello:
%EF%BB%BF%EF%BB%BF%EF%BB%BF
My first thought was that it was an encoding issue but I checked the concerned PHP file(s) and they are all in UTF8 without BOM. The only solution I have found is to remove this string of garbage each time before treating the content of a file.
I'm using Wordpress and the problem suddenly appeared yesterday while I had not modified any file.
Do you have any idea?
Thanks
Laurent
First off, I know this problem was signaled before, but the solutions do not apply to my case
Here is the url
http://www.astagiudiziaria.com/beni/porzione_di_rustico_e_terreni_agricoli/index.html
The page says its charset is ISO-8859-1, but it cannot be since it has the EURO sign on it. Chrome browser identifies it as windows-1252
I used
$file = str_replace('charset=iso-8859-1', 'charset=utf-8', $file);
$file = iconv('windows-1252', 'UTF-8', $file);
and save it and my text editor says it is UTF-8 encoded
Then I use
$doc2->loadHTML($file);
$doc2->saveHTMLFile('ggg.html');
and also my text editor says it is UTF-8 encoded
But http://i-tools.org/charset says this file, ggg.html is actually ASCII !
Nonetheless, inside it things look as expected, even though they are using html encodings , like Pré or proprietà
The xpath queries return garbage data, like
instead of Pré is Pré
instead of € is €Â
I have tried the solutions suggested around here without any success
I think it's about how php is dealing with libxml, since in ruby it works flawlessly - also using libxml through curb gem - problem being that my client wants a php script
I took a quick glance, and the way I see it the site outputs mixed encoding.
It is iso-8859-1 with a mixed-in windows-1252 € sign (I think).
Thats why the browser gets confused (but somehow handles it).
No idea how you would proceed here, apart from asking them to fix their site or alternativly do some bit-fiddling.
the Pré is Pré breaks because you attemt to windows-1252->utf8 transcode what actually is iso-8859-1 stuff (I suppose).
I have a stupid problem. I use a software for export .csv files, and the result is a strange formated text. When I try to deal them in PHP, everything goes wrong.
I copy and paste the text in MS WORD : there is a strange character between each letter.
In php I tried to convert it using utf8_decode/utf8_encode, iconv("ISO-8859-1", "WINDOWS-1252", $str)... in vain.
I guess it's an utf16 encoded text, but I'm not sure of it. I tried some functions to decode utf16, in vain too.
Is someone has a solution to fix this ?
Your guess it correct:
file -i NL_JGFR_130326_bac.csv
NL_JGFR_130326_bac.csv: text/plain; charset=utf-16le
You can probably use the PHP MultiByte extension to work with UTF-16:
http://php.net/manual/en/ref.mbstring.php
I'm actually working on a web application coded in php with zend framework. I need to translate every pages in french and english so I use csv file to do it.
My problem is when a word start with an accentued letter like É or À, the letter just disappear, but the rest of the word is displayed.
For example, if my csv file contains Écriture, it displays criture. But if I have exécution, it displays exécution without any problems.
Everytime I want to display text in my view, I just call <?php echo $this->translate('line to call in csv'); ?> and my text is displayed.
Like I said ,my application is encoded with UTF-8, and I don't have any problems withs specials characters, except when they're first. I googled it but couldn't find anything for now.
Thanks already for your help !
UPDATE
I forgot to say that when I execute my application in zend browser to debug it, everything's fine, my É displays. It's only in broswers like IE or FF that I have the problem.
UPDATE #2
I just found another post talking about fgetcsv, and it looks like the function I use to translate from my csv file is using fgetcsv() ... could it be the problem ? And if it is, how can I fix it ? It's coded like that in Zend Translate library I'm not sure I want to start changing things there ...
UPDATE #3
I continued my research and I found issues in PHP when encoded UTF-8. But Zend Framework is encoded UTF-8 by default so I'm sure there is a way to make this work.. I'm still searching but I hope someone has the solution !
I had the same problem, I tried AJ's solution and it worked:
Missing first character of fields in csv
The problem seems to be that fgetcsv() uses locale settings, just use
setlocale(LC_ALL, 'en_US.UTF-8');
In .csv file content try to use
; as delimiter
and
" as enclosure.
something like this inside .csv file
"key1";"value1" ##first line
"key1";"value1" ##second line
"key1";"value1" ##fird line
this solve like ussue for me
view csv file using hex editor and make sure it is encoded in the right way
"É" is 0xC3 0x89,
"À" is 0xC3 0x80
Did you have some strtoupper() or ucfirst() or similar functions in your code? In that case try mb_strtoupper($str, 'UTF-8')
Well, I give up.
I've been messing around with all I could think of to retrieve data from a target website that has information in traditional Chinese encoding (charset=GB2312).
I've been using the simple_html_parser like always but it doesn't seem to return the Chinese characters, in fact all I get are some weird question marks embedded inside a rhomboid shape.
("�������ѯ�ؼ��֣�" Like so)
Declaring the encoding for the php file didn't do anything except of getting rid of some unwanted character showing at the start of the page.
By declaring it I mean:
header('Content-Type', 'text/html; charset=GB2312');
I can't get any data that's written in Chinese, also tried file_get_contents with the same luck. I'm probably missing something obvious since I can't find any related discussion elsewhere.
Thanks in advance.
Have you tried converting the encoding with mb_convert_encoding or iconv, e.g.
$str = mb_convert_encoding($content, 'UTF-8', 'GB2312');
or
$str = iconv("UTF-8", "GB2312//IGNORE", $content);
Get it in whatever character set the source uses, then convert it to something usable locally, such as UTF-8. Then send it to the browser.
set header('Content-Type: text/html; charset=utf-8');
It's working for me