Thai, Vietnamese language not supported in Excel - php

I have created an excel in which it has Thai and Vietnamese language. My problem is that it is showing these characters as question marks.
My code is below
$worksheet->write($i, 5, iconv("UTF-8", "ISO-8859-1//TRANSLIT", html_entity_decode($text)), $mainquest);
I have also tried all the other ISO standards. I put ISO-8859-1 for french language support. I also tried the mb_convert_encoding but no progress.
Is there any solutions for this?

The encoding charset is not the same as French for Vietnamese and Thai
For Vietnamese (Windows)
- charset=windows-1258
For Thai (Windows)
-charset=windows-874
so for Thai:
$worksheet->write($i, 5, iconv("UTF-8", "windows-874",html_entity_decode($text)), $mainquest);
and for Vietnamese:
$worksheet->write($i, 5, iconv("UTF-8", "windows-1258",html_entity_decode($text)), $mainquest);

If switching the library is an option I'd suggest to use phpexcel. It is UTF8-based, which means you shouldn't get into trouble with character encoding at all (if everything else is neatly set to utf-8 in your workflow – db,files, webserver). Never had any problems with this library so far, while having generated spread sheets with all kind of special characters.

you must use UTF-8 encoding for this problem

Related

Unable to convert file from ANSI to UTF-8, using PHP

I have a file, which contains some cyrillic characters. When I open this file in Notepad++ I see, that it has ANSI encoding. If I manually encode it into UTF-8 using Notepad++, then everything is absolutely ok - I can use this file in my parsers and get results. But what I want is to do it programmatically, using PHP. This is what I tried after searching through SO and documentation:
file_put_contents($file, utf8_encode(file_get_contents($file)));
In this case when my algorithm parses the resulting files, it meets such letters as "è", "í" , "â". In other words, in this case I get some rubbish. I also tried this:
file_put_contents($file, iconv('WINDOWS-1252', 'UTF-8', file_get_contents($file)));
But it produces the very same rubbish. So, I really wonder how can I achive programmatically what Notepad++ does. Thanks!
Notepad++ may report your encoding as ANSI but this does not necessarily equate to Windows-1252. 1252 is an encoding for the Latin alphabet, whereas 1251 is designed to encode Cyrillic script. So use
file_put_contents($file, iconv('WINDOWS-1251', 'UTF-8', file_get_contents($file)));
to convert from 1251 to utf-8 with iconv.

PHP Uploaded file name: Japanese character encoding

When uploading a file with a japanese name, some characters are creating problem.
On a windows system, I want to save the name of the file as-uploaded. So I have to use
mb_convert_encoding($name, "SJIS", "AUTO");
which works fine most of the cases.
Though, some characters like ① as in 0423図表① totally disappear at the end. It seems that when uploaded the name of the file is already "wrong":
it looks like "0423å³è¡¨â .pptx" in UTF-8 and if I change the header charset with
header('Content-Type: text/html; charset=SJIS');
it looks like
"0423テ・ツ崢ウティツ。ツィテ「ツ堕.pptx"
I am not sure what I can do in this case. I tried to replace the ① character but I cannot even find it with strpos() before or after the encoding conversion.
To qualify my answer (to the downvoter):
Q: I have heard that UTF-8 does not support some Japanese characters. Is this correct?
A: There is a lot of misinformation floating around about the support
of Chinese, Japanese and Korean (CJK) characters. The Unicode Standard
supports all of the CJK characters from JIS X 0208, JIS X 0212, JIS X
0221, or JIS X 0213, for example, and many more. This is true no
matter which encoding form of Unicode is used: UTF-8, UTF-16, or
UTF-32.
Unicode supports over 80,000 CJK characters right now, and work is
underway to encode further additions. The International Standard
ISO/IEC 10646 and the Unicode Standard are completely synchronized in
repertoire and content. And that means that Unicode has the same
repertoire as GB 18030, since that also is synchronized with ISO 10646
— although with a different ordering and byte format.
From: The Unicode Consortium.
My Answer:
Rather than strpos use mb_stripos, from the PHP Multibyte string functions to find and replace characters. This should help your script detect and translate the non-latin characters.
If the uploaded file name ($_FILES['var']['name']) is already incorrect in the PHP script (from output such as print_r($_FILES)) then you need to ensure you are correctly encoding the HTML form with accept-charset='UTF-8' (or SJIS, etc.). I would hope you're already well ahead of me on this.
Also it may be advisable to add a few preconditionals at the top of your code, again using the PHP mb_ functions add at the top of your PHP page:
mb_internal_encoding('UTF-8'); //or whatever character set works for you
mb_http_output('SJIS');
mb_http_input('UTF-8');
mb_regex_encoding('UTF-8');
Out of interest:
http://www.unicode.org/reports/tr37/
and
http://david.latapie.name/blog/shift-jis-utf-8/

Convert the Chinese Characters From ISO-8859-1 To UTF-8

I got a system which previously the html encoding type was set as ISO-8859-1 and it caused all the Chinese characters store in the format of "&\#36830;&\#34915;&\#35033;".
So my question is, how can I convert the format above into Chinese word back in UTF-8?
For your information, I had tried with utf8_decode, iconv, but none of them work. :(
Thank you very much.
The current text encoding of that string is rather insubstantial. What you have there are HTML entities; they have little to do with the underlying "physical" encoding like ISO-8859 or UTF-8. What you want is to decode those HTML entities into a byte representation of the characters in a specific encoding, in this case to UTF-8. Therefore:
echo html_entity_decode('连衣裙', ENT_COMPAT, 'UTF-8');
// 连衣裙
You need to use:
utf8_encode($data);
and not decode,to convert your current ISO-8859-1 to UTF-8.
Some native PHP functions such as strtolower(), strtoupper() and ucfirst() do not always function correctly with UTF-8 strings. Possible solutions: convert to latin first or add the following line to your code:
setlocale(LC_CTYPE, 'C');
Make sure not to save your PHP files using a BOM (Byte-Order Marker) UTF-8 file marker (your browser might show these BOM characters between PHP pages on your site).
Just for your reference:
ISO-8859-1 => Albanian, Brazilian, Catalan, Danish, Dutch, English, Finnish, French, German, Portuguese, Norwegian, Spanish, Swedish
UTF-8 => Chinese (simplified), Chinese (traditional), Japanese, Persian
There are many tools that can convert character references to characters, and writing such a tool is rather straightforward, especially if you know the references are all decimal. So the answer really depends on the software environment.
For example, to do such a conversion for an individual HTML document, you could use the BabelPad editor: command Convert → Numeric Character References (NCR) → NCR to Unicode, and save the result as UTF-8.

Support for norwegian characters in FPDF

I'm using fpdf to create some pdf's from a HTML-form.
Everything is working fine apart from the norwegian characters ÆØÅ doesn't work. They simply don't show. Because I am making this for norwegians, those characters are very important to make it useful.
How can I add support for ÆØÅ?
Please try iconv:
Standard FPDF fonts use ISO-8859-1 or Windows-1252. You can try iconv to change character encoding.
Example:
$str = iconv('UTF-8', 'windows-1252', $str);
And if you can change your PDF generation code then please look at mpdf : UTF-8 multilingual
Hope this help!

PHP writeexcel and UTF-8 support

Has anyone of you ever used php_writeexcel (http://www.bettina-attack.de/jonny/view.php/projects/php_writeexcel/)?
I would like to know if there is an easy way to enable utf-8 support. php_writeexcel exports html to Microsoft Excel documents, yet it can't display certain characters:
http://pastebin.com/AgVpph7F
Perhaps I could solve this with some php functions?
Thanks for your help!
For fields with special characters (eg french) I use utf8_decode() to get the special characters to show up correctly.
Php_writeexcel is a port of the Perl module Spreadsheet::WriteExcel. However, the port is from a time when Unicode strings weren't supported in the underlying Excel file format.
Later (2.xx) versions of Spreadsheet::WriteExcel have native support for Unicode but they haven't been ported to PHP.
As such you won't be able to handle Unicode strings with php_writeexcel.
It isn't a perfect solution, but iconv will convert some of those characters.
http://www.php.net/manual/en/function.iconv.php
Depending how you want the unsupported characters to be handled:
iconv('UTF-8', 'ISO-8859-1//IGNORE','ėčščįęščūųüó');
output: üó
iconv('UTF-8', 'ISO-8859-1//TRANSLIT','ėčščįęščūųüó');
output: ??????????üó

Categories