I work on a website that has different language interfaces, so far I use english and german.
when the german text is loaded, it shows weird characters like the following screenshot
though I use
header('Content-type: text/html; charset=utf-8');
and also in the html header
<META http-equiv="content-type" content="text/html; charset=utf-8">
what else can I do to solve it ?
Thanks
The content of the page needs to also be in UTF-8. Your content was probably made using MS Word, which uses Windows 1251 encoding. You need to re-save your document as UTF-8.
UTF-8 does not convert formats for you.
If those strings are saved in a file, the file has to be encoded in UTF-8 too.
If you're getting them from a database, they'll have to be stored as UTF-8 and you'll have to set the connection charset to utf-8.
You could also check whether your text is UTF-8 and if not, convert it with utf8_encode.
Related
I have these Chinese characters:
汉字/漢字''test
If I do
echo utf8_encode($chinesevar);
it displays
??/??''test
Or even if I just do a simple
echo $chinesevar
it still displays some weird characters...
So how am I going to display these Chinese characters without using the <meta> tag with the UTF-8 thingy .. or the ini_set UTF-8 thing or even the header() thing with UTF-8?
Simple:
save your source code in UTF-8
output an HTTP header to specify to your browser that it should interpret the page using UTF-8:
header('Content-Type: text/html; charset=utf-8');
Done.
utf8_encode is for converting Latin-1 encoded strings to UTF-8. You don't need it.
For more details, see Handling Unicode Front To Back In A Web App.
Look that your file is in UTF8 without BOM and that your webserver deliver your site in UTF-8
HTML:
<meta http-equiv="content-type" content="text/html; charset=UTF-8" />
in PHP:
header('Content-Type: text/html; charset=utf-8');
And if you work with a database look that your database is in UTF-8 if you read the text from your database.
$chinesevarOK = mb_convert_encoding($chinesevar, 'HTML-ENTITIES', 'UTF-8');
Perhaps take a look at the following solutions:
Your database, table and field COLLATE should be utf8_unicode_ci
Check if your records are showing the correct characters within the database...
Set your html to utf8
Add the following line to your php after connecting to the database
mysqli_set_charset($con,"utf8");
http://www.w3schools.com/php/func_mysqli_set_charset.asp
save your source code in UTF-8 No BOM
I have these Chinese characters:
汉字/漢字''test
If I do
echo utf8_encode($chinesevar);
it displays
??/??''test
Or even if I just do a simple
echo $chinesevar
it still displays some weird characters...
So how am I going to display these Chinese characters without using the <meta> tag with the UTF-8 thingy .. or the ini_set UTF-8 thing or even the header() thing with UTF-8?
Simple:
save your source code in UTF-8
output an HTTP header to specify to your browser that it should interpret the page using UTF-8:
header('Content-Type: text/html; charset=utf-8');
Done.
utf8_encode is for converting Latin-1 encoded strings to UTF-8. You don't need it.
For more details, see Handling Unicode Front To Back In A Web App.
Look that your file is in UTF8 without BOM and that your webserver deliver your site in UTF-8
HTML:
<meta http-equiv="content-type" content="text/html; charset=UTF-8" />
in PHP:
header('Content-Type: text/html; charset=utf-8');
And if you work with a database look that your database is in UTF-8 if you read the text from your database.
$chinesevarOK = mb_convert_encoding($chinesevar, 'HTML-ENTITIES', 'UTF-8');
Perhaps take a look at the following solutions:
Your database, table and field COLLATE should be utf8_unicode_ci
Check if your records are showing the correct characters within the database...
Set your html to utf8
Add the following line to your php after connecting to the database
mysqli_set_charset($con,"utf8");
http://www.w3schools.com/php/func_mysqli_set_charset.asp
save your source code in UTF-8 No BOM
Quick Background: I inherited a large sql dump file containing a combination of english and arabic text and (I think) it was originally exported using 'latin1'. I changed all occurrences of 'latin1' to 'utf8' prior to importing the file. The the arabic text didn't appear correctly in phpmyadmin (which I guess is normal), but when I loaded the text to a web page with the following...
<meta http-equiv='Content-Type' content='text/html; charset=windows-1256'/>
...everything looked good and the arabic text displayed perfectly.
Problem: My client is really really really picky and doesn't want to change his...
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8"/>
...to the 'Windows-1256' equivalent. I didn't think this would be a problem, but when I changed the charset value to 'UTF-8', all of the arabic characters appeared as diamonds with question marks. Shouldn't UTF-8 display arabic text correctly?
Here are a few notes about my database configuration:
Database charset is 'utf8'
Database connection collation is 'utf8_general_ci'
All databases, tables, and applicable fields have been collated as 'utf8_general_ci'
I've been scouring stack overflow and other forums for anything the relates to my issue. I've found similar problems, but not of the solutions seem to work for my specific situation. Hope someone can help!
If the document looks right when declared as windows-1256 encoded, then it most probably is windows-1256 encoded. So it was apparently not exported using latin1—which would have been impossible, since latin1 has no Arabic letters.
If this is just about a single file, then the simplest way is to convert it from windows-1256 encoding to utf-8 encoding, using e.g. Notepad++. (Open the file in it, change the encoding, via File format menu, to Arabic, windows-1256. Then select Convert to UTF-8 in the File format menu and do File → Save.)
Windows-1256 and UTF-8 are completely different encodings, so data gets all messed up if you declare windows-1256 data as UTF-8 or vice versa. Only ASCII characters, such as English letters, have the same representation in both encodings.
We can't find the error in your code if you don't show us your code, so we're very limited in how we can help you.
You told the browser to interpret the document as being UTF-8 rather than Windows-1256, but did you actually change the encoding used from Windows-1256 to UTF-8?
For example,
$ cat a.pl
use strict;
use warnings;
use feature qw( say );
use charnames ':full';
my $enc = $ARGV[0] or die;
binmode STDOUT, ":encoding($enc)";
print <<"__EOI__";
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=$enc">
<title>Foo!</title>
</head>
<body dir="rtl">
\N{ARABIC LETTER ALEF}\N{ARABIC LETTER LAM}\N{ARABIC LETTER AIN}\N{ARABIC LETTER REH}\N{ARABIC LETTER BEH}\N{ARABIC LETTER YEH}\N{ARABIC LETTER TEH MARBUTA}
</body>
</html>
__EOI__
$ perl a.pl UTF-8 > utf8.html
$ perl a.pl Windows-1256 > cp1256.html
I think you need to go back to square one. It sounds like you have a database dump in Win-1256 encoding and you want to work with it in UTF-8 from now on. It also sounds like you are using PHP but you have lots of irrelevant tags on your question and are missing the most important one, PHP.
First, you need to convert the text dump into UTF-8 and you should be able to do that with PHP. Chances are that your conversion script will have two steps, first read the Win-1256 bytes and decode them into internal Unicode text strings, then encode the Unicode text strings into UTF-8 bytes for output to a new text file.
Once you have done that, redo the database import as you did before, but now you have correctly encoded the input data as UTF-8.
After that it should be as simple as reading the database and rendering a web page with the correct UTF-8 encoding.
P.S. It is actually possible to reencode the data every time you display it, but that does not solve the problem of having a database full of incorrectly encoded data.
inorder to display arabic characters correctly , you need to convert your php file to utf-8 without Bom
this happened with me, arabic characters was displayed diamonds, but conversion to utf-8 without bom will solve this problem
I seems that the db is configured as UTF8, but the data itself is extended ascii. If the data is converted to UTF8, it will display correctly in content type set to UTF8
I never had this problem before, it was usually my database or the html page. But now i think its my php. I import text from a csv or from a text area and in both ways it goes wrong.
for example é changes to é. I used htmlentities to fix this but it didn't work. The htmlentities function didn't return é in html but é in html entities, so it already loses the real characters before htmlentities comes in to place... So does that mean my php file has the wrong encoding or something?
I hope someone can help me out..
Thanks!
Chris
A file is usually ISO-8859-1 (Latin) or UTF-8 ... ISO-8859-1 is 1 byte per char, UTF-8 is 1-4 bytes per char. So if you get 2 chars when you expect one, then you are reading UTF-8 and showing it as ISO-8859-1 ... if you get strange chars, then you are reading ISO-8859-1 and showing it as UTF-8.
If you provide more details, it would be easier to pinpoint, but in short, you have inconsistent charsets and need to convert one or the other so they're all the same. But from what it seems, you're using ISO-8859-1 in your project, but you are reading some UTF-8 from somewhere... use utf8_decode($text) if that data should be indeed be stored as UTF-8, or find the data and convert it manually.
EDIT: If you are using AJAX somewhere, then you will ALWAYS get UTF-8 from it, and you'll have to decode it yourself with utf8_decode() if you want to keep using ISO-8859-1.
Try opening your php file and change the encoding to UTF-8
if that doesn't help, add this to your php:
header('Content-Type: text/html; charset=utf-8');
Or this to your html:
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
Take a look at PHP's iconv().
I'm using Kohana 3, which has full support for Unicode.
I have this as the first child of my <head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
The Unicode character I am inserting into is é as in Café.
However, I am getting the triangle with a ? (as in could not decode character).
As far as I can tell in my own code, I am not doing any string manipulation on the text.
In fact, I have placed the accent straight into a view's PHP file and it is still not working.
I copied the character from this page: http://www.fileformat.info/info/unicode/char/00e9/index.htm
I've only just started examining PHP's Unicode limitations, so I could be doing something horribly wrong.
So, how do I display this character? Do I need to resort to the HTML entity?
Update
So this works
Caf<?php echo html_entity_decode('é', ENT_NOQUOTES, 'UTF-8'); ?>
Why does that work? If I copy the output accented e from that script and insert it into my document, it doesn't work.
View the http headers. You should see something like
Content-Type: text/html; charset=UTF-8
Browsers don't pay much attention to meta tags, if there was a real http header stating a different encoding.
update
Whatcha get from this?
echo bin2hex('é');
echo chr(0xc3) . chr(0xa9);
You should get c3a9é, otherwise I'd say file encoding issue.
I guess, you see �, the replacement character for invalid UTF-8 byte sequences. Your text is not UTF-8 encoded. Check your editor’s settings to control the encoding of the PHP file.
If you’re not sure about the encoding of your sources, you can enforce UTF-8 compatibilty as described here (German text): Force UTF-8.
You should never need entities except the basic ones.