Unicode Character Display Problem - php

I am trying to show नेपाल at my page, but is shows नेपाल. What is causing the unicode to render like this.

It's caused by something (likely the web browser) interpreting the characters as something else than Unicode. Browsers are quite bad at guessing the proper encoding, so it must be explicitly defined. Perhaps you should have something like this in the head section:
<meta http-equiv="content-type" content="text/html;charset=utf-8" />
It's also possible that the font being used does not cover those characters.

Write at the top of the script <?php header('Content-Type: text/html; charset=UTF-8');?>

If the data comes from a database then this
$mysqli->query('set character set utf8');
should help. Put it inside your db connection :-)

Reading from unicode.org:
If you are unable to read some Unicode
characters in your browser, it may be
because your system is not properly
configured. Here are some basic
instructions for doing that. There are
two basic steps:
Install fonts that cover the
characters you need
Configure your
browser to use them.

Related

How I can solve my PHP web page file language encoding?

I have a problem in Language Encoding in PHP as my php file should display both English and Arabic Characters.
Some web page parts are static and others are dynamic (data comes from a Sybase database) and the language encoding of database is ok as data is displayed well in it.
My web page has some drop down lists that are dynamic but they display the data in a strange format which is not English or Arabic like squares and unknown symbols.
I checked the possible causes and did many solutions like:-
Changing the encoding of the PHP script:
Saving File with the Name : WebPage1 of Type : PHP and Encoding : ANSI or UTF-8 or Unicode.
Changing the HTML encoding declaration:
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
<meta http-equiv="Content-Type" content="text/html; charset=windows-1256" />
Changing the PHP encoding declaration:
header('Content-Type: text/html; charset=UTF-8);
header('Content-Type: text/html; charset=windows-1256');
Changing the database tables font and language:
Arial(Arabic).
The problem still exists and I do not know what I can do to solve that.
Can you suggest any solution?
Check your database connection, make sure the sybase_connect connects with UTF-8 as charset.
See http://php.net/manual/en/function.sybase-connect.php
From the comment that you are using ODBC to connect: There seems to be an issue with PHP/ODBC and UTF8. Some suggestions are mentioned in this thread: Php/ODBC encoding problem
Always use UTF-8.
Your first header is correct. Your first header is correct, except you should use single = instead of ==. Make sure you used header() function before sending any output to browser.
Open your files in a Unicode supporting editor like Editplus, notepad++ and while saving every source code or HTML file, use Save as and choose UTF-8 on the save as screen. If you use eclipse, import your project to eclipse, right click it and go to project settings, apply charset setting as utf-8 to all source code.
If there's something wrong with data coming from MySQL database, then use appropriate collation on any text storing column (varchar, blob etc). Those are the usual suggestions for it. If you use Sybase, then use Google for collation settings.
And don't change your font to Arabic; Arial already supports it.
You seem to confuse something. Neither UTF-8 nor windows-1256 describe languages, they denote character sets/encodings. Although the character sets may contain characters that are typically used in certain languages, their use doesn’t say anything about the language.
Now as the characters of Windows-1256 are contained in Unicode’s character set and thus can be encoded with UTF-8, you should choose UTF-8 for both languages.
And if you want to declare the language for your contents, read the W3C’s tutorial on Declaring Language in XHTML and HTML.
In your case you could declare your primary document language as en (English) and parts of your document as ar (Arabic):
header('Content-Language: en');
header('Content-Type: text/html;charset=UTF-8');
echo '<p>The following is in Arabic: <span lang="ar">العربية</span></p>';
Make sure to use UTF-8 for both.

weird characters

Si i'm parsing a web page with a parser that i created..and when i parse the page and echo the content out I get characters like these †why is doing it that,it supposed to be ... or any other character like -- instead.
The weird characters are caused by encoding problems, your best bet is to encode them to UTF-8 (make sure your page is also in UTF-8) before you echo them.
You can use the function utf8_encode for that.
Here is a very complete answer on how
to successfully do that:
Detect encoding and make everything UTF-8
Usually those type of characters come from bad character encoding. From the top of my head, your best solution is to check the web page that you created for the meta tag supplying character encoding on the webpage. Something like this:
<meta content='text/html; charset=UTF-8' http-equiv='Content-Type'/>
And making sure you supply the same character encoding on your end.
I go this solved with iconv("UTF-8","ISO-8859-1",$string) it does the job, 10x guys

Browser displays � instead of ´

I have a PHP file which has the following text:
<div class="small_italic">This is what you´ll use</div>
On one server, it appears as:
This is what you´ll use
And on another, as:
This is what you�ll use
Why would there be a difference and what can I do to make it appear properly (as an apostrophe)?
Note to all (for future reference)
I implemented Gordon's / Gumbo's suggestion, except I implemented it on a server level rather than the application level. Note that (a) I had to restart the Apache server and more importantly, (b) I had to replace the existing "bad data" with the corrected data in the right encoding.
/etc/php.ini
default_charset = "iso-8859-1"
You have to make sure the content is served with the proper character set:
Either send the content with a header that includes
<?php header("Content-Type: text/html; charset=[your charset]"); ?>
or - if the HTTP charset headers don't exist - insert a <META> element into the <head>:
<meta http-equiv="Content-Type" content="text/html; charset=[your charset]" />
Like the attribute name suggests, http-equiv is the equivalent of an HTTP response header and user agents should use them in case the corresponding HTTP headers are not set.
Like Hannes already suggested in the comments to the question, you can look at the headers returned by your webserver to see which encoding it serves. There is likely a discrepancy between the two servers. So change the [your charset] part above to that of the "working" server.
For a more elaborate explanation about the why, see Gumbo's answer.
The display of the REPLACEMENT CHARACTER � (U+FFFD) most likely means that you’re specifying your output to be Unicode but your data isn’t.
In this case, if the ACUTE ACCENT ´ is for example encoded using ISO 8859-1, it’s encoded with the byte sequence 0xB4 as that’s the code point of that character in ISO 8859-1. But that byte sequence is illegal in a Unicode encoding like UTF-8. In that case the replacement character U+FFFD is shown.
So to fix this, make sure that you’re specifying the character encoding properly according to your actual one (or vice versa).
To sum it maybe up a little bit:
Make sure the FILE saved on the web server has the right encoding
Make sure the web server also delivers it with the right encoding
Make sure the HTML meta tags is set to the right encoding
Make sure to use "standard" special chars, i.e. use the ' instead of ´of you want to write something like "Luke Skywalker's code"
For encoding, UTF-8 might be good for you.
If this answer helps, please mark as correct or vote for it. THX
The simple solution is to use ASCII code for special characters.
The value of the apostrophe character in ASCII is ’. Try putting this value in your HTML, and it should work properly for you.
Set your browser's character set to a defined value:
For example,
<meta http-equiv="content-type" content="text/html; charset=utf-8" />
This is probably being caused by the data you're inserting into the page with PHP being in a different character encoding from the page itself (the most common iteration is one being Latin 1 and the other UTF-8).
Check the encoding being used for the page, and for your database. Chances are there will be a mismatch.
Create an .htaccess file in the root directory:
AddDefaultCharset utf-8
AddCharset utf-8 *
<IfModule mod_charset.c>
CharsetSourceEnc utf-8
CharsetDefault utf-8
</IfModule>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />

Unicode and PHP - am I doing something wrong?

I'm using Kohana 3, which has full support for Unicode.
I have this as the first child of my <head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
The Unicode character I am inserting into is é as in Café.
However, I am getting the triangle with a ? (as in could not decode character).
As far as I can tell in my own code, I am not doing any string manipulation on the text.
In fact, I have placed the accent straight into a view's PHP file and it is still not working.
I copied the character from this page: http://www.fileformat.info/info/unicode/char/00e9/index.htm
I've only just started examining PHP's Unicode limitations, so I could be doing something horribly wrong.
So, how do I display this character? Do I need to resort to the HTML entity?
Update
So this works
Caf<?php echo html_entity_decode('é', ENT_NOQUOTES, 'UTF-8'); ?>
Why does that work? If I copy the output accented e from that script and insert it into my document, it doesn't work.
View the http headers. You should see something like
Content-Type: text/html; charset=UTF-8
Browsers don't pay much attention to meta tags, if there was a real http header stating a different encoding.
update
Whatcha get from this?
echo bin2hex('é');
echo chr(0xc3) . chr(0xa9);
You should get c3a9é, otherwise I'd say file encoding issue.
I guess, you see �, the replacement character for invalid UTF-8 byte sequences. Your text is not UTF-8 encoded. Check your editor’s settings to control the encoding of the PHP file.
If you’re not sure about the encoding of your sources, you can enforce UTF-8 compatibilty as described here (German text): Force UTF-8.
You should never need entities except the basic ones.

PHP Japanese Strings getting set to?

I have a PHP file with one simple echo function:
echo 'アクセスは撥ねりません。';
but when I access that page i get this:
????????????
Can someone help me?
I also have my page encoding set to UTF-8, and I know it, because all of the browsers i used said so.
I also do this before the echo function:
mb_internal_encoding('UTF-8');
What does this do?
Does it help me?
All I need is to be able to echo a static Japanese string.
Thanks!
There are a few places where this could go wrong.
Firstly, if you aren't setting the output encoding in php with header()
header('Content-type: text/html; charset=utf-8');
or in your html with a meta tag:
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
you will need to check the php.ini setting default_charset. Chances are this is defaulted to iso-8859-1
Secondly, you may also need to check the content encoding you are saving the php script as. If you are saving it as ASCII or some other latin charset, it will munge the characters.
I got it.
I just had to set the mbstring extension settings to handle internal strings in UTF-8. Thas extension is standard with my build of PHP 5.3.0.
Maybe you are printing Japanese characters contained in UTF-16 (extended set of chars)?
I just did a quick test and your example works for me, so it's most likely one of these:
Your file is not saved in UTF-8, but some other encoding, such as Shift-JIS. A decent editor should be able to let you see what encoding it used
Your server is sending bad http headers. Can you use some tool to check the headers and paste the results? Or the results you got from the browser?
The browser is using an incompatible font
I saved a file in UTF-8, pasted your code into it, and my server is serving the file with Content-Type: text/html; charset=utf-8 and it shows up just fine. Did not need to use the mb_ function or anything else.

Categories