How can I know what encoding will be used by PHP when sending data to the browser? I.e. with the Cotent-Type header, for instance: iso-8859-1.
Usually Apache + PHP servers of webhosters are configured to send out NO charset header.
The shortest way to test how your server is configured are these:
Use this tool to see the server header by getting any one of your pages on your webiste.
If in the server headers you see a charset it means your server is using it, usually it won't contain a charset.
Another way is to run this simple script on your server: <?php echo ini_get('default_charset'); ?> As said above this usually prints out an empty string, if different it will show you the charset of the PHP.
The 2nd solution is supposing Apache is not configured with AddDefaultCharset some_charset which is not usually the case, but in such case I'm afraid Apache setting might override PHP deafult_charset ini directive.
You can use the header() solution that William suggested, however if you are running Apache, and the Apache config is using a default charset, that will win everytime (Internet Explorer will go crazy) See: AddDefaultCharset
Keep in mind that content-types and encodings are two different things. text/html is a content-type; ISO-8859-1 and UTF-8 are encodings.
The HTTP response header that the server sends typically looks like this:
Content-Type: text/html; charset=utf-8
"charset" is actually the character encoding. It's not in a separate header; however there is a header called "Content-Encoding" which actually specifies what kind of compression the response uses (e.g. gzip).
If you want to change the character encoding to UTF-8, in a file that contains HTML:
<?
header("Content-Type: text/html; charset=utf-8");
You can set your own with header('Content-type: xxx/yyy');, but I believe that text/html is sent by default.
AFAIK, PHP sends strings bytewise. that is, if your variables hold UTF-8, it will send UTF-8. if you have iso-8859-1, it will send that too. if you mix them, it won't be pretty.
If your server is not configured to have a default content or charset, and neither is PHP, PHP will send only Content-Type: text/html - it won't specify a charset at all, and will send the bytes as it sees them in the script.
If a browser receives a page without charset specified, various things can happen:
most browsers have an "Encoding/Charset" menu; if the user explicitly selects one, the browser will try to apply it. Doesn't happen too often, so:
some browsers try to render it with a default charset (which is locale-dependent, e.g. for FF and cs_CZ it used to be iso-8859-2; YMMV)
IE will try to determine the charset heuristically (it will take a guess, based on character distribution - and many times it gets it right; sometimes it gets it wrong and you get a page in Romanian interpreted as Chinese text, which usually means "unreadable")
some old browsers will fall back on us-ascii
If with this procedure, the PHP script's charset and the browser's charset matches, the text will - accidentally - be readable. If not, there will be weird signs and similar phenomena.
Related
I am creating a site with html and php.
When I Run my php page on borwser using localhost(XAMPP server), then some symbols () are displayed but when I check my html-php code, then no symbol or script like: ¿ or » is found.
If i am wrong somewhere then Please let me know.
That's a UTF-8 byte-order marker. You should configure your editor to save UTF-8 without BOM. It isn't mandatory for the UTF-8 encoding; in fact, its use is discouraged and it only causes problems.
Additionally, make sure your web server is sending an appropriate Content-Type HTTP header:
Content-Type: text/plain; charset=utf-8
¿ or » are html entities, they are looks different at php code and at browser. You can find them, for example, here. Also, you possibly have an issue with BOM
My best guess: You have an issue with encoding (UTF vs. ISO). Look up encoding used by your editor on saving, and send it to the browser like i.e. header("Content-type:text/html;charset=UTF-8")
sounds like you're dealing with a character encoding problem.
try to declare the encoding in your headers.
header("Content-Type: text/html; charset=UTF-8")
this needs to be output before any text is sent to the client.
Currently I'm using a php web service to retrieve the information from MySQL.
I'm dealing with multiple languages including chinese/japanese/french characters, I'm having issues displaying chinese/japanese and a few other languages.
<?php
echo "你好";
?>
For example, when I'm trying to echo simple chinese characters, what shows is "ä½ å¥½" instead, I'm not sure how to proceed.
Please advise.
Thank you
You should probably set the character encoding.
This is traditionally done by setting the html content-type header. The default is usually:
Content-Type: text/html; charset=ISO-8859-1
You can change this via php by using the header() function.
header('Content-Type: text/html; charset=utf-8');
Some other resources for you:
http://www.w3.org/International/questions/qa-html-encoding-declarations
http://en.wikipedia.org/wiki/Character_encodings_in_HTML
1) Make sure the file is saved as UTF-8 (without BOM)
2) Tell the browser that it's UTF-8 (as hafichuk explained)
3) Make sure the browser is using a font that has Chinese/Japanese/etc characters (a ton of fonts do not have them -- if you've done 1 and 2, this could very well be the problem)
I've got a program on which I have non-ASCII characters which do not show properly on ISO-8859-1. Is there a way to use PHP and change the browser encoding somehow, and also allow the characters to display properly in the browser even though the encoding is ISO-8859-1?
Much Appreciated.
Use the header function to send an (explicit) HTTP Content-Type response header.
header('Content-Type: text/html; charset=ISO-8859-1');
… replacing ISO-8859-1 with whatever encoding you are actually using. Hopefully that will be UTF-8.
you should use the header function
header( 'Content-Type: text/html; charset=ISO-8859-1');
Note: you should make sure no content have been sent to the browser or you can't modify the headers anymore, so I advise you to use this code as soon as possible in your script
The browser itself doesn't have an encoding. It supports many encodings and uses the one you tell it too. If you specify (in headers and/or HTML) that the encoding is ISO-8859-1, then your document should be in that encoding and you should make sure that all characters you send are in the right encoding. So you should actually send ISO-8859-1 characters. You cannot send a document that uses different encodings for different sections of the document.
For some characters, you may post an HTML entity instead. For instance é can be sent as é. This will work, regardless of encoding.
If you have the choice, I'd opt to use UTF-8. It supports any character and you don't have to worry about escaping diacritics or other special characters, except those that are special to HTML/XML itself.
Like others have said, using the header function:
header('Content-type: text/html; charset=ISO-8859-1');
or, if you want to serve valid XHTML files instead of the standard HTML:
header('Content-type: application/xml+xhtml; charset=ISO-8859-1');
It is possible to call the header later on in the script, unlike what RageZ said, but you will need to have enabled output buffering for that, using ob_start().
I am not that good with encoding but I am even falling over with the basics here.
I am trying to create a file that is recognised as UTF-8
header("Content-Type: text/plain; charset=utf-8");
header("Content-disposition: attachment; filename=test.txt");
echo "test";
exit();
also tried
header("Content-Type: text/plain; charset=utf-8");
header("Content-disposition: attachment; filename=test.txt");
echo utf8_encode("test");
exit();
I then open the file with Notepad++ and it says its current encoding is ANSI not UTF-8, what am I missing how should I be outputting this file.
I will eventually be outputting an XML file of products for the Affiliate Window program.
Also if it helps My webserver is Centos, Apache2, PHP 5.2.8.
Thanks in advance for any help!
As Filip said, encoding is not an intrinsic attribute of a file; It's implicit. This means that unless you know what encoding a file is to be interpreted in, there is no way to determine it. The best you can do, is to make a guess. This is presumably what programs such as Notepad++ does. Since the actual data that you have sent, can be interpreted in many different encodings, it just picks the candidate that it likes best. For Notepad++ this appears to be ANSI (Which in itself is a rather inaccurate classification), while other programs might default to something else.
The reason why you have to specify the charset in a HTTP-header is exactly because the file itself doesn't contain this information, so the browser needs to be informed about it. Once you have saved the file to disk, this information is thus unavailable.
If the file you're going to serve is an XML-document, you have the option of putting the encoding information inside the actual document. That way it is preserved after the file is saved to disk. Eg. if you are using utf-8, you should put this at the top of your document:
<?xml version="1.0" encoding="utf-8" ?>
Note that apart from getting the meta-information about the charset across, you also need to make sure that the data you are serving is actually utf-8 encoded. This is much the same scenario: You need to know implicitly what encoding your data are in. The function utf8_encode is (despite the name) explicitly meant for converting iso-8859-1 into utf-8. Thus, if you use it on already utf-8 encoded data, you'll get it double-encoded, with the result of garbled data.
Charsets aren't that complicated in itself. The problem is that if you aren't careful about keeping things straight you'll mess it up. Whenever you have a string, you should be absolutely certain that you know which encoding it is in. Otherwise it's not a string - it's just a blob of binary data.
test is all ASCII. So there is no need to use UTF-8 for that.
But in fact, the first 128 characters of the Unicode charset are the same as ASCII’s charset. And UTF-8 uses the same code for that characters as ASCII does. See Wikipedia’s description of UTF-8 for furhter information.
Once you download the file it no longer carries the information about the encoding, so Notepad++ has to guess it from the contents. There's a thing called Byte-Order-Mark which allows specifying the UTF encodings by prefix in the contents.
See question "When a BOM is used, is it only in 16-bit Unicode text?".
I would imagine using something like echo "\xEF\xBB\xBF" before writing the actual contents will force Notepad++ to recognize the file correctly.
There is no such thing as headers for downloaded txt-files. As you try to create XML files in the end anyway, and you can specify the charset in the XML declaration, try creating a simple XML structure and save / open that, then it should work, as long as the OS has utf-8 support, which any modern Linux distribution should have.
I refer you to Joel's Absolute minimum every software developer should know about Unicode
I refer you to What Every Programmer Absolutely, Positively Needs To Know About Encodings And Character Sets To Work With Text
I have a PHP file with one simple echo function:
echo 'アクセスは撥ねりません。';
but when I access that page i get this:
????????????
Can someone help me?
I also have my page encoding set to UTF-8, and I know it, because all of the browsers i used said so.
I also do this before the echo function:
mb_internal_encoding('UTF-8');
What does this do?
Does it help me?
All I need is to be able to echo a static Japanese string.
Thanks!
There are a few places where this could go wrong.
Firstly, if you aren't setting the output encoding in php with header()
header('Content-type: text/html; charset=utf-8');
or in your html with a meta tag:
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
you will need to check the php.ini setting default_charset. Chances are this is defaulted to iso-8859-1
Secondly, you may also need to check the content encoding you are saving the php script as. If you are saving it as ASCII or some other latin charset, it will munge the characters.
I got it.
I just had to set the mbstring extension settings to handle internal strings in UTF-8. Thas extension is standard with my build of PHP 5.3.0.
Maybe you are printing Japanese characters contained in UTF-16 (extended set of chars)?
I just did a quick test and your example works for me, so it's most likely one of these:
Your file is not saved in UTF-8, but some other encoding, such as Shift-JIS. A decent editor should be able to let you see what encoding it used
Your server is sending bad http headers. Can you use some tool to check the headers and paste the results? Or the results you got from the browser?
The browser is using an incompatible font
I saved a file in UTF-8, pasted your code into it, and my server is serving the file with Content-Type: text/html; charset=utf-8 and it shows up just fine. Did not need to use the mb_ function or anything else.