I faced a problem in displaying UTF-8 content (Tamil text) .
<?php
// SAMPLE CODE
header('Content-Type: text/plain; charset=UTF-8');
echo 'Hello Loréane !';
After i googled , changed the encoding of file in editor to 'UTF-8' from 'ANSI' , Now the problem solved i got the correct content in browser
And my question is
Why it works after I changed the encoding type in file , even though i sent UTF-8 headers before that doesn't work ?
header('Content-Type: text/plain; charset=UTF-8');
This just informs the browsers what kind of content you're going to send it and how it should treat it. It does not set the encoding of the actual content you're sending. It's completely up to you to fulfil your own promise. Your content is not going to magically transform from whatever to UTF-8 just because you set that header. If you tell the browser to treat the content as UTF-8, but you're sending it Latin-1 encoded data, of course it will break.
Related
I need to get content of the remote file in utf-8 encoding. The file in in utf-8. When I display that file on screen, it has proper encoding:
http://www.parfumeriafox.sk/source_file.html
(notice the ň and č characters, for example, these are alright).
When I run this code:
<?php
$url = 'http://parfumeriafox.sk/source_file.html';
$csv = file_get_contents_utf8($url);
header('Content-type: text/html; charset=utf-8');
print $csv;
function file_get_contents_utf8($fn) {
$content = file_get_contents($fn);
return mb_convert_encoding($content, 'utf-8');
}
(you can run it using http://www.parfumeriafox.sk/encoding.php), then I get question marks instead of those special characters. I have done huge research on this, I have tried standard file_read_contents function, I have even used some stream bla bla php context function, I also tried fopen and fread function to read that file on binary level, nothing seems to work. I have tried that with and without sending header. This is supposed to be perfectly siple, what am I doing wrong? When I check that string with some encoding detect function, it returns UTF-8.
You can see which character set your browser decided the document was by opening the developer console and looking at document.characterSet:
> document.characterSet
"windows-1250"
With this knowledge we can ask iconv to convert from "windows-1250" to utf-8 for us:
<?php
$text = file_get_contents("source_file.csv");
$text = iconv("windows-1250", "utf-8", $text);
print($text);
The output is valid utf-8, and levanduľa is displayed correctly as well.
How about this one????
For this one I used header('Content-Type: text/plain;; charset=Windows-1250');
bergamot, citrón, tráva, rebarbora, bazalka;levanduľa, škorica, hruška;céderové drevo, vanilka, pižmo, amberlyn
This code works for me
<?php
header('Content-Type: text/plain;charset=Windows-1250');
echo file_get_contents('http://www.parfumeriafox.sk/source_file.html');
?>
The problem is not with file_get_contents()
I save the $data to a file and the characters were correct but still not encoded correctly by my text editor. See image below.
$data = file_get_contents('http://www.parfumeriafox.sk/source_file.html');
file_put_contents('doc.txt',$data);
UPDATE
Seems to be one problematic character as shown here.
It also is seen on the HTML image below. Renders as ¾
Its Hex value is xBE (190 decimal)
I tried these two character sets. Neither worked.
header('Content-Type: text/plain; charset=ISO 8859-1');
header('Content-Type: text/plain; charset=ISO 8859-2');
END OF UPDATE
It works by adding a header WITHOUT charset=utf-8.
These two headers work
header('Content-Type: text/plain');
header('Content-Type: text/html');
These two headers do NOT work
header('Content-Type: text/plain; charset=utf-8');
header('Content-Type: text/html; charset=utf-8');
This code is tested and displayed all characters.
<?php
header('Content-Type: text/plain');
echo file_get_contents('http://www.parfumeriafox.sk/source_file.html');
?>
<?php
header('Content-Type: text/html');
echo file_get_contents('http://www.parfumeriafox.sk/source_file.html');
?>
These are some of the problematic characters with their Hex values.
This is the saved file viewed in Notepad++ with UTF-8 Encoding.
Check the Hex values against these character sets.
From the above table I saw the character set was Latin2.
I went to Wikipedia Windows code page and found that Latin2 is Windows-1250
bergamot, citrón, tráva, rebarbora, bazalka;levanduľa, škorica, hruška;céderové drevo, vanilka, pižmo, amberlyn
I am creating a site with html and php.
When I Run my php page on borwser using localhost(XAMPP server), then some symbols () are displayed but when I check my html-php code, then no symbol or script like: ¿ or » is found.
If i am wrong somewhere then Please let me know.
That's a UTF-8 byte-order marker. You should configure your editor to save UTF-8 without BOM. It isn't mandatory for the UTF-8 encoding; in fact, its use is discouraged and it only causes problems.
Additionally, make sure your web server is sending an appropriate Content-Type HTTP header:
Content-Type: text/plain; charset=utf-8
¿ or » are html entities, they are looks different at php code and at browser. You can find them, for example, here. Also, you possibly have an issue with BOM
My best guess: You have an issue with encoding (UTF vs. ISO). Look up encoding used by your editor on saving, and send it to the browser like i.e. header("Content-type:text/html;charset=UTF-8")
sounds like you're dealing with a character encoding problem.
try to declare the encoding in your headers.
header("Content-Type: text/html; charset=UTF-8")
this needs to be output before any text is sent to the client.
I set up both PHP 5 and Apache to use UTF-8 encoding.
I tried to show in my browser the result of this PHP code:
echo "Trying to visualize the letter ü"
and it shows me this result:
Trying to visualize the letter �
Why?
Try this :
<?php
header('Content-Type: text/plain; charset=utf-8');
echo "Trying to visualize the letter ü";
If this doesn't work, than your file is in different encoding that utf-8.
What's different between UTF-8 and UTF-8 without BOM?
Change File Encoding to utf-8 via vim in a script
Make sure you set your documents "Content-Type" http header, and set the charset to the encoding that you're using:
header("Content-Type: text/html; charset=utf-8");
German characters are not displaying when using print_r( $data->sheets[0]['cells'] ); and I used UTF-8 but it does not work. eg.: "Stra�e, Wedemarkstra�e".
Depending on where you get the $data from, you propably have to set another charset. Either in your database, or save the .php script itself in utf-8. This also has to line up with the charset in your browser.
If that still doesnt work, check out mb_convert_encoding here.
One more thing you can do is to set the HTTP content type including a charset.
header('Content-Type: text/html; charset=utf-8')
I've got a program on which I have non-ASCII characters which do not show properly on ISO-8859-1. Is there a way to use PHP and change the browser encoding somehow, and also allow the characters to display properly in the browser even though the encoding is ISO-8859-1?
Much Appreciated.
Use the header function to send an (explicit) HTTP Content-Type response header.
header('Content-Type: text/html; charset=ISO-8859-1');
… replacing ISO-8859-1 with whatever encoding you are actually using. Hopefully that will be UTF-8.
you should use the header function
header( 'Content-Type: text/html; charset=ISO-8859-1');
Note: you should make sure no content have been sent to the browser or you can't modify the headers anymore, so I advise you to use this code as soon as possible in your script
The browser itself doesn't have an encoding. It supports many encodings and uses the one you tell it too. If you specify (in headers and/or HTML) that the encoding is ISO-8859-1, then your document should be in that encoding and you should make sure that all characters you send are in the right encoding. So you should actually send ISO-8859-1 characters. You cannot send a document that uses different encodings for different sections of the document.
For some characters, you may post an HTML entity instead. For instance é can be sent as é. This will work, regardless of encoding.
If you have the choice, I'd opt to use UTF-8. It supports any character and you don't have to worry about escaping diacritics or other special characters, except those that are special to HTML/XML itself.
Like others have said, using the header function:
header('Content-type: text/html; charset=ISO-8859-1');
or, if you want to serve valid XHTML files instead of the standard HTML:
header('Content-type: application/xml+xhtml; charset=ISO-8859-1');
It is possible to call the header later on in the script, unlike what RageZ said, but you will need to have enabled output buffering for that, using ob_start().