Encoding problems using PHP Gettext - php

I am trying to start using Gettext for my php project.
However, I have some encoding problems. If I use UTF-8 encoding in the .mo files and use
"bind_textdomain_codeset('messages', 'UTF-8');"
I don't see the accents properly in the browser. In Firefox, in order to see them OK, I have to change the browser codification to UTF-8 (it is not the default encoding). As I can't expect my visitators to change their browser encoding, what should I do?
I also tried changing everything to ISO-8859-15 and, although accents work OK (even with the browser default encoding), the € sign doesn't work. And I have also read there are problemas when using languages like russian, so it doesn't seem to be the right way.
How should I proceed?
Thank you :)

You should instruct the browser that the page you are sending is encoded in UTF-8. Do this using header before you actually output any content:
header('Content-Type: text/html; charset=utf-8');
Of course this assumes that the page is in UTF-8 in the first place.
In general, the one law that you can never disregard is that all content in your page must be in the same encoding (and that's the encoding you use when declaring the Content-Type).
If all sources for the content (e.g. your hardcoded stuff, what comes from gettext, what comes from a database) are in that encoding, everything is fine. If not then you have to manually convert all content from sources that diverge to the encoding of the page, which is possible through iconv or mb_convert_encoding.

Related

I want to correct PHP encoding problems on PHPLIST for SHIFT-JIS and UTF-8 foreign fonts

I have PHPLIST on my server which jammed with encoding Japanese fonts.
I installed foreign Language pack, but still cannot encode SHIFT-JIS and UTF-8.
How to correct PHP' s encoding in the files with encoding definition lines to correct encoding in each page PHP makes?
I think the problem is the script of the program which does not define encoding for each page, since the encoding correction of the version for the program.
2 possible source of problems: PHP script itself and database at the back (if any).
By sending the approiate header before any content has been sent to the client (simply speaking, headers must be sent at the beginning of your code as possible), the encoding could be "defined" by
<?php header('Content-Type: text/html; charset=utf-8');?>
EDIT!
Esailija gave a review and correction (see comments below) on my answer which is not correct for your question. As suggested by Esailija, you should check transmission encoding instead on the storage encoding itself.
My original answer is kept here as a "hall of shame".
Note that if you are using DBMS like MySQL, the encoding in the database should be set properly as well (utf8_general_ci recommended, backup your data completely before you applying any changes to the existing data and do it on an independent testing server first, as changing the encoding in your database could be a disaster).

Characters with accents keep appearing as "�"

I'm using a simple php script to scour an RSS feed, store the scoured data to a temporary cache flat file, then display it along the side of my website. However all the characters with accents appear as "�" What is causing this and how can I fix it?
You're having a problem with your character encoding. Depending on which encoding the feed uses, you have to use the same to display your data, or try to convert it to the encoding you're using on your website. PHP offers iconv() for that purpose, for example.
In case the encoding is UTF-8 (or any other multibyte encoding), you also have to make sure you use multibyte-safe functions/methods in your PHP scripts, in case you process the feed in your application.
To deliver your content in UTF-8, for example, you have to send the appropriate content header before any other output.
Example:
header('Content-Type: text/html; charset=utf-8');

Project conversion from ISO 8859-1 to UTF-8

I coded a php project under ISO 8859-1, and for some technical reasons I want to encode the project under UTF-8. what is a better way to do it? I am afraid of loosing special characters like french accents and so on. thanks for you advice.
You should try using the shell command iconv to encode the php files from latin1 (ISO-8859-1) to UTF-8.
After that you should be sure that PHP uses UTF-8 as the default encoding (default_encoding variable in php.ini if I recall correctly). If not, then you can set it with ini_set() for your project.
After that you should convert your database to UTF-8 or use a quickfix like this (for MySQL):
mysql_query("SET NAMES 'utf8'");
Of course you just substitute mysql_query() for whatever framework you use (if you use any).
Put it into your primary file which includes all the classes and stuff.
transcode all the files with iconv. change any and all http headers or meta tags. profit.
Here's my take on your question - you want the generated HTML (via PHP) to be UTF-8 compliant? Be aware that the HTML 4.x standard is based on iso-8859-1 and it's unclear if XHTML is based on utf-8 or iso-8859-1. Of course, pure XML is utf-8.
(1) So the first piece of the puzzle is to select your DOCTYPE for your rendered HTML.
(2) Make sure you add the the language character set meta tags (charset=utf8), etc.
(3) Take the rendered PHP/HTML string and send it through iconv either via the shell using a system call or through some PHP API method.
The resulting rendered HTML will be utf-8 encoded. The client browser needs to be set to render the HTML by means of utf-8 and not western latin1. Otherwise you get a strange non-breaking space character in the upper left hand corner of the page.

Why is this the extended ascii character (â, é, etc) getting replaced with <?> characters?

Why is this the extended ascii character (â, é, etc) getting replaced with <?> characters?
I attached a pic... but I am using PHP to pull the data from MySQL, and some of these locations have extended characters... I am using the Font Arial.
You can see the screen shot here: http://img269.imageshack.us/i/funnychar.png/
Still happening after the suggestions, here is what I did:
My firefox (view->encoding) is set to UTF-8 after adding the line, however, the text inside the option tags is still showing the funny character instead of the actual accented one. What should I look for now?
UPDATE:
I have the following in the PHP program that is giving my those <?> characters...
ini_set( 'default_charset', 'UTF-8' );
And right after my zend db object creation, I am setting the following query:
$db->query("SET NAMES utf8;");
I changed all my tables over to UTF-8 and reinserted all the data (waste of time) as it never helped. It was latin1 prior.
Also STATUS is reporting:
Connection: Localhost via UNIX socket
Server characterset: latin1
Db characterset: latin1
Client characterset: utf8
Conn. characterset: utf8
UNIX socket: /var/run/mysqld/mysqld.sock
Uptime: 4 days 20 hours 59 min 41 sec
Looking at the source of the page, I see
<option value="Br�l� Lake"> Br�l� Lake
OK- NEW UPDATE-
I Changed everything in my PHP and HTML to:
and
header('Content-Type: text/html; charset=latin1');
Now it works, what gives?? How do I convert it all to UTF-8?
That's what the browser does when it doesn't know the encoding to use for a character. Make sure you specify the encoding type of the text you send to the client either in headers or markup meta.
In HTML:
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
In PHP (before any other content is sent to the client):
header('Content-Type: text/html; charset=utf-8');
I'm assuming you'll want UTF-8 encoding. If your site uses another encoding for text, then you should replace UTF-8 with the encoding you're using.
One thing to note about using HTML to specify the encoding is that the browser will restart rendering a page once it sees the Content-Type meta tag, so you should include the <meta /> tag immediately after the <head /> tag in your page so the browser doesn't do any more extra processing than it needs.
Another common charset is "iso-8859-1" (Basic Latin), which you may want to use instead of UTF-8. You can find more detailed info from this awesome article on character encodings and the web. You can also get an exhaustive list of character encodings here if you need a specific type.
If nothing else works, another (rare) possibility is that you may not have a font installed on your computer with the characters needed to display the page. I've tried repeating your results on my own server and had no luck, possibly because I have a lot of fonts installed on my machine so the browser can always substitute unavailable characters from one font with another font.
What I did notice by investigating further is that if text is sent in an encoding different than the encoding the browser reports as, Unicode characters can render unexpectedly. To work around this, I used the HTML character entity representation of special characters, so â becomes â in my HTML and é becomes é. Once I did this, no matter what encoding I reported as, my characters rendered correctly.
Obviously you don't want to modify your database to HTML encode Unicode characters. Your best option if you must do this is to use a PHP function, htmlentities(). You should use this function on any data-driven text you expect to have Unicode characters in. This may be annoying to do, but if specifying the encoding doesn't help, this is a good last resort for forcing Unicode characters to work.
There is no such standard called "extended ASCII", just a bunch of proprietary extensions.
Anyway, there are a variety of possible causes, but it's not your font. You can start by checking the character set in MySQL, and then see what PHP is doing. As Dan said, you need to make sure PHP is specifying the character encoding it's actually using.
As others have mentioned, this is a character-encoding question. You should read Joel Spolsky's article about character encoding.
Setting
header('Content-Type: text/html; charset=utf-8');
will fix your problem if your php page is writing UTF-8 characters to the browser. If the text is still garbled, it's possible your text is not UTF-8; in that case you need to use the correct encoding name in the Content-Type header. If you have a choice, always use UTF-8 or some other Unicode encoding.
Simplest fix
ini_set( 'default_charset', 'UTF-8' );
this way you don't have to worry about manually sending the Content-Type header yourself.
EDIT
Make sure you are actually storing data as UTF-8 - sending non-UTF-8 data to the browser as UTF-8 is just as likely to cause problems as sending UTF-8 data as some other character set.
SELECT table_collation
FROM information_schema.`TABLES` T
WHERE table_name=[Table Name];
SELECT default_character_set_name
, default_collation_name
FROM information_schema.`SCHEMATA` S
WHERE schema_name=[Schema Name];
Check those values
There are two transmission encodings, PHP<->browser and Mysql<->PHP, and they need to be consistent with each other. Setting up the encoding for Mysql<->PHP is dealt with in the answers to the questions below:
Special characters in PHP / MySQL
How to make MySQL handle UTF-8 properly
php mysql character set: storing html of international content
The quick answer is "SET NAMES UTF8".
The slow answer is to read the articles recommended in the other answers - it's a lot better to understand what's going on and make one precise change than to apply trial and error until things seem to work. This isn't just a cosmetic UI issue, bad encoding configurations can mess up your data very badly. Think about the Simpsons episode where Lisa gets chewing gum in her hair, which Marge tries to get out by putting peanut butter on.
You should encode all special chars into HTML entities instead of depending on the charset.
htmlentities() will do the work for you.
I changed all my tables over to UTF-8 and reinserted all the data (waste of time) as it never helped. It was latin1 prior.
If your original data was latin1, then inserting it into a UTF-8 database won't convert it to UTF-8, AFAIK, it will insert the same data but now believe it's UTF-8, thus breaking.
If you've got a SQL dump, I'd suggest running it through a tool to convert to UTF-8. Notepad++ does this pretty well - simply open the file, check that the accented characters are displaying correctly, then find "convert to UTF-8" in the menu.
These special characters generally appear due to the the extensions. If we provide a meta tag with charset=utf-8 we can eliminate them by adding:
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
to your meta tags

Coding in UTF-8 problem

I am using notepad++ for php coding.
I don't have any problem with format set up using Encode in ANSI.
However when I use Encode in UTF-8, either I have a strange character at the top or not showing anything.
Q1. Am I supposed to use ANSI?
Q2. Why do I am not able to display anything when I use UTF-8
My sourse code for the header is following.
<html>
<head>
<title>Hello, PHPlot!</title>
</head>
Is that because I am not using UTF-8 in the header?
It's probably a Byte Order Mark. You can use the 'Encode in UTF-8 without BOM' mode in notepad++.
This question has some helpful information about using UTF-8 with PHP. You will also (as you suggested) need to set the content type in either the header or a meta tag in order for the browser to interpret it correctly.
It sounds like you are using UTF-8 with a BOM (which has issues) and your server is failing to specify the encoding correctly.
IIRC, BOM is unavoidable in Notepad, so I would suggest using a better editor. I'm fond of Komodo Edit myself.
(Also note, that a Doctype is required in HTML documents)
As Tom Haigh says, it's probably the BOM. It's not necessary for UTF-8 encoding, so you can safely leave them out.
However I should point out that PHP has very weak support for UTF-8 - be prepared for a bumpy ride. Take a look at this page for some details on problems you might encounter.

Categories