This is really driving me crazy. I'm building a spanish website (meaning a lot of latin characters) and I'm using Zend framework.
When I save a á it displays like á . I know is a charset encoding but I dont understand why.
My head charset is <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
My database is utf8_general_ci . I've changed the charset to others and always the same problem.
When I look in my database, what is saved is an á
Any idea why is this happening? Thanks!
Here goes the recipe to get rid of encoding headaches:
Your DB, tables and fields must use collation utf8_unicode_ci.
Be sure your code (HTML, PHP... all) is UTF-8 encoded.
Always use this meta tag: <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
This is long:
If you have access to MySQL file configuration my.cnf just add this line:
init-connect='SET NAMES utf8'
This tells MySQL to return results in UTF-8 for each connection.
If you cannot edit my.cnf but you are using PDO, open the connection this way:
$pdo = new PDO($dsn, $usr, $pwd, array(PDO::MYSQL_ATTR_INIT_COMMAND => 'SET NAMES utf8'));
If you aren't using PDO... start using it, what are you waiting for? Meanwhile you can execute this after each connection if you use ext/MySQLi:
$mysql->set_charset('utf8');
Or this if you use plain old ext/MySQL:
mysql_set_charset('utf8', $connection);
The character á and other characters outside the ASCII map can be mapped with unicode.
As you are stroing the data in a table with UTF-8 charset and characters are storing properly.
I would suggest you that
Before you store data
Before you fetch data
Run the query SET NAMES UTF-8. This will fetch your UTF-8 data with taking care of the characters which are outside of ASCII map.
While outputting the data you make sure that you are forcing/directing browser to use the UTF-8 charset to print the data.
Related
I have a PHP file encoded with UTF-8 like this :
<meta http-equiv="content-type" content="text/html; charset=UTF-8" />
And before using sql request, I've added this line :
mysql_query("SET NAMES 'utf8'");
My database is coded in UTF-8 and varchar column with latin_swedish_ci
The result is like this picture :
Chang code to this
mysql_query("SET NAMES UTF8");
Save file like this
This is due to incorrect collation / charset of your column. Even if your database characted set is UTF-8, the column's character set takes precedence. To handle special characters like japanese chinese you column should be set to utf8_bin instead of latin_swedish_ci
Some Steps to Follow:
Meta tag for UTF8.
<meta http-equiv="content-type" content="text/html; charset=utf-8" />
Set your PHP to use UTF8.
mb_internal_encoding('UTF-8');
mb_http_output('UTF-8');
mb_http_input('UTF-8');
For MySql, you've to convert your table to UTF8.
ALTER TABLE table_name CONVERT TO CHARACTER SET utf8 COLLATE utf8_general_ci
Also run:
SET NAMES UTF8
as the first query after establishing a connection which will convert your DB connection to UTF8.
html encoding in meta tag gets ignored if it is sent via http headers already.
header('Content-Type: text/html; charset=utf-8');
Check from you browser what it thinks the current page encoding is:
Chrome: Tools - Encoding
Firefox: View - Character encoding
MySQL should be ok with just "set names utf8", table dada is encoded automatically to the mysql connections encoding
I have some texts in French (containing accented characters such as "é"), stored in a MySQL table whose collation is utf8_unicode_ci (both the table and the columns), that I want to output on an HTML5 page.
The HTML page charset is UTF-8 (< meta charset="utf-8" />) and the PHP files themselves are encoded as "UTF-8 without BOM" (I use Notepad++ on Windows). I use PHP5 to request the database and generate the HTML.
However, on the output page, the special characters (such as "é") appear garbled and are replaced by "�".
When I browse the database (via phpMyAdmin) those same accented characters display just fine.
What am I missing here?
(Note: changing the page encoding (through Firefox's "web developer" menu) to ISO-8859-1 solves the problem... except for the special characters that appears directly in the PHP files, which become now corrupted. But anyway, I'd rather understand why it doesn't work as UTF-8 than changing the encoding without understanding why it works. ^^;)
I experienced that same problem before, and what I did are the following
1) Use notepad++(can almost adapt on any encoding) or eclipse and be sure in to save or open it in UTF-8 without BOM.
2) set the encoding in PHP header, using header('Content-type: text/html; charset=UTF-8');
3) remove any extra spaces on the start and end of my PHP files.
4) set all my table and columns encoding to utf8mb4_general_ci or utf8mb4_unicode_ci via PhpMyAdmin or any mySQL client you have. A comparison of the two encodings are available here
5) set mysql connection charset to UTF-8 (I use PDO for my database connection )
PDO::MYSQL_ATTR_INIT_COMMAND => "SET NAMES utf8"
PDO::MYSQL_ATTR_INIT_COMMAND => "SET CHARACTER SET utf8"
or just execute the SQL queries before fetching any data
6) use a meta tag <meta http-equiv="Content-Type" content="text/html; charset=utf-8"/>
7) use a certain language code for French
<meta http-equiv="Content-language" content="fr" />
8) change the html element lang attribute to the desired language
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="fr" lang="fr">
and will be updating this more because I really had a hard time solving this problem before because I was dealing with Japanese characters in my past projects
9) Some fonts are not available in the client PC, you need to use Google fonts to include it on your CSS
10) Don't end your PHP source file with ?>
NOTE:
but if everything I said above doesn't work, try to adjust your encoding depending on the character-set you really want to display, for me I set everything to SHIFT-JIS to display all my japanese characters and it really works fine. But using UFT-8 must be your priority
This works for me
Make your database utf8_general_ci
Save your files in N++ as UTF-8 without BOM
Put $mysqli->query('SET NAMES utf8'); after the connection to the database in your PHP file
Put < meta charset="utf-8" /> in your HTML-s
Works perfect.
If your php.ini default_charset is not set to UTF-8, you need to use a Content-type to define your data. Apply the following header at the top of your file(s) :
header("Content-type: text/html; charset=utf-8");
If you have still troubles with encoding, the cause may be one of the following:
a database server charset problem (check encoding of your server)
a database client charset problem (check encoding of your connection)
a database table charset problem (check encoding of your table)
a php default encoding problem (check default_encoding parameter in parameters.ini)
a multibyte missconfigured (see mb_string parameters in parameters.ini)
a <form> charset problem (check that it is sent as utf-8)
a <html> charset problem (where no enctype is set in your html file)
a Content-encoding: problem (where the wrong encoding is sent by Apache).
SET NAMES worked for me.
My issue was in one of my editing pages the field with the foreign characters would not display, on the production web pages there was no problem.
I know you already have an answer. That's great. But strangely none of these answers solved my issue. I'd like to share my answer for the benefit of the others who may encounter the same issues.
I also had the same problems as the OP, with regards to French accents in a multi-lingual application.
But I encountered this issue for the first time when I had to pass (French accented) data as segments in AJAX calls.
Yes, we must have the database set to work with UTF8. But the fact that AJAX calls had query strings (in my case segments, since I'm using CodeIgniter), I had to simply encode the French text.
To do this on the client-side, use the Javascript encodeURI() function with your data.
And to reverse it in PHP, just use urldecode($MyStr) where data was received as parameters.
Hope this helps.
Type something full French signs in your (php) file
Save that file as UTF-8
Paste line beneath into your website header
header('Content-Type: text/html; charset=utf-8');
Page (file) should look good.
If looks good go here for mysql behavior after (SET_NAMES).
I'm using CakePHP with App.encoding set to UTF-8, <meta http-equiv="Content-Type" content="text/html; charset=utf-8" /> present in my <head> and my MySQL database set to UTF-8 Unicode Encoding and utf8_general_ci collation. I also have "encoding"=>"UTF8" in my database.php connection details.
When I store a '£' symbol in the database table and view it using command line MySQL, the character displays correctly.
If I use CakePHP to fetch the rows from the database table and output them in my website, I see £ instead of my intended £ symbol.
However if I then use utf8_decode() to output my data, it displays correctly.
Is this correct? I have tried using htmlentities() to convert the £ symbol into £ but it outputs £ instead! Even when I use the additional parameters for charset.
Perhaps someone can help - I must have missed something here, but I thought that the characters should display correctly (in things like textarea HTML tags) if all your headers, meta tags etc were consistently UTF-8?
It sounds like the data in your database is wrong: the character £ is actually stored as the two characters £. You can confirm this by going to the database and using the hex and charset functions:
select charset(MyColumn), hex(MyColumn) from MyTable;
If the column is encoded in UTF-8, for the value '£' you should see output identical to this:
+---------------+-----------+
| utf8 | C2A3 |
+---------------+-----------+
If you see anything else, like if the charset column reports latin1 or if hex column reports C382C2A3, the data in the table is wrong. It can be fixed though, but the fix depends on the kind of error the data has. What do you get from charset and hex?
You can use htmlentities with third parameters to safely encode UTF-8 :
htmlentities("£", ENT_COMPAT, "UTF-8")
If all is in UTF8 remove the "encoding"=>"UTF8" in your database.php connection details:
$conn = mysql_connect($server, $username, $password);
//mysql_set_charset("UTF8", $conn); // REMOVED. ;)
mysql_select_db($database, $conn);
I have a page that has a translation function here. My problem here is that, when I translate the language into French, the words are cut because the page didn't interpret the words correctly. I checked posts related to my problem but none of them work.
In my page, I put these stuffs:
header ('Content-Type:text/html; charset=WINDOWS-1252'); -> This is just to insist the encoding on start up. I think this one is optional but I still use it.
<meta http-equiv="Content-Type" content="text/html; charset=windows-1252">
Equivalent translations are fetched from a database tablename: labels. Labels's table type is InnoDB with utf8 -- UTF-8 Unicode as default Character set.
Characters after é are being cut. Is there anything that I need to do to display the characters correctly? Thanks!
I don't see any point in using Unicode on the backend and a code page in the frontend of a multilingual application. You either use the same encoding throughout your project, or you manually convert back and forth between UTF-8 and windows-1252.
I don't think you have a problem with reading. The labels come truncated from the DB, otherwise your browser would display garbage characters. So this is not an issue with PHP/HTML, but with MySQL. In the case of èéàòì and the like, MySQL is certainly able to convert from UTF-8 to CP1252 (latin1). However, if this were not the case (as if we try to convert the same string from UTF-8 to CP1251), MySQL would show a question mark ?.
In your case I think it's an input problem, ie the labels are truncated in the DB. How is this possible? You may have a UTF8 PHP and MySQL, but your browser sends windows-1252 strings when it submits a form from a page loaded with such a charset. In your PHP script you should transcode this string to UTF-8 before inserting it in the db, or connect to MySQL with SET NAMES 'CP1252'. Since you don't do so, you end up trying to insert a bunch of invalid UTF-8 bytes, so MySQL truncates the string and your labels are empty. Attached is a test case. Here is the test table
CREATE TABLE `test` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`name` varchar(128) DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=4 DEFAULT CHARSET=utf8
Here is the PHP part. Note that this script is UTF-8 encoded, so every literal string appearing in it has the same encoding.
// This is a UTF-8 file, so my editor uses UTF-8 and thus each literal
// string is a UTF-8 string, since PHP only has binary strings.
$label = "Référence";
// Now let's translate this string as if it came from a browser submitting
// a form loaded from a cp1252 encoded page
$src = mb_convert_encoding($label, "CP1252", "UTF-8");
// But connect as if I were UTF-8
$db = new PDO('mysql:host=localhost;dbname=test;charset=utf8',
'test', 'test');
// Insert the string
$stmt = $db->prepare('INSERT INTO test (name) VALUES ( ? )');
$stmt->bindValue(1, $src);
$stmt->execute();
// Read it
header("content-type: text/plain; charset=windows-1252");
foreach($db->query('SELECT * FROM test') as $row)
echo $row['name'] . "\n";
How do you recover? Either you connect to MySQL with the cp1252 charset and let MySQL translate for you, or you transcode the string in your script.
After correctly getting data in, you'll have to extract them and put it on a HTML page. This time you'll have the same problem, but reversed: showing a UTF-8 string in a CP1252 document. The bytes in the DB are unsuitable, because UTF-8 is a variable-length encoding, whilst in CP1252 a char is exactly 1 byte long. If you put these bytes directly into the page, the browser will show some random gibberish for the extra bytes. So, again, you either connect to the db specifying the CP1252 charset so that MySQL takes care of the conversion and give you the right bytes, or you transcode the bytes yourself on the PHP side.
Or you'd better doing yourself a favor: use the same encoding everywhere. I suggest UTF-8 because today is the right thing to do, but you can successfully opt for CP1252 because it can represents English and French chars (and saves some storage, but I don't consider this an issue)
My suggestion is to use the same encoding through the whole process. Use UTF-8 as the charset both in header and the meta tag.
It seems to me, that your data is not stored correctly in the database. If you are working with mysqli you can try to set the charset of the connection object, before reading or writing to the database.
// tells the mysqli connection to deliver UTF-8 encoded strings.
$db = new mysqli($dbHost, $dbUser, $dbPassword, $dbName);
$db->set_charset('utf8');
For other databases see UTF-8 for PHP and MySQL. Maybe it's necessary to insert the french textes again (with this setting), because the existing textes could be invalid now.
Your linked example page is correctly encoded with UTF-8 (the file format), though your meta tag is a bit incorrect:
<!meta http-equiv="Content-Type" content="text/html; charset=windows-1252">
<meta http-equiv="Content-Type" content="text/html; charset=utf-8"/>
The <! is not a commented out, you would have to write <!-- instead. The best would be to declare it only once for UTF-8 and remove other meta tags.
I'm working on a website which uses / stores accented characters in the database. I have the page template set so that the config.php charset variable matches the setting, e.g.:
<meta charset="<?php echo $this->config->item('charset');?>">
The problem I'm having is, when $config['charset'] is set to UTF-8, the form validation fails and it's as if no characters were submitted if an accented character was included. So, for example, a required field will bounce back if á is included anywhere in the string. The string minus the á works fine.
I've managed to get this working by changing the $config['charset'] to ISO-8859-1 and converting text to UTF-8 before inserting / after retrieving from the database with php's utf8_encode() and utf8_decode(). Is this the best way or am I missing something needed in order to get UTF-8, with accented characters, working in CodeIgniter?
Any advice appreciated.
You have to make sure that you use UTF-8 everywhere, and that both PHP and MySQL are configure to handle UTF-8.
In the html, add the meta-tag:
<meta charset="utf-8" />
And save it in UTF-8 format. here is how to do that in notepad++.
Define the MySQL tables to support UTF-8, create table with:
DEFAULT CHARSET=utf8;
And set the connection to:
mysql_set_charset('utf8', $con);
Enable UTF-8 in the php.ini:
default_charset = "utf-8"
For a full manual check Handling Unicode Front To Back In A Web App