There is a MySQL database and I want to select columns from a table.
I must return a String from the concatenation of the selected column values. But in one of the columns there are accentuated letters in the column value , like é.
So how to encode the column returned value ?
NB : I already wrote header('Content-Type: text/plain; charset=utf-8'); at the beginning of the PHP file.
Defining charset with SET NAMES 'utf-8' may help.
http://dev.mysql.com/doc/refman/5.0/en/charset-connection.html
What encoding is your database table in? On a lot of installations, MySQL defaults to LATIN-1. Make sure the table stores its data as UTF-8, then make sure that the connection between MySQL and PHP is in UTF-8. The easy way to do that is running the query SET NAMES utf8 after connecting, but you can also set a default encoding.
Next, the UTF-8 header should be sent from the server to the browser, but you've already done that by adding the header() call.
If your database table is currently not encoded as UTF-8, you might need to re-enter your data after changing it.
Maybe the Multibyte String Module can be of some help to you, part of the Human Language and Character Encoding Support.
Related
This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
UTF-8 all the way through
okay, this is stupid that I can't figure it out.
Mysql database is set to utf8_general_ci collation. The field i'm having problems with is longtext type.
characters added to the database as é or other accented characters are returning as �.
I run the output through stripslashes and i've tried both with and without html_entity_decode but can find no change in the output. What am I doing wrong?
Cheers
What character encoding does the string have that you try to insert? If it is in ISO-8859-1 you can use the PHP function utf8_encode() to encode it to UTF-8 before inserting it into the database.
http://php.net/manual/en/function.utf8-encode.php
Getting encoding right is really tricky - there are too many layers:
Browser
Page
PHP
MySQL
The SQL command "SET CHARSET utf8" from PHP will ensure that the client side (PHP) will get the data in utf8, no matter how they are stored in the database. Of course, they need to be stored correctly first.
DDL definition vs. real data
Encoding defined for a table/column doesn't really mean that the data are in that encoding. If you happened to have a table defined as utf8 but stored as differtent encoding, then MySQL will treat them as utf8 and you're in trouble. Which means you have to fix this first.
What to check
You need to check in what encoding the data flow at each layer.
Check HTTP headers, headers.
Check what's really sent in body of the request.
Don't forget that MySQL has encoding almost everywhere:
Database
Tables
Columns
Server as a whole
Client
Make sure that there's the right one everywhere.
Conversion
If you receive data in e.g. windows-1250, and want to store in utf-8, then use this SQL before storing:
SET NAMES 'cp1250';
If you have data in DB as windows-1250 and want to retreive utf8, use:
SET CHARSET 'utf8';
Last note:
Don't rely on too "smart" tools to show the data. E.g. phpMyAdmin does (was doing when I was using it) encoding really bad. And it goes through all the layers so it's hard to find out. Also, Internet Explorer had really stupid behavior of "guessing" the encoding based on weird rules. Use simple editors where you can switch encoding. Also, I recommend MySQL Workbench.
I am storing Unicode text لاہور in MySQL, I have set tables and columns to utf8_general_ci. The text لاہور is displaying correctly in MySQL. However if I echo that with PHP it shows ?????? on the browser window.
One thing to mention here: I have the whole document in Unicode and all words are displaying correctly, but they are written directly i.e. not coming from MySQL.
Even if I try
$p="لاہور";
echo $p;
It displays لاہور in the browser. Things go wrong only when retrieving from MySQL.
One common cause for this is that your PHP script is being saved with another format (for example ASCII), you must be sure that your PHP script is also saved as UTF-8 or whatever codification you use in your database.
Another possible cause is that MySQL is not returning proper Unicode characters to your script, you may use mysql_query("SET NAMES utf8") or whatever encoding you want to use, before processing your queries, a good way to troubleshot this problem could be converting the string to their respective unicode codes and comparing them to see if they're the same.
It may not always be sufficient to set the content type using meta tags, I usually set it via the header directive as well as below.
header('Content-Type: text/html; charset=utf-8');
Most likely your MySQL connection (as opposed to storage) has not been set to UTF-8, causing the UTF-8 data retrieved from MySQL to be converted to Latin1 (or similar), which cannot represent those characters and they are replaced with a ?.
If you are using mysql_:
mysql_set_charset( 'utf8' );
If you are using mysqli_:
$mysqli->set_charset( 'utf8' );
before you make any queries
If you are using PDO, add charset=utf8 to the connection string.
I have the following problem: on a very simple php-mysqli query:
if ( $result = $mysqli->query( $sqlquery ) )
{
$res = $result->fetch_all();
$result->close();
}
I get strings wrongly encoded as Western encoded string, although the database, the table and the column is in utf8_general_ci collation. The php script itself is utf-8 encoded and the mysql-less parts of the script get the correct encodings. So say echo "ő" works perfectly, but echo $res[0] from the previous example outputs the EF BF BD character when the file viewed in the correct UTF-8 encoding. If I manually switch the browser's encoding to Western, the mysqli sourced strings get good decoding, except for the non-western characters being replaced with "?'.
What makes it even stranger is that on my development environment this isn't happening, while on my webserver it is. The developer environment is a LAMP stack (The Uniform Server), while the webserver uses nginx.
In this case, I entered the data in the database using phpMyAdmin, and inside phpmyadmin it displays perfectly. phpMyAdmin's collation is utf-8 too. I believe that the problem must be somewhere around here, as on the same webserver, for an other site where I enter data through php (using POST) the same problem doesn't happen. On that case, the data is visible correctly both while entering and while viewing it (I mean in the php generated webpages), but the special characters are not correct in phpMyAdmin.
Can you help me start where to debug? Is it connected to php or mysql or nginx or phpMyAdmin?
Use mysqli_set_charset to change the client encoding to UTF-8 just after you connect:
$mysqli->set_charset("utf8");
The client encoding is what MySql expects your input to be in (e.g. when you insert user-supplied text to a search query) and what it gives you the results in (so it has to match your output encoding in order for echo to display things correctly).
You need to have it match the encoding of your web page to account for the two scenarios above and the encoding of the PHP source file (so that the hardcoded parts of your queries are interpreted correctly).
Update: How to convert data inserted using latin-1 to utf-8
Regarding data that have already been inserted using the wrong connection encoding there is a convenient solution to fix the problem. For each column that contains this kind of data you need to do:
ALTER TABLE table_name MODIFY column_name existing_column_type CHARACTER SET latin1;
ALTER TABLE table_name MODIFY column_name BLOB;
ALTER TABLE table_name MODIFY column_name existing_column_type CHARACTER SET utf8;
The placeholders table_name, column_name and existing_column_type should be replaced with the correct values from your database each time.
What this does is
Tell MySql that it needs to store data in that column in latin1. This character set contains only a small subset of utf8 so in general this conversion involves data loss, but in this specific scenario the data was already interpreted as latin1 on input so there will be no side effects. However, MySql will internally convert the byte representation of your data to match what was originally sent from PHP.
Convert the column to a binary type (BLOB) that has no associated encoding information. At this point the column will contain raw bytes that are a proper utf8 character string.
Convert the column to its previous character type, telling MySql that the raw bytes should be considered to be in utf8 encoding.
WARNING: You can only use this indiscriminate approach if the column in question contains only incorrectly inserted data. Any data that has been correctly inserted will be truncated at the first occurrence of any non-ASCII character!
Therefore it's a good idea to do it right now, before the PHP side fix goes into effect.
Use mysqli::set_charset function.
$mysqli->set_charset('utf8'); //returns false if the encoding was not valid... won't happen
http://php.net/manual/en/mysqli.set-charset.php
I haven't used mysqli for some time, but if things are the same, connections by default use the latin swedish encoding (ISO 8859 1).
I will consider your page is already using utf8 encoding by having:
<meta http-equiv="Content-Type" content="text/html; charset=utf-8"/>
Inside the <head> tag.
If you have string already on latin swedish encoding, you can use mk_convert_encoding:
http://php.net/manual/en/function.mb-convert-encoding.php
$fixedStr = mb_convert_encoding($wrongStr, 'UTF-8', 'ISO-8859-1');
iconv does something very similar: Truth be told, I don't know the difference, but here's the link to the function reference:
http://php.net/manual/en/function.iconv.php
I just realized that you might have some strings in utf8 and others in latin swedish. You can use mb_detect_encoding for that: http://php.net/manual/en/function.mb-detect-encoding.php
You can also dump the database and use iconv (cmd line) if you have it installed:
iconv -f latain -t utf-8 < currentdb.sql > fixeddb.sql
What is the best Collation for the column that can allow to store accented letters and parse them out perfectly without any encoding error, because whenever I add an accented letter such as é, å, it shows out with an encoding problem on the PHP side, but in the MySQL side it's fine...
How do I get the accented letters display properly?
You get them correctly by matching the encoding on both ends, ie. both your PHP output and your DB should use the same encoding. For European languages I would suggest using UTF-8 for both your scripts and the DB. Just remember that you still have to initialize UTF-8 collation in MySQL using SET NAMES 'utf8' COLLATE 'utf8_general_ci' (so run this query just after you make a connection to the DB and you should be ok).
Perhaps your problem isn't within the database, but within however you're displaying things from PHP? What content encoding are you specifying in your output? You might need to manually send a header to specify that the content is UTF-8 if that's what you're trying to output.
For instance: header("Content-Type: text/html; charset=UTF-8");
There was a table in latin1 and site in cp1252
I want to have table in utf8 and site in utf-8
I've done:
1) on web page: Content-Type: text/html;charset=utf-8
2) Mysql: ALTER TABLE XXX CONVERT TO CHARACTER SET utf8
_
This SQL doesn't work as I want - it doesn't convert ä & ü characters in database to their multibyte equivalents
Please Help.
Tanks
As this blog post says, using MySQL's ALTER TABLE CONVERT syntax is A Bad Idea [TM]. Export your data, convert the table and then reimport the data, as described in the blog post.
Another idea: Have you set your default client connection charset via /etc/my.cnf or mysqli::set-charset .
I've been a fool. SET NAMES was missing.
What I know now:
1) Every time the charset of a column is changed, the actual data is ALWAYS recoded! Change field to binary to see that.
2) The charset of a column is prior!, the table and db charset follow in the priority. They are used mainly for setting defaults. (not 100% sure about last sentence)
3) SET NAMES is very important. German characters can come in latin1 and be placed get correctly in utf8 table(recoded by Mysql silently) when you SET NAMES correctly. The server can send data to a web page in the encoding you desire, no matter what the table encoding is. It can be recoded for output
4) If there is a column in encoding A and a column in encoding B, and you compare them (or use LIKE), the Mysql will silently convert them so that it looks like they are in one encoding
5) Mysql is smart. It never operates with text as with bytes unless the type is binary. It always operates as characters! He wants that ё in latin1 would equal ё in utf8 if he knows the data encoding
Since you claim you now get s**t back, it suggests that the characters were modified in the database.
How are you accessing the data in mysql? If you are using a programming interface such as PHP, then you may need to tell that interface what character encoding to expect.
For example, in PHP you will need to call something like mysql_set_charset("utf8"); but it can also be done with an SQL query of SET NAMES utf8
You will then also need to make sure that whatever is displaying the text knows it is utf8 and is rendering with an appropriate encoding. For example, on a web page you would need to set the content type to utf-8. something like Content-Type: text/html;charset=utf-8