I often see something similar to this below in PHP scripts using MySQL
query("SET NAMES utf8");
I have never had to do this for any project yet so I have a couple basic questions about it.
Is this something that is done with PDO only?
If it is not a PDO specific thing, then what is the purpose of doing it? I realize it is setting the encoding for mysql but I mean, I have never had to use it so why would I want to use it?
It is needed whenever you want to send data to the server having characters that cannot be represented in pure ASCII, like 'ñ' or 'ö'.
That if the MySQL instance is not configured to expect UTF-8 encoding by default from client connections (many are, depending on your location and platform.)
Read http://www.joelonsoftware.com/articles/Unicode.html in case you aren't aware how Unicode works.
Read Whether to use "SET NAMES" to see SET NAMES alternatives and what exactly is it about.
From the manual:
SET NAMES indicates what character set
the client will use to send SQL
statements to the server.
More elaborately, (and once again, gratuitously lifted from the manual):
SET NAMES indicates what character set
the client will use to send SQL
statements to the server. Thus, SET
NAMES 'cp1251' tells the server,
“future incoming messages from this
client are in character set cp1251.”
It also specifies the character set
that the server should use for sending
results back to the client. (For
example, it indicates what character
set to use for column values if you
use a SELECT statement.)
Getting encoding right is really tricky - there are too many layers:
Browser
Page
PHP
MySQL
The SQL command "SET CHARSET utf8" from PHP will ensure that the client side (PHP) will get the data in utf8, no matter how they are stored in the database. Of course, they need to be stored correctly first.
DDL definition vs. real data
Encoding defined for a table/column doesn't really mean that the data are in that encoding. If you happened to have a table defined as utf8 but stored as differtent encoding, then MySQL will treat them as utf8 and you're in trouble. Which means you have to fix this first.
What to check
You need to check in what encoding the data flow at each layer.
Check HTTP headers, headers.
Check what's really sent in body of the request.
Don't forget that MySQL has encoding almost everywhere:
Database
Tables
Columns
Server as a whole
Client
Make sure that there's the right one everywhere.
Conversion
If you receive data in e.g. windows-1250, and want to store in utf-8, then use this SQL before storing:
SET NAMES 'cp1250';
If you have data in DB as windows-1250 and want to retreive utf8, use:
SET CHARSET 'utf8';
Few more notes:
Don't rely on too "smart" tools to show the data. E.g. phpMyAdmin does (was doing when I was using it) encoding really bad. And it goes through all the layers so it's hard to find out.
Also, Internet Explorer had really stupid behavior of "guessing" the encoding based on weird rules.
Use simple editors where you can switch encoding. I recommend MySQL Workbench.
This query should be written before the query which create or update data in the database, this query looks like :
mysql_query("set names 'utf8'");
Note that you should write the encode which you are using in the header for example if you are using utf-8 you add it like this in the header or it will couse a problem with Internet Explorer
so your page looks like this
<html>
<head>
<title>page title</title>
<meta charset="UTF-8" />
</head>
<body>
<?php
mysql_query("set names 'utf8'");
$sql = "INSERT * FROM ..... ";
mysql_query($sql);
?>
</body>
</html>
The solution is
$conn->set_charset("utf8");
Instead of doing this via an SQL query use the php function: mysqli::set_charset
mysqli_set_charset
Note:
This is the preferred way to change the charset. Using mysqli_query() to set it (such as SET NAMES utf8) is not recommended.
See the MySQL character set concepts section for more information.
from http://www.php.net/manual/en/mysqli.set-charset.php
Thanks #all!
don't use: query("SET NAMES utf8"); this is setup stuff and not a query. put it right afte a connection start with setCharset() (or similar method)
some little thing in parctice:
status:
mysql server by default talks latin1
your hole app is in utf8
connection is made without any extra (so: latin1) (no SET NAMES utf8 ..., no set_charset() method/function)
Store and read data is no problem as long mysql can handle the characters.
if you look in the db you will already see there is crap in it (e.g.using phpmyadmin).
until now this is not a problem! (wrong but works often (in europe)) ..
..unless another client/programm or a changed library, which works correct, will read/save data. then you are in big trouble!
Not only PDO. If sql answer like '????' symbols, preset of you charset (hope UTF-8) really recommended:
if (!$mysqli->set_charset("utf8"))
{ printf("Can't set utf8: %s\n", $mysqli->error); }
or via procedure style mysqli_set_charset($db,"utf8")
Related
I am transfering the database from one server to another server using phpmyadmin. I successfully transfered it but having issue with swedish characters. I can see the swedish characters are displaying properly within the tables but in php pages it is wrong seems like double encoded or any other problem. Can anyone help?
The problem could be lying in different parts. Welcome to the world of Unicode!
Make sure the collation for the columns in MySQL is utf8_* (I personally prefer utf8_bin).
Make sure the PHP page is telling the client that the contents are encoded with UTF8. That can/should be done in two ways:
Set the following header: header('Content-Type: text/html; charset=utf-8');
In your HTML <head> add the correct meta tag: <meta charset="utf-8">
(note: while in theory it's not strictly necessary to do both, as they're equivalent for the client, it's better to be redundant!)
Make sure the connection with MySQL uses UTF8. That can be done by executing a simple query right after the connection to the database: SET NAMES 'utf8' (e.g. mysqli_query("SET NAMES 'utf8'"); alter it accordingly if you're using PDO or the MySQLi OOP APIs).
Bonus: if you're using UTF8 in your PHP script, make sure you treat everything in an Unicode-safe way. So, prefer using mb_* functions to manipulate strings, use the u flag with preg_* functions, etc. And remember than UTF8 characters are variable in the number of bytes they use, from 1 to 4!
I have same setting for my both website only problem is with database after transfering it to an other server. Encoding of pages are same on both sites.
you can check it here
http://www.abswheels.se
http://www.dackis.se/abs/
you can see the difference. any sugguestions??
also everything is fine inside the database. I dont know why when i fetch the data with special character from database it has a problem. you can see the title bar of both website. everything is same on client side. same encoding same setting
This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
UTF-8 all the way through
okay, this is stupid that I can't figure it out.
Mysql database is set to utf8_general_ci collation. The field i'm having problems with is longtext type.
characters added to the database as é or other accented characters are returning as �.
I run the output through stripslashes and i've tried both with and without html_entity_decode but can find no change in the output. What am I doing wrong?
Cheers
What character encoding does the string have that you try to insert? If it is in ISO-8859-1 you can use the PHP function utf8_encode() to encode it to UTF-8 before inserting it into the database.
http://php.net/manual/en/function.utf8-encode.php
Getting encoding right is really tricky - there are too many layers:
Browser
Page
PHP
MySQL
The SQL command "SET CHARSET utf8" from PHP will ensure that the client side (PHP) will get the data in utf8, no matter how they are stored in the database. Of course, they need to be stored correctly first.
DDL definition vs. real data
Encoding defined for a table/column doesn't really mean that the data are in that encoding. If you happened to have a table defined as utf8 but stored as differtent encoding, then MySQL will treat them as utf8 and you're in trouble. Which means you have to fix this first.
What to check
You need to check in what encoding the data flow at each layer.
Check HTTP headers, headers.
Check what's really sent in body of the request.
Don't forget that MySQL has encoding almost everywhere:
Database
Tables
Columns
Server as a whole
Client
Make sure that there's the right one everywhere.
Conversion
If you receive data in e.g. windows-1250, and want to store in utf-8, then use this SQL before storing:
SET NAMES 'cp1250';
If you have data in DB as windows-1250 and want to retreive utf8, use:
SET CHARSET 'utf8';
Last note:
Don't rely on too "smart" tools to show the data. E.g. phpMyAdmin does (was doing when I was using it) encoding really bad. And it goes through all the layers so it's hard to find out. Also, Internet Explorer had really stupid behavior of "guessing" the encoding based on weird rules. Use simple editors where you can switch encoding. Also, I recommend MySQL Workbench.
I have a form in my page for users to leave a comment.
I'm currently using this charset:
meta http-equiv="Content-Type" content="text/html;charset=ISO-8859-1"
but retrieveving the comment from DB accents are not displaying correct ( Ex. è =>è ).
Which parameters should i care about for a correct handling of accents?
SOLVED
changed meta tag to charset='utf-8'
changed character-set Mysql (ALTER TABLE comments CONVERT TO CHARACTER SET utf-8)
changed connection character-set both when inserting records and retrieving ($conn->query('SET NAMES utf8'))
Now accents are displaying correct
thanks
Luca
Character sets can be complicated and pain to debug when it comes to LAMP web applications. At each of the stages that one piece of software talks to another there's scope for incorrect charset translation or incorrect storage of data.
The places you need to look out for are:
- Between the browser and the web server (which you've listed already)
- Between PHP and the MySQL server
The character you've listed look like normal a European character that will be included in the ISO-8859-1 charset.
Things to check for:
even though you're specifying the character set in a meta header have a look in your browser to be sure which character set the browser is actually using. If you've specified it the browser should use that charset to render/view the page but in cases I've seen it attempting to auto-detect the correct charset and failing. Most browsers will have an "encoding" menu (perhaps under "view") that allows you to choose the charset. Ensure that it says ISO-8859-1 (Western European).
MySQL can happily support character set conversion if required to but in most cases you want to have your tables and client connection set to use the same encoding. When configured this way MySQL won't attempt to do any encoding conversion and will just write the data you input byte for byte into the table. When read it'll come out the same way byte for byte.
You've not said if you're reading data from the database back out with the same web-app or with some other client. I'd suggest you try to read it out with the same web application and using the same meta charset header (again, check the browser is really setting it) and see what is displayed in the browser.
To debug these issues requires you to be really sure about whether the client/console you're using is doing any conversion too, the safest way is sometimes to get the data into a hex editor where you can be sure that nothing else is messing around with any translation.
If it doesn't look like it's a browser-side problem please can you include the output of the following commands against your database:
Run from a connection that your web-app makes (not from some other MySQL client):
SHOW VARIABLES LIKE 'character_set%';
SHOW VARIABLES LIKE 'collation%';
Run from any MySQL client:
SHOW CREATE TABLE myTable;
(where myTable is the table you're reading/writing data from/to)
The ISO-8859-1 character set is for Latin characters only. Try UTF-8, and make sure that the database these characters are coming from are also UTF-8 columns.
I have a table of datas encoded in latin5 charset and all the columns in the table are also latin5. From mysql console when I enter "SET NAMES 'latin5'" and query the table results are ok . When I try to delete or insert/update all the new data's encodings are perfect. But when I try to insert Iso-8859 data (also verify this with mb_detect_encoding) to the database and I try to insert the data without "SET NAMES" it doesn't insert/update/select in proper encodings or when I used "SET NAMES 'latin5'" it doesn't insert/update in proper way but select are ok latin5 datas are coming in proper encodings in with only set names 'latin5'. When i use set names 'utf8' the select queries are bad encoded but insert/update are ok.
The reason I asked that we will go to production. And this makes me thinking about possible future problems.
mb_detect_encoding doesn't know what encoding your string is. It makes a qualified guess, but there are no guarantees that it will guess right. Especially not if the candidates are all single-byte encodings, as in the case of latin1 and latin5.
There really is no substitute for knowing what you're doing, if you want to get charsets right. I suggest that you read these pages at least a couple of times:
http://www.phpwact.org/php/i18n/charsets
http://www.nicknettleton.com/zine/php/php-utf-8-cheatsheet
In particular, make note that a web page is served with a http header, that specifies the charset that the page is encoded with. Unless you explicitly set this from your php-script, you'll use the webservers default, which may vary from server to server.
Also, be wary to actually understand what is going on, rather than doing trial and error. The latter can easily get you something that works in some context, but not in every context.
And lastly. If you have any choice at all, I seriously suggest that you use utf-8 for everything. latin5 is going to get you lots of grief.
It often happens that characters such as é gets transformed to é, even though the collation for the MySQL DB, table and field is set to utf8_general_ci. The encoding in the Content-Type for the page is also set to UTF8.
I know about utf8_encode/decode, but I'm not quite sure about where and how to use it.
I have read the "The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)" article, but I need some MySQL / PHP specific pointers.
How do I ensure that user entered data containing international characters doesn't get corrupted?
On the first look at http://www.nicknettleton.com/zine/php/php-utf-8-cheatsheet I think that one important thing is missing (perhaps I overlooked this one).
Depending on your MySQL installation and/or configuration you have to set the connection encoding so that MySQL knows what encoding you're expecting on the client side (meaning the client side of the MySQL connection, which should be you PHP script). You can do this by manually issuing a
SET NAMES utf8
query prior to any other query you send to the MySQL server.
If your're using PDO on the PHP side you can set-up the connection to automatically issue this query on every (re)connect by using
$db=new PDO($dsn, $user, $pass);
$db->setAttribute(PDO::MYSQL_ATTR_INIT_COMMAND, "SET NAMES utf8");
when initializing your db connection.
Collation and charset are not the same thing. Your collation needs to match the charset, so if your charset is utf-8, so should the collation. Picking the wrong collation won't garble your data though - Just make string-comparison/sorting work wrongly.
That said, there are several places, where you can set charset settings in PHP. I would recommend that you use utf-8 throughout, if possible. Places that needs charset specified are:
The database. This can be set on database, table and field level, and even on a per-query level.
Connection between PHP and database.
HTTP output; Make sure that the HTTP-header Content-Type specifies utf-8. You can set default values in PHP and in Apache, or you can use PHP's header function.
HTTP input. Generally forms will be submitteed in the same charset as the page was served up in, but to make sure, you should specify the accept-charset property. Also make sure that URL's are utf-8 encoded, or avoid using non-ascii characters in url's (And GET parameters).
utf8_encode/decode functions are a little strangely named. They specifically convert between latin1 (ISO-8859-1) and utf-8. If everything in your application is utf-8, you won't have to use them much.
There are at least two gotchas in regards to utf-8 and PHP. The first is that PHP's builtin string functions expect strings to be single-byte. For a lot of operations, this doesn't matter, but it means than you can't rely on strlen and other functions. There is a good run-down of the limitations at this page. Usually, it's not a big problem, but especially when using 3-party libraries, you need to be aware that things could blow up on this. One option is also to use the mb_string extension, which has the option to replace all troublesome functions with utf-8 aware alternatives. It's still not a 100% bulletproof solution, but it'll work for most cases.
Another problem is that some installations of PHP still has the magic_quotes setting turned on. This problem is orthogonal to utf-8, but can lead to some head scratching. Turn it off, for your own sanity's sake.
Things you should do:
Make sure Apache puts out UTF-8 content. Do this in your httpd.conf, or use PHP's header()-function to do it manually.
Make sure your database connection is UTF8. SET NAMES utf8 does the trick.
Make sure all your tables are set to UTF8.
Make sure all your PHP and template files are encoded as UTF8 if you store international characters in them.
You usually don't have to do to much using the mb_string or utf8_encode/decode-functions when you do this.
For better unicode correctness, you should use utf8_unicode_ci (though the documentation is a little vague on the differences). You should also make sure the following Mysql flags are set correctly -
default-character-set=utf8
skip-character-set-client-handshake //Important so the client doesn't enforce another encoding
Those can be set in the mysql configuration file (under the [mysqld] tab) or at run time by sending the appropriate queries.
Regardless of the language it's written in, if you were to create an app that allows a wide array of encodings, handle it in pieces:
Identify the encoding
somehow you want to find out what kind of encoding you're dealing with, otherwise, it's pretty pointless to consider it further. You'll end up with junk chars.
Handle your bytes
think of these strings less like 'strings' of characters, and more like lists of bytes
PHP is especially sneaky. Don't let it truncate your data on-the-fly. If you're regexing a UTF-8 string, make sure you identify it as such
Store for the LCD
Again, you don't want to truncate data. If you're storing a sentence in English, can you also store a set of Mandarin glyphps? How about Arabic? Which of these is going to require the most space? Account for it.