I use ODBC to connect to SQL Server from PHP.
In PHP I read some string (nvarchar column) data from SQL Server and then want to insert it to mysql database. When I try to insert such value to mysql database table I get this mysql error:
Incorrect string value: '\xB3\xB9ow...' for column 'name' at row 1
For string with all ASCII characters everything is fine, the problem occurs when non-ASCII characters (from some European languages) exist.
So, in more general terms: there is a Unicode string in MS SQL Server database, which is retrieved by PHP trough ODBC. Then it is put in sql insert query (as value for utf-8 varchar column) which is executed for mysql database.
Can someone explain to me what is happening in this situation in terms of encoding? At which step what character encoding convertions may take place?
I use: PHP 5.2.5, MySQL5.0.45-community-nt, MS Sql Server 2005.
PHP have to run on Linux platform.
UPDATE: The error doesn't occur when I call utf8_encode($s) on this string and use that value in mysql insert query, but then the inserted string doesn't display correctly in mysql database (so that utf8 encoding only worked for enforcing proper utf8 string, but it loses correct characters).
First you have the encoding of the DB. Then you have the encoding used by the ODBC client.
If the encoding of your ODBC client connection does not match the one of the DB, the ODBC layer will automatically transcode your data, in some cases.
The trick here is to force the encoding of the ODBC client connection.
For an "all UTF-8" setup :
$conn=odbc_connect(DB_DSN,DB_USR,DB_PWD);
odbc_exec($conn, "SET NAMES 'UTF8'");
odbc_exec($conn, "SET client_encoding='UTF-8'");
// processing here
This works perfectly with PostgreSQL + Php 5.x.
The exact syntax and options depends on the DB vendor.
You can find very useful and clear additional info for MySql here : http://dev.mysql.com/doc/refman/5.0/fr/charset-connection.html
hope this helps.
Maybe you can use the PDO extension, if it will make any difference?
There is a user contributed comment here that suggests to change the data types in sql server to somethig else, if this is not possible look at the users class that casts fields.
I have no experience with ODBC via PHP, but with the mysql functions PHP seems to default to ASCII and UTF8 connections need to be made explicit if you want to avoid trouble.
Are you sure PHP and the MySQL server are communicating in UTF8? Until PHP 6 the Unicode support tends to be annoyingly inconistent like that.
I remember that the MySQL docs mention a connection string parameter to tweak the Unicode encoding.
From your description it sounds like PHP is treating the connection as ASCII-only.
Related
I noticed that when doing database queries in PHP (e.g. Zend_db, mysqli...), you can set the character set. For example: mysqli_set_charset($con,"utf8"); I'm a little foggy as to what this actually does behind the scenes.
If I use php to do a database SELECT query, and I indicate a character set, what happens if it's not the same character set that the column was defined as in the database?
I mean, the database returns a binary sequence, but what is actually returned if the character is not encoded the same in the two character sets? Will mySQL take the internal binary data and return it "As-is"?
Or will MySQL try to convert it to the binary sequence that's the equivalent in the character set you indicated?
I guess the gist of my question is that I would like to know how the data is encoded when PHP is sending in the query, how it's transmitted back from MySQL, and whether there's another step of translation after PHP gets it back and stores it into a string in PHP internal memory.
Similarly, if you're doing an INSERT or update, how does it get sent from PHP to MySQL? Does PHP convert it to the correct binary encoding THEN send it into MySQL?
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
Update:
Thanks to Raymond Nijland. I was able to fix my bug. But I did notice that for nonstandard characters, the charset does seem to matter.
I did a select statement using $db = new \PDO("mysql:host=$host;dbname=$database;charset=latin1", $dbuser, $dbpassword);. First, I tried latin1, then I tried utf8.
My problem was that I had a column with encrypted data, which I guess had some wierd characters in it. if I did an md5 on that field directly in the database query, it gave me an encoding that began with 889... BUT, I tried to pulled it into PHP with a SELECT statement. If I used PDO with a charset of latin1, then did an MD5() inside of PHP, it gives me the same hash (889...). Which implies that PHP has an exact copy of the binary that's in the database. BUT if I did used PDO with charset 'UTF-8', then did an MD5() in PHP, it gave me a hash beginning with 087... So somewhere, a conversion must be taking place.
At this point, my orignal bug is fixed, but I'm still curious as to what's happening. Is MYSQL doing the conversion before returning it to PHP, or does PDO do some sort of conversion on the PHP side?
mysqli_set_charset($con,"utf8"); (or other code for other client libraries) declares to MySQL that the encoding in the client will be MySQL's CHARACTER SET utf8. If bytes with a different encoding are sent to (think INSERT) mysql, garbage or errors will occur.
That setting also declares that the client desires that encoding from SELECTs.
The CHARACTER SET on each column in each table may be something else (eg, "latin1"). If so, MySQL will attempt to convert the encoding during the transmission.
Caution: MySQL's CHARACTER SET utf8 is a subset of the well-known UTF-8. To get the latter, use CHARACTER SET utf8mb4 in tables and mysqli_set_charset($con,"utf8mb4"); when connecting.
Going forward, utf8mb4 is preferred in most situations.
Non-text stuff ("as-is") should be put into BLOB or VARBINARY columns -- this bypasses any checking of the encoding. (Think a .jpg or AES_ENCRYPT.)
MySQL's MD5() function returns a hex string. UNHEX(MD5('...')) return binary stuff and must be store in, say, a BINARY(16) column.
Many forms of garbled text are discussed in Trouble with UTF-8 characters; what I see is not what I stored .
i'm having big trouble storing extended characters from UTF-8 in MS-SQL Server Tables.
Every extended character stored in in ASCII-extended range (8-bit). i didn't find any way to store and not distort the information stored or losing extended character info data.
my current setup for the server is:
IIS 7.5
PHP 5.6.11 (x86_64 VC11 Fast-CGI)
SQL Server Native Client for SQL Server 2008 R2 8 (Used via ODBC)
Why i'm not using SQL-Server driver Extension for PHP, simple, Microsoft don't have any available build version of that driver for 64-bit.
What i have tried so far:
my Fields are Nvarchar, Nchar datatype.
Used utf8_decode() to prepare the date for INSERT/UPDATE statements
Used utf8_encode() when i retrieve the Data with the SELECTs
Used mb_convert_encoding($string,[new_encoding],'UTF-8'); Where new encoding was Windows-1252, UCS-2, ASCII. (in subtitution of utf8_decode)
Used iconv(); With encoding CP437.(in subtitution of utf8_decode)
no success in any case.
what i'm experimenting when data is stored, and you make a SELECT directly into the database the extended characters in strings are shown as garbage.
lets say:
i write "ánimo" and that word its stored as "ánimo" when i retrieve from table to screen displays as "ánimo"
input ----> decode_utf8() ---> MS-SQL ----> encode_utf8()---> screen
any Ideas ???
I am working on a website with MySQL database on a Linux server.
Using phpMyAdmin, on the database, it says
MyISAM is the default storage engine on this MySQL server
latin1_swedish_ci
However, I have created all the tables with InnoDB and utf8_unicode_ci. I also checked that the table fields for all tables is utf8_unicode_ci.
Yet, when I mysql_fetch_array, and echo to stream, it gives gibberish. I had to explicitly set mysql_set_charset('utf8') for the text to appear correctly.
PHP version is 5.3.9; MySQL version is 5.1.70-cll - MySQL Community Server (GPL).
This is the first time I encountered this problem and I never had to set charset before.
What caused the text fetched by php mysql_* to be gibberish? Under what circumstance is it necessary to mysql_set_charset?
EDIT: This is not a question to attract suggestion to use alternative library e.g. mysqli, pdo. I just want to understand about this current situtation about the behavior of MySQL and charsets. Thanks.
When exchanging data between two systems, there's always the question "what encoding will text be sent in?" "Text" is represented simply as binary data, just long strings of 1s and 0s. These could mean anything at all. There are hundreds of encoding schemes to encode different characters into different sequences of 1 and 0. If a system just receives a string of those without being told what encoding they represent, the system cannot know what characters those supposedly are.
Therefore, for any interface between two system, there needs to be a specification for what encoding strings are in. For MySQL, that's the API call mysql_set_charset. This is the way to tell MySQL what encoding strings will be in that PHP sends to it, and what encoding MySQL should returns strings in back to PHP. Without setting this explicitly some default encoding is assumed, which may not be the same encoding you're expecting, creating a mismatch and garbage characters.
Read What Every Programmer Absolutely, Positively Needs To Know About Encodings And Character Sets To Work With Text and Handling Unicode Front To Back In A Web App for more information.
It's wise to always call it once connection is established, to ensure your app will not be affected by broken server settings. Because you can have your tables in i.e. UTF8 and send your data in UTF8 but if the connection is not UTF8 (because of i.e my.ini settings) then you end up with mess. So either call mysql_set_charset() or execute SET NAMES charset query, and you will be on safe ground. And since it is done once per connection, it's basically no cost operation anyway
mysql_set_charset functions sets the default character set for the current connection. Even though your data is stored in unicode on the server, it still requires a compatible connection character set to transmit data accurately.
If you execute SHOW VARIABLES LIKE 'character\_set\_%' statement in mysql it will show various sharacter sets used by the server and current connection. Ideally they should all match and be utf8.
More information: MySQL Connection Character Sets.
I have a Mysql database with all tables collated as 'utf8_unicode_ci'.
Also all data I wrote to the Database with php was encoded in utf8.
But I forgot to set the mysql connection encoding to utf8, so it probably defaulted to ISO-8859.
For a long time this was not a problem. Although special characters where displayed wrong in Tools like phpMyAdmin, the data was correct when loading it into my php application, as long as I kept using the wrong connection encoding.
But now I need to use my database from another application, that (correctly) does not use ISO-8859 as connection encoding and gets broken special characters.
Now I want to convert my database so I can use the right connection encoding.
I already tried this:
mysql wrong connection encoding
But I does not help for me. The closest I got to a solution was 'ut8_decode(utf8_decode($data))'.
But this breaks fields that start with a special character.
Additional Information:
So what might happen is the following:
My application sends some utf8 Data to the database.
Mysql gets the data but thinks (due to the connection encoding) that it is not utf8, and converts it, to fit for the 'utf8_unicode_ci' collation.
When my php application reads the data from the database mysql seems to undo the previous conversion so everything looks fine again from my php app.
I have a simple PHP web app that accepts icon images via file upload and stores them in a MEDIUMBLOB column.
On my machine (Windows) plus two Linux servers, this works fine. On a third Linux server, the inserted image is corrupted: unreadable after a SELECT, and the length of the column data as reported by the MySQL length() function is about 40% larger than the size of the uploaded file.
(Each server connects to a separate instance of MySQL.)
Of course, this leads me to think about encoding and character set issues. BLOB columns have no associated charsets, so it seems like the most likely culprit is PDO and its interpretation of the parameter value for that column.
I've tried using bindValue with PDO::PARAM_LOB, to no effect.
I've verified that the images are being received on the server correctly (i.e. am reading them post-upload with no problem), so it's definitely a DB/PDO issue.
I've searched for obvious configuration differences between the servers, but I'm not an expert in PHP configuration so I might have missed something.
The insert code is pretty much as follows:
$imagedata = file_get_contents($_FILES["icon"]["tmp_name"]);
$stmt = $pdo->prepare('insert into foo (theimage) values (:theimage)');
$stmt->bindValue(':theimage', $imagedata, PDO::PARAM_LOB);
$stmt->execute();
Any help will be really appreciated.
UPDATE: The default MySQL charset on the problematic server is utf8; it's latin1 on the others.
The problem is "solved" by adding PDO::MYSQL_ATTR_INIT_COMMAND => "SET NAMES latin1 COLLATE latin1_general_ci" to the PDO constructor.
This seems like a bug poor design to me: why should the charset of the connection have any effect on data for a binary column, particularly when it's been identified as binary to PDO itself with PARAM_LOB?
Note that the DB tables are defined as latin1 in all cases: it's only the servers' default charsets that are inconsistent.
This seems like a bug to me: why should the charset of the connection have any effect on data for a binary column, particularly when it's been identified as binary to PDO itself with PARAM_LOB?
I do not think that this must be a bug. I can imagine that whenever the client talks with the server and says that the following command is in UTF-8 and the server needs it in Latin-1, then the query might get re-encoded prior parsing and execution. So this is an encoding issue for the transportation of the data. As the whole query prior parsing will get influenced by this re-encoding, the binary data for the BLOB column will get changed as well.
From the Mysql manual:
What character set should the server translate a statement to after receiving it?
For this, the server uses the character_set_connection and collation_connection system variables. It converts statements sent by the client from character_set_client to character_set_connection (except for string literals that have an introducer such as _latin1 or _utf8). collation_connection is important for comparisons of literal strings. For comparisons of strings with column values, collation_connection does not matter because columns have their own collation, which has a higher collation precedence.
Or on the way back: Latin1 data from the store will get converted into UTF-8 because the client told the server that it prefers UTF-8 for the transportation.
The identifier for PDO itself you name looks like being something entirely different:
PDO::PARAM_LOB tells PDO to map the data as a stream, so that you can manipulate it using the PHP Streams API. (Ref)
I'm no MySQL expert but I would explain it this way. Client and server need to negotiate which charsets they are using and I assume they do this for a reason.