I have successfully installed the ODBC iseries driver on a linux box. And I am calling into a DB2 iseries(6). Everything is running smoothly until I try to pull data from a column CDESC VARCHAR(3000). When the characters are below 255 I get no issues, but when it is over 255 the query fails and breaks the app. The data in the table is well over 255, but I just cannot pull it back out. I have tried CAST(CDESC AS TEXT) AS DESC, but this does not work. Any thoughts on the driver settings or changing the column type? Thanks in advance
VARCHARis a data type for single-byte character set data [SBCS], not double-byte [storesDBCS]. As such, it is impossible that it could store characters over 255.
If you need to support double byte characters, you might look at NVARCHAR which handles Unicode character sets.
Perhaps the issue is in translating to your character set. Remember that DB2 for i stores SBCS data in EBCDIC based character sets, not ASCII related ones. What CCSID's are you using on your end, and what is the data stored in?
Related
I noticed that when doing database queries in PHP (e.g. Zend_db, mysqli...), you can set the character set. For example: mysqli_set_charset($con,"utf8"); I'm a little foggy as to what this actually does behind the scenes.
If I use php to do a database SELECT query, and I indicate a character set, what happens if it's not the same character set that the column was defined as in the database?
I mean, the database returns a binary sequence, but what is actually returned if the character is not encoded the same in the two character sets? Will mySQL take the internal binary data and return it "As-is"?
Or will MySQL try to convert it to the binary sequence that's the equivalent in the character set you indicated?
I guess the gist of my question is that I would like to know how the data is encoded when PHP is sending in the query, how it's transmitted back from MySQL, and whether there's another step of translation after PHP gets it back and stores it into a string in PHP internal memory.
Similarly, if you're doing an INSERT or update, how does it get sent from PHP to MySQL? Does PHP convert it to the correct binary encoding THEN send it into MySQL?
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
Update:
Thanks to Raymond Nijland. I was able to fix my bug. But I did notice that for nonstandard characters, the charset does seem to matter.
I did a select statement using $db = new \PDO("mysql:host=$host;dbname=$database;charset=latin1", $dbuser, $dbpassword);. First, I tried latin1, then I tried utf8.
My problem was that I had a column with encrypted data, which I guess had some wierd characters in it. if I did an md5 on that field directly in the database query, it gave me an encoding that began with 889... BUT, I tried to pulled it into PHP with a SELECT statement. If I used PDO with a charset of latin1, then did an MD5() inside of PHP, it gives me the same hash (889...). Which implies that PHP has an exact copy of the binary that's in the database. BUT if I did used PDO with charset 'UTF-8', then did an MD5() in PHP, it gave me a hash beginning with 087... So somewhere, a conversion must be taking place.
At this point, my orignal bug is fixed, but I'm still curious as to what's happening. Is MYSQL doing the conversion before returning it to PHP, or does PDO do some sort of conversion on the PHP side?
mysqli_set_charset($con,"utf8"); (or other code for other client libraries) declares to MySQL that the encoding in the client will be MySQL's CHARACTER SET utf8. If bytes with a different encoding are sent to (think INSERT) mysql, garbage or errors will occur.
That setting also declares that the client desires that encoding from SELECTs.
The CHARACTER SET on each column in each table may be something else (eg, "latin1"). If so, MySQL will attempt to convert the encoding during the transmission.
Caution: MySQL's CHARACTER SET utf8 is a subset of the well-known UTF-8. To get the latter, use CHARACTER SET utf8mb4 in tables and mysqli_set_charset($con,"utf8mb4"); when connecting.
Going forward, utf8mb4 is preferred in most situations.
Non-text stuff ("as-is") should be put into BLOB or VARBINARY columns -- this bypasses any checking of the encoding. (Think a .jpg or AES_ENCRYPT.)
MySQL's MD5() function returns a hex string. UNHEX(MD5('...')) return binary stuff and must be store in, say, a BINARY(16) column.
Many forms of garbled text are discussed in Trouble with UTF-8 characters; what I see is not what I stored .
We are developing an iPhone app that would send emoticons from iPhone to server-side PHP and insert into MySQL tables. I am doing the server-side work.
But after insert statement executed successfully, the inserted value become blank.
What I could insert into the field(varchar) correctly is text, but once including emoticons,
just the text could be inserted and the emoticons would be cut automatically.
Someone give me advice about set the field type to Blog so that it could store image data.
But the inserted value is not always including emoticons case and size is small.
*I am using mysql_real_escape_string for inserting value.
Most iOS emojis use code points above the Basic Multilingual Plane of the Unicode table. For example, 😄 (SMILING FACE WITH OPEN MOUTH AND SMILING EYES) is at U+1F604.
Now, see http://dev.mysql.com/doc/refman/5.5/en/charset-unicode.html.
MySQL before version 5.5 only supports UTF-8 for the BMP, which includes characters between U+0000 and U+FFFF (i.e. only a subset of actual UTF-8; MySQL's utf8 is not real UTF-8). It cannot store the character at code point U+1F604 or other similar "high characters". MySQL 5.5+ supports utf8mb4 (actual UTF-8), utf16 and utf32, which are able to encode these characters. If you're using MySQL 5.5+, use one of these column character sets and make sure you're using the same charset for your connection encoding to/from PHP. If you are on MySQL < 5.5, you'll have to use a BLOB column type. That type stores raw bytes without caring about the "characters" in it. The downside is that you won't be able to efficiently search or index the text.
Some of the emoji characters work with older non-blobed mysql configurations because they are encoded using a 3 byte codepoint and mysql can store a 3 byte character. If you cannot upgrade mysql nor use blobs for whatever reason, you can scrub out 4 byte codepoints and keep the 3 byte ones.
If your computer has emoji capabilities, here is a list of the 3 byte iOS emoji characters:
☺❤✨❕❔✊✌✋☝☀☔☁⛄⚡☎➿✂⚽⚾⛳♠♥♣♦〽☕⛪⛺⛲⛵✈⛽⚠♨1⃣2⃣3⃣4⃣5⃣6⃣7⃣8⃣9⃣0⃣#⃣⬆⬇⬅➡↗↖↘↙◀▶⏪⏩♿㊙㊗✳✴♈♉♊♋♌♍♎♏♐♑♒♓⛎⭕❌©®™
I've been having some trouble figuring out the best way to handle UTF8 characters in PHP. I'm able to load UTF8 data (chinese characters) into Vertica just fine, and can see them there when using a JDBC client, so I know the data is being recorded correctly.
However, when I query via PHP, strings that contain UTF8 characters come through as nulls. However, I can do something like wrap the UTF8 field in a URI_PERCENT_ENCODE function, then do a urldecode on the data in PHP, which outputs the characters correctly.
Are there any ODBC driver settings, or PHP settings that you can recommend to handle UTF8 more gracefully?
We are running PHP 5.3, 64 bits.
For whatever it's worth, when working with the Vertica 64-bit ODBC for Windows and calling SQLDescribeColW to describe a table with Chinese name and Chinese column names (i.e. describing an SQL statement like 'select * from mytable'), the names returned encoded in "funky UTF-8".
The "funky UTF-8" or FUTF-8 encoding uses wchar_t[] (on Windows it is an array of 16-bit values) where in each entry in the array, there is a single real-UTF-8 byte.
For example, if the column name was "时髦" whose UTF-16 encoding is 65f6h,9ae6h (two characters, 16 bits each) and its UTF-8 encoding is e6h, 97h, b6h, e9h, abh, a6h (two characters, 3 bytes each) then in FUTF-8 you'd get: 00e6h, 0097h, 00b6h, 00e9h, 00abh, 00a6h (6 characters, 16 bits each).
I guess that this is what puts in null for PHP. I'd call it a bug of the ODBC driver.
I have a MySQL table & fields that are all set to UTF-8. The thing is, a previous PHP script, which was in charge of the database writing, was using some other encoding, not sure whether it is in the script itself, the MySQL connection or somewhere else. The result is that although the table & fields are set to UTF-8, we see the wrong chars instead of Chinese.
It looks like that:
Now, the previous scripts (which were in charge of the writing and corrupted the data) can read it well for some reason, but my new script which all encoded in UTF-8, shows chars like ½©. How can that be fixed?
By the sound of it, you have a utf8 column but you are writing to it and reading from it using a latin1 connection, so what is actually being stored in the table is mis-encoded. Your problem is that when you read from the table using a utf8 connection, you see the data that's actually stored there, which is why it looks wrong. You can fix the mis-encoded data in the table by converting to latin1, then back to utf8 via the binary character set (three steps in total).
The original database was in a Chinese encoding – GB-18030 or similar, not Latin-1 – and the bytes that make up these characters, when displayed in UTF-8, show up as a bunch of Latin diacritics. Read each string as GB-18030, convert it to UTF-8, and save.
I use ODBC to connect to SQL Server from PHP.
In PHP I read some string (nvarchar column) data from SQL Server and then want to insert it to mysql database. When I try to insert such value to mysql database table I get this mysql error:
Incorrect string value: '\xB3\xB9ow...' for column 'name' at row 1
For string with all ASCII characters everything is fine, the problem occurs when non-ASCII characters (from some European languages) exist.
So, in more general terms: there is a Unicode string in MS SQL Server database, which is retrieved by PHP trough ODBC. Then it is put in sql insert query (as value for utf-8 varchar column) which is executed for mysql database.
Can someone explain to me what is happening in this situation in terms of encoding? At which step what character encoding convertions may take place?
I use: PHP 5.2.5, MySQL5.0.45-community-nt, MS Sql Server 2005.
PHP have to run on Linux platform.
UPDATE: The error doesn't occur when I call utf8_encode($s) on this string and use that value in mysql insert query, but then the inserted string doesn't display correctly in mysql database (so that utf8 encoding only worked for enforcing proper utf8 string, but it loses correct characters).
First you have the encoding of the DB. Then you have the encoding used by the ODBC client.
If the encoding of your ODBC client connection does not match the one of the DB, the ODBC layer will automatically transcode your data, in some cases.
The trick here is to force the encoding of the ODBC client connection.
For an "all UTF-8" setup :
$conn=odbc_connect(DB_DSN,DB_USR,DB_PWD);
odbc_exec($conn, "SET NAMES 'UTF8'");
odbc_exec($conn, "SET client_encoding='UTF-8'");
// processing here
This works perfectly with PostgreSQL + Php 5.x.
The exact syntax and options depends on the DB vendor.
You can find very useful and clear additional info for MySql here : http://dev.mysql.com/doc/refman/5.0/fr/charset-connection.html
hope this helps.
Maybe you can use the PDO extension, if it will make any difference?
There is a user contributed comment here that suggests to change the data types in sql server to somethig else, if this is not possible look at the users class that casts fields.
I have no experience with ODBC via PHP, but with the mysql functions PHP seems to default to ASCII and UTF8 connections need to be made explicit if you want to avoid trouble.
Are you sure PHP and the MySQL server are communicating in UTF8? Until PHP 6 the Unicode support tends to be annoyingly inconistent like that.
I remember that the MySQL docs mention a connection string parameter to tweak the Unicode encoding.
From your description it sounds like PHP is treating the connection as ASCII-only.