We are developing an iPhone app that would send emoticons from iPhone to server-side PHP and insert into MySQL tables. I am doing the server-side work.
But after insert statement executed successfully, the inserted value become blank.
What I could insert into the field(varchar) correctly is text, but once including emoticons,
just the text could be inserted and the emoticons would be cut automatically.
Someone give me advice about set the field type to Blog so that it could store image data.
But the inserted value is not always including emoticons case and size is small.
*I am using mysql_real_escape_string for inserting value.
Most iOS emojis use code points above the Basic Multilingual Plane of the Unicode table. For example, 😄 (SMILING FACE WITH OPEN MOUTH AND SMILING EYES) is at U+1F604.
Now, see http://dev.mysql.com/doc/refman/5.5/en/charset-unicode.html.
MySQL before version 5.5 only supports UTF-8 for the BMP, which includes characters between U+0000 and U+FFFF (i.e. only a subset of actual UTF-8; MySQL's utf8 is not real UTF-8). It cannot store the character at code point U+1F604 or other similar "high characters". MySQL 5.5+ supports utf8mb4 (actual UTF-8), utf16 and utf32, which are able to encode these characters. If you're using MySQL 5.5+, use one of these column character sets and make sure you're using the same charset for your connection encoding to/from PHP. If you are on MySQL < 5.5, you'll have to use a BLOB column type. That type stores raw bytes without caring about the "characters" in it. The downside is that you won't be able to efficiently search or index the text.
Some of the emoji characters work with older non-blobed mysql configurations because they are encoded using a 3 byte codepoint and mysql can store a 3 byte character. If you cannot upgrade mysql nor use blobs for whatever reason, you can scrub out 4 byte codepoints and keep the 3 byte ones.
If your computer has emoji capabilities, here is a list of the 3 byte iOS emoji characters:
☺❤✨❕❔✊✌✋☝☀☔☁⛄⚡☎➿✂⚽⚾⛳♠♥♣♦〽☕⛪⛺⛲⛵✈⛽⚠♨1⃣2⃣3⃣4⃣5⃣6⃣7⃣8⃣9⃣0⃣#⃣⬆⬇⬅➡↗↖↘↙◀▶⏪⏩♿㊙㊗✳✴♈♉♊♋♌♍♎♏♐♑♒♓⛎⭕❌©®™
Related
We are developing an iPhone app that would send emoticons from iPhone to server-side PHP and insert into MySQL tables. I am doing the server-side work.
But after insert statement executed successfully, the inserted value become blank.
What I could insert into the field(varchar) correctly is text, but once including emoticons,
just the text could be inserted and the emoticons would be cut automatically.
Someone give me advice about set the field type to Blog so that it could store image data.
But the inserted value is not always including emoticons case and size is small.
*I am using mysql_real_escape_string for inserting value.
Most iOS emojis use code points above the Basic Multilingual Plane of the Unicode table. For example, 😄 (SMILING FACE WITH OPEN MOUTH AND SMILING EYES) is at U+1F604.
Now, see http://dev.mysql.com/doc/refman/5.5/en/charset-unicode.html.
MySQL before version 5.5 only supports UTF-8 for the BMP, which includes characters between U+0000 and U+FFFF (i.e. only a subset of actual UTF-8; MySQL's utf8 is not real UTF-8). It cannot store the character at code point U+1F604 or other similar "high characters". MySQL 5.5+ supports utf8mb4 (actual UTF-8), utf16 and utf32, which are able to encode these characters. If you're using MySQL 5.5+, use one of these column character sets and make sure you're using the same charset for your connection encoding to/from PHP. If you are on MySQL < 5.5, you'll have to use a BLOB column type. That type stores raw bytes without caring about the "characters" in it. The downside is that you won't be able to efficiently search or index the text.
Some of the emoji characters work with older non-blobed mysql configurations because they are encoded using a 3 byte codepoint and mysql can store a 3 byte character. If you cannot upgrade mysql nor use blobs for whatever reason, you can scrub out 4 byte codepoints and keep the 3 byte ones.
If your computer has emoji capabilities, here is a list of the 3 byte iOS emoji characters:
☺❤✨❕❔✊✌✋☝☀☔☁⛄⚡☎➿✂⚽⚾⛳♠♥♣♦〽☕⛪⛺⛲⛵✈⛽⚠♨1⃣2⃣3⃣4⃣5⃣6⃣7⃣8⃣9⃣0⃣#⃣⬆⬇⬅➡↗↖↘↙◀▶⏪⏩♿㊙㊗✳✴♈♉♊♋♌♍♎♏♐♑♒♓⛎⭕❌©®™
I've been having some trouble figuring out the best way to handle UTF8 characters in PHP. I'm able to load UTF8 data (chinese characters) into Vertica just fine, and can see them there when using a JDBC client, so I know the data is being recorded correctly.
However, when I query via PHP, strings that contain UTF8 characters come through as nulls. However, I can do something like wrap the UTF8 field in a URI_PERCENT_ENCODE function, then do a urldecode on the data in PHP, which outputs the characters correctly.
Are there any ODBC driver settings, or PHP settings that you can recommend to handle UTF8 more gracefully?
We are running PHP 5.3, 64 bits.
For whatever it's worth, when working with the Vertica 64-bit ODBC for Windows and calling SQLDescribeColW to describe a table with Chinese name and Chinese column names (i.e. describing an SQL statement like 'select * from mytable'), the names returned encoded in "funky UTF-8".
The "funky UTF-8" or FUTF-8 encoding uses wchar_t[] (on Windows it is an array of 16-bit values) where in each entry in the array, there is a single real-UTF-8 byte.
For example, if the column name was "时髦" whose UTF-16 encoding is 65f6h,9ae6h (two characters, 16 bits each) and its UTF-8 encoding is e6h, 97h, b6h, e9h, abh, a6h (two characters, 3 bytes each) then in FUTF-8 you'd get: 00e6h, 0097h, 00b6h, 00e9h, 00abh, 00a6h (6 characters, 16 bits each).
I guess that this is what puts in null for PHP. I'd call it a bug of the ODBC driver.
Having trouble getting foreign characters and Emoji to display.
Edited to clarify
A user types an Emoji character into a text field which is then sent to the server (php) and saved into the database (mysql). When displaying the text we grab a JSON encoded string from the server, which is parsed and displayed on the client side.
QUESTION: the character for a "trophy" emoji saved in the DB reads as
%uD83C%uDFC6
When that is sent back to the client we don't see the emoji picture, we actually see the raw encoded text.
How would we get the client side to read that text as an emoji character and display the image?
(all on an iphone / mobile safari)
Thanks!
Check the encodings used by your client, your web server, and your database table. Make sure they are all using encodings that can handle the characters you are concerned about.
Looks like the problem is my MySql encoding... utf8mb4 would allow it - unfortunately it's unavailable before MySQL v5.5
the character for a "trophy" emoji saved in the DB reads as %uD83C%uDFC6
Then your data are already mangled. %u escapes are specific to the JavaScript escape() function, which should generally never be used. Make sure your textarea->PHP handling uses standards-compliant encoding, eg encodeURIComponent if you need to get a JS variable into a URL query.
Then, having proper raw UTF-8 strings in your PHP layer, you can worry about getting MySQL to store characters like the emoji that are outside of the Basic Multilingual Plane. Best way is columns with a utf8mb4 collation; if that is not available try binary columns which will allow you to store any byte sequence (treating it as UTF-8 when it comes back out). That way, however, you won't get case-insensitive comparisons.
I need to store in a MySQL database table ASCII character codes like this for example (★⋰⋱★⋰⋱★⋰⋱★⋰⋱★)
Should I manipulate the data before saving to my db (using javascript)
in order to be stored as html codes (&heart;) or should I change the type the data is stored and MySQL will handle everything? (Utf-8)
If your database field must be encoded in ASCII, I would definitely store those esoteric characters as say &heart; as you said because ASCII certainly does not extend to those characters (ASCII uses only 7 bits to store character data).
Nonetheless, I would recommend using UTF-8 for your database field. UTF-8 allows for a far wider range of characters.
The manual clearly states " ucs2 cannot be used as a client character set, which means that it does not work for SET NAMES or SET CHARACTER SET". So how can I insert, for example, the codepoint U+2193? I am using PHP 5.3 + PDO.
If you want to use Unicode for communicating with a MySQL server, your only option is to use UTF-8.
If you're working with UCS-2 or UTF-16 strings in PHP now, you'll have to convert them to UTF-8 before trying to store them. Also note that MySQL will give you back UTF-8 if that's what you set your client character set to, so you'll need to convert query results as well if you're committed to working with UCS-2 on the PHP side. (If you're in a position to make bigger changes, you'd likely be better off simply using UTF-8 everywhere than doing all this extra conversion.)
As for storing the codepoint U+2193, no worries: UTF-8 can represent every Unicode codepoint (in this specific case, it'd be 0xE2 0x86 0x93).
Technically, this is fudging a little, since MySQL's utf8 and ucs2 character sets only cover a subset of Unicode called the Basic Multilingual Plane (BMP). The world of Unicode charsets is expanded in MySQL 5.5 to move beyond the BMP, but you still can't use ucs2, the new utf16 or utf32 charsets as client charsets, leaving you still stuck with UTF-8.
For posterity, CREATE TABLE test (encoding varchar(255) CHARACTER SET ucs2); and then INSERT INTO test VALUES (1, CHAR(0x2193));. If I then run a SELECT * FROM test I see a down arrow.