MySQL save emoji's to innoDB table [duplicate] - php

We are developing an iPhone app that would send emoticons from iPhone to server-side PHP and insert into MySQL tables. I am doing the server-side work.
But after insert statement executed successfully, the inserted value become blank.
What I could insert into the field(varchar) correctly is text, but once including emoticons,
just the text could be inserted and the emoticons would be cut automatically.
Someone give me advice about set the field type to Blog so that it could store image data.
But the inserted value is not always including emoticons case and size is small.
*I am using mysql_real_escape_string for inserting value.

Most iOS emojis use code points above the Basic Multilingual Plane of the Unicode table. For example, 😄 (SMILING FACE WITH OPEN MOUTH AND SMILING EYES) is at U+1F604.
Now, see http://dev.mysql.com/doc/refman/5.5/en/charset-unicode.html.
MySQL before version 5.5 only supports UTF-8 for the BMP, which includes characters between U+0000 and U+FFFF (i.e. only a subset of actual UTF-8; MySQL's utf8 is not real UTF-8). It cannot store the character at code point U+1F604 or other similar "high characters". MySQL 5.5+ supports utf8mb4 (actual UTF-8), utf16 and utf32, which are able to encode these characters. If you're using MySQL 5.5+, use one of these column character sets and make sure you're using the same charset for your connection encoding to/from PHP. If you are on MySQL < 5.5, you'll have to use a BLOB column type. That type stores raw bytes without caring about the "characters" in it. The downside is that you won't be able to efficiently search or index the text.

Some of the emoji characters work with older non-blobed mysql configurations because they are encoded using a 3 byte codepoint and mysql can store a 3 byte character. If you cannot upgrade mysql nor use blobs for whatever reason, you can scrub out 4 byte codepoints and keep the 3 byte ones.
If your computer has emoji capabilities, here is a list of the 3 byte iOS emoji characters:
☺❤✨❕❔✊✌✋☝☀☔☁⛄⚡☎➿✂⚽⚾⛳♠♥♣♦〽☕⛪⛺⛲⛵✈⛽⚠♨1⃣2⃣3⃣4⃣5⃣6⃣7⃣8⃣9⃣0⃣#⃣⬆⬇⬅➡↗↖↘↙◀▶⏪⏩♿㊙㊗✳✴♈♉♊♋♌♍♎♏♐♑♒♓⛎⭕❌©®™

Related

Need help querying UTF8 strings from Vertica with PHP ODBC driver

I've been having some trouble figuring out the best way to handle UTF8 characters in PHP. I'm able to load UTF8 data (chinese characters) into Vertica just fine, and can see them there when using a JDBC client, so I know the data is being recorded correctly.
However, when I query via PHP, strings that contain UTF8 characters come through as nulls. However, I can do something like wrap the UTF8 field in a URI_PERCENT_ENCODE function, then do a urldecode on the data in PHP, which outputs the characters correctly.
Are there any ODBC driver settings, or PHP settings that you can recommend to handle UTF8 more gracefully?
We are running PHP 5.3, 64 bits.
For whatever it's worth, when working with the Vertica 64-bit ODBC for Windows and calling SQLDescribeColW to describe a table with Chinese name and Chinese column names (i.e. describing an SQL statement like 'select * from mytable'), the names returned encoded in "funky UTF-8".
The "funky UTF-8" or FUTF-8 encoding uses wchar_t[] (on Windows it is an array of 16-bit values) where in each entry in the array, there is a single real-UTF-8 byte.
For example, if the column name was "时髦" whose UTF-16 encoding is 65f6h,9ae6h (two characters, 16 bits each) and its UTF-8 encoding is e6h, 97h, b6h, e9h, abh, a6h (two characters, 3 bytes each) then in FUTF-8 you'd get: 00e6h, 0097h, 00b6h, 00e9h, 00abh, 00a6h (6 characters, 16 bits each).
I guess that this is what puts in null for PHP. I'd call it a bug of the ODBC driver.

Emoji Support in PHP & MySQL

I've been asked to enable Emoji support for an APP backed by a PHP API. The APP is currently iPhone only (i don't have one, but i'm assuming it has Emoji's on it?).
Anyway, i noticed the database for some reason uses latin_swedish everywhere. But since i wasn't sure if utf-8 could support the 4 byte character strings required for the full emoji range, i started googling, but couldn't realy get a full answer from the results.
So:
To support Emoji's, do the charset's/collation's need setting to utf-8 in mysql, or utf-8 mb4?
If charset needs setting to utf8mb4, what is the difference between utf8 and utf8mb4 (utf8 supports up to 4 bytes anyway doesnt it?). Does it force characters to be stored in 4 byte representations at a fixed width (assuming requiring 4x more storage space per chatacter even on the standard ascii range which would normally be 1 byte).
Can utf8 be compared to utf8mb4 in mysql queries? What if i try to do a full text search, or a where clause on a utf8mb4 charset against a utf8 column of another table?
Does PHP support 4byte strings without having to use a special library like mb_string? i.e. can i just assign $var = $_POST['text'] and do things like $emoji_var == 'xxxx' or do i have to literally change all strings in PHP to use mbstring and change all comparitors e.c.t.
Just trying to work out how much work is involved in having emoji support, and any caveats of doing so. So any help would be great.

Store ASCII codes in mysql DB

I need to store in a MySQL database table ASCII character codes like this for example (★⋰⋱★⋰⋱★⋰⋱★⋰⋱★)
Should I manipulate the data before saving to my db (using javascript)
in order to be stored as html codes (&heart;) or should I change the type the data is stored and MySQL will handle everything? (Utf-8)
If your database field must be encoded in ASCII, I would definitely store those esoteric characters as say &heart; as you said because ASCII certainly does not extend to those characters (ASCII uses only 7 bits to store character data).
Nonetheless, I would recommend using UTF-8 for your database field. UTF-8 allows for a far wider range of characters.

iPhone emoticons insert into MySQL but become blank value

We are developing an iPhone app that would send emoticons from iPhone to server-side PHP and insert into MySQL tables. I am doing the server-side work.
But after insert statement executed successfully, the inserted value become blank.
What I could insert into the field(varchar) correctly is text, but once including emoticons,
just the text could be inserted and the emoticons would be cut automatically.
Someone give me advice about set the field type to Blog so that it could store image data.
But the inserted value is not always including emoticons case and size is small.
*I am using mysql_real_escape_string for inserting value.
Most iOS emojis use code points above the Basic Multilingual Plane of the Unicode table. For example, 😄 (SMILING FACE WITH OPEN MOUTH AND SMILING EYES) is at U+1F604.
Now, see http://dev.mysql.com/doc/refman/5.5/en/charset-unicode.html.
MySQL before version 5.5 only supports UTF-8 for the BMP, which includes characters between U+0000 and U+FFFF (i.e. only a subset of actual UTF-8; MySQL's utf8 is not real UTF-8). It cannot store the character at code point U+1F604 or other similar "high characters". MySQL 5.5+ supports utf8mb4 (actual UTF-8), utf16 and utf32, which are able to encode these characters. If you're using MySQL 5.5+, use one of these column character sets and make sure you're using the same charset for your connection encoding to/from PHP. If you are on MySQL < 5.5, you'll have to use a BLOB column type. That type stores raw bytes without caring about the "characters" in it. The downside is that you won't be able to efficiently search or index the text.
Some of the emoji characters work with older non-blobed mysql configurations because they are encoded using a 3 byte codepoint and mysql can store a 3 byte character. If you cannot upgrade mysql nor use blobs for whatever reason, you can scrub out 4 byte codepoints and keep the 3 byte ones.
If your computer has emoji capabilities, here is a list of the 3 byte iOS emoji characters:
☺❤✨❕❔✊✌✋☝☀☔☁⛄⚡☎➿✂⚽⚾⛳♠♥♣♦〽☕⛪⛺⛲⛵✈⛽⚠♨1⃣2⃣3⃣4⃣5⃣6⃣7⃣8⃣9⃣0⃣#⃣⬆⬇⬅➡↗↖↘↙◀▶⏪⏩♿㊙㊗✳✴♈♉♊♋♌♍♎♏♐♑♒♓⛎⭕❌©®™

How do I insert UCS-2 data with PHP PDO into MySQL?

The manual clearly states " ucs2 cannot be used as a client character set, which means that it does not work for SET NAMES or SET CHARACTER SET". So how can I insert, for example, the codepoint U+2193? I am using PHP 5.3 + PDO.
If you want to use Unicode for communicating with a MySQL server, your only option is to use UTF-8.
If you're working with UCS-2 or UTF-16 strings in PHP now, you'll have to convert them to UTF-8 before trying to store them. Also note that MySQL will give you back UTF-8 if that's what you set your client character set to, so you'll need to convert query results as well if you're committed to working with UCS-2 on the PHP side. (If you're in a position to make bigger changes, you'd likely be better off simply using UTF-8 everywhere than doing all this extra conversion.)
As for storing the codepoint U+2193, no worries: UTF-8 can represent every Unicode codepoint (in this specific case, it'd be 0xE2 0x86 0x93).
Technically, this is fudging a little, since MySQL's utf8 and ucs2 character sets only cover a subset of Unicode called the Basic Multilingual Plane (BMP). The world of Unicode charsets is expanded in MySQL 5.5 to move beyond the BMP, but you still can't use ucs2, the new utf16 or utf32 charsets as client charsets, leaving you still stuck with UTF-8.
For posterity, CREATE TABLE test (encoding varchar(255) CHARACTER SET ucs2); and then INSERT INTO test VALUES (1, CHAR(0x2193));. If I then run a SELECT * FROM test I see a down arrow.

Categories