Store ASCII codes in mysql DB - php

I need to store in a MySQL database table ASCII character codes like this for example (★⋰⋱★⋰⋱★⋰⋱★⋰⋱★)
Should I manipulate the data before saving to my db (using javascript)
in order to be stored as html codes (&heart;) or should I change the type the data is stored and MySQL will handle everything? (Utf-8)

If your database field must be encoded in ASCII, I would definitely store those esoteric characters as say &heart; as you said because ASCII certainly does not extend to those characters (ASCII uses only 7 bits to store character data).
Nonetheless, I would recommend using UTF-8 for your database field. UTF-8 allows for a far wider range of characters.

Related

MySQL save emoji's to innoDB table [duplicate]

We are developing an iPhone app that would send emoticons from iPhone to server-side PHP and insert into MySQL tables. I am doing the server-side work.
But after insert statement executed successfully, the inserted value become blank.
What I could insert into the field(varchar) correctly is text, but once including emoticons,
just the text could be inserted and the emoticons would be cut automatically.
Someone give me advice about set the field type to Blog so that it could store image data.
But the inserted value is not always including emoticons case and size is small.
*I am using mysql_real_escape_string for inserting value.
Most iOS emojis use code points above the Basic Multilingual Plane of the Unicode table. For example, 😄 (SMILING FACE WITH OPEN MOUTH AND SMILING EYES) is at U+1F604.
Now, see http://dev.mysql.com/doc/refman/5.5/en/charset-unicode.html.
MySQL before version 5.5 only supports UTF-8 for the BMP, which includes characters between U+0000 and U+FFFF (i.e. only a subset of actual UTF-8; MySQL's utf8 is not real UTF-8). It cannot store the character at code point U+1F604 or other similar "high characters". MySQL 5.5+ supports utf8mb4 (actual UTF-8), utf16 and utf32, which are able to encode these characters. If you're using MySQL 5.5+, use one of these column character sets and make sure you're using the same charset for your connection encoding to/from PHP. If you are on MySQL < 5.5, you'll have to use a BLOB column type. That type stores raw bytes without caring about the "characters" in it. The downside is that you won't be able to efficiently search or index the text.
Some of the emoji characters work with older non-blobed mysql configurations because they are encoded using a 3 byte codepoint and mysql can store a 3 byte character. If you cannot upgrade mysql nor use blobs for whatever reason, you can scrub out 4 byte codepoints and keep the 3 byte ones.
If your computer has emoji capabilities, here is a list of the 3 byte iOS emoji characters:
☺❤✨❕❔✊✌✋☝☀☔☁⛄⚡☎➿✂⚽⚾⛳♠♥♣♦〽☕⛪⛺⛲⛵✈⛽⚠♨1⃣2⃣3⃣4⃣5⃣6⃣7⃣8⃣9⃣0⃣#⃣⬆⬇⬅➡↗↖↘↙◀▶⏪⏩♿㊙㊗✳✴♈♉♊♋♌♍♎♏♐♑♒♓⛎⭕❌©®™

MySQL: Different charsets for different text contents, does it worth?

I have my database with utf8mb4 in all tables and all char/varchar/text columns. All is working fine but I was wondering if I really need it for all columns. I mean, I have columns that will contain user text that require utf8mb4 since the user can type in any language, insert emoticons, and so on. However I have different columns that will contain other kind of strings like user access tokens, country codes, user nicknames that does not contain strange characters, and so on.
Does it worth to change the charset of these columns to something like ascii or latin1? It would improve database space, efficiency? My feel is that set a charset like utf84mb for something that will never contain unicode characters is a waste of 'something'... but I really do not know how this is managed internally by MySQL.
In the other side I am connecting to this database from php and setting the connection charset to uft8mb4, so I suppose that all non utf8 columns will be converted automatically. I suppose is not a problem as utf8 is superset of ascii or latin1.
Any tips? pros and contras? Thanks!
The short answer is to make all your columns and tables defaulting to the same thing, UTF-8.
The long answer is because of the way UTF-8 is encoded, where ASCII will map 1:1 with UTF-8 and not incur any additional storage overhead like you might experience with UTF-16 or UTF-32, it's not a big deal. If you're storing non-ASCII characters it will take more space, but if you're storing those, you'll need the support anyway.
Having mixed character sets in your tables is just asking for trouble. The only exception is when defining BINARY or BLOB type columns that are not UTF-8 but instead binary.
Even the documentation makes it clear the only place this is an issue is with CHAR columns rather than VARCHAR, but it's not really a good idea to use CHAR columns in the first place.
ASCII is a strict subset of UTF-8, so there is exactly zero gain in space efficiency if you have nothing that uses special characters stored in UTF-8. There is a marginal improvement in space efficiency if you use latin-1 instead of UTF-8 for storing latin-derived text (special characters that UTF-8 uses 2 bytes for can be stored with just one byte in latin-1), but you gain a lot of headaches on the way, and you lose compatibility with wider character sets.
For example, ñ is stored as 0xC3 0xB1 in UTF-8, whereas latin-1 stores it as 0xF1. On the other hand, a is 0x61 in both encodings. The clever guys that invented UTF8 did it this way. You save a single byte, only for special characters.
TL;DR Use UTF-8 for everything. If you have to ask, you don't need anything else.

Proper handling of foreign characters / emoji

Having trouble getting foreign characters and Emoji to display.
Edited to clarify
A user types an Emoji character into a text field which is then sent to the server (php) and saved into the database (mysql). When displaying the text we grab a JSON encoded string from the server, which is parsed and displayed on the client side.
QUESTION: the character for a "trophy" emoji saved in the DB reads as
%uD83C%uDFC6
When that is sent back to the client we don't see the emoji picture, we actually see the raw encoded text.
How would we get the client side to read that text as an emoji character and display the image?
(all on an iphone / mobile safari)
Thanks!
Check the encodings used by your client, your web server, and your database table. Make sure they are all using encodings that can handle the characters you are concerned about.
Looks like the problem is my MySql encoding... utf8mb4 would allow it - unfortunately it's unavailable before MySQL v5.5
the character for a "trophy" emoji saved in the DB reads as %uD83C%uDFC6
Then your data are already mangled. %u escapes are specific to the JavaScript escape() function, which should generally never be used. Make sure your textarea->PHP handling uses standards-compliant encoding, eg encodeURIComponent if you need to get a JS variable into a URL query.
Then, having proper raw UTF-8 strings in your PHP layer, you can worry about getting MySQL to store characters like the emoji that are outside of the Basic Multilingual Plane. Best way is columns with a utf8mb4 collation; if that is not available try binary columns which will allow you to store any byte sequence (treating it as UTF-8 when it comes back out). That way, however, you won't get case-insensitive comparisons.

PHP MySQL Chinese UTF-8 Issue

I have a MySQL table & fields that are all set to UTF-8. The thing is, a previous PHP script, which was in charge of the database writing, was using some other encoding, not sure whether it is in the script itself, the MySQL connection or somewhere else. The result is that although the table & fields are set to UTF-8, we see the wrong chars instead of Chinese.
It looks like that:
Now, the previous scripts (which were in charge of the writing and corrupted the data) can read it well for some reason, but my new script which all encoded in UTF-8, shows chars like ½©. How can that be fixed?
By the sound of it, you have a utf8 column but you are writing to it and reading from it using a latin1 connection, so what is actually being stored in the table is mis-encoded. Your problem is that when you read from the table using a utf8 connection, you see the data that's actually stored there, which is why it looks wrong. You can fix the mis-encoded data in the table by converting to latin1, then back to utf8 via the binary character set (three steps in total).
The original database was in a Chinese encoding – GB-18030 or similar, not Latin-1 – and the bytes that make up these characters, when displayed in UTF-8, show up as a bunch of Latin diacritics. Read each string as GB-18030, convert it to UTF-8, and save.

iPhone emoticons insert into MySQL but become blank value

We are developing an iPhone app that would send emoticons from iPhone to server-side PHP and insert into MySQL tables. I am doing the server-side work.
But after insert statement executed successfully, the inserted value become blank.
What I could insert into the field(varchar) correctly is text, but once including emoticons,
just the text could be inserted and the emoticons would be cut automatically.
Someone give me advice about set the field type to Blog so that it could store image data.
But the inserted value is not always including emoticons case and size is small.
*I am using mysql_real_escape_string for inserting value.
Most iOS emojis use code points above the Basic Multilingual Plane of the Unicode table. For example, 😄 (SMILING FACE WITH OPEN MOUTH AND SMILING EYES) is at U+1F604.
Now, see http://dev.mysql.com/doc/refman/5.5/en/charset-unicode.html.
MySQL before version 5.5 only supports UTF-8 for the BMP, which includes characters between U+0000 and U+FFFF (i.e. only a subset of actual UTF-8; MySQL's utf8 is not real UTF-8). It cannot store the character at code point U+1F604 or other similar "high characters". MySQL 5.5+ supports utf8mb4 (actual UTF-8), utf16 and utf32, which are able to encode these characters. If you're using MySQL 5.5+, use one of these column character sets and make sure you're using the same charset for your connection encoding to/from PHP. If you are on MySQL < 5.5, you'll have to use a BLOB column type. That type stores raw bytes without caring about the "characters" in it. The downside is that you won't be able to efficiently search or index the text.
Some of the emoji characters work with older non-blobed mysql configurations because they are encoded using a 3 byte codepoint and mysql can store a 3 byte character. If you cannot upgrade mysql nor use blobs for whatever reason, you can scrub out 4 byte codepoints and keep the 3 byte ones.
If your computer has emoji capabilities, here is a list of the 3 byte iOS emoji characters:
☺❤✨❕❔✊✌✋☝☀☔☁⛄⚡☎➿✂⚽⚾⛳♠♥♣♦〽☕⛪⛺⛲⛵✈⛽⚠♨1⃣2⃣3⃣4⃣5⃣6⃣7⃣8⃣9⃣0⃣#⃣⬆⬇⬅➡↗↖↘↙◀▶⏪⏩♿㊙㊗✳✴♈♉♊♋♌♍♎♏♐♑♒♓⛎⭕❌©®™

Categories