Php code igniter save turkish characters to my sql - php

I am writing a web app using PHP Codeigniter. I receive input that can be in any language and I save it in my DB.
The MYSQL DB collation is set to utf8_unicode_ci.
For codeigniter in the database.php I have set this:
$db['default']['char_set'] = 'utf8';
$db['default']['dbcollat'] = 'utf8_unicode_ci';
When I run the following insert on my DB:
insert into user (name,id) values ('John Temirtaş', 2)
I get this error:
Incorrect string value: '\xC5\x9F' for column 'name' at row 1
There is a problem with the s. Its a turkish character.
So far I have tried this while debugging
print_r($name)
John Temirtaş
print_r("Encoded Name: ".utf8_encode($name))
Encoded Name: John TemirtaÅ
print_r("Decoded Name".utf8_decode($name)
Encoded Name: John Temirta?
print_r("Decoded-Encode Name: ".utf8_decode(utf8_encode($name)))
Decoded-Encode Name: John Temirtaş
I have tried saving John TemirtaÅ in the db and it works fine. So I think I might utf8_encode($name) before saving it in the DB and utf8_decode before displaying it. Just doing the latter will NOT work. YOU NEED TO ADD mb_internal_encoding("UTF-8"); to the top of your php script.
How do I encode the data properly so that its inserted?
THANK YOU EVERYONE FOR YOU HELP HERE IS WHAT WORKED!
Open up MySql work bench. Set the
character encoding and collation of
the user table. Charset: utf-8
Collation: utf8_unicode_ci
Set the collation of the name column
to utf8_unicode_ci. Done. The
insert should work.
Thank you for all your help.

Before running your insert query, try running the following query first:
mysql_query("SET NAMES 'utf8'");
and seeing if that helps.

You did everything okay. I am from turkey and in every project I develop I am checking these things:
Is my php file saving with UTF8 encode?
Are my tables and its fields collated with utf8_unicode_ci
Do not change codeigniter's "char_set" and "dbcollat" options.
If you do these, there should be no problem.

Try this before inserting record in databse.
$this->db->db_set_charset('latin1', 'latin1_swedish_ci');
// -> latin1- Charset &
// -> latin1_swedish_ci - Collation
Make sure you have same setting in DB for table as well as for table column.

Maybe I cannot answer exactly to what you are asking but I can give a few general tips cause I use utf-8 my self.
Try entering this character by hand from some client (heidisql is a free one).
When I worked with GET things didn't work well but with htmlencode, htmlspecialchars.. with such functions it works fine.
Later I only worked with POST so no encoding required and I have implemented an utf-8 autocomplete that works fine.
I thing it is worth to try the encoding thing and var_dump sql just before insert in order to make sure what is entered.
In some complex situtation quotes stored in variables have come very handy to me, for example:
$dbq="\"";
$sql=$dbq .'some sql '. $dbq;
I work with PHP PDO and never had a problem with utf-8 so far. Hope any of this is of some help!
I forgot.. you said codeIgniter.. entering a value by hand will assure you at least where to focus, db or framework (luckily I use mine mvc). Just try to get very inside into the mvc code and see what does sql look like just before insertion.

Related

General error: 1366 Incorrect string value: '\xE0\xAE\x95\xE0\xAF\x8A...' for column 'occupation' in Laravel [duplicate]

After noticing an application tended to discard random emails due to incorrect string value errors, I went though and switched many text columns to use the utf8 column charset and the default column collate (utf8_general_ci) so that it would accept them. This fixed most of the errors, and made the application stop getting sql errors when it hit non-latin emails, too.
Despite this, some of the emails are still causing the program to hit incorrect string value errrors: (Incorrect string value: '\xE4\xC5\xCC\xC9\xD3\xD8...' for column 'contents' at row 1)
The contents column is a MEDIUMTEXT datatybe which uses the utf8 column charset and the utf8_general_ci column collate. There are no flags that I can toggle in this column.
Keeping in mind that I don't want to touch or even look at the application source code unless absolutely necessary:
What is causing that error? (yes, I know the emails are full of random garbage, but I thought utf8 would be pretty permissive)
How can I fix it?
What are the likely effects of such a fix?
One thing I considered was switching to a utf8 varchar([some large number]) with the binary flag turned on, but I'm rather unfamiliar with MySQL, and have no idea if such a fix makes sense.
UPDATE to the below answer:
The time the question was asked, "UTF8" in MySQL meant utf8mb3. In the meantime, utf8mb4 was added, but to my knowledge MySQLs "UTF8" was not switched to mean utf8mb4.
That means, you'd need to specifically put "utf8mb4", if you mean it (and you should use utf8mb4)
I'll keep this here instead of just editing the answer, to make clear there is still a difference when saying "UTF8"
Original
I would not suggest Richies answer, because you are screwing up the data inside the database. You would not fix your problem but try to "hide" it and not being able to perform essential database operations with the crapped data.
If you encounter this error either the data you are sending is not UTF-8 encoded, or your connection is not UTF-8. First, verify, that the data source (a file, ...) really is UTF-8.
Then, check your database connection, you should do this after connecting:
SET NAMES 'utf8mb4';
SET CHARACTER SET utf8mb4;
Next, verify that the tables where the data is stored have the utf8mb4 character set:
SELECT
`tables`.`TABLE_NAME`,
`collations`.`character_set_name`
FROM
`information_schema`.`TABLES` AS `tables`,
`information_schema`.`COLLATION_CHARACTER_SET_APPLICABILITY` AS `collations`
WHERE
`tables`.`table_schema` = DATABASE()
AND `collations`.`collation_name` = `tables`.`table_collation`
;
Last, check your database settings:
mysql> show variables like '%colla%';
mysql> show variables like '%charac%';
If source, transport and destination are utf8mb4, your problem is gone;)
MySQL’s utf-8 types are not actually proper utf-8 – it only uses up to three bytes per character and supports only the Basic Multilingual Plane (i.e. no Emoji, no astral plane, etc.).
If you need to store values from higher Unicode planes, you need the utf8mb4 encodings.
The table and fields have the wrong encoding; however, you can convert them to UTF-8.
ALTER TABLE logtest CONVERT TO CHARACTER SET utf8 COLLATE utf8_general_ci;
ALTER TABLE logtest DEFAULT CHARACTER SET utf8 COLLATE utf8_general_ci;
ALTER TABLE logtest CHANGE title title VARCHAR(100) CHARACTER SET utf8 COLLATE utf8_general_ci;
"\xE4\xC5\xCC\xC9\xD3\xD8" isn't valid UTF-8. Tested using Python:
>>> "\xE4\xC5\xCC\xC9\xD3\xD8".decode("utf-8")
...
UnicodeDecodeError: 'utf8' codec can't decode bytes in position 0-2: invalid data
If you're looking for a way to avoid decoding errors within the database, the cp1252 encoding (aka "Windows-1252" aka "Windows Western European") is the most permissive encoding there is - every byte value is a valid code point.
Of course it's not going to understand genuine UTF-8 any more, nor any other non-cp1252 encoding, but it sounds like you're not too concerned about that?
I solved this problem today by altering the column to 'LONGBLOB' type which stores raw bytes instead of UTF-8 characters.
The only disadvantage of doing this is that you have to take care of the encoding yourself. If one client of your application uses UTF-8 encoding and another uses CP1252, you may have your emails sent with incorrect characters. To avoid this, always use the same encoding (e.g. UTF-8) across all your applications.
Refer to this page http://dev.mysql.com/doc/refman/5.0/en/blob.html for more details of the differences between TEXT/LONGTEXT and BLOB/LONGBLOB. There are also many other arguments on the web discussing these two.
First check if your default_character_set_name is utf8.
SELECT default_character_set_name FROM information_schema.SCHEMATA S WHERE schema_name = "DBNAME";
If the result is not utf8 you must convert your database. At first you must save a dump.
To change the character set encoding to UTF-8 for all of the tables in the specified database, type the following command at the command line. Replace DBNAME with the database name:
mysql --database=DBNAME -B -N -e "SHOW TABLES" | awk '{print "SET foreign_key_checks = 0; ALTER TABLE", $1, "CONVERT TO CHARACTER SET utf8 COLLATE utf8_general_ci; SET foreign_key_checks = 1; "}' | mysql --database=DBNAME
To change the character set encoding to UTF-8 for the database itself, type the following command at the mysql> prompt. Replace DBNAME with the database name:
ALTER DATABASE DBNAME CHARACTER SET utf8 COLLATE utf8_general_ci;
You can now retry to to write utf8 character into your database. This solution help me when i try to upload 200000 row of csv file into my database.
Although your collation is set to utf8_general_ci, I suspect that the character encoding of the database, table or even column may be different.
ALTER TABLE tabale_name MODIFY COLUMN column_name VARCHAR(255)
CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL;
In general, this happens when you insert strings to columns with incompatible encoding/collation.
I got this error when I had TRIGGERs, which inherit server's collation for some reason.
And mysql's default is (at least on Ubuntu) latin-1 with swedish collation.
Even though I had database and all tables set to UTF-8, I had yet to set my.cnf:
/etc/mysql/my.cnf :
[mysqld]
character-set-server=utf8
default-character-set=utf8
And this must list all triggers with utf8-*:
select TRIGGER_SCHEMA, TRIGGER_NAME, CHARACTER_SET_CLIENT, COLLATION_CONNECTION, DATABASE_COLLATION from information_schema.TRIGGERS
And some of variables listed by this should also have utf-8-* (no latin-1 or other encoding):
show variables like 'char%';
I got a similar error (Incorrect string value: '\xD0\xBE\xDO\xB2. ...' for 'content' at row 1). I have tried to change character set of column to utf8mb4 and after that the error has changed to 'Data too long for column 'content' at row 1'.
It turned out that mysql shows me wrong error. I turned back character set of column to utf8 and changed type of the column to MEDIUMTEXT. After that the error disappeared.
I hope it helps someone.
By the way MariaDB in same case (I have tested the same INSERT there) just cut a text without error.
That error means that either you have the string with incorrect encoding (e.g. you're trying to enter ISO-8859-1 encoded string into UTF-8 encoded column), or the column does not support the data you're trying to enter.
In practice, the latter problem is caused by MySQL UTF-8 implementation that only supports UNICODE characters that need 1-3 bytes when represented in UTF-8. See "Incorrect string value" when trying to insert UTF-8 into MySQL via JDBC? for details. The trick is to use column type utf8mb4 instead of type utf8 which doesn't actually support all of UTF-8 despite the name. The former type is the correct type to use for all UTF-8 strings.
In my case, Incorrect string value: '\xCC\x88'..., the problem was that an o-umlaut was in its decomposed state. This question-and-answer helped me understand the difference between o¨ and ö. In PHP, the fix for me was to use PHP's Normalizer library. E.g., Normalizer::normalize('o¨', Normalizer::FORM_C).
The solution for me when running into this Incorrect string value: '\xF8' for column error using scriptcase was to be sure that my database is set up for utf8 general ci and so are my field collations. Then when I do my data import of a csv file I load the csv into UE Studio then save it formatted as utf8 and Voila! It works like a charm, 29000 records in there no errors. Previously I was trying to import an excel created csv.
I have tried all of the above solutions (which all bring valid points), but nothing was working for me.
Until I found that my MySQL table field mappings in C# was using an incorrect type: MySqlDbType.Blob . I changed it to MySqlDbType.Text and now I can write all the UTF8 symbols I want!
p.s. My MySQL table field is of the "LongText" type. However, when I autogenerated the field mappings using MyGeneration software, it automatically set the field type as MySqlDbType.Blob in C#.
Interestingly, I have been using the MySqlDbType.Blob type with UTF8 characters for many months with no trouble, until one day I tried writing a string with some specific characters in it.
Hope this helps someone who is struggling to find a reason for the error.
If you happen to process the value with some string function before saving, make sure the function can properly handle multibyte characters. String functions that cannot do that and are, say, attempting to truncate might split one of the single multibyte characters in the middle, and that can cause such string error situations.
In PHP for instance, you would need to switch from substr to mb_substr.
I added binary before the column name and solve the charset error.
insert into tableA values(binary stringcolname1);
Hi i also got this error when i use my online databases from godaddy server
i think it has the mysql version of 5.1 or more. but when i do from my localhost server (version 5.7) it was fine after that i created the table from local server and copied to the online server using mysql yog i think the problem is with character set
Screenshot Here
To fix this error I upgraded my MySQL database to utf8mb4 which supports the full Unicode character set by following this detailed tutorial. I suggest going through it carefully, because there are quite a few gotchas (e.g. the index keys can become too large due to the new encodings after which you have to modify field types).
There's good answers in here. I'm just adding mine since I ran into the same error but it turned out to be a completely different problem. (Maybe on the surface the same, but a different root cause.)
For me the error happened for the following field:
#Column(nullable = false, columnDefinition = "VARCHAR(255)")
private URI consulUri;
This ends up being stored in the database as a binary serialization of the URI class. This didn't raise any flags with unit testing (using H2) or CI/integration testing (using MariaDB4j), it blew up in our production-like setup. (Though, once the problem was understood, it was easy enough to see the wrong value in the MariaDB4j instance; it just didn't blow up the test.) The solution was to build a custom type mapper:
package redacted;
import javax.persistence.AttributeConverter;
import java.net.URI;
import java.net.URISyntaxException;
import static java.lang.String.format;
public class UriConverter implements AttributeConverter<URI, String> {
#Override
public String convertToDatabaseColumn(URI attribute) {
return attribute.toString();
}
#Override
public URI convertToEntityAttribute(String field) {
try {
return new URI(field);
}
catch (URISyntaxException e) {
throw new RuntimeException(format("could not convert database field to URI: %s", field));
}
}
}
Used as follows:
#Column(nullable = false, columnDefinition = "VARCHAR(255)")
#Convert(converter = UriConverter.class)
private URI consulUri;
As far as Hibernate is involved, it seems it has a bunch of provided type mappers, including for java.net.URL, but not for java.net.URI (which is what we needed here).
In my case that problem was solved by changing Mysql column encoding to 'binary' (data type will be changed automatically to VARBINARY). Probably I will not be able to filter or search with that column, but I'm no need for that.
In my case ,first i've meet a '???' in my website, then i check Mysql's character set which is latin now ,so i change it into utf-8,then i restart my project ,then i got the same error with you , then i found that i forget to change the database's charset and change into utf-8, boom,it worked.
I tried almost every steps mentioned here. None worked. Downloaded mariadb. It worked. I know this is not a solution yet this might help somebody to identify the problem quickly or give a temporary solution.
Server version: 10.2.10-MariaDB - MariaDB Server
Protocol version: 10
Server charset: UTF-8 Unicode (utf8)
I had a table with a varbinary column that I wanted to convert to utf8mb4 varchar. Unfortunately some of the existing data was invalid UTF-8 and the ALTER query returned Incorrect string value for various rows.
I tried every suggestion I could find regarding cast / convert / char_length = length etc. but nothing in SQL detected the erroneous values, other than the ALTER query returning bad rows one by one. I would love a pure SQL solution to remove the bad values. Sadly this solution is not pretty
I ended up select *'ing the entire table into PHP, where the erroneous rows could be detected en-masse by:
if (empty(htmlspecialchars($row['whatever'])))
The problem can also be caused by the client if the charset is not set to utf8mb4. so even if every Database, Table and Column is set to utf8mb4 you will still get an error, for instance in PyCharm.
For Python, set the charset of the connection in the MySQL Connector connect method:
mydb = mysql.connector.connect(
host="IP or Host",
user="<user>",
passwd="<password>",
database="<yourDB>",
# set charset to utf8mb4 to support emojis
charset='utf8mb4'
)
I know i`m late to the ball but someone else might come accross the problem i had with this and be happy to read my workaround.
I have come accross this problem with french characters. turns out i the text I was copying had encoding the accents on some charaatcers as 2 chars and others as single chars...
i couldn`t find how to set my table to accept the strings so i ended up changing the diacritics in my text import.
here is a list of them as double characters to search for them in your texts.
ùòìàè
áéíóú
ûôêâî
ç
1 - You have to declare in your connection the propertie of enconding UTF8. http://php.net/manual/en/mysqli.set-charset.php.
2 - If you are using mysql commando line to execute a script, you have to use the flag, like:
Cmd: C:\wamp64\bin\mysql\mysql5.7.14\bin\mysql.exe -h localhost -u root -P 3306 --default-character-set=utf8 omega_empresa_parametros_336 < C:\wamp64\www\PontoEletronico\PE10002Corporacao\BancoDeDadosModelo\omega_empresa_parametros.sql

MySQL Error when entering large string [duplicate]

After noticing an application tended to discard random emails due to incorrect string value errors, I went though and switched many text columns to use the utf8 column charset and the default column collate (utf8_general_ci) so that it would accept them. This fixed most of the errors, and made the application stop getting sql errors when it hit non-latin emails, too.
Despite this, some of the emails are still causing the program to hit incorrect string value errrors: (Incorrect string value: '\xE4\xC5\xCC\xC9\xD3\xD8...' for column 'contents' at row 1)
The contents column is a MEDIUMTEXT datatybe which uses the utf8 column charset and the utf8_general_ci column collate. There are no flags that I can toggle in this column.
Keeping in mind that I don't want to touch or even look at the application source code unless absolutely necessary:
What is causing that error? (yes, I know the emails are full of random garbage, but I thought utf8 would be pretty permissive)
How can I fix it?
What are the likely effects of such a fix?
One thing I considered was switching to a utf8 varchar([some large number]) with the binary flag turned on, but I'm rather unfamiliar with MySQL, and have no idea if such a fix makes sense.
UPDATE to the below answer:
The time the question was asked, "UTF8" in MySQL meant utf8mb3. In the meantime, utf8mb4 was added, but to my knowledge MySQLs "UTF8" was not switched to mean utf8mb4.
That means, you'd need to specifically put "utf8mb4", if you mean it (and you should use utf8mb4)
I'll keep this here instead of just editing the answer, to make clear there is still a difference when saying "UTF8"
Original
I would not suggest Richies answer, because you are screwing up the data inside the database. You would not fix your problem but try to "hide" it and not being able to perform essential database operations with the crapped data.
If you encounter this error either the data you are sending is not UTF-8 encoded, or your connection is not UTF-8. First, verify, that the data source (a file, ...) really is UTF-8.
Then, check your database connection, you should do this after connecting:
SET NAMES 'utf8mb4';
SET CHARACTER SET utf8mb4;
Next, verify that the tables where the data is stored have the utf8mb4 character set:
SELECT
`tables`.`TABLE_NAME`,
`collations`.`character_set_name`
FROM
`information_schema`.`TABLES` AS `tables`,
`information_schema`.`COLLATION_CHARACTER_SET_APPLICABILITY` AS `collations`
WHERE
`tables`.`table_schema` = DATABASE()
AND `collations`.`collation_name` = `tables`.`table_collation`
;
Last, check your database settings:
mysql> show variables like '%colla%';
mysql> show variables like '%charac%';
If source, transport and destination are utf8mb4, your problem is gone;)
MySQL’s utf-8 types are not actually proper utf-8 – it only uses up to three bytes per character and supports only the Basic Multilingual Plane (i.e. no Emoji, no astral plane, etc.).
If you need to store values from higher Unicode planes, you need the utf8mb4 encodings.
The table and fields have the wrong encoding; however, you can convert them to UTF-8.
ALTER TABLE logtest CONVERT TO CHARACTER SET utf8 COLLATE utf8_general_ci;
ALTER TABLE logtest DEFAULT CHARACTER SET utf8 COLLATE utf8_general_ci;
ALTER TABLE logtest CHANGE title title VARCHAR(100) CHARACTER SET utf8 COLLATE utf8_general_ci;
"\xE4\xC5\xCC\xC9\xD3\xD8" isn't valid UTF-8. Tested using Python:
>>> "\xE4\xC5\xCC\xC9\xD3\xD8".decode("utf-8")
...
UnicodeDecodeError: 'utf8' codec can't decode bytes in position 0-2: invalid data
If you're looking for a way to avoid decoding errors within the database, the cp1252 encoding (aka "Windows-1252" aka "Windows Western European") is the most permissive encoding there is - every byte value is a valid code point.
Of course it's not going to understand genuine UTF-8 any more, nor any other non-cp1252 encoding, but it sounds like you're not too concerned about that?
I solved this problem today by altering the column to 'LONGBLOB' type which stores raw bytes instead of UTF-8 characters.
The only disadvantage of doing this is that you have to take care of the encoding yourself. If one client of your application uses UTF-8 encoding and another uses CP1252, you may have your emails sent with incorrect characters. To avoid this, always use the same encoding (e.g. UTF-8) across all your applications.
Refer to this page http://dev.mysql.com/doc/refman/5.0/en/blob.html for more details of the differences between TEXT/LONGTEXT and BLOB/LONGBLOB. There are also many other arguments on the web discussing these two.
First check if your default_character_set_name is utf8.
SELECT default_character_set_name FROM information_schema.SCHEMATA S WHERE schema_name = "DBNAME";
If the result is not utf8 you must convert your database. At first you must save a dump.
To change the character set encoding to UTF-8 for all of the tables in the specified database, type the following command at the command line. Replace DBNAME with the database name:
mysql --database=DBNAME -B -N -e "SHOW TABLES" | awk '{print "SET foreign_key_checks = 0; ALTER TABLE", $1, "CONVERT TO CHARACTER SET utf8 COLLATE utf8_general_ci; SET foreign_key_checks = 1; "}' | mysql --database=DBNAME
To change the character set encoding to UTF-8 for the database itself, type the following command at the mysql> prompt. Replace DBNAME with the database name:
ALTER DATABASE DBNAME CHARACTER SET utf8 COLLATE utf8_general_ci;
You can now retry to to write utf8 character into your database. This solution help me when i try to upload 200000 row of csv file into my database.
Although your collation is set to utf8_general_ci, I suspect that the character encoding of the database, table or even column may be different.
ALTER TABLE tabale_name MODIFY COLUMN column_name VARCHAR(255)
CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL;
In general, this happens when you insert strings to columns with incompatible encoding/collation.
I got this error when I had TRIGGERs, which inherit server's collation for some reason.
And mysql's default is (at least on Ubuntu) latin-1 with swedish collation.
Even though I had database and all tables set to UTF-8, I had yet to set my.cnf:
/etc/mysql/my.cnf :
[mysqld]
character-set-server=utf8
default-character-set=utf8
And this must list all triggers with utf8-*:
select TRIGGER_SCHEMA, TRIGGER_NAME, CHARACTER_SET_CLIENT, COLLATION_CONNECTION, DATABASE_COLLATION from information_schema.TRIGGERS
And some of variables listed by this should also have utf-8-* (no latin-1 or other encoding):
show variables like 'char%';
I got a similar error (Incorrect string value: '\xD0\xBE\xDO\xB2. ...' for 'content' at row 1). I have tried to change character set of column to utf8mb4 and after that the error has changed to 'Data too long for column 'content' at row 1'.
It turned out that mysql shows me wrong error. I turned back character set of column to utf8 and changed type of the column to MEDIUMTEXT. After that the error disappeared.
I hope it helps someone.
By the way MariaDB in same case (I have tested the same INSERT there) just cut a text without error.
That error means that either you have the string with incorrect encoding (e.g. you're trying to enter ISO-8859-1 encoded string into UTF-8 encoded column), or the column does not support the data you're trying to enter.
In practice, the latter problem is caused by MySQL UTF-8 implementation that only supports UNICODE characters that need 1-3 bytes when represented in UTF-8. See "Incorrect string value" when trying to insert UTF-8 into MySQL via JDBC? for details. The trick is to use column type utf8mb4 instead of type utf8 which doesn't actually support all of UTF-8 despite the name. The former type is the correct type to use for all UTF-8 strings.
In my case, Incorrect string value: '\xCC\x88'..., the problem was that an o-umlaut was in its decomposed state. This question-and-answer helped me understand the difference between o¨ and ö. In PHP, the fix for me was to use PHP's Normalizer library. E.g., Normalizer::normalize('o¨', Normalizer::FORM_C).
The solution for me when running into this Incorrect string value: '\xF8' for column error using scriptcase was to be sure that my database is set up for utf8 general ci and so are my field collations. Then when I do my data import of a csv file I load the csv into UE Studio then save it formatted as utf8 and Voila! It works like a charm, 29000 records in there no errors. Previously I was trying to import an excel created csv.
I have tried all of the above solutions (which all bring valid points), but nothing was working for me.
Until I found that my MySQL table field mappings in C# was using an incorrect type: MySqlDbType.Blob . I changed it to MySqlDbType.Text and now I can write all the UTF8 symbols I want!
p.s. My MySQL table field is of the "LongText" type. However, when I autogenerated the field mappings using MyGeneration software, it automatically set the field type as MySqlDbType.Blob in C#.
Interestingly, I have been using the MySqlDbType.Blob type with UTF8 characters for many months with no trouble, until one day I tried writing a string with some specific characters in it.
Hope this helps someone who is struggling to find a reason for the error.
If you happen to process the value with some string function before saving, make sure the function can properly handle multibyte characters. String functions that cannot do that and are, say, attempting to truncate might split one of the single multibyte characters in the middle, and that can cause such string error situations.
In PHP for instance, you would need to switch from substr to mb_substr.
I added binary before the column name and solve the charset error.
insert into tableA values(binary stringcolname1);
Hi i also got this error when i use my online databases from godaddy server
i think it has the mysql version of 5.1 or more. but when i do from my localhost server (version 5.7) it was fine after that i created the table from local server and copied to the online server using mysql yog i think the problem is with character set
Screenshot Here
To fix this error I upgraded my MySQL database to utf8mb4 which supports the full Unicode character set by following this detailed tutorial. I suggest going through it carefully, because there are quite a few gotchas (e.g. the index keys can become too large due to the new encodings after which you have to modify field types).
There's good answers in here. I'm just adding mine since I ran into the same error but it turned out to be a completely different problem. (Maybe on the surface the same, but a different root cause.)
For me the error happened for the following field:
#Column(nullable = false, columnDefinition = "VARCHAR(255)")
private URI consulUri;
This ends up being stored in the database as a binary serialization of the URI class. This didn't raise any flags with unit testing (using H2) or CI/integration testing (using MariaDB4j), it blew up in our production-like setup. (Though, once the problem was understood, it was easy enough to see the wrong value in the MariaDB4j instance; it just didn't blow up the test.) The solution was to build a custom type mapper:
package redacted;
import javax.persistence.AttributeConverter;
import java.net.URI;
import java.net.URISyntaxException;
import static java.lang.String.format;
public class UriConverter implements AttributeConverter<URI, String> {
#Override
public String convertToDatabaseColumn(URI attribute) {
return attribute.toString();
}
#Override
public URI convertToEntityAttribute(String field) {
try {
return new URI(field);
}
catch (URISyntaxException e) {
throw new RuntimeException(format("could not convert database field to URI: %s", field));
}
}
}
Used as follows:
#Column(nullable = false, columnDefinition = "VARCHAR(255)")
#Convert(converter = UriConverter.class)
private URI consulUri;
As far as Hibernate is involved, it seems it has a bunch of provided type mappers, including for java.net.URL, but not for java.net.URI (which is what we needed here).
In my case that problem was solved by changing Mysql column encoding to 'binary' (data type will be changed automatically to VARBINARY). Probably I will not be able to filter or search with that column, but I'm no need for that.
In my case ,first i've meet a '???' in my website, then i check Mysql's character set which is latin now ,so i change it into utf-8,then i restart my project ,then i got the same error with you , then i found that i forget to change the database's charset and change into utf-8, boom,it worked.
I tried almost every steps mentioned here. None worked. Downloaded mariadb. It worked. I know this is not a solution yet this might help somebody to identify the problem quickly or give a temporary solution.
Server version: 10.2.10-MariaDB - MariaDB Server
Protocol version: 10
Server charset: UTF-8 Unicode (utf8)
I had a table with a varbinary column that I wanted to convert to utf8mb4 varchar. Unfortunately some of the existing data was invalid UTF-8 and the ALTER query returned Incorrect string value for various rows.
I tried every suggestion I could find regarding cast / convert / char_length = length etc. but nothing in SQL detected the erroneous values, other than the ALTER query returning bad rows one by one. I would love a pure SQL solution to remove the bad values. Sadly this solution is not pretty
I ended up select *'ing the entire table into PHP, where the erroneous rows could be detected en-masse by:
if (empty(htmlspecialchars($row['whatever'])))
The problem can also be caused by the client if the charset is not set to utf8mb4. so even if every Database, Table and Column is set to utf8mb4 you will still get an error, for instance in PyCharm.
For Python, set the charset of the connection in the MySQL Connector connect method:
mydb = mysql.connector.connect(
host="IP or Host",
user="<user>",
passwd="<password>",
database="<yourDB>",
# set charset to utf8mb4 to support emojis
charset='utf8mb4'
)
I know i`m late to the ball but someone else might come accross the problem i had with this and be happy to read my workaround.
I have come accross this problem with french characters. turns out i the text I was copying had encoding the accents on some charaatcers as 2 chars and others as single chars...
i couldn`t find how to set my table to accept the strings so i ended up changing the diacritics in my text import.
here is a list of them as double characters to search for them in your texts.
ùòìàè
áéíóú
ûôêâî
ç
1 - You have to declare in your connection the propertie of enconding UTF8. http://php.net/manual/en/mysqli.set-charset.php.
2 - If you are using mysql commando line to execute a script, you have to use the flag, like:
Cmd: C:\wamp64\bin\mysql\mysql5.7.14\bin\mysql.exe -h localhost -u root -P 3306 --default-character-set=utf8 omega_empresa_parametros_336 < C:\wamp64\www\PontoEletronico\PE10002Corporacao\BancoDeDadosModelo\omega_empresa_parametros.sql

Special characters are correctly displayed only when inserted from PHP

I have table Users (with UTF8 character set and utf8_general_ci collation) which has lastName column (same character set and collation as table). When I insert my name (Štěrba) into this table and column directly from Navicat for MySQL, it's correctly displayed in Navicat, but badly in browser (output document is UTF8 and after mysql_connect() I also use SET CHARACTER SET utf8):
But when I do this insert from PHP with this query:
INSERT INTO users (firstName, lastName) values ('Pavel', 'Štěrba');
it's displayed correctly in browser, but in Navicat it's saved like this:
Obviously, I can't edit it directly from navicat because I will break it... Have you got any idea, why this happen? Did I miss encoding somewhere? Or it's issue with Navicat? Thanks for tips!
Chances are high YOU are doing it wrong, and not the tools everyone uses.
Do not fiddle with the encoding setting via queries! Use mysql_set_charset().
You have to repair all entries in your database that got there via PHP.
Note that "SET CHARACTER SET" is wrong, although it sounds like the right thing. If you cannot use the above PHP function, all you should do is use "SET NAMES utf8" only!
use SET NAMES utf8 also after SET CHARACTER SET utf8
i hope this help

Page with UTF-8 encoding sends data to MySQL with UTF-8 encoding but entry is scrambled

I realize there's a dozen similar questions, but none of the solutions suggested there work in this case.
I have a PHP variable on a page, initialized as:
$hometeam="Крылья Советов"; //Cyrrilic string
When I print it out on the page, it prints out correctly. So echo $hometeam displays the string Крылья Советов, as it should.
The content meta tag in the header is set as follows:
<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=UTF-8">
And, at the very beginning of the page, I have the following (as suggested in one of the solutions found in my search):
ini_set('default_charset', 'utf-8');
So that should be all good.
The MySQL table I'm trying to save this to, and the column in question, have utf8_bin as their encoding. When I go to phpMyAdmin and manually enter Крылья Советов, it saves properly in the field.
However, when I try to save it through a query on the page, using the following basic query:
mysql_query("insert into tablename (round,hometeam) values ('1','$hometeam') ");
The mysql entry looks like this:
c390c5a1c391e282acc391e280b9c390c2bbc391c592c391c28f20c390c2a1c390c2bec390c2b2c390c2b5c391e2809ac390c2bec390c2b2
So what's going on here? If everything is ok on the page, and everything is ok with MySQL itself, where is the issue? Is there something I should add to the query itself to make it keep the string UTF-8 encoded?
Note that I have set mysql_set_charset('utf8'); after connecting to the database (at the top of the page).
EDIT: Running the query SHOW VARIABLES LIKE "%character_set%" gives the following:
Variable_name Value
character_set_client utf8
character_set_connection utf8
character_set_database latin1
character_set_filesystem binary
character_set_results utf8
character_set_server latin1
character_set_system utf8
character_sets_dir /usr/share/mysql/charsets/
Seems like there could be something here, since there are 2 latin1's in that list. What do you think?
Also, when I type a Cyrillic string directly into phpMyAdmin, it appears fine at first (it displays correctly after I save it). But reloading the table, it displays in HEX like the inserted ones. I apologize for the misinformation regarding this in the question. As it turns out, this should mean the problem is with phpMyAdmin or the database itself.
EDIT #2: this is what show create table tablename returns:
CREATE TABLE `tablename` ( `id` int(11) NOT NULL AUTO_INCREMENT, `round` int(11), `hometeam` varchar(32) COLLATE utf8_bin NOT NULL, `competition` varchar(32) CHARACTER SET latin1 NOT NULL DEFAULT 'Russia', PRIMARY KEY (`id`)) ENGINE=MyISAM AUTO_INCREMENT=119 DEFAULT CHARSET=utf8 COLLATE=utf8_bin
Do you get this hex string in phpMyAdmin? I suppose when you SELECT the inserted value by e.g. PHP or the MySQL console client, you would be given the expected cyrillic UTF8 string.
If so, it's a configuration issue with phpMyAdmin, see e.g. here: http://theyouri.blogspot.ch/2010/12/phpmyadmin-collated-db-in-utf8bin-shows.html
phpMyAdmin collated db in utf8_bin shows hex data instead of UTF8 text
$cfg['DisplayBinaryAsHex'] = false;
Moreover, please don't use mysql_query that way, since you're totally open to SQL injections. I'm also not sure if you really want to use utf8_bin, see e.g. this discussion: utf8_bin vs. utf_unicode_ci or this: UTF-8: General? Bin? Unicode?
EDIT There's something weird going on. If you translate the given hex string to UTF8 characters, you get this: "ÐšÑ€Ñ‹Ð»ÑŒÑ Ð¡Ð¾Ð²ÐµÑ‚Ð¾Ð²" (see e.g. http://software.hixie.ch/utilities/cgi/unicode-decoder/utf8-decoder). If you utf8_decode this, you get the desired "Крылья Советов". So, it seems that it's at least utf8 encoded twice (besides the problem that it somewhere shows up as hex characters).
Could you please provide the complete script? Do you utf8_encode your string anywhere? If your script is this and only this (besides a valid, opened MySQL connection):
<?php
$hometeam="Крылья Советов"; //Cyrrilic string
// open mysql connection here
mysql_set_charset('utf8');
mysql_query("INSERT INTO tablename (round, hometeam) VALUES ('1', '$hometeam')");
$result = mysql_query("SELECT * FROM tablename WHERE round = '1'");
$row = mysql_fetch_assoc($result);
echo $row['hometeam'];
?>
And you call the page, what is the result (in the page source of the browser, not what is displayed in the browser)?
Also, please check what happens if you change the collation to utf8_unicode_ci, as suggested in another answer here. That at least covers phpMyAdmin issues when displaying binary data and is propably anyway what you'll want (since you probably want ORDER BY clauses to perform as expected, see discussions in the SO questions I linked above).
EDIT2 Perhaps you could also provide some snippets like SHOW CREATE TABLE tablename or SHOW VARIABLES LIKE "%character_set%". Might help.
Also, when I type a Cyrillic string directly into phpMyAdmin, it
appears fine at first (it displays correctly after I save it). But
reloading the table, it displays in HEX like the inserted ones.
This almost certainly looks like there is a problem in your table! Run show create table tablename. I bet there is latin1 instead of utf8, because you have it set as the default in the character_set_database variable.
To change this, run the following commmand:
ALTER TABLE tbl_name CONVERT TO CHARACTER SET charset_name;
This will convert all your varchar fields to utf8. But be careful with the records you already have in the table, as they are already malformed, if you converted them to UTF8 they will stay malformed. Maybe the best idea is to create the database again, just add the following commands at the end of table definition:
CREATE TABLE `tablename` (
....
) ENGINE=<whatever you use> DEFAULT CHARSET=utf8 COLLATE=utf8_general_ci
1) Try to save the entry to the database with the PhpMyAdmin and then also look at the result in PhpMyAdmin. Does it look OK? If yes, database is created and set up properly.
2) Try to use utf8_general_ci instead. This shouldn't matter, but give it a try.
3) Tune all necessary settings on the PHP side - follow this post:
http://blog.loftdigital.com/blog/php-utf-8-cheatsheet . Especially try this trick:
echo htmlentities($hometeam, ENT_QUOTES, 'UTF-8')
As I saw in the comments, you don't seam to be able to update your database configuration isn't it?
I guess you have a misconfiguration of the encoding because I saw that in the official documentation MySQL Documentation
I can propose you a PHP solution. Because of a lot of encoding problem you can transform the string before inserting it inside database. You have to find a common language to talk between PHP and the database.
The one I tried in an other project consist in transform string using url_encode($string) and url_decode($string).

MySQL charsets and collations: accent insensitive doesn't work

I know that the answer is very simple, but I'm going bananas. I think I've tried every solution available. Here we go...
I have a database with charset latin1. Yeah, i should have it in utf8, but I have several running projects on it, so I don't want to mess them.
The issue comes with SELECT with LIKE "%...%"
The table is utf8 with COLLATE utf8_general_ci. The fields are also utf8 with utf8_general_ci collation. My script files (php) are utf-8 encoded, and the server also serves files in utf-8. So, everything is utf-8.
Ok, as everything is collated with utf8_general_ci, I should be able to search case insensitive and accent insentive. For example:
Having in table providers...
id providerName
1 Jose
2 José
I should be able to do...
SELECT * FROM providers WHERE providerName LIKE "%jose%"
or
SELECT * FROM providers WHERE providerName LIKE "%josé%"
And have, in both cases, the two rows returned. But, with the first query, I only get row 1; and with second query, I only get row two. Case insensitive search seems to work well, but accent insensitive does not.
So I tried adding COLLATE utf8_general_ci after the LIKE "%...%". Same result.
Then, I discovered that the connection was been made in latin1 (vía PHP function mysql_client_encoding()). So I added a query everytime a connection was made, indicating to use utf8. I used both SET NAMES UTF8 COLLATE utf8_general_ci AND php's mysql_set_charset(). When I add this configuration, the first query return row 1, but the second query does not return any result. In addition, all results returns rare characters (you know, like í°, even if all was set to utf8).
This is pluzzing me. Everything is set in UTF8, but it doesn't work as (I) expect.
MySQL Server 5.0.95
PHP 5.2.14
Win7
Stop the machines!!
I found out that I was doing everything OK and it DID respond as expected. The only problem was that, even if the table, fields, files and server were in utf8, when the table was populated (some time in the past), the connection was been made with latin1.
So I re-populated the table, now with utf8 connection, and it worked just fine.
Thank you guys!
I don't have the setup to test this properly, but here's a possible solution. So many places to set UTF8! :)

Categories