I have a site implemented in Cakephp 2 years before for my client. At that time i was not aware about the site will be used world wide. Due to different country special characters have been used in the site. At that time cakephp utf8 option was not enabled and characters are saved in encoded form like ?? in database.
Now when ever we try to download the CSV these characters created problem and not appeared correctly in CSV. I have tried a lot to resolve this but did not succeed.
Please help me how to resolve this.
You must ensure 3 things-
1. enable `'encoding' => 'utf8',` on database settings at `app/Config/database.php`
2. Table column Collation must be set to 'utf8_general_ci' or `utf8_unicode_ci`
3. Html page character set must set as `utf-8`
Use the Below Query to get the Current Collation information for your Tables,
select TABLE_NAME,TABLE_TYPE,ENGINE,TABLE_COLLATION from information_schema.TABLES where TABLE_SCHEMA like 'YOURDATABASENAME';
The Column 'TABLE_COLLATION' will give you the collation info. If it's set to utf8, then almost all the characters could be saved in your DB and can be retrieved back. The issue you currently facing could be because of PHP or Browser encoding problems. But if your DB has different collation, which doesn't supports all characters, then the data saved in that DB is probably lost. It's almost impossible to identify encoding type and retrieve it back.
For future cases, you have two choices,
You could set UTF-8 as your DB Collation, but if you have indexed your string data, then for each and every char, MySQL process will hold 4 Bytes, even though UTF-8 is variable length encoding.So this will possibly increase your Memory usage.
Or
You could set latin1 as your DB Collation and you have to url-encode the characters from UTF-8 to latin and save them in DB. This will decrease your memory usage, but you would have the over head of Encoding/Decoding. If English is the major language in your DB, i would say you can go for this.
It depends on the Language you need to save in your DB and when showing them back in browser, the Browser must have set the supported encoding. In your case, if you are downloading them as CSV,it depends on the encoding format set for the file.
Related
After migration from PHP 5.3 to PHP 5.6 I have encoding problem. My MySQL database is latin1 and my PHP files are in windows-1251. Now everything is displayed like "ñëåäíèòå àäðåñè" or "�����".
It should be display something in Cyrillic like "кирилица". I've tried mysqli_set_charset but it didn't solve my problem.
First, let's see what you have in the table. Do SELECT col, HEX(col)... to see how these are encoded. Here is the HEX that should be there if it is correctly utf8-encoded:
ñëå --> C3B1C3ABC3A5; кир --> D0BAD0B8D180
If you don't get those, then the problem was on inserting, and we may (or may not) be able to repair the data. If you have C390C2BAC390C2B8C391E282AC for the Cyrillic, then you have "double encoding", and it will take some work to 'fix'.
utf8 needs to be established in about 4 places.
The column(s) in the database -- Use SHOW CREATE TABLE to verify that they are explicitly set to utf8, or defaulted from the table definition. (It is not enough to change the database default.)
The connection between the client and the server. See SET NAMES utf8.
The bytes you have. (This is probably the case.)
If you are displaying the text in a web page, check the <meta> tag.
Halfer is right. Change both your PHP and MySQL encoding, first the PHP with
mb_internal_encoding ("UTF-8");
mb_http_output("UTF-8");
to UTF-8, at the top of your PHP pages.
If you miss out the "UTF-8" and print the output from these finctions, it will show you your current PHP encoding - probably windows-1251
Also note that with MySQL you need to change the character encoding on the row in the table as well as on the table itself overall and on the database itself overall, as the defaults will remain latin1 so any new fields you add would be latin1 without being carefully checked.
If you are trying to save Cryllic text to the database you will need the correct Cryllic character set in the database, rather than latin1
According to the official MySQL manual the collation used defines the order of records when sorting alphabetically:
http://dev.mysql.com/doc/refman/5.0/en/charset-general.html
However: I have a PHP script (UTF-8) and I save some foreign characters in my MySQL database it's saved all weird (first row). This is when the collation I choose is latin1_swedish_ci. When I change the collation to utf8_unicode_ci all is good (second row).
When saving this data everything is exactly the same except for the collation.
So how about that "collation is used solely for sorting records"?
How someone can clarify this for me :-) Thanks in advance!
It appears that the charset of your connection is not set right, therefore the conversion from the programming language charset to the database is not correct.
You should set the charset in your connection, then both will workfine.
as pointed out in the comments a little explanation on how things work.
when you have not set the character set in your connections, the server assumes it to be the same as the collocation of the database. when data is recieved in a another encoding, the data is written nevertheless. just with wrong or other characters than they have been in the encoding of the data from the script.
as long as nothing changes, the script gets back the same data as it has written and everything appears to be fine.
however when either the connection encoding or the database encoding is changed at this point, the already stored data gets converted to the new encoding. the problem here is that the source data is not in the encoding that is assumend when converting.
all encodings share the ascii set with the same bits, thats why ascii charactes dont mess up. only special charaters do.
so you have to set your conneciton encoding in order to dont produce the mess that you are already in.
now what can you do about the data you already have?
you can make a dump of your database using mysqldump and use the --skip-set-charset option. then you get a plaintext file. in this plane text file replace all occurences of the actual database charset with the one the data is really in (the one you had in your script when you wrote the data).
then save the file and make sure your editor does not do any conversion (i recommend vim).
then import that file and you will get a database with data in the correct encoding. then you can change the encoding however you like and as long as your conneciton charset gets set also you will be fine from now on.
also make sure that the mysql server has the charsets installed, but it should have that already.
this is only my approach, i have cleaned up a lot of messed up installations like that. most of which at some point have garbled characters in their projects (after switching server, updating or restoring a backup...).
turns out not setting the connection charset is something that is very often forgotten.
I recenly had problem in importing latin1_swedish database into new one. Somone made Latin1 Database to store Latin2 characters. It was all working till I made database dump and wanted to import it to another database.
It's really complicated. In the end I corrected sql dump to proper ISO-8859-2 Encoded file with all characters displaying correctly. Still import into tables with Latin2 encoding didn't work, all special characters were lost (maybe its a PHPMyAdmin bug?).
Converting file to UTF-8 encoding and changing table encoding to utf8_general_ci imported everything correctly.
Next, whole PHP site uses and displays ISO-8859-2 characters (its old PHPBB forum).
While connecting to Database I use "SET NAMES latin2" command to change encoding.
To my surprise, page displays as proper ISO-8859-2.
If table is UTF-8 and Set names is latin2. Does MySQL connection convert characters into ISO-8859-2 before returning them???
(didnt know if I shoud write it all or not. Edit it if I put too much not needed info)
SET NAMES effectively sets how the data is translated before being stored or after recalled, prior to presenting to the client. For the case of storage, the character set definition of the column is the ultimate determining factor (if it differs from table, and database character set definition). See this informative blog post about encoding in MySQL.
I have a MySQL table & fields that are all set to UTF-8. The thing is, a previous PHP script, which was in charge of the database writing, was using some other encoding, not sure whether it is in the script itself, the MySQL connection or somewhere else. The result is that although the table & fields are set to UTF-8, we see the wrong chars instead of Chinese.
It looks like that:
Now, the previous scripts (which were in charge of the writing and corrupted the data) can read it well for some reason, but my new script which all encoded in UTF-8, shows chars like ½©. How can that be fixed?
By the sound of it, you have a utf8 column but you are writing to it and reading from it using a latin1 connection, so what is actually being stored in the table is mis-encoded. Your problem is that when you read from the table using a utf8 connection, you see the data that's actually stored there, which is why it looks wrong. You can fix the mis-encoded data in the table by converting to latin1, then back to utf8 via the binary character set (three steps in total).
The original database was in a Chinese encoding – GB-18030 or similar, not Latin-1 – and the bytes that make up these characters, when displayed in UTF-8, show up as a bunch of Latin diacritics. Read each string as GB-18030, convert it to UTF-8, and save.
I am running into a very strange issue with a site that I am working on. The site is basically a job board where the owner or users can create job listings including a description that ends up being stored into a MySQL text field. What we are experiencing is this, whenever listings from certain sources are entered, they initially end up with the "Black Diamond" with a question mark inside character in place of apostrophes and double spaces. This part I know is an encoding issue and can correct. The real question is this, these black diamonds show when the record is displayed in a MySQL admin tool and when the job listing is viewed in a web browser (simple select statement displays the listing in a PHP app), but after the first time it is viewed, then the problem somehow fixes itself. It is like the running the select then displaying the record updates the job description field and fixes the encoding issues. How could this be? Has anyone ever heard of this or anything similar? I cannot understand how a database field would change without running an update statement...
How are the job listings entered? Are they entered via a web page? If so, what character encoding does the web page use? (This should determine the character encoding of the submitted data AFAIK.) What character set is the connection used to communicate with MySQL? What is the character set of the column the data is stored in? Finally, what is the character encoding of the web page(s) on which the entered data is reviewed?
Here is what I do: I declare all of my pages as UTF-8 encoded, using the following tag at the start of the <head> section:
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
I issue the following command immediately when I connect to MySQL, so as to make sure that MySQL understands the data I send to it will be UTF-8 encoded:
SET NAMES uft8
(Depending on the database abstraction method you use, a special function might be recommended in order to set the connection character set, like mysqli's mysqli_set_charset().)
I also make sure that those columns in which I intend to store UTF-8 data are declared to be UTF-8. You can find out what the character set of a column is by issuing SHOW CREATE TABLE table_name. The character set of the table (which by default is the character set for any column in the table) is displayed at the end. If the character set for the column is different to the default character set for the table then it is displayed as part of the column definition. If you wish to change the character set of a column then you can do so using ALTER TABLE.
If you have not previously taken the steps to handle character sets in your app then you may find that the tables are all using the latin1 character set. If you naively store UTF-8-encoded data (for example) into these columns, you may run into character encoding issues. Changing the column character set using ALTER TABLE does not necessarily fix your old data, because MySQL reads your old data assuming it to be valid latin1-encoded text and converts it to the eqivalent UTF-8 (correctly converting what it has read, but not giving the result you want).
The above steps would hopefully mean that future data will be correctly encoded and correctly displayed, but you may have data already mis-encoded in your database, so be aware that if you follow the above steps and still see older data displaying incorrectly, this may be why. Good luck.
Run into this problem a few years ago... I remember finding those notorious characters, and replacing them in php with a single quote or a double quote... Ofcourse with escaping... A simple preg_replace for those characters will do the trick... Its just an encoding issue...
This page, though geared for wordpress might help
http://codex.wordpress.org/Converting_Database_Character_Sets
I had the same issue (mysql encoding and webpage encoding set to UTF-8 but black diamonds showing up in my query results. I found this snippet while googling but cannot for the life of me find its source to give proper attribution:
if( function_exists('mysql_set_charset') ){
mysql_set_charset('utf8', $db_connection);
}else{
mysql_query("SET NAMES 'utf8'", $db_connection);
}
Anyway, it cleared up the issue for me.