I'm re-designing a Web site and I have a problem with the existing data base:
The database collate is set to utf8_unicode_ci and in the table row I'm calling the collate seems to be set to latin1_swedish_ci the characters store in it are Japanese (but even in phpmyadmin) you see other characters (I guess because of the latin1_swedish_ci).
When I print the result from the query I get a bunch of ??? now using
mysql_query('SET NAMES utf8');
mysql_set_charset('utf8',$conn);
Will output 2009â€N10ŒŽÂ†2009?N10???2009â€N11ŒŽÂ†2009?N11???
Any ideas?
Because the table was set to use latin1_swedish_ci, it was unable to correctly store the UTF-8 data that was entered. You need to switch that table to use utf8_unicode_ci for data going forward, but any existing data is essentially corrupted. You would have to re-enter the data after switching the collate to get the correct Japanese characters for the existing records.
You need to change the charset to utf8. The collation do not need to be changed to display japanese characters (but to be able to sort and compare texts it might be a good idea to change it to utf8_general_ci).
Hi all thanks for your reply's this is what happened, I couldn't really change anything in the DB since there's another version of the site that still uses that DB and will be up. So the solution I found was the following:
Case scenario:
The DB is set to use UTF8 -> (utf8_general_ci) but the field (at least the one's I needed where set to latin1_swedish_ci.
Solution:
After mysql_connect I put the following:
mysql_query("SET NAMES 'Shift_JIS'",$conn);
mysql_set_charset('Shift_JIS',$conn);
Then in the PHP file:
$titleJP = $row['titleJP'];
$titleJP = mb_convert_encoding($titleJP, "UTF-8", mb_detect_encoding($titleJP,"Shift_JIS,JIS,SJIS,eucjp-win"));
Now that worked perfectly the characters are displayed in correct Japanese.
I tried every other solution I could think of with no luck (utf-8_decode/encode php functions, etc.. etc..)
Related
In a large system based on Mysql 5.5.57 Php 5.6.37 setup
Currently the whole system is working in utf8 including SET NAMES utf8 at the beginning of each db connection.
I need to support emojis in one of the tables so I need to switch it to utf8mb4. I don't want to switch other tables.
My question is - if I change to SET NAMES utf8mb4 for all connections (utf8 and utf8mb4) and switch the specific table only to utf8mb4 (and only write mb4 data to this table). Will the rest of the system work as before?
Can there be any issue from working with SET NAMES utf8mb4 in the utf8 tables/data/connections?
I think there should no problem using SET NAMES utf8mb4 for all connections.
(utf8mb3 is a synonym of utf8 in MySQL; I'll use the former for clarity.)
utf8mb3 is a subset of utf8mb4, so your client's bytes will be happy either way (except for Emoji, which needs utf8mb4). When the bytes get to (or come from) a column that is declared only there will be a check to verify that you are not storing Emoji or certain Chinese characters, but otherwise, it goes through with minimal fuss.
I suggest
ALTER TABLE ... CONVERT TO utf8mb4
as the 'right' way to convert a table. However, it converts all varchar/text columns. This may be bad...
If you JOIN a converted table to an unconverted table, then you will be trying to compare a utf8mb3 string to a utf8mb4 string. MySQL will throw up its hands and convert all rows from one to the other. That is no INDEX will be useful.
So... Be sure to at least be consistent about any columns that are involved in JOINs.
I create a table with different collation column that included:
utf8_persian_ci
cp1256
Why different collation? Because some of them address and correct charset for PHP that be able create Persian folder/files is windows-1256 therefore I thought I need to set charset cp1256 for saving path into mysql.
It shows ???? instead of farsi characters When I fetch rows from the table to show in PHP. My default charset sets to UTF8.
Now what is the problem that row store with ??? or php shows ??? instead persian keywords?
The following screen shot is going to be good introduction for the issue:
It is from phpmyadmin for the table topics of phpbb3. It shows that at the same table there are two columns one renders text in wrong encoding topic_title and other topic_first_poster_name renders the text correct.
In the convert script I set the $encoding to be windows-1256 as advised because my later VB forum was using windows-1256.
The screen shotted table has utf8_bin collation and topic_title collation is utf8_unicode_ci while topic_first_poster_name is utf8_bin.
What I need is to convert the text of topic_title to be rendered correctly because it make phpbb3 to render it wrong.
I tried the hint in this article about fixing column encoding but I miss able to determine what encoding that I have to use:
UPDATE table SET column=CONVERT(CONVERT(CONVERT(column USING binary) USING utf8) USING cp1251) WHERE id=123;
I have made the following using cp1256 but I did not get any result:
UPDATE t_topics SET topic_title=CONVERT(CONVERT(CONVERT(topic_title USING binary) USING utf8) USING cp1256) WHERE topic_id=2
Update:
When I alter the chaset i.e makin cp1256 first then utf8, the field text becomes like the following and it also wrong:
Update 2:
Using the following in the application viewtopic.php solve the problem in the browser's window:
'TOPIC_TITLE' => iconv( "UTF-8","Windows-1256//TRANSLIT", utf8_encode($topic_data['topic_title']))
However, what would this indicate in-order to solve this issue from the database field itself?
I have mysql database (not mine). In this database all the encodings set to utf-8, and I connect with charset utf-8. But, when I try to read from the database I get this:
×¢×?ק 1
בית ×ª×•×’× ×” העוסק במספר שפות ×ª×•×›× ×”
× × ×œ× ×œ×¤× ×•×ª ×חרי 12 בלילה ..
What I supposed to get:
עסק 1
בית תוגנה העוסק במספר שפות תוכנה
נא לא לפנות אחרי 12 בלילה ..
When I look from phpmyadmin, I have the same thing(connection in pma is to utf-8).
I know that the data is supposed to be in Hebrew. Someone have an idea how to fix these?
You appear to have UTF-8 data that was treated as Windows-1252 and subsequently converted to UTF-8 (sometimes referred to as "double-encoding").
The first thing that you need to determine is at what stage the conversion took place: before the data was saved in the table, or upon your attempts to retrieve it? The easiest way is often to SELECT HEX(the_column) FROM the_table WHERE ... and manually inspect the byte-encoding as it is currently stored:
If, for the data above, you see C397C2A9... then the data is stored erroneously (an incorrect connection character set at the time of data insertion is the most common culprit); it can be corrected as follows (being careful to use data types of sufficient length in place of TEXT and BLOB as appropriate):
Undo the conversion from Windows-1252 to UTF-8 that caused the data corruption:
ALTER TABLE the_table MODIFY the_column TEXT CHARACTER SET latin1;
Drop the erroneous encoding metadata:
ALTER TABLE the_table MODIFY the_column BLOB;
Add corrected encoding metadata:
ALTER TABLE the_table MODIFY the_column TEXT CHARACTER SET utf8;
See it on sqlfiddle.
Beware to correctly insert any data in the future, or else the table will be partly encoded in one way and partly in another (which can be a nightmare to try and fix).
If you're unable to modify the database schema, the records can be transcoded to the correct encoding on-the-fly with CONVERT(BINARY CONVERT(the_column USING latin1) USING utf8) (see it on sqlfiddle), but I strongly recommended that you fix the database when possible instead of leaving it containing broken data.
However, if you see D7A2D73F... then the data is stored correctly and the corruption is taking place upon data retrieval; you will have to perform further tests to identify the exact cause. See UTF-8 all the way through for guidance.
i hav a database that contains spanish characters. to populate the database i am getting the values from client page which has character encoding=UTF-8. when i insert the values in mySql database the rows contain altered data. for example if i insert 'México', the entry in the database is 'México'. the impact this has is when i do a query on the table specifying 'México', i get no results. my question is how do u insert spanish or other latin accent in mysql database? i hav tried all the collations, htmlentities() etc but nothing works!!
when making mysql query i checked what data is being sent and it is in its correct form 'México' but when i see the table entry through phpmyadmin, its altered!!
Change your mysql database/ table / column encoding to UTF-8 (and also set the collation to a compatible value).
ALTER TABLE mytable CONVERT TO CHARACTER SET utf8 COLLATE utf8_general_ci;
ALTER TABLE mytable
MODIFY country CHAR(50)
CHARACTER SET utf8 COLLATE utf8_general_ci;
Also specify the char set at the PHP side when connecting.
mysql_set_charset('utf8',$conn);
Take a look at this article for further info and a script to batch change every table / column in a database.
I just wasted 4 hours on the same problem as you.
Everything is in UTF-8
EG:
<meta charset="UTF-8">
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
The MySQL database is in UTF-8 PHP is in UTF-8
After four hours and on many occasions tearing my hair out I have found a solution that works.
$content = htmlentities($row[mystring], ENT_QUOTES, "ISO-8859-1");
$content = html_entity_decode($content);
It takes the accents converts them to html characters then converts them back into UTF-8
This is a great hack and it works perfectly.
For some inexplicable reason, the data in my MYSQL database is not in the UTF-8 format even though I have gone to extreme measures like exporting the data to a text file in phpmyadmin saving it as UTF-8 in a text editor and re-importing it. None of this worked.
Check two things first:
Are you inserting the data as UTF-8? If the data is coming from a web page, make sure the page's encoding is set to UTF-8 (encoding meta tag is set in the page header).
Are you sure the data is not saved as Unicode? This is the reverse situation: if phpMyAdmin uses something else other than UTF-8, you'd see the garbled characters when it displays the contents.