I am migrating a mysql database from one site to another.
Its encodign: utf8
Its connection encodign: utf8_unicode_ci
The encoding used in the php files of that site: utf-8 without BOM
The encoding in the headers for every page in that site: utf-8
Everything works fine in that site.
Then I exported the database using phpmyadmin.
It generated a .sql file, encoded with utf-8, and when I open it everything is fine.
Then I copied that file to the new site, which uses the same encoding for everything, and imported it.
When I show the data from the old site, in the new one, through a web page, it shows broken characters. Eg: ™ => �.
If I turn the encoding of the browser from utf-8 to iso-8859-1, I see the correct symbol.
Everything else in the new site works fine, I have no encoding problems after saving stuff to the database and pulling it back. The only strange thing is that when I browse the data stored, phpmyadmin shows broken chars. But I don't have that problem when showing the content in the website.
I did the import with two different programs: phpmyadmin and webmin.
So I have no clue about what is wrong here, any thoughts?
How should I have configured the encodings so this didn't happen?
Maybe in the first site you didn't set the connection encoding (see output of mysql_client_encoding() in php ).
If that is the problem you stored your data in the wrong format, and you were also converting it back correctly using the same misbehaviour.
p.s. utf8_unicode_ci is not an encoding, is a collation (how to order your strings)
There is a pretty good FAQ on charsets and encodings in PHP.
This happened me before with an old site, it was in a different collation, so once you tried to change it to utf8 it shows strange symbols all over.
Sometimes the backend in phpmyadmin shows the strange codes, but the site shows alright.
Related
I have a website, with arabic content which has been migrated from a different server. On the old server, everything was displaying correctly, supposedly everything was encoded with UTF-8.
On the current server, the data started displaying incorrectly, showing نبذة عن and similar characters.
The application is build on the CakePHP Framework.
After many trials, I changed the 'encoding' parameter in the MySql connection array to become 'latin1'. For the people who don't know CakePHP, this sets MySql's connection encoding. Setting this value to UTF8 did not change anything, even after the steps described below.
Some of the records started showing correctly in Arabic, while others remained gibberish.
I have already gone through all the database and server checks, confirming that:
The database created is UTF-8.
The table is UTF-8.
The columns are not explicitly set to any encoding, thus encoded in UTF-8.
Default Character set in PHP is UTF-8
mysql.cnf settings default to UTF-8
After that, I retrieved my data and looped through it, printing the encoding of each string (from each row) using mb_detect_encoding. The rows that are displaying correctly are returning UTF8 while it is returning nothing for the rows that are corrupt.
The data of the website has been edited on multiple types, possibly with different encodings, this is something I cannot know for sure. What I can confirm though, is that the only 2 encodings that this data might have passed through are UTF-8 and latin1.
Is there any possible way to recover the data when mb_detect_encoding is not returning anything and the current dataset is unknown?
UPDATE: I have found out that while the database was active on the new server, the my.cnf was updated.
The below directive was changed:
character-set-server=utf8
To
default-character-set=utf8
I am not sure how much this makes a difference though.
Checking the modified dates, I can conclude to a certain degree of certainty that the data I could recover was not edited on the new server, while the data I couldn't retrieve has been edited.
Try to fix the problem from DB side .. not from php or DB connection
I advice you to go to your old server and export your DB again with character set UTF8
then after import it to a new server .. be sure that you can see the arabic characters inside the tables(with phpmyadmin)
if your tables looks fine ..
then you can move to check the next
DB connection
php file encoding
the header encoding in html
as I know if the problem from the DB .. there is no way without export the data again from the old server
Edit:
if you do not have access to your old DB please check this answer it can help you
You were expecting نبذة عن? Mojibake. See duplicate for discussion and solution, including how to recover the data via a pair of ALTER TABLEs.
I had a similar problem with migrating database tables encoded with utf8 from a public server to localhost. The resolution was in setting the localhost server encoding using PHP
$db->set_charset("utf8")
right after the mysqli connection.
Now it works properly.
My site is to be in Turkish, and I've created a locale file in app/Locale/tur/LC_MESSAGES/default.po
I've set the configuration Configure::write('Config.language','tr'); in my App controller's before filter. It is ready from the intended po file. However the characters when shown are getting garbled. Example: Ürünler shown as �r�nler
I've set character encoding to utf8 in page headers. Database encoding works fine. If I echo Ürünler as it is in a string it still works fine. However its only when it comes from PO file that it is creating problems.
I am developing my site in CakePHP 2.3.2. I've done many many multilingual sites in Cake but never faced this problem.
My PO file is okay as I even tried one of the PO files that is working fine in my past projects, it still doesn't work.
Any help appreciated. Thanks!!
It is not enough to set your headers to utf8. You also need to save files that contain utf8 chars as utf8. So check your file and make sure this is the case (utf8 without bom!).
Please make sure that default.po is saved in Unicode(utf-8)
Dreamweaver CS6 - Open file -> Modify -> Page Properties -> Document encoding (select Unicode (UTF-8)
We imported a website from another server to our server. The code and database is 100% the same.
But the text on the website seems to have a wrong encoding.
Example:
In the database the word "Australië" is "AustraliĂŤ" while on the website its shown as Australi??.
I can fix the ?? with adding mysql_set_charset("utf8",$this->db); after the database connection.
But then its shown like in the database like "AustraliĂŤ" wich is incorrect. I tried different encodings in apache, after database and in meta tags.
The easiest way would be to change the data in the database but there is to much data in it to do this.
Anyone has a solution for this problem? Have been searching and trying a lot off things for hours.
You could try to:
set the MySQL connection collation to uft8_general_ci in the database
run SET NAMES 'utf8' and SET COLLATION_CONNECTION=utf8_unicode_ci in your PHP files
make sure all your PHP files are saved with UTF-8 encoding and do not feature a BOM
make sure the cells in your table are utf8_general_ci
make sure that MySQL charset is UTF-8 Unicode (utf8)
This is what I have. With this setup I see all characters in the database (phpMyAdmin) as they really appear on the website itself.
I have encountered a similar issue when I had a mismatch of encodings, i.e. I was saving data to a UTF-8 database by a ISO-8859-1 encoded site...
Hope this helps you.
I have two tables here - one is in UTF and holds Arabic text as it can be read. The other one has a different encoding however and the content is Arabic however in the database its displayed as
ÈöÓúãö Çááøåö ÇáÑøóÍúãóäö ÇáÑøóÍöíãö
I have to show data from both tables on the same page - the page is UTF encoded however I'm not sure if this can be done or if its possible. What do i do? My database is mysql and I'm using php.
Is it possible to convert the encoding of the contents of the other table into UTF8 btw?
You have to use mb_convert_encoding() first, on everything, to make sure it's all in UTF-8 to begin with. http://us3.php.net/manual/en/function.mb-convert-encoding.php Then it should display, assuming your HTML's charset is UTF-8 and the users have the appropriate fonts installed.
Also, virtually all consoles and a great many free online SQL commanders (like PHPMyAdmin) are not UTF-8 aware and print out jibberish. I have not yet found a free SSH client that supports UTF-8; if it's a big deal, invest in SecureCRT.
EDIT:
Excuse me. I don't read Arabic at all, but I did get Arabic back. please tell me if this is the correct text, and if so, accept this answer ;_)
ب?س?ك? افف?م? افر??ح?ك?ل? افر??ح?ٍك?
The code I used to get this was:
header('Content-Type: text/html;charset=utf-8');
echo mb_convert_encoding('ÈöÓúãö Çááøåö ÇáÑøóÍúãóäö ÇáÑøóÍöíãö', 'utf-8', 'iso-8859-6');
I found the Arabic encoding via this page: http://a4esl.org/c/charset.html
Cheers!
I am trying to debug a nasty utf-8 problem, and do not know where to start.
A page contains the word 'categorieën', wich should be categorieën. Clearly something is wrong with the UTF-8. This happens with all these multibite characters. I have scanned the gazillion topics here on UTF8, but they mostly cover the basics, not this situation where everything appears to be configured and set correct, but clearly is not.
The pages are served by Drupal, from a MySQL database.
The database was migrated (not by me) by sql-dumping and -importing trough phpmyadmin. Good chance something went wrong there, because before, there was no problem. And because the problem occurs only on older, imported items. Editing these items or inserting new ones, and fixxing the wrongly encoded characters by hand, fixes the problem. Though I cannot see a difference in the database.
Content re-edited trough Drupal does not have this problem.
When, on the CLI, using MySQL, I can read out that text and get the correct ë character. On both The articles that render "correct "and "incorrect" characters.
The tables have collation utf8_general_ci
Headers appear to be sent with correct encoding: Vary Accept-Encoding and Content-Type text/html; charset=utf-8
HTML head contains a <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
the HTTP headers tell me there is a Varnish proxy inbetween. Could that cause UTF8-conversion/breakage
content is served Gzipped, normal in Drupal, and I have never seen this UTF8 issie wrt the gzipping, but you never know.
It appears the import is the culprit and I would like to know
a) what went wrong.
b) why I cannot see a difference in the mysql cli client between "wrong" and "correct" characters
c) how to fix the database, or where to start looking and learning on how to fix it.
The dump file was probably output as UTF-8, but interpreted as latin1 during import.
The ë, the latin1 two-byte representation of UTF-8's ë, is physically in your tables as UTF-8 data.
Seeing as you have a mix of intact and broken data, this will be tough to fix in a general way, but usually, this dirty workaround* will work well:
UPDATE table SET column = REPLACE("ë", "ë", column);
Unless you are working with languages other than dutch, the range of broken characters should be extremely limited and you might be able to fix it with a small number of such statements.
Related questions with the same problem:
Detecting utf8 broken characters in MySQL
I need help fixing Broken UTF8 encoding
* (of course, don't forget to make backups before running anything like this!)
There should have not gone anything awol in exporting and importing a Drupal dump, unless the person doing this somehow succeeded into setting the export as something else than UTF8. We export/import dumps a lot and have never bumped into a such problem.
Hopefully Pekkas answers will help you to resolve the issue, if it is in the DB, but I also thought that you could check wether the data being shown on the web page is being ran through some php functions that arent multibyte friendly.
Here are some equivalents of normal functions in mb: http://php.net/manual/en/ref.mbstring.php
ps. If you have recently moved your site to another server (so it's not just a db import), you should check what headers your site is sending out with a tool such as http://www.webconfs.com/http-header-check.php
Make sure the last row has UTF8 in it.
You mention that the import might be the problem. In that case it's possible that during import the connection with the client and the MySQL server wasn't using UTF-8. I've had this problem a couple of times in the past, so I'd like to share with you these MySQL settings (in my.conf):
Under the server settings add these:
# UTF 8
default-character-set=utf8
character-set-server=utf8
collation-server=utf8_general_ci
skip-character-set-client-handshake
And under the client settings add:
default-character-set=utf8
This might save you some headache the next time.
To be absolutely sure you have utf8 from start to end:
- source code files in utf8 without BOM
- database with utf8 collation
- database tables with utf8 collation
- database connection in utf8 (query it with 'SET CHARSET UTF8')
- pages header set to utf8 (the ajax ones too)
- meta tag to set page in utf8