Website/Database text encoding issue - php

We imported a website from another server to our server. The code and database is 100% the same.
But the text on the website seems to have a wrong encoding.
Example:
In the database the word "Australië" is "AustraliĂŤ" while on the website its shown as Australi??.
I can fix the ?? with adding mysql_set_charset("utf8",$this->db); after the database connection.
But then its shown like in the database like "AustraliĂŤ" wich is incorrect. I tried different encodings in apache, after database and in meta tags.
The easiest way would be to change the data in the database but there is to much data in it to do this.
Anyone has a solution for this problem? Have been searching and trying a lot off things for hours.

You could try to:
set the MySQL connection collation to uft8_general_ci in the database
run SET NAMES 'utf8' and SET COLLATION_CONNECTION=utf8_unicode_ci in your PHP files
make sure all your PHP files are saved with UTF-8 encoding and do not feature a BOM
make sure the cells in your table are utf8_general_ci
make sure that MySQL charset is UTF-8 Unicode (utf8)
This is what I have. With this setup I see all characters in the database (phpMyAdmin) as they really appear on the website itself.
I have encountered a similar issue when I had a mismatch of encodings, i.e. I was saving data to a UTF-8 database by a ISO-8859-1 encoded site...
Hope this helps you.

Related

PHP/MySQL Encoding

I have a website, with arabic content which has been migrated from a different server. On the old server, everything was displaying correctly, supposedly everything was encoded with UTF-8.
On the current server, the data started displaying incorrectly, showing نبذة عن and similar characters.
The application is build on the CakePHP Framework.
After many trials, I changed the 'encoding' parameter in the MySql connection array to become 'latin1'. For the people who don't know CakePHP, this sets MySql's connection encoding. Setting this value to UTF8 did not change anything, even after the steps described below.
Some of the records started showing correctly in Arabic, while others remained gibberish.
I have already gone through all the database and server checks, confirming that:
The database created is UTF-8.
The table is UTF-8.
The columns are not explicitly set to any encoding, thus encoded in UTF-8.
Default Character set in PHP is UTF-8
mysql.cnf settings default to UTF-8
After that, I retrieved my data and looped through it, printing the encoding of each string (from each row) using mb_detect_encoding. The rows that are displaying correctly are returning UTF8 while it is returning nothing for the rows that are corrupt.
The data of the website has been edited on multiple types, possibly with different encodings, this is something I cannot know for sure. What I can confirm though, is that the only 2 encodings that this data might have passed through are UTF-8 and latin1.
Is there any possible way to recover the data when mb_detect_encoding is not returning anything and the current dataset is unknown?
UPDATE: I have found out that while the database was active on the new server, the my.cnf was updated.
The below directive was changed:
character-set-server=utf8
To
default-character-set=utf8
I am not sure how much this makes a difference though.
Checking the modified dates, I can conclude to a certain degree of certainty that the data I could recover was not edited on the new server, while the data I couldn't retrieve has been edited.
Try to fix the problem from DB side .. not from php or DB connection
I advice you to go to your old server and export your DB again with character set UTF8
then after import it to a new server .. be sure that you can see the arabic characters inside the tables(with phpmyadmin)
if your tables looks fine ..
then you can move to check the next
DB connection
php file encoding
the header encoding in html
as I know if the problem from the DB .. there is no way without export the data again from the old server
Edit:
if you do not have access to your old DB please check this answer it can help you
You were expecting نبذة عن? Mojibake. See duplicate for discussion and solution, including how to recover the data via a pair of ALTER TABLEs.
I had a similar problem with migrating database tables encoded with utf8 from a public server to localhost. The resolution was in setting the localhost server encoding using PHP
$db->set_charset("utf8")
right after the mysqli connection.
Now it works properly.

Is MySQL's collation solely used for sorting?

According to the official MySQL manual the collation used defines the order of records when sorting alphabetically:
http://dev.mysql.com/doc/refman/5.0/en/charset-general.html
However: I have a PHP script (UTF-8) and I save some foreign characters in my MySQL database it's saved all weird (first row). This is when the collation I choose is latin1_swedish_ci. When I change the collation to utf8_unicode_ci all is good (second row).
When saving this data everything is exactly the same except for the collation.
So how about that "collation is used solely for sorting records"?
How someone can clarify this for me :-) Thanks in advance!
It appears that the charset of your connection is not set right, therefore the conversion from the programming language charset to the database is not correct.
You should set the charset in your connection, then both will workfine.
as pointed out in the comments a little explanation on how things work.
when you have not set the character set in your connections, the server assumes it to be the same as the collocation of the database. when data is recieved in a another encoding, the data is written nevertheless. just with wrong or other characters than they have been in the encoding of the data from the script.
as long as nothing changes, the script gets back the same data as it has written and everything appears to be fine.
however when either the connection encoding or the database encoding is changed at this point, the already stored data gets converted to the new encoding. the problem here is that the source data is not in the encoding that is assumend when converting.
all encodings share the ascii set with the same bits, thats why ascii charactes dont mess up. only special charaters do.
so you have to set your conneciton encoding in order to dont produce the mess that you are already in.
now what can you do about the data you already have?
you can make a dump of your database using mysqldump and use the --skip-set-charset option. then you get a plaintext file. in this plane text file replace all occurences of the actual database charset with the one the data is really in (the one you had in your script when you wrote the data).
then save the file and make sure your editor does not do any conversion (i recommend vim).
then import that file and you will get a database with data in the correct encoding. then you can change the encoding however you like and as long as your conneciton charset gets set also you will be fine from now on.
also make sure that the mysql server has the charsets installed, but it should have that already.
this is only my approach, i have cleaned up a lot of messed up installations like that. most of which at some point have garbled characters in their projects (after switching server, updating or restoring a backup...).
turns out not setting the connection charset is something that is very often forgotten.

PHP mysql fixed connection to utf8, but now existing greek data is useless

I have a mysql database storing some fields in greek characters. In my html I have charset=utf-8 and my database columns are defined with encoding utf_general_ci. But I was not setting the connection encoding so far. As a result I have a database that doesn't display the greek characters well, but when reading back in PHP, it all shows well.
Now I try to do this the right way, so I added also in my database functions.
$mysqli->set_charset("utf8");
This works great for new entries.
But for existing entries, the problem is that when I read data in PHP, it comes garbled, since now the connection encoding has changed.
Is there a way to fix my data and make them useful again? I can continue working my old way, but I know it's wrong and can cause me more problems in the future.
I solved this issue as follows:
in a PHP script, retrieve the information as I do now, i.e without setting the connection. This way the mistake will be inverted and corrected and in your php file you will have the characters in the correct utf-8 format.
in the same PHP script, write back the information with setting the connection to utf-8
at this point the correct characters are in the database
I changed all my read/write functions of your site to use the utf-8 from now on

Mysql display chinese characters

I have been using php + mysql (phpmyadmin) to construct websites with Chinese contents (utf-8) for a long time.
When inputting forms, and also generate output php from db, the Chinese Words display well; but when I look at the database, although sometimes they are normal chinese characters, but something they are not (become strange strings), that made me notice that, the way that mysql handle and input data is not always utf-8.
Some experts on web mentioned, mysql were used to record the input data by latin1; nevertheless, I note that the existing charset in phpmyadmin is utf-8...
Will there be any solid way to detect the encoding format of chinese characters appeared in a phpmyadmin table cell?
Also, apart from mentioning at header of the page, will there be any method so that I can make sure the data entered to the db is utf-8 but not others?
Thank you.
The biggest problem that people encounter in this regard is that they don't tell MySQL that they're sending/expecting UTF-8 encoded data when connecting to the database, so MySQL thinks it's supposed to handle latin1 encoded data and converts it accordingly. Issue the command SET NAMES utf8 after connecting to the db or use mysql_set_charset.
in my case, it just because htmlentities(); Solution is change echo htmlentities($email_db); to echo htmlentities($email_db, ENT_COMPAT, 'UTF-8');

UTF8 characters not printed as such in Drupals HTML

I am trying to debug a nasty utf-8 problem, and do not know where to start.
A page contains the word 'categorieën', wich should be categorieën. Clearly something is wrong with the UTF-8. This happens with all these multibite characters. I have scanned the gazillion topics here on UTF8, but they mostly cover the basics, not this situation where everything appears to be configured and set correct, but clearly is not.
The pages are served by Drupal, from a MySQL database.
The database was migrated (not by me) by sql-dumping and -importing trough phpmyadmin. Good chance something went wrong there, because before, there was no problem. And because the problem occurs only on older, imported items. Editing these items or inserting new ones, and fixxing the wrongly encoded characters by hand, fixes the problem. Though I cannot see a difference in the database.
Content re-edited trough Drupal does not have this problem.
When, on the CLI, using MySQL, I can read out that text and get the correct ë character. On both The articles that render "correct "and "incorrect" characters.
The tables have collation utf8_general_ci
Headers appear to be sent with correct encoding: Vary Accept-Encoding and Content-Type text/html; charset=utf-8
HTML head contains a <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
the HTTP headers tell me there is a Varnish proxy inbetween. Could that cause UTF8-conversion/breakage
content is served Gzipped, normal in Drupal, and I have never seen this UTF8 issie wrt the gzipping, but you never know.
It appears the import is the culprit and I would like to know
a) what went wrong.
b) why I cannot see a difference in the mysql cli client between "wrong" and "correct" characters
c) how to fix the database, or where to start looking and learning on how to fix it.
The dump file was probably output as UTF-8, but interpreted as latin1 during import.
The ë, the latin1 two-byte representation of UTF-8's ë, is physically in your tables as UTF-8 data.
Seeing as you have a mix of intact and broken data, this will be tough to fix in a general way, but usually, this dirty workaround* will work well:
UPDATE table SET column = REPLACE("ë", "ë", column);
Unless you are working with languages other than dutch, the range of broken characters should be extremely limited and you might be able to fix it with a small number of such statements.
Related questions with the same problem:
Detecting utf8 broken characters in MySQL
I need help fixing Broken UTF8 encoding
* (of course, don't forget to make backups before running anything like this!)
There should have not gone anything awol in exporting and importing a Drupal dump, unless the person doing this somehow succeeded into setting the export as something else than UTF8. We export/import dumps a lot and have never bumped into a such problem.
Hopefully Pekkas answers will help you to resolve the issue, if it is in the DB, but I also thought that you could check wether the data being shown on the web page is being ran through some php functions that arent multibyte friendly.
Here are some equivalents of normal functions in mb: http://php.net/manual/en/ref.mbstring.php
ps. If you have recently moved your site to another server (so it's not just a db import), you should check what headers your site is sending out with a tool such as http://www.webconfs.com/http-header-check.php
Make sure the last row has UTF8 in it.
You mention that the import might be the problem. In that case it's possible that during import the connection with the client and the MySQL server wasn't using UTF-8. I've had this problem a couple of times in the past, so I'd like to share with you these MySQL settings (in my.conf):
Under the server settings add these:
# UTF 8
default-character-set=utf8
character-set-server=utf8
collation-server=utf8_general_ci
skip-character-set-client-handshake
And under the client settings add:
default-character-set=utf8
This might save you some headache the next time.
To be absolutely sure you have utf8 from start to end:
- source code files in utf8 without BOM
- database with utf8 collation
- database tables with utf8 collation
- database connection in utf8 (query it with 'SET CHARSET UTF8')
- pages header set to utf8 (the ajax ones too)
- meta tag to set page in utf8

Categories