Character Encoding & Databases - php

I am having a big problem with character encoding accross my domains. The big thing for me really is that I don't understand it. I set all my websites to be utf-8 using the meta tag:
<meta charset="UTF-8">
Which seemed to have solved a few problems a while back. Now I am seeing problems with between the website and the database, when a user enters their first or last name, and it has an accent in it, it doesnt display correctly. However I ran the following test.
I created a test table called 'test' (imaginative I know)
I wrote a very tiny script to take the value from a text box and put it into this table and then display the contents of this table each time this page displays, so I could see what is going on.
Here are some screen shots, first, from the output of the page:
And then a screenshot of the database itself:
So the type of column first is VARCHAR(50), I just left the settings as I would do normally, and the character encoding was latin_swedish1 or something. After id 4, I changed it utf8_bin, but that still didnt make a difference.
The problem is, the data still display okay on the website, but looks terrible in the database. Is that a problem, is this how it should be done? I think the problems I the users are complaining about are when it is put into emails and PDFs etc, which I don't think I set character encoding on.
Any help and advice would be greatly welcomed.

Make sure to have the charset is utf8 when you created the table.
create table my_table (
id int primary key,
.....
) engine=innodb charset=utf8;
and make sure that your connection is setting the charset for utf8. it depends on each framework you're using.
You can set internal character to utf-8 /* Set internal character encoding to UTF-8 */
mb_internal_encoding("UTF-8"); check http://php.net/manual/en/function.mb-internal-encoding.php

Try to use utf8_general_ci. This will solve troubles.

When the3 column type was assigned, how was it assigned? I would ensure that it was along the lines of VARCHAR(n) CHARSET utf8 as that would make your column type correct for the UTF8 standard, more normally I see the UCS2 (VARCHAR(n) CHARSET ucs2) which can then cause problems later down the line.
you can set this using NVARCHAR ( see http://dev.mysql.com/doc/refman/5.0/en/charset-national.html) since sql 5.
To change the type of a column on its own you can use this type of command:
Alter table tablenamehere MODIFY columnidhere newdatatypehere;
for example
Alter table mytabelforphp MODIFY oldvarcharcolumnId nvarchar(1024);
Let me know if you need more info:)

also add your character encoding to database connection
if you have mysqli connection use :
mysqli::set_charset
if you use PDO follow this:
PDO_MYSQL DSN

Related

Strings stored in a wrong charset on database

I have a problem with varchar elements stored in a database.
In order to be printed in the correct way they have been stored in a strange collapse that i can not understand, see this example:
Data Stored incorrectly
You can notice that €™ in the strings. It should be an "è". Same things happen for ' and for "à". How can i change the character set of the database in order to display properly the varchar elements?
Currently settings are:
Storage engine: MyISAM
Collation: latin1_swedish_ci
Thank you very much. I really need help. I tried a lot of collations but nothing seems to work.
As far as I know, collation is for ordering and comparison, not recording.
You might be looking for encoding instead as it is responsible for the characters itself.
I believe that your issue is that you have the wrong encoding. Before you you set it to UTF8, please change your collation to utf8_unicode_ci.
Please attempt to run this:
SET NAMES utf8;
That should resolve your issue for new records. The old ones will remain untouched.

PHP 5.6 encoding latin1 MySQL encoding

After migration from PHP 5.3 to PHP 5.6 I have encoding problem. My MySQL database is latin1 and my PHP files are in windows-1251. Now everything is displayed like "ñëåäíèòå àäðåñè" or "�����".
It should be display something in Cyrillic like "кирилица". I've tried mysqli_set_charset but it didn't solve my problem.
First, let's see what you have in the table. Do SELECT col, HEX(col)... to see how these are encoded. Here is the HEX that should be there if it is correctly utf8-encoded:
ñëå --> C3B1C3ABC3A5; кир --> D0BAD0B8D180
If you don't get those, then the problem was on inserting, and we may (or may not) be able to repair the data. If you have C390C2BAC390C2B8C391E282AC for the Cyrillic, then you have "double encoding", and it will take some work to 'fix'.
utf8 needs to be established in about 4 places.
The column(s) in the database -- Use SHOW CREATE TABLE to verify that they are explicitly set to utf8, or defaulted from the table definition. (It is not enough to change the database default.)
The connection between the client and the server. See SET NAMES utf8.
The bytes you have. (This is probably the case.)
If you are displaying the text in a web page, check the <meta> tag.
Halfer is right. Change both your PHP and MySQL encoding, first the PHP with
mb_internal_encoding ("UTF-8");
mb_http_output("UTF-8");
to UTF-8, at the top of your PHP pages.
If you miss out the "UTF-8" and print the output from these finctions, it will show you your current PHP encoding - probably windows-1251
Also note that with MySQL you need to change the character encoding on the row in the table as well as on the table itself overall and on the database itself overall, as the defaults will remain latin1 so any new fields you add would be latin1 without being carefully checked.
If you are trying to save Cryllic text to the database you will need the correct Cryllic character set in the database, rather than latin1

Different Characters returned via remote database connection utf8 than local connection

I am trying to retrieve foreign language UTF8 data via a remote mysql database connection. When I retrieve the data remotely, the utf8 doesn't appear properly in the browser. However, when I retrieve the data via a local database connection, both on the live site, and on the local testing machine, the characters appear correctly in the browser.
My remote connection is from wamp local server to the online live website.
For every page I have set:
header('Content-Type: text/html; charset=utf-8');
I've also tried to set UTF-8 meta tag. I also have UTF8 specified in .htaccess as the default charset.
It's an older website so am still using mysqli. I have also tried setting:
$mysqli->set_charset("utf8");
For example, with remote connection Français is appearing as Français.
I have no idea what to do with this. I have spent hours trying to figure it out, but to no avail. I know it's the norm to ask for code, but there is just so much code, that I can't include it all here.
Thanks!
And your solution is: on the remote database, the data is encoded to utf8 twice, which yields incorrect results. There is no problem in your code, that database is at fault. You can fix it there (if it's a varchar, make a backup first!): convert it to latin1 first, then to binary then to utf8. An working sql fiddle to show you how is here, I'll paste the code here too in case sqlfiddle removes it somewhere in the future:
-- database column correctly defined as utf8
CREATE TABLE base (col VARCHAR(128) CHARSET utf8);
-- wrong data is entered:
INSERT INTO base SELECT UNHEX('4672616EC383C2A7616973');
-- first, convert back to latin-1, we have now proper utf-8 data, but in a latin1 column
ALTER TABLE base MODIFY COLUMN col VARCHAR(128) CHARSET latin1;
-- convert to binary first, so MySQL leaves the bytes as is without conversion
ALTER TABLE base MODIFY COLUMN col VARBINARY(128);
-- then convert to the proper character set, which will leave the bytes once again intact
ALTER TABLE base MODIFY COLUMN col VARCHAR(128) CHARSET utf8;
I made it work by adding the following to the script that calls the remote database:
$mysqli->set_charset("latin1");
I don't know if it's a bit of a hack, because it still means the chars are probably not encoded or collated correctly, but it works. Thanks Wrikken for showing me the character set modifications, I can try to use those here in the future to correct things properly.

UTF8 lithuanian characters unrecognized in MySQL database

I have well known but quite difficult to sort out problem here. And yes I was searching on forum but those threads are old enough so I decided to create new post.
So I built a website using WP and included html FORM in one page. When user fills the form (in his/her language) the values of the fields' go into MySQL database table reg_form.
Everything works, the values are saved, BUT some characters (specific in that language) are not recognized. I tried a lot of different methods to solve this, but nothing can help.
The strangest thing is that if you look at WordPress tables you can find those specific characters are recognizable but not in reg_form table which I created.
I was trying to solve this problem and finally I decided to approach in somehow ridiculous way. I created NEW database, new tables, installed new wordpress, created new form etc.
That‘s what I was doing:
I used this suggestion first:
http://tympanus.net/codrops/2009/08/31/solving-php-mysql-utf-8-issues/
Yes, my files are saved using UTF8 encoding (without BOM). Yes, meta tags are ok. Yes, the FORM uses accept-charset='UTF-8'. Yes, all tables in database use UTF8. Yes, server, database and tables collation is the same “utf8_general_ci”.
Then I tried to insert in my code this:
$conn = mysql_connect($server, $username, $password);
mysql_set_charset("UTF8", $conn);
Then I tried this suggestion
link here: akrabat.com/php/utf8-php-and-mysql/
Then I tried to set Apache's AddDefaultCharset in .htaccess file using this link here: httpd.apache.org/docs/2.0/mod/core.html#AddDefaultCharset
BUT… still the problem remains. I can’t see those specific characters properly – only weird hieroglyphic.
The problem you face has to do with a little specific detail in database character encoding settings and Wordpress.
While Wordpress has a general character encoding setting that normally takes care about database tables as well, it does not care about the default character encoding setting of the database those tables are in.
So when your plugin/code adds a database table your own, you need to take care about the encoding settings as well - because by default they will be the database default you create the table in, which most likely is latin-1 which does not work well for your language.
To set the default character set for the database (replace "wpdb" with your database name if it varies):
ALTER DATABASE wpdb CHARACTER SET utf8 COLLATE utf8_general_ci;
To change the character set for your existing table *"reg_form"*:
ALTER TABLE reg_form CONVERT TO CHARACTER SET charset_name;
Note: Backup your database first.
HOLLY SHIT!! FINALLY! : ))))))))
The problem was that I was using mysqli_ queries. Now I tried to change to mysql_ (notice the change!) queries and it worked!! Two weeks of haaaaard working and researches... Phew!
Now who can explain me properly the reasons of this phenomena? : ))

Black Diamonds that are Fixing themselves in MySQL

I am running into a very strange issue with a site that I am working on. The site is basically a job board where the owner or users can create job listings including a description that ends up being stored into a MySQL text field. What we are experiencing is this, whenever listings from certain sources are entered, they initially end up with the "Black Diamond" with a question mark inside character in place of apostrophes and double spaces. This part I know is an encoding issue and can correct. The real question is this, these black diamonds show when the record is displayed in a MySQL admin tool and when the job listing is viewed in a web browser (simple select statement displays the listing in a PHP app), but after the first time it is viewed, then the problem somehow fixes itself. It is like the running the select then displaying the record updates the job description field and fixes the encoding issues. How could this be? Has anyone ever heard of this or anything similar? I cannot understand how a database field would change without running an update statement...
How are the job listings entered? Are they entered via a web page? If so, what character encoding does the web page use? (This should determine the character encoding of the submitted data AFAIK.) What character set is the connection used to communicate with MySQL? What is the character set of the column the data is stored in? Finally, what is the character encoding of the web page(s) on which the entered data is reviewed?
Here is what I do: I declare all of my pages as UTF-8 encoded, using the following tag at the start of the <head> section:
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
I issue the following command immediately when I connect to MySQL, so as to make sure that MySQL understands the data I send to it will be UTF-8 encoded:
SET NAMES uft8
(Depending on the database abstraction method you use, a special function might be recommended in order to set the connection character set, like mysqli's mysqli_set_charset().)
I also make sure that those columns in which I intend to store UTF-8 data are declared to be UTF-8. You can find out what the character set of a column is by issuing SHOW CREATE TABLE table_name. The character set of the table (which by default is the character set for any column in the table) is displayed at the end. If the character set for the column is different to the default character set for the table then it is displayed as part of the column definition. If you wish to change the character set of a column then you can do so using ALTER TABLE.
If you have not previously taken the steps to handle character sets in your app then you may find that the tables are all using the latin1 character set. If you naively store UTF-8-encoded data (for example) into these columns, you may run into character encoding issues. Changing the column character set using ALTER TABLE does not necessarily fix your old data, because MySQL reads your old data assuming it to be valid latin1-encoded text and converts it to the eqivalent UTF-8 (correctly converting what it has read, but not giving the result you want).
The above steps would hopefully mean that future data will be correctly encoded and correctly displayed, but you may have data already mis-encoded in your database, so be aware that if you follow the above steps and still see older data displaying incorrectly, this may be why. Good luck.
Run into this problem a few years ago... I remember finding those notorious characters, and replacing them in php with a single quote or a double quote... Ofcourse with escaping... A simple preg_replace for those characters will do the trick... Its just an encoding issue...
This page, though geared for wordpress might help
http://codex.wordpress.org/Converting_Database_Character_Sets
I had the same issue (mysql encoding and webpage encoding set to UTF-8 but black diamonds showing up in my query results. I found this snippet while googling but cannot for the life of me find its source to give proper attribution:
if( function_exists('mysql_set_charset') ){
mysql_set_charset('utf8', $db_connection);
}else{
mysql_query("SET NAMES 'utf8'", $db_connection);
}
Anyway, it cleared up the issue for me.

Categories