Strings stored in a wrong charset on database - php

I have a problem with varchar elements stored in a database.
In order to be printed in the correct way they have been stored in a strange collapse that i can not understand, see this example:
Data Stored incorrectly
You can notice that €™ in the strings. It should be an "è". Same things happen for ' and for "à". How can i change the character set of the database in order to display properly the varchar elements?
Currently settings are:
Storage engine: MyISAM
Collation: latin1_swedish_ci
Thank you very much. I really need help. I tried a lot of collations but nothing seems to work.

As far as I know, collation is for ordering and comparison, not recording.
You might be looking for encoding instead as it is responsible for the characters itself.
I believe that your issue is that you have the wrong encoding. Before you you set it to UTF8, please change your collation to utf8_unicode_ci.
Please attempt to run this:
SET NAMES utf8;
That should resolve your issue for new records. The old ones will remain untouched.

Related

can not synch PHP and sql charsets

I have a confusing problem with php and css.
I created a database in phpmyadmin. All fields collation is utf8_general_ci. Also my table and database use the same collation as the fields. I tried every way to store data in database like using VALUES (N'content').
the most confusing problem is when I insert data using php my admin it shows at the webpage with question marks. When I store with PHP with the same SQL code it shows fine at the webpage but in the database strings shows like this:
تست Ùارسی
I really tried everything, can anyone help me through this problem?
utf8_general_ci isn't a character set, it's a collation. The collation determines how MySQL will decide how to order things when you ask it to sort alphabetically, it doesn't set the actual charset. Look at the table and the database character set, it should be UTF8 or UTF8MB4. If it's not then you've got a character set mismatch.

Character Encoding & Databases

I am having a big problem with character encoding accross my domains. The big thing for me really is that I don't understand it. I set all my websites to be utf-8 using the meta tag:
<meta charset="UTF-8">
Which seemed to have solved a few problems a while back. Now I am seeing problems with between the website and the database, when a user enters their first or last name, and it has an accent in it, it doesnt display correctly. However I ran the following test.
I created a test table called 'test' (imaginative I know)
I wrote a very tiny script to take the value from a text box and put it into this table and then display the contents of this table each time this page displays, so I could see what is going on.
Here are some screen shots, first, from the output of the page:
And then a screenshot of the database itself:
So the type of column first is VARCHAR(50), I just left the settings as I would do normally, and the character encoding was latin_swedish1 or something. After id 4, I changed it utf8_bin, but that still didnt make a difference.
The problem is, the data still display okay on the website, but looks terrible in the database. Is that a problem, is this how it should be done? I think the problems I the users are complaining about are when it is put into emails and PDFs etc, which I don't think I set character encoding on.
Any help and advice would be greatly welcomed.
Make sure to have the charset is utf8 when you created the table.
create table my_table (
id int primary key,
.....
) engine=innodb charset=utf8;
and make sure that your connection is setting the charset for utf8. it depends on each framework you're using.
You can set internal character to utf-8 /* Set internal character encoding to UTF-8 */
mb_internal_encoding("UTF-8"); check http://php.net/manual/en/function.mb-internal-encoding.php
Try to use utf8_general_ci. This will solve troubles.
When the3 column type was assigned, how was it assigned? I would ensure that it was along the lines of VARCHAR(n) CHARSET utf8 as that would make your column type correct for the UTF8 standard, more normally I see the UCS2 (VARCHAR(n) CHARSET ucs2) which can then cause problems later down the line.
you can set this using NVARCHAR ( see http://dev.mysql.com/doc/refman/5.0/en/charset-national.html) since sql 5.
To change the type of a column on its own you can use this type of command:
Alter table tablenamehere MODIFY columnidhere newdatatypehere;
for example
Alter table mytabelforphp MODIFY oldvarcharcolumnId nvarchar(1024);
Let me know if you need more info:)
also add your character encoding to database connection
if you have mysqli connection use :
mysqli::set_charset
if you use PDO follow this:
PDO_MYSQL DSN

Is MySQL's collation solely used for sorting?

According to the official MySQL manual the collation used defines the order of records when sorting alphabetically:
http://dev.mysql.com/doc/refman/5.0/en/charset-general.html
However: I have a PHP script (UTF-8) and I save some foreign characters in my MySQL database it's saved all weird (first row). This is when the collation I choose is latin1_swedish_ci. When I change the collation to utf8_unicode_ci all is good (second row).
When saving this data everything is exactly the same except for the collation.
So how about that "collation is used solely for sorting records"?
How someone can clarify this for me :-) Thanks in advance!
It appears that the charset of your connection is not set right, therefore the conversion from the programming language charset to the database is not correct.
You should set the charset in your connection, then both will workfine.
as pointed out in the comments a little explanation on how things work.
when you have not set the character set in your connections, the server assumes it to be the same as the collocation of the database. when data is recieved in a another encoding, the data is written nevertheless. just with wrong or other characters than they have been in the encoding of the data from the script.
as long as nothing changes, the script gets back the same data as it has written and everything appears to be fine.
however when either the connection encoding or the database encoding is changed at this point, the already stored data gets converted to the new encoding. the problem here is that the source data is not in the encoding that is assumend when converting.
all encodings share the ascii set with the same bits, thats why ascii charactes dont mess up. only special charaters do.
so you have to set your conneciton encoding in order to dont produce the mess that you are already in.
now what can you do about the data you already have?
you can make a dump of your database using mysqldump and use the --skip-set-charset option. then you get a plaintext file. in this plane text file replace all occurences of the actual database charset with the one the data is really in (the one you had in your script when you wrote the data).
then save the file and make sure your editor does not do any conversion (i recommend vim).
then import that file and you will get a database with data in the correct encoding. then you can change the encoding however you like and as long as your conneciton charset gets set also you will be fine from now on.
also make sure that the mysql server has the charsets installed, but it should have that already.
this is only my approach, i have cleaned up a lot of messed up installations like that. most of which at some point have garbled characters in their projects (after switching server, updating or restoring a backup...).
turns out not setting the connection charset is something that is very often forgotten.

Converting latin1_swedish_ci to utf8 with PHP

I have a database filled with values like ♥•â—♥ Dhaka ♥•â—♥ (Which should be ♥•●♥ Dhaka ♥•●♥) as I didnt specify the collation while creating the database.
Now I want to Fix it. I cannot fetch the data again from where I got it from at the first place. So I was thinking if it might be possible to fetch the data in a php script and convert it to the correct characters.
I've changed the collation of the database and the fields to utf8_general_ci..
The collation is NOT the same as the character set. The collation is only used for sorting and comparison of text (that's why there's a language term in there). The actual character set may be different.
The most common failure is not in the database but rather in the connection between PHP and MySQL. The default charset for the connection is usually ISO-8859-1. You need to change that the first thing you do after connecting, using either the SQL query SET NAMES 'utf-8'; or the mysql_set_charset function.
Also check the character set of your tables. This may be wrong as well if you have not specified UTF-8 to begin with (again: this is not the same as the collation). But make sure to take a backup before changing anything here. MySQL will try to convert the charset from the previous one, so you may need to reload the data from backup if you have actually saved UTF-8 data in ISO-8859-1 tables.
I would look into mb_detect_encoding() and mb_convert_encoding() and see if they can help you.

Help with multi-lingual text, php, and mysql

I have had no end of problems trying to do what I thought would be relatively simple:
I need to have a form which can accept user input text in a mix of English an other languages, some multi-byte (ie Japanese, Korean, etc), and this gets processed by php and is stored (safely, avoiding SQL injection) in a mysql database. It also needs to be accessed from the database, processed, and used on-screen.
I have it set up fine for Latin chars but when I add a mix of Latin andmulti-byte chars it turns garbled.
I have tried to do my homework but just am banging my head against a wall now.
Magic quotes is off, I have tried using utf8_encode/decode, htmlentities, addslashes/stripslashes, and (in mysql) both "utf8_general_ci" and "utf8_unicode_ci" for the field in the table.
Part of the problem is that there are so many places where I could be messing it up that I'm not sure where to begin solving the problem.
Thanks very much for any and all help with this. Ideally, if someone has working php code examples and/or knows the right mysql table format, that would be fantastic.
Here is a laundry list of things to check are in UTF8 mode:
MySQL table encoding. You seem to have already done this.
MySQL connection encoding. Do SHOW STATUS LIKE 'char%' and you will see what MySQL is using. You need character_set_client, character_set_connection and character_set_results set to utf8 which can easily set in your application by doing SET NAMES 'utf8' at the start of all connections. This is the one most people forget to check, IME.
If you use them, your CLI and terminal settings. In bash, this means LANG=(something).UTF-8.
Your source code (this is not usually a problem unless you have UTF8 constant text).
The page encoding. You seem to have this one right, too, but your browsers debug tools can help a lot.
Once you get all this right, all you will need in your app is mysql_real_escape_string().
Oh and it is (sadly) possible to successfully store correctly encoded UTf8 text in a column with the wrong encoding type or from a connection with the wrong encoding type. And it can come back "correctly", too. Until you fix all the bits that aren't UTF8, at which point it breaks.
I don't think you have any practical alternatives to UTF-8. You're going to have to track down where the encoding and/or decoding breaks. Start by checking whether you can round-trip multi-language text to the data base from the mysql command line, or perhaps through phpmyadmin. Track down and eliminate problems at that level. Then move out one more level by simulating input to your php and examining the output, again dealing with any problems. Finally add browsers into the mix.
First you need to check if you can add multi-language text to your database directly. If its possible you can do it in your application
Are you serializing any data by chance? PHPs serialize function has some issue when serializing non-english characters.
Everything you do should be utf-8 encoded.
One thing you could try is to json_encode() the data when putting it into the database and json_decoding() it when it's retrieved.
The problem was caused by my not having the default char set in the php.ini file, and (possibly) not having set the char set in the mysql table (in PhpMyAdmin, via the Operations tab).
Setting the default char set to "utf-8" fixed it. Thanks for the help!!
Check your database connection settings. It also needs to support UTF-8.

Categories