PHP MYSQL Collation Speical Characters XML->PHP->MYSQL - php

I am trying to import data from an XML file into a MYSQL DB using PHP. I am able to get the code to work just fine but when I look at the data in the DB there are special characters. For example, when I look at the XML in my browser it shows up as "outdoors in good weater..." but in the DB it appears to as "outdoors in good weather…".
I've cycled through all the different types of collation for that field in my DB but it does not seem to help much. Sometimes it shows up with the characters mentioned above and others as ???.
I have also tried to sync up the data with the following code in my PHP
$mysqli->query("SET NAMES 'utf8' COLLATE 'utf8_general_ci'");
But, again I have had no luck.
Thank you for reading this and for your help!
Akshay

You need to change the character set to UTF-8, along with your collation:
ALTER TABLE tablename CONVERT TO CHARACTER SET utf8 COLLATE utf8_general_ci;
What you are seeing is a Unicode ellipsis character (…) being converted into another character set, which is probably Latin1. That is why it looks garbled.

Related

Bulk insert string containing Russian

I am converting a spreadsheet using PHPExcel to a Database and the cell value happens to contain Russian. If I run mb_detect_encoding() I am told the text is UTF8 and if I set a header of UTF8 then I see the correct Russian characters.
However if I compile it into a string (with only addslashes involved in the process) and insert it into the table I see lots of ????. I have set the table characterset as utf8mb4 and also set the collation as utf8mb4_general_ci. I have also run $this->db->query("SET NAMES 'utf8mb4'"); on my DB connection.
I run PDO query() with my multi part insert and get the ???s but if I output the query to screen I get ÐŸÐ¾Ñ which would be valid UTF8. Why would this not be stored correctly in the database?
I have kept this question rather than deleting it so someone may find the answer helpful.
The reason I was struggling was because in SQLYog it doesn't show you the column Charset by default. There is an option which reads "Hide language options" on the Alter table view which will then reveal that when SQLyog creates a table it uses the default server Charset as opposed to what you define the table Charset to be. I'm not sure if thats correct - but the solution simply is to turn on the Column Charset settings and check they match what you are expecting.
По is Mojibake for По. Probably...
The bytes you have in the client are correctly encoded in utf8 (good).
You connected with SET NAMES latin1 (or set_charset('latin1') or ...), probably by default. (It should have been utf8.)
The column in the tables may or may not have been CHARACTER SET utf8, but it should have been that.
The question marks imply...
you had utf8-encoded data (good)
SET NAMES latin1 was in effect (default, but wrong)
the column was declared CHARACTER SET latin1 (default, but wrong)
One way to help diagnose the problem(s) is to run
SELECT col, HEX(col) FROM tbl WHERE ...
For По, the hex should be D09FD0BE. Each Cyrillic character, in utf8, is hex D0xx.

German characters ü ö ä Ä Ü Ö ß not saved properly in database

When users try to save their name in german, they're saved like this:
Markus Müller ( Markus Müller)
Angela Eisenbl�tter ( Angela Eisenblätter )
Doris Vötter ( Doris Vötter )
I have inspected the values just before saving them with firebug and they show normally. But when saved they show like above.
The structure of my table is this
name varchar(250) utf8_unicode_ci
email varchar(250) utf8_unicode_ci
company varchar(250) utf8_unicode_ci
reading int(11)
rdate timestamp
Please help me
update
$con=mysqli_connect("localhost","englisch_root","b00t","englisch_efront");
mysql_set_charset('utf8', $con);
after i have added like this it give fullowing error
Warning: mysql_set_charset() expects parameter 2 to be resource,
Replace mysql_set_charset('utf8'); with mysqli_set_charset($con, 'utf8'); (or $con->set_charset('utf8');). You can't mix functions relative to databases of different PHP extensions (mysql vs mysqli): they work on different connections so they are mutually incompatible.
Notes:
MySQL uses utf8, not utf-8
never execute directly a SET NAMES statement, this is not safe:
If you must change the character set of the connection, use the mysql_set_character_set() function rather than executing a SET NAMES (or SET CHARACTER SET) statement. mysql_set_character_set() works like SET NAMES but also affects the character set used by mysql_real_escape_string(), which SET NAMES does not.
(from MySQL's documentation about mysql_real_escape_string, the C function behind mysql(i)_set_charset PHP functions)
Using mysqli_query($link,"SET CHARACTER SET utf8"); BEFORE the query solved the issue in my case. P.s. i suggest using mysqli_set_charset.
User table column data is in ASCII instead of utf8_unicode_ci.
Two issues...
SELECT HEX(col), col FROM ... WHERE ... -- For "Müller", you should get 4DC3BC6C6C6572 if it is correctly stored as utf8. If you don't get that, then you have either latin1 or a "double encoding", and fixing the data will be more complex.
If you are displaying this on a web page, you need a suitable tag near the top.
First Read this http://www.joelonsoftware.com/articles/Unicode.html For clarity of encoding.
Then follow this article http://www.toptal.com/php/a-utf-8-primer-for-php-and-mysql
You need to ensured that you are using UTF-8 encoding at all the places Including. Html page, Database schema, table, column collation, Db Connection etc.

Webpage outputting question marks in place of unicode characters, despite character sets and collation being correct?

When I fetch a record from my database, where the database, the table, and the row are all set to utf8_unicode_ci, I recieve a question boxed in a diagonal square in place of the correct unicode character; this is despite me also setting the HTML encoding on the page with:
<meta charset="utf8">
I have a suspicion however it is to do with MySQL/PHP though because when I print_r the output the question marks are still displaying while a manually entered degree symbol (the symbol I should be seeing) works fine.
This SQL query also did nothing:
SET NAMES utf8;
Any ideas? I've checked every end of my setup.
utf8_unicode_ci is the collation, you need the character set as utf8 as example:
CREATE TABLE someTable DEFAULT CHARACTER SET utf8 DEFAULT COLLATE utf8_unicode_ci;
As adrienne states in their answer here:
make sure that all of the following are true:
The DB connection is using UTF-8
The DB tables are using UTF-8
The individual columns in the DB tables are using UTF-8
The data is actually stored properly in the UTF-8 encoding inside the database (often not the case if you've imported from bad sources,
or changed table or column collations)
The web page is requesting UTF-8
Apache is serving UTF-8

Advice on converting ISO-8859-1 data to UTF-8 in MySQL

We have a very large InnoDB MySQL 5.1 database with all tables using the latin1_swedish_ci collation. We want to convert all of the data which should be in ISO-8859-1 into UTF-8. How effective would changing the collation to utf8_general_ci be, if at all?
Would we be better off writing a script to convert the data and inserting into a new table? Obviously our goal is to minimise the risk of losing any data when re-encoding.
Edit: We do have accented character's, £ symbols etc.
If the data is currently using only latin characters and you are just wanted to change the character set and collation to UTF8 to enable future addition of UTF-8 data, then there should be no problem simply changing the character set and collation. I would do it in a copy of the table first of course.
About a week ago I had to do the same task (issues with ö, ä, å)
Created a dump.sql.
Searched and replaced all CHARSET=latin1 to CHARSET=utf8 (in the dump.sql).
searched and replaced all COLLATE=latin1_swedish_ci to COLLATE=utf8_unicode_ci (in the dump.sql).
Created a new database with the collation utf8_unicode_ci.
Imported the dump.sql.
Altered the the database's charset with alter database MY_DB charset=utf8;
and it worked perfectly
Note: after Mike Brant's remark, I think it's better better to do manual searching and replace for the fields you specifically want. Or you can simply use ALTER for each field without needing the dump.sql. It didn't make much change in my case, as most of my fields needed to be utf encoded

inserting latin1-encoded text into utf8 tables (forgot to use mysql_set_charset)

I have a PHP web app with MySQL tables taking utf8 text. I recently converted the data from latin1 to utf8 along with the tables and columns accordingly. I did, however, forget to use mysql_set_charset and the latest incoming data I would assume came through the MySQL connection as latin1. I don't know what happens when latin1 comes in to a utf8 column, but it's causing some strange display issues for items like comma, quotes, ampersand, etc.
Now that mysql_set_charset is in place, it is pulling the data out with funky characters. Any way to convert the latin1-utf8 soup to straight utf8 now that i have the database connection resource using the correct charset?
Found the fix with your comment. Here was the SQL line that seemingly has solved my issue.
UPDATE table SET col = CONVERT(CONVERT(CONVERT(col USING latin1) USING binary) using utf8);
Even though the column is UTF8, it forces it to pull the data out as latin1, convert to binary, convert to utf8 and re-insert.

Categories