Inserting a Japanese XML feed in to a MYSQL table via PHP - php

I've been looking for a solution to this issue and have tried most answers found. But with no luck was I able to insert the Japanese xml feed into the tables which have been created.
I tried setting all setting where UTF8 should be used as most answers have suggested - But no luck!!

Check that the connection character set is UTF-8 by making it issue SET NAMES utf8 as the first task it does.
Check that the page you are using to view the data is marked as UTF-8: do this either by sending a content-type header that specifies UTF-8, or by adding a meta tag that says so (I'm sorry, I don't know much about XML so I don't know how it is done).
Check that the data you are receiving is encoded as UTF-8. If it is encoded in some other character set but you are presenting it to MySQL as UTF-8, then it may get stored mis-encoded in the database, or MySQL may make the best sense of it that it can and garble/truncate it as a result.

Related

PHP/MySQL Encoding

I have a website, with arabic content which has been migrated from a different server. On the old server, everything was displaying correctly, supposedly everything was encoded with UTF-8.
On the current server, the data started displaying incorrectly, showing نبذة عن and similar characters.
The application is build on the CakePHP Framework.
After many trials, I changed the 'encoding' parameter in the MySql connection array to become 'latin1'. For the people who don't know CakePHP, this sets MySql's connection encoding. Setting this value to UTF8 did not change anything, even after the steps described below.
Some of the records started showing correctly in Arabic, while others remained gibberish.
I have already gone through all the database and server checks, confirming that:
The database created is UTF-8.
The table is UTF-8.
The columns are not explicitly set to any encoding, thus encoded in UTF-8.
Default Character set in PHP is UTF-8
mysql.cnf settings default to UTF-8
After that, I retrieved my data and looped through it, printing the encoding of each string (from each row) using mb_detect_encoding. The rows that are displaying correctly are returning UTF8 while it is returning nothing for the rows that are corrupt.
The data of the website has been edited on multiple types, possibly with different encodings, this is something I cannot know for sure. What I can confirm though, is that the only 2 encodings that this data might have passed through are UTF-8 and latin1.
Is there any possible way to recover the data when mb_detect_encoding is not returning anything and the current dataset is unknown?
UPDATE: I have found out that while the database was active on the new server, the my.cnf was updated.
The below directive was changed:
character-set-server=utf8
To
default-character-set=utf8
I am not sure how much this makes a difference though.
Checking the modified dates, I can conclude to a certain degree of certainty that the data I could recover was not edited on the new server, while the data I couldn't retrieve has been edited.
Try to fix the problem from DB side .. not from php or DB connection
I advice you to go to your old server and export your DB again with character set UTF8
then after import it to a new server .. be sure that you can see the arabic characters inside the tables(with phpmyadmin)
if your tables looks fine ..
then you can move to check the next
DB connection
php file encoding
the header encoding in html
as I know if the problem from the DB .. there is no way without export the data again from the old server
Edit:
if you do not have access to your old DB please check this answer it can help you
You were expecting نبذة عن? Mojibake. See duplicate for discussion and solution, including how to recover the data via a pair of ALTER TABLEs.
I had a similar problem with migrating database tables encoded with utf8 from a public server to localhost. The resolution was in setting the localhost server encoding using PHP
$db->set_charset("utf8")
right after the mysqli connection.
Now it works properly.

Is MySQL's collation solely used for sorting?

According to the official MySQL manual the collation used defines the order of records when sorting alphabetically:
http://dev.mysql.com/doc/refman/5.0/en/charset-general.html
However: I have a PHP script (UTF-8) and I save some foreign characters in my MySQL database it's saved all weird (first row). This is when the collation I choose is latin1_swedish_ci. When I change the collation to utf8_unicode_ci all is good (second row).
When saving this data everything is exactly the same except for the collation.
So how about that "collation is used solely for sorting records"?
How someone can clarify this for me :-) Thanks in advance!
It appears that the charset of your connection is not set right, therefore the conversion from the programming language charset to the database is not correct.
You should set the charset in your connection, then both will workfine.
as pointed out in the comments a little explanation on how things work.
when you have not set the character set in your connections, the server assumes it to be the same as the collocation of the database. when data is recieved in a another encoding, the data is written nevertheless. just with wrong or other characters than they have been in the encoding of the data from the script.
as long as nothing changes, the script gets back the same data as it has written and everything appears to be fine.
however when either the connection encoding or the database encoding is changed at this point, the already stored data gets converted to the new encoding. the problem here is that the source data is not in the encoding that is assumend when converting.
all encodings share the ascii set with the same bits, thats why ascii charactes dont mess up. only special charaters do.
so you have to set your conneciton encoding in order to dont produce the mess that you are already in.
now what can you do about the data you already have?
you can make a dump of your database using mysqldump and use the --skip-set-charset option. then you get a plaintext file. in this plane text file replace all occurences of the actual database charset with the one the data is really in (the one you had in your script when you wrote the data).
then save the file and make sure your editor does not do any conversion (i recommend vim).
then import that file and you will get a database with data in the correct encoding. then you can change the encoding however you like and as long as your conneciton charset gets set also you will be fine from now on.
also make sure that the mysql server has the charsets installed, but it should have that already.
this is only my approach, i have cleaned up a lot of messed up installations like that. most of which at some point have garbled characters in their projects (after switching server, updating or restoring a backup...).
turns out not setting the connection charset is something that is very often forgotten.

PHP mysql fixed connection to utf8, but now existing greek data is useless

I have a mysql database storing some fields in greek characters. In my html I have charset=utf-8 and my database columns are defined with encoding utf_general_ci. But I was not setting the connection encoding so far. As a result I have a database that doesn't display the greek characters well, but when reading back in PHP, it all shows well.
Now I try to do this the right way, so I added also in my database functions.
$mysqli->set_charset("utf8");
This works great for new entries.
But for existing entries, the problem is that when I read data in PHP, it comes garbled, since now the connection encoding has changed.
Is there a way to fix my data and make them useful again? I can continue working my old way, but I know it's wrong and can cause me more problems in the future.
I solved this issue as follows:
in a PHP script, retrieve the information as I do now, i.e without setting the connection. This way the mistake will be inverted and corrected and in your php file you will have the characters in the correct utf-8 format.
in the same PHP script, write back the information with setting the connection to utf-8
at this point the correct characters are in the database
I changed all my read/write functions of your site to use the utf-8 from now on

problems with character encoding pulled from mysql database [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
UTF-8 all the way through
okay, this is stupid that I can't figure it out.
Mysql database is set to utf8_general_ci collation. The field i'm having problems with is longtext type.
characters added to the database as &eacute or other accented characters are returning as �.
I run the output through stripslashes and i've tried both with and without html_entity_decode but can find no change in the output. What am I doing wrong?
Cheers
What character encoding does the string have that you try to insert? If it is in ISO-8859-1 you can use the PHP function utf8_encode() to encode it to UTF-8 before inserting it into the database.
http://php.net/manual/en/function.utf8-encode.php
Getting encoding right is really tricky - there are too many layers:
Browser
Page
PHP
MySQL
The SQL command "SET CHARSET utf8" from PHP will ensure that the client side (PHP) will get the data in utf8, no matter how they are stored in the database. Of course, they need to be stored correctly first.
DDL definition vs. real data
Encoding defined for a table/column doesn't really mean that the data are in that encoding. If you happened to have a table defined as utf8 but stored as differtent encoding, then MySQL will treat them as utf8 and you're in trouble. Which means you have to fix this first.
What to check
You need to check in what encoding the data flow at each layer.
Check HTTP headers, headers.
Check what's really sent in body of the request.
Don't forget that MySQL has encoding almost everywhere:
Database
Tables
Columns
Server as a whole
Client
Make sure that there's the right one everywhere.
Conversion
If you receive data in e.g. windows-1250, and want to store in utf-8, then use this SQL before storing:
SET NAMES 'cp1250';
If you have data in DB as windows-1250 and want to retreive utf8, use:
SET CHARSET 'utf8';
Last note:
Don't rely on too "smart" tools to show the data. E.g. phpMyAdmin does (was doing when I was using it) encoding really bad. And it goes through all the layers so it's hard to find out. Also, Internet Explorer had really stupid behavior of "guessing" the encoding based on weird rules. Use simple editors where you can switch encoding. Also, I recommend MySQL Workbench.

PHP charset accents issue

I have a form in my page for users to leave a comment.
I'm currently using this charset:
meta http-equiv="Content-Type" content="text/html;charset=ISO-8859-1"
but retrieveving the comment from DB accents are not displaying correct ( Ex. è =>è ).
Which parameters should i care about for a correct handling of accents?
SOLVED
changed meta tag to charset='utf-8'
changed character-set Mysql (ALTER TABLE comments CONVERT TO CHARACTER SET utf-8)
changed connection character-set both when inserting records and retrieving ($conn->query('SET NAMES utf8'))
Now accents are displaying correct
thanks
Luca
Character sets can be complicated and pain to debug when it comes to LAMP web applications. At each of the stages that one piece of software talks to another there's scope for incorrect charset translation or incorrect storage of data.
The places you need to look out for are:
- Between the browser and the web server (which you've listed already)
- Between PHP and the MySQL server
The character you've listed look like normal a European character that will be included in the ISO-8859-1 charset.
Things to check for:
even though you're specifying the character set in a meta header have a look in your browser to be sure which character set the browser is actually using. If you've specified it the browser should use that charset to render/view the page but in cases I've seen it attempting to auto-detect the correct charset and failing. Most browsers will have an "encoding" menu (perhaps under "view") that allows you to choose the charset. Ensure that it says ISO-8859-1 (Western European).
MySQL can happily support character set conversion if required to but in most cases you want to have your tables and client connection set to use the same encoding. When configured this way MySQL won't attempt to do any encoding conversion and will just write the data you input byte for byte into the table. When read it'll come out the same way byte for byte.
You've not said if you're reading data from the database back out with the same web-app or with some other client. I'd suggest you try to read it out with the same web application and using the same meta charset header (again, check the browser is really setting it) and see what is displayed in the browser.
To debug these issues requires you to be really sure about whether the client/console you're using is doing any conversion too, the safest way is sometimes to get the data into a hex editor where you can be sure that nothing else is messing around with any translation.
If it doesn't look like it's a browser-side problem please can you include the output of the following commands against your database:
Run from a connection that your web-app makes (not from some other MySQL client):
SHOW VARIABLES LIKE 'character_set%';
SHOW VARIABLES LIKE 'collation%';
Run from any MySQL client:
SHOW CREATE TABLE myTable;
(where myTable is the table you're reading/writing data from/to)
The ISO-8859-1 character set is for Latin characters only. Try UTF-8, and make sure that the database these characters are coming from are also UTF-8 columns.

Categories