I have a few textfiles which are input for a MySQL database. These textfiles contain characters like é and ë. I have struggled getting the data properly into the database and now it seems I've finally got it right. However, I would like to know if there is a better way to do this than the way I describe here.
The textfiles are all UTF-8 encoded.
The PHP scripts are all UTF-8 encoded as well. I've read that this is very important.
All HTML output is done using a header like this: <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
The MySQL database is created using a collation of latin1_swedish_ci (the character set is left blank)
All the columns that contain characters (VARCHAR) are defined using a collation of latin1_swedish_ci
I assume the right way to store url encoded strings is when I see the character é stored as %C3%A9 in the database. I found a MySQL function for urlencoding here.
But when I open up phpMyAdmin I see the character é is presented as %C3%A3%C2%A9.
I can add another statement to replace characters in the database, but something tells me there is a more efficient way to achieve this.
Any help is greatly appreciated. Thanks in advance.
What is missing from your list of 5 things is
I tell mysql that the client bytes are utf8-encoded. I do this via $mysqli_obj->set_charset('utf8'); or new PDO('dblib:host=host;dbname=db;charset=UTF8', $user, $pwd); or SET NAMES utf8. (or utf8mb4).
The client sees utf8, the table sees latin1; the conversion will occur when INSERTing and SELECTing, but it needs #6 to know to do so.
Related
There are so many threads dedicated to this topic, that I feel silly having to ask this.
But, I'm at a total loss as to what the problem could be.
I am trying to insert special characters (cyrillic, scandinavian, etc) into a MySQL database, via PHP (html) form.
Characters like : Ä,Ö,Å, as well as russian alphabets, etc.
Based on previous threads in this forum, I have tried all the following (inserted right after the MySQL database-connection string) :
mysqli->set_charset("utf8");
This didn't work, so I tried the following :
mysqli_query("set names 'utf8'");
mysqli_query("set charset 'utf8'");
These are not recommended by PHP. But, I tried them anyway, but still no luck.
(All my databases, tables, and columns are collated as : UTF8_general_ci)
In addition, all my html forms have the following :
<meta charset="utf-8">
So, I'm at a complete loss as to what I'm doing wrong. Once the data is sent to the database, it shows up (in the database itself) as rubbish characters (question marks, and other hieroglyphics).
However, the funny thing is :
(a) When I view the data on my website, it displays correctly;
(b) When the data is sent within the body of an email, it also displays correctly
So..........why is it not displaying correctly within the database itself ??
When dealing with specific charset (like UTF-8), it's important that the entire line of code is set to the same charset. Below are a few pointers how to follow this.
ALL attributes must be set to ut8 (collation is NOT the same as charset in the database)
You should save the document itself as UTF-8 (If you're using Notepad++, it's Format -> Convert to UFT-8 (or UTF-8 w/o BOM), there's a difference - both or either may work for you)
The header in both PHP and HTML should be set to UTF-8:
HTML: <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
PHP: header('Content-Type: text/html; charset=utf-8');
Upon connecting to the databse, set the charset ti UTF-8, like this:
$connection->set_charset("utf8"); (directly after connecting)
Also make sure your database and tables are set to UTF-8, you can do that by this query (in the database, need only be done once):
ALTER DATABASE databasename CHARACTER SET utf8 COLLATE utf8_unicode_ci;
ALTER TABLE tablename CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
Remember that EVERYTHING needs to be set to UFT-8 charcode. If something can be set to UFT-8 (or another charset, check the PHP-docs (php.net)), it should be set to the same charset as everything else.
(a) When I view the data on my website, it displays correctly;
(b) When the data is sent within the body of an email, it also displays correctly
This means data is correctly stored in the db, when you get the output is the same like the input, logically correct?
The other question is: How are you looking into the database, which kind of client are you using?
PHPMyAdmin, SomeDesktop Client.. The problem will be there.. because the data is stored right.. seems so ;)
I've developed an PHP/MySQL-application where in one table names are stored. These names sometimes contain special characters (like é, à, ë, ...).
When creating the table I had forgotten to set the collocation-item to UTF-8 and now is set to LATIN1_SWEDISH_CI.
So some data isn't displayed correct in phpMyAdmin. But when I show the names on a PHP-page, those special characters are displayed correctly. Here's an extract from a PHP-file where I use UTF-8
<?php ... ?>
<html>
<head>
<meta http-equiv="Content-Type" content-"text/html; charset="UTF-8">
....
Like I said the special characters are displayed as it should. So far... no problem.
But now I would like to export that data into an CSV-file and guess what? The special characters aren't included in the CSV-file.
My PHP-export-file contains the following lines of code:
<?php
mysql_query("SET NAMES utf8");
header('Content-Type: text/html; charset=UTF-8');
...
But no special characters are displayed?
Does anyone have a solution for this problem? Because I find it a little ridiculous to open the CSV in Excel and use 'Find & Replace'.
Using the HTML escape-codes is out of the question. That's why there's UTF-8, not?
You have stored UTF-8 encoded data which MySQL regards as Latin-1 data. MySQL does not complain about this because any arbitrary sequence of bytes is valid Latin-1. Because the connection character set of the connection used to retrieve the data is the same as that used to insert it, the correct data is displayed on your web page. But if you view the data in a utility that takes pains to display the actually stored characters, you will see mis-encoded text, because that is what you actually have stored.
There are two things you need to do: firstly, you need to change your database connection code to make sure that all connections you make to your database are using the UTF-8 character set. This can be accomplished using a settings file or just by issuing a SET NAMES statement every time you connect.
Secondly, you need to correct the mis-encoded data already stored in the database. Do not alter table to change the character set to UTF-8 directly; if you do, you will end up with double-UTF-8-encoded data. Instead, use an alter table query to change the column to the binary character set, and after doing that, alter table again to UTF-8.
I am pulling comments out of the database and have this, �, show up... how do I get rid of it? Is it because of whats in the database or how I'm showing it, I've tried using htmlspecialchars but doesn't work.
Please help
The problem lies with Character Encoding. If the character shows up fine in the database, but not on the page. Your page needs to be set to the same character encoding as the database. And vice a versa, if your page that posts to the database character encoding does not match, well it comes out weird.
I generally set my character encoding to UTF-8 for any type of posting fields, such as Comments / Posts. Most MySQL databases default to the latin charset. So you will need to modify that: http://yoonkit.blogspot.com/2006/03/mysql-charset-from-latin1-to-utf8.html
The HTML part can be done with a META tag: <META http-equiv="Content-Type" content="text/html; charset=UTF-8">
or with PHP: header('Content-type: text/html; charset=utf-8'); (must be placed before any output.)
Hopefully that gets the ball rolling for you.
That happens when you have a character that your font doesn't know how to display. It shows up differently in every program, many Windows programs show it as a box, Firefox shows it as a questionmark in a diamond, other programs just use a plain question mark.
So you can use a newer display system, install a missing font (like if it's asian characters) or look to see if it's one or two characters that do this and just replace them with something visible.
It might be problem of the way you are storing the information in the database. If the encoding you were using didn't accept accents (à, ñ, î, ç...), then it stores them using weird symbols. Same happens to other language specific symbols. There is probably not a solution for what's already in the database, but you can still save the following inserts by changing the encoding type in mysql.
Cheers
Make sure your database UTF-8 (if it won't solve the problem make sure you specify your char-set while connecting to the database).
You can also encode / decode before entering data to your database.
I would suggest to go with htmlspecialchars() for encoding and htmlspecialchars_decode() for decoding.
Are you passing your charset in mysql_set_charset() with mysql_connect() ???
As others have said, check what your database encoding is. You could try using utf8_encode() or iconv() to convert your character encoding.
Check your code for errors. That's all one can really say considering that you have given us absolutely no details as to what you're doing.
Encoding problems are usually what cause that (are you converting from integers to characters?), so, you fix it by checking if you're converting things properly.
I have turkish character problem in mysql database when adding content with tinymce from admin panel.
Charset is:
<meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-9"" />
How can I solve this?
Thanks in advance
Make sure the table in MySQL is also defined as having charset ISO-8859-9.
There's not enough information to say what your problem is, but in general you need the same character set in your HTML page (text/html;charset), PHP's connection to the database (mysql_set_charset), and MySQL's CREATE TABLE ... DEFAULT CHARACTER SET (if you just CREATE TABLE it will end up in Latin-1 which you probably don't want. Plus you would need to make sure not to use htmlentities-without-charset-argument on output (use htmlspecialchars instead).
See eg. this answer for more detail. That's talking about using UTF-8 for the encoding, but the same applies if you substitute ISO-8859-9 all the way through. (Although unless there's a good reason not to, you should really be using UTF-8.)
well I had a similar problem with my turkish site.
My tables were in latin5_turkish_ci an the charset of the php page were latin5
there was no problem when I submitted the content via php to database, all characters were being saved correctly
but when I tried to submit the content via jquery post method then any turkish character was being saved correctly to database
and php iconv function solved my problem
Recently I worked in a project in where I need to display japanese text which are come from database. I already use
meta http-equiv="Content-Type" content="text/html; charset=utf-8"
It help to display the static text. But when it come from database it display "??????????" type text.
How can I solve this kind of problem?
Is the database charset UTF8 too? Is the connection charset UTF8? Seems like the data gets converted to ISO-8859-1 somewhere along the way.
Without more information, it is hard to find exactly what the problem is. What DBMS are you using? MySQL? PostgreSQL? Either way, I'm pretty sure either your database and/or your connection isn't using UTF8.
You can change your connection charset by using one of the following functions:
mysql_set_charset('UTF-8');
pg_set_client_encoding('UTF-8');