There are so many threads dedicated to this topic, that I feel silly having to ask this.
But, I'm at a total loss as to what the problem could be.
I am trying to insert special characters (cyrillic, scandinavian, etc) into a MySQL database, via PHP (html) form.
Characters like : Ä,Ö,Å, as well as russian alphabets, etc.
Based on previous threads in this forum, I have tried all the following (inserted right after the MySQL database-connection string) :
mysqli->set_charset("utf8");
This didn't work, so I tried the following :
mysqli_query("set names 'utf8'");
mysqli_query("set charset 'utf8'");
These are not recommended by PHP. But, I tried them anyway, but still no luck.
(All my databases, tables, and columns are collated as : UTF8_general_ci)
In addition, all my html forms have the following :
<meta charset="utf-8">
So, I'm at a complete loss as to what I'm doing wrong. Once the data is sent to the database, it shows up (in the database itself) as rubbish characters (question marks, and other hieroglyphics).
However, the funny thing is :
(a) When I view the data on my website, it displays correctly;
(b) When the data is sent within the body of an email, it also displays correctly
So..........why is it not displaying correctly within the database itself ??
When dealing with specific charset (like UTF-8), it's important that the entire line of code is set to the same charset. Below are a few pointers how to follow this.
ALL attributes must be set to ut8 (collation is NOT the same as charset in the database)
You should save the document itself as UTF-8 (If you're using Notepad++, it's Format -> Convert to UFT-8 (or UTF-8 w/o BOM), there's a difference - both or either may work for you)
The header in both PHP and HTML should be set to UTF-8:
HTML: <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
PHP: header('Content-Type: text/html; charset=utf-8');
Upon connecting to the databse, set the charset ti UTF-8, like this:
$connection->set_charset("utf8"); (directly after connecting)
Also make sure your database and tables are set to UTF-8, you can do that by this query (in the database, need only be done once):
ALTER DATABASE databasename CHARACTER SET utf8 COLLATE utf8_unicode_ci;
ALTER TABLE tablename CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
Remember that EVERYTHING needs to be set to UFT-8 charcode. If something can be set to UFT-8 (or another charset, check the PHP-docs (php.net)), it should be set to the same charset as everything else.
(a) When I view the data on my website, it displays correctly;
(b) When the data is sent within the body of an email, it also displays correctly
This means data is correctly stored in the db, when you get the output is the same like the input, logically correct?
The other question is: How are you looking into the database, which kind of client are you using?
PHPMyAdmin, SomeDesktop Client.. The problem will be there.. because the data is stored right.. seems so ;)
Related
I have a few textfiles which are input for a MySQL database. These textfiles contain characters like é and ë. I have struggled getting the data properly into the database and now it seems I've finally got it right. However, I would like to know if there is a better way to do this than the way I describe here.
The textfiles are all UTF-8 encoded.
The PHP scripts are all UTF-8 encoded as well. I've read that this is very important.
All HTML output is done using a header like this: <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
The MySQL database is created using a collation of latin1_swedish_ci (the character set is left blank)
All the columns that contain characters (VARCHAR) are defined using a collation of latin1_swedish_ci
I assume the right way to store url encoded strings is when I see the character é stored as %C3%A9 in the database. I found a MySQL function for urlencoding here.
But when I open up phpMyAdmin I see the character é is presented as %C3%A3%C2%A9.
I can add another statement to replace characters in the database, but something tells me there is a more efficient way to achieve this.
Any help is greatly appreciated. Thanks in advance.
What is missing from your list of 5 things is
I tell mysql that the client bytes are utf8-encoded. I do this via $mysqli_obj->set_charset('utf8'); or new PDO('dblib:host=host;dbname=db;charset=UTF8', $user, $pwd); or SET NAMES utf8. (or utf8mb4).
The client sees utf8, the table sees latin1; the conversion will occur when INSERTing and SELECTing, but it needs #6 to know to do so.
I've developed an PHP/MySQL-application where in one table names are stored. These names sometimes contain special characters (like é, à, ë, ...).
When creating the table I had forgotten to set the collocation-item to UTF-8 and now is set to LATIN1_SWEDISH_CI.
So some data isn't displayed correct in phpMyAdmin. But when I show the names on a PHP-page, those special characters are displayed correctly. Here's an extract from a PHP-file where I use UTF-8
<?php ... ?>
<html>
<head>
<meta http-equiv="Content-Type" content-"text/html; charset="UTF-8">
....
Like I said the special characters are displayed as it should. So far... no problem.
But now I would like to export that data into an CSV-file and guess what? The special characters aren't included in the CSV-file.
My PHP-export-file contains the following lines of code:
<?php
mysql_query("SET NAMES utf8");
header('Content-Type: text/html; charset=UTF-8');
...
But no special characters are displayed?
Does anyone have a solution for this problem? Because I find it a little ridiculous to open the CSV in Excel and use 'Find & Replace'.
Using the HTML escape-codes is out of the question. That's why there's UTF-8, not?
You have stored UTF-8 encoded data which MySQL regards as Latin-1 data. MySQL does not complain about this because any arbitrary sequence of bytes is valid Latin-1. Because the connection character set of the connection used to retrieve the data is the same as that used to insert it, the correct data is displayed on your web page. But if you view the data in a utility that takes pains to display the actually stored characters, you will see mis-encoded text, because that is what you actually have stored.
There are two things you need to do: firstly, you need to change your database connection code to make sure that all connections you make to your database are using the UTF-8 character set. This can be accomplished using a settings file or just by issuing a SET NAMES statement every time you connect.
Secondly, you need to correct the mis-encoded data already stored in the database. Do not alter table to change the character set to UTF-8 directly; if you do, you will end up with double-UTF-8-encoded data. Instead, use an alter table query to change the column to the binary character set, and after doing that, alter table again to UTF-8.
In districts table
I have a row as
district_id district_name country_id
15 Šahty 16
While selecting from php and displaying in browser,it shows like this :�ahty
I am using mssql 2005 with collation SQL_Latin1_General_CP1_CI_AS.
The problem is something like this
removing accent and special characters
but i need the solution in php.
UPDATE(?):
There is no support for UTF-8 in sqlserver.
https://dba.stackexchange.com/questions/7346/mssql-2005-2008-utf-8-collation-charset
Hi you need to consider correct HTML content type header
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
Data may be selected correctly, but browser may be can not displayed them as you expected.
You can play with this in firefox by Menu View -> Character encoding -> until you find correct one
In order to make special characters work, a general rule is that all the components must be on the same encoding. This means that database, database connection (very often forgotten 'SET NAMES {charset}' call after connecting to database) and web page Content-type have to be all in the same character set.
If you ask data from latin1 database and have database connection also has latin1, make sure the page you display values at is also latin1.
It's recommended though to use UTF-8 instead of latin1 everywhere, so if possible I recommending changing charsets and data in your database all to UTF-8, as it's more compatible all-around and easier to handle.
Just remove the UTF8 charset and let the browser select the charset it will set to ISO-8859-1 that will work with accents in sql server
I know there are hundreds of questions about UTF-8 woes but I tried all the approaches I could find, none of them helped.
The facts:
I'm trying to read a string that contains a é from my MySQL database and display it on a PHP page. Actually, it does display as é (but the font does not recognize it as such and thus another default font is used). The troubles arose when I wanted to convert this string to a filename using PHP functions for string replacement. PHP does not recognize this as the é character at all.
Here's a quick rundown of what I'm doing:
1) The String is stored in a MySQL database. The MySQL server settings are:
MySQL connection collation utf8_unicode_ci
MySQL charset: UTF-8 Unicode (utf8)
The database itself is set to collation utf8_unicode_ci (MyISAM storage engine, not changeable due to shared server)
The actual table is set to collcation utf8_unicode_ci (InnoDB storage engine)
The é shows up correctly in phpMyAdmin. The data is inserted into the DB via a Java program but I have also tried this with manually entered data (entered in phpMyAdmin).
2) The PHP default_charset is not set (NO VALUE), I'm on a shared server and placing a manual override php.ini did not seem to work. Using ini_set("default_charset", 'utf-8'); works but has no effect on the problem I have.
3) Before I run the actual select query I query SET NAMES 'utf8'. The query itself is irrelevant but for testing I chose a simple SELECT title FROM items WHERE item_id = 1
4) The PHP file itself is encoded UTF-8. I have set the correct charset for the html with <meta http-equiv="content-type" content="text/html; charset=utf-8" />
5) To test the problem I used htmlentities on the returned string (Astérix), checking the source code it is converted to Astérix which is not correct of course. Accordingly, the string shows up as Astérix in the browser.
What possible reason could there be for this? To me it seems like I set everything that can be set to UTF-8.
http://php.net/manual/en/ref.mbstring.php - look at multibyte string functions.
I have a form in my page for users to leave a comment.
I'm currently using this charset:
meta http-equiv="Content-Type" content="text/html;charset=ISO-8859-1"
but retrieveving the comment from DB accents are not displaying correct ( Ex. è =>è ).
Which parameters should i care about for a correct handling of accents?
SOLVED
changed meta tag to charset='utf-8'
changed character-set Mysql (ALTER TABLE comments CONVERT TO CHARACTER SET utf-8)
changed connection character-set both when inserting records and retrieving ($conn->query('SET NAMES utf8'))
Now accents are displaying correct
thanks
Luca
Character sets can be complicated and pain to debug when it comes to LAMP web applications. At each of the stages that one piece of software talks to another there's scope for incorrect charset translation or incorrect storage of data.
The places you need to look out for are:
- Between the browser and the web server (which you've listed already)
- Between PHP and the MySQL server
The character you've listed look like normal a European character that will be included in the ISO-8859-1 charset.
Things to check for:
even though you're specifying the character set in a meta header have a look in your browser to be sure which character set the browser is actually using. If you've specified it the browser should use that charset to render/view the page but in cases I've seen it attempting to auto-detect the correct charset and failing. Most browsers will have an "encoding" menu (perhaps under "view") that allows you to choose the charset. Ensure that it says ISO-8859-1 (Western European).
MySQL can happily support character set conversion if required to but in most cases you want to have your tables and client connection set to use the same encoding. When configured this way MySQL won't attempt to do any encoding conversion and will just write the data you input byte for byte into the table. When read it'll come out the same way byte for byte.
You've not said if you're reading data from the database back out with the same web-app or with some other client. I'd suggest you try to read it out with the same web application and using the same meta charset header (again, check the browser is really setting it) and see what is displayed in the browser.
To debug these issues requires you to be really sure about whether the client/console you're using is doing any conversion too, the safest way is sometimes to get the data into a hex editor where you can be sure that nothing else is messing around with any translation.
If it doesn't look like it's a browser-side problem please can you include the output of the following commands against your database:
Run from a connection that your web-app makes (not from some other MySQL client):
SHOW VARIABLES LIKE 'character_set%';
SHOW VARIABLES LIKE 'collation%';
Run from any MySQL client:
SHOW CREATE TABLE myTable;
(where myTable is the table you're reading/writing data from/to)
The ISO-8859-1 character set is for Latin characters only. Try UTF-8, and make sure that the database these characters are coming from are also UTF-8 columns.