UTF-8 problems with characters from MySQL database (e.g. é as é) - php

I know there are hundreds of questions about UTF-8 woes but I tried all the approaches I could find, none of them helped.
The facts:
I'm trying to read a string that contains a é from my MySQL database and display it on a PHP page. Actually, it does display as é (but the font does not recognize it as such and thus another default font is used). The troubles arose when I wanted to convert this string to a filename using PHP functions for string replacement. PHP does not recognize this as the é character at all.
Here's a quick rundown of what I'm doing:
1) The String is stored in a MySQL database. The MySQL server settings are:
MySQL connection collation utf8_unicode_ci
MySQL charset: UTF-8 Unicode (utf8)
The database itself is set to collation utf8_unicode_ci (MyISAM storage engine, not changeable due to shared server)
The actual table is set to collcation utf8_unicode_ci (InnoDB storage engine)
The é shows up correctly in phpMyAdmin. The data is inserted into the DB via a Java program but I have also tried this with manually entered data (entered in phpMyAdmin).
2) The PHP default_charset is not set (NO VALUE), I'm on a shared server and placing a manual override php.ini did not seem to work. Using ini_set("default_charset", 'utf-8'); works but has no effect on the problem I have.
3) Before I run the actual select query I query SET NAMES 'utf8'. The query itself is irrelevant but for testing I chose a simple SELECT title FROM items WHERE item_id = 1
4) The PHP file itself is encoded UTF-8. I have set the correct charset for the html with <meta http-equiv="content-type" content="text/html; charset=utf-8" />
5) To test the problem I used htmlentities on the returned string (Astérix), checking the source code it is converted to Astérix which is not correct of course. Accordingly, the string shows up as Astérix in the browser.
What possible reason could there be for this? To me it seems like I set everything that can be set to UTF-8.

http://php.net/manual/en/ref.mbstring.php - look at multibyte string functions.

Related

PHP <==> MySQL; storing Cyrillic / Scandinavian characters in the database

There are so many threads dedicated to this topic, that I feel silly having to ask this.
But, I'm at a total loss as to what the problem could be.
I am trying to insert special characters (cyrillic, scandinavian, etc) into a MySQL database, via PHP (html) form.
Characters like : Ä,Ö,Å, as well as russian alphabets, etc.
Based on previous threads in this forum, I have tried all the following (inserted right after the MySQL database-connection string) :
mysqli->set_charset("utf8");
This didn't work, so I tried the following :
mysqli_query("set names 'utf8'");
mysqli_query("set charset 'utf8'");
These are not recommended by PHP. But, I tried them anyway, but still no luck.
(All my databases, tables, and columns are collated as : UTF8_general_ci)
In addition, all my html forms have the following :
<meta charset="utf-8">
So, I'm at a complete loss as to what I'm doing wrong. Once the data is sent to the database, it shows up (in the database itself) as rubbish characters (question marks, and other hieroglyphics).
However, the funny thing is :
(a) When I view the data on my website, it displays correctly;
(b) When the data is sent within the body of an email, it also displays correctly
So..........why is it not displaying correctly within the database itself ??
When dealing with specific charset (like UTF-8), it's important that the entire line of code is set to the same charset. Below are a few pointers how to follow this.
ALL attributes must be set to ut8 (collation is NOT the same as charset in the database)
You should save the document itself as UTF-8 (If you're using Notepad++, it's Format -> Convert to UFT-8 (or UTF-8 w/o BOM), there's a difference - both or either may work for you)
The header in both PHP and HTML should be set to UTF-8:
HTML: <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
PHP: header('Content-Type: text/html; charset=utf-8');
Upon connecting to the databse, set the charset ti UTF-8, like this:
$connection->set_charset("utf8"); (directly after connecting)
Also make sure your database and tables are set to UTF-8, you can do that by this query (in the database, need only be done once):
ALTER DATABASE databasename CHARACTER SET utf8 COLLATE utf8_unicode_ci;
ALTER TABLE tablename CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
Remember that EVERYTHING needs to be set to UFT-8 charcode. If something can be set to UFT-8 (or another charset, check the PHP-docs (php.net)), it should be set to the same charset as everything else.
(a) When I view the data on my website, it displays correctly;
(b) When the data is sent within the body of an email, it also displays correctly
This means data is correctly stored in the db, when you get the output is the same like the input, logically correct?
The other question is: How are you looking into the database, which kind of client are you using?
PHPMyAdmin, SomeDesktop Client.. The problem will be there.. because the data is stored right.. seems so ;)

MySQL outputs Western encoding in UTF-8 PHP file

I have the following problem: on a very simple php-mysqli query:
if ( $result = $mysqli->query( $sqlquery ) )
{
$res = $result->fetch_all();
$result->close();
}
I get strings wrongly encoded as Western encoded string, although the database, the table and the column is in utf8_general_ci collation. The php script itself is utf-8 encoded and the mysql-less parts of the script get the correct encodings. So say echo "ő" works perfectly, but echo $res[0] from the previous example outputs the EF BF BD character when the file viewed in the correct UTF-8 encoding. If I manually switch the browser's encoding to Western, the mysqli sourced strings get good decoding, except for the non-western characters being replaced with "?'.
What makes it even stranger is that on my development environment this isn't happening, while on my webserver it is. The developer environment is a LAMP stack (The Uniform Server), while the webserver uses nginx.
In this case, I entered the data in the database using phpMyAdmin, and inside phpmyadmin it displays perfectly. phpMyAdmin's collation is utf-8 too. I believe that the problem must be somewhere around here, as on the same webserver, for an other site where I enter data through php (using POST) the same problem doesn't happen. On that case, the data is visible correctly both while entering and while viewing it (I mean in the php generated webpages), but the special characters are not correct in phpMyAdmin.
Can you help me start where to debug? Is it connected to php or mysql or nginx or phpMyAdmin?
Use mysqli_set_charset to change the client encoding to UTF-8 just after you connect:
$mysqli->set_charset("utf8");
The client encoding is what MySql expects your input to be in (e.g. when you insert user-supplied text to a search query) and what it gives you the results in (so it has to match your output encoding in order for echo to display things correctly).
You need to have it match the encoding of your web page to account for the two scenarios above and the encoding of the PHP source file (so that the hardcoded parts of your queries are interpreted correctly).
Update: How to convert data inserted using latin-1 to utf-8
Regarding data that have already been inserted using the wrong connection encoding there is a convenient solution to fix the problem. For each column that contains this kind of data you need to do:
ALTER TABLE table_name MODIFY column_name existing_column_type CHARACTER SET latin1;
ALTER TABLE table_name MODIFY column_name BLOB;
ALTER TABLE table_name MODIFY column_name existing_column_type CHARACTER SET utf8;
The placeholders table_name, column_name and existing_column_type should be replaced with the correct values from your database each time.
What this does is
Tell MySql that it needs to store data in that column in latin1. This character set contains only a small subset of utf8 so in general this conversion involves data loss, but in this specific scenario the data was already interpreted as latin1 on input so there will be no side effects. However, MySql will internally convert the byte representation of your data to match what was originally sent from PHP.
Convert the column to a binary type (BLOB) that has no associated encoding information. At this point the column will contain raw bytes that are a proper utf8 character string.
Convert the column to its previous character type, telling MySql that the raw bytes should be considered to be in utf8 encoding.
WARNING: You can only use this indiscriminate approach if the column in question contains only incorrectly inserted data. Any data that has been correctly inserted will be truncated at the first occurrence of any non-ASCII character!
Therefore it's a good idea to do it right now, before the PHP side fix goes into effect.
Use mysqli::set_charset function.
$mysqli->set_charset('utf8'); //returns false if the encoding was not valid... won't happen
http://php.net/manual/en/mysqli.set-charset.php
I haven't used mysqli for some time, but if things are the same, connections by default use the latin swedish encoding (ISO 8859 1).
I will consider your page is already using utf8 encoding by having:
<meta http-equiv="Content-Type" content="text/html; charset=utf-8"/>
Inside the <head> tag.
If you have string already on latin swedish encoding, you can use mk_convert_encoding:
http://php.net/manual/en/function.mb-convert-encoding.php
$fixedStr = mb_convert_encoding($wrongStr, 'UTF-8', 'ISO-8859-1');
iconv does something very similar: Truth be told, I don't know the difference, but here's the link to the function reference:
http://php.net/manual/en/function.iconv.php
I just realized that you might have some strings in utf8 and others in latin swedish. You can use mb_detect_encoding for that: http://php.net/manual/en/function.mb-detect-encoding.php
You can also dump the database and use iconv (cmd line) if you have it installed:
iconv -f latain -t utf-8 < currentdb.sql > fixeddb.sql

accent characters not showing properly in browser

In districts table
I have a row as
district_id district_name country_id
15 Šahty 16
While selecting from php and displaying in browser,it shows like this :�ahty
I am using mssql 2005 with collation SQL_Latin1_General_CP1_CI_AS.
The problem is something like this
removing accent and special characters
but i need the solution in php.
UPDATE(?):
There is no support for UTF-8 in sqlserver.
https://dba.stackexchange.com/questions/7346/mssql-2005-2008-utf-8-collation-charset
Hi you need to consider correct HTML content type header
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
Data may be selected correctly, but browser may be can not displayed them as you expected.
You can play with this in firefox by Menu View -> Character encoding -> until you find correct one
In order to make special characters work, a general rule is that all the components must be on the same encoding. This means that database, database connection (very often forgotten 'SET NAMES {charset}' call after connecting to database) and web page Content-type have to be all in the same character set.
If you ask data from latin1 database and have database connection also has latin1, make sure the page you display values at is also latin1.
It's recommended though to use UTF-8 instead of latin1 everywhere, so if possible I recommending changing charsets and data in your database all to UTF-8, as it's more compatible all-around and easier to handle.
Just remove the UTF8 charset and let the browser select the charset it will set to ISO-8859-1 that will work with accents in sql server

PHP charset accents issue

I have a form in my page for users to leave a comment.
I'm currently using this charset:
meta http-equiv="Content-Type" content="text/html;charset=ISO-8859-1"
but retrieveving the comment from DB accents are not displaying correct ( Ex. è =>è ).
Which parameters should i care about for a correct handling of accents?
SOLVED
changed meta tag to charset='utf-8'
changed character-set Mysql (ALTER TABLE comments CONVERT TO CHARACTER SET utf-8)
changed connection character-set both when inserting records and retrieving ($conn->query('SET NAMES utf8'))
Now accents are displaying correct
thanks
Luca
Character sets can be complicated and pain to debug when it comes to LAMP web applications. At each of the stages that one piece of software talks to another there's scope for incorrect charset translation or incorrect storage of data.
The places you need to look out for are:
- Between the browser and the web server (which you've listed already)
- Between PHP and the MySQL server
The character you've listed look like normal a European character that will be included in the ISO-8859-1 charset.
Things to check for:
even though you're specifying the character set in a meta header have a look in your browser to be sure which character set the browser is actually using. If you've specified it the browser should use that charset to render/view the page but in cases I've seen it attempting to auto-detect the correct charset and failing. Most browsers will have an "encoding" menu (perhaps under "view") that allows you to choose the charset. Ensure that it says ISO-8859-1 (Western European).
MySQL can happily support character set conversion if required to but in most cases you want to have your tables and client connection set to use the same encoding. When configured this way MySQL won't attempt to do any encoding conversion and will just write the data you input byte for byte into the table. When read it'll come out the same way byte for byte.
You've not said if you're reading data from the database back out with the same web-app or with some other client. I'd suggest you try to read it out with the same web application and using the same meta charset header (again, check the browser is really setting it) and see what is displayed in the browser.
To debug these issues requires you to be really sure about whether the client/console you're using is doing any conversion too, the safest way is sometimes to get the data into a hex editor where you can be sure that nothing else is messing around with any translation.
If it doesn't look like it's a browser-side problem please can you include the output of the following commands against your database:
Run from a connection that your web-app makes (not from some other MySQL client):
SHOW VARIABLES LIKE 'character_set%';
SHOW VARIABLES LIKE 'collation%';
Run from any MySQL client:
SHOW CREATE TABLE myTable;
(where myTable is the table you're reading/writing data from/to)
The ISO-8859-1 character set is for Latin characters only. Try UTF-8, and make sure that the database these characters are coming from are also UTF-8 columns.

META value charset=UTF-8 prevents UTF-8 characters showing

I've made a test program that is basically just a textarea that I can enter characters into and when I click submit the characters are written to a MySQL test table (using PHP).
The test table is collation is UTF-8.
The script works fine if I want to write a é or ú to the database it writes fine. But then if I add the following meta statement to the <head> area of my page:
<meta http-equiv="content-type" content="text/html; charset=UTF-8">
...the characters start becoming scrambled.
My theory is that the server is imposing some encoding that works well, but when I add the UTF-8 directive it overrides this server encoding and that this UTF-* encoding doesn't include the characters such as é and ú.
But I thought that UTF-8 encoded all (bar Klingon etc) characters.
Basically my program works but I want to know why when I add the directive it doesn't.
I think I'm missing something.
Any help/teaching most appreciated.
Thanks in advance.
Firstly, PHP generally doesn't handle the Unicode character set or UTF-8 character encoding. With the exception of (careful use of) mb_... functions, it just treats strings as binary data.
Secondly, you need to tell the MySQL client library what character set / encoding you're working with. The 'SET NAMES' SQL command does the job, and different MySQL clients (mysql, mysqli etc..) provide access to it in different ways, e.g. http://www.php.net/manual/en/mysqli.set-charset.php
Your browser, and MySQL client, are probably both defaulting to latin1, and coincidentally matching. MySQL then knows to convert the latin1 binary data into UTF-8. When you set the browser charset/encoding to UTF-8, the MySQL client is interpreting that UTF-8 data as latin1, and incorrectly transcoding it.
So the solution is to set the MySQL client to a charset matching the input to PHP from the browser.
Note also that table collation isn't the same as table character set - collation refers to how strings are compared and sorted. Confusing stuff, hope this helps!

Categories