Background:
There is a table, events; this table is formatted latin1. Individual columns in this table are set to utf8. The column we will cherry pick to discuss is 'title' which is one of the utf8 columns. The website is set for utf8 both via apache and the meta tag.
As a test, if I save décor or © into the title field and perform
select title, LENGTH(title) as len, CHAR_LENGTH(title) as chlen
from events where length(title) != char_length(title)
I will get décor or ©, 12, 10 back as a result; which is expected showing that the data has indeed been properly saved into my utf8 column.
However, upon echoing the title out to a page, it's mangeld into d�cor or � which makes no sense to me since, as mentioned before, the character encoding is set to utf-8 on the page.
Not sure if this final detail makes a difference but if I edit the page and resubmit the mangled text it turns into d%uFFFDcor or %uFFFD both in the database and when displayed to the page. Further submits cause no change.
Actual Question:
Does anyone have an idea as to what I may be doing wrong? :-P
Well, there's likely one of three problems.
1. Mysql's connection is not using UTF-8
This means that it's converted to another charset (likely Latin-1) before it hits PHP. I've found the best solution is to run the following queries:
SET CHARACTER SET = "utf8";
SET character_set_database = "utf8";
SET character_set_connection = "utf8";
SET character_set_server = "utf8";
2. The page rendered is not really set to UTF-8
Set both the Content-type header and the <meta> tag content types to UTF-8. Some browsers don't respect one or the other...
header ('Content-Type: text/html; charset=UTF-8');
echo '<meta http-equiv="content-type" content="text/html; charset=utf-8" />';
As noted in the comments, that's not the problem...
3. You're doing something to the string before echoing it
Most of PHP's string functions will not do well with UTF-8. If you're calling a normal function that doesn't accept a $charset parameter, the chances are that it won't work with utf-8 strings (such as str_replace). If it does have a $charset parameter (like htmlspecialchars, make sure that you set it.
echo htmlspecialchars($content, ENT_COMPAT, 'UTF-8');
Related
I'm using mysql database and php to insert russian characters into a table.
I'm using:
$conn->set_charset('utf-8');
into my .php page to set charset to utf-8 but, when I try to print the DB charset with:
echo "set name:".$conn->character_set_name();
it shows
set name:latin1
I've set my Table to:
utf8mb4_unicode_ci
but nothing change.
Printing the passed text from the ajax request, I can see the text written correctly.
What should I do?
I guess you aren't checking the return value of mysqli::set_charset(). It must be returning false because utf-8 is not a valid encoding name in MySQL; the correct name is utf8 (no dash). Or, even better, utf8mb4.
You can get a list of supported encodings with:
SHOW COLLATION;
Why when I'm reading a cyrillic text from database it's ok , but when I put this text in select-option menu I get strange symbols
http://prikachi.com/images/813/6589813g.jpg
http://prikachi.com/images/811/6589811I.jpg
I think that I put everywhere to be utf-8 but I don't know ...
in my html I use :
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
most likely you're not using the same charset (utf-8) everywhere so your data gets messed up at some point. depending on what exactly you're doing, you'll have to change/add one or more of the following points (maybe it's the SET CHARSET/mysql_set_charset you forgot):
tell MySQL to use utf-8. to do this, add this to your my.cnf:
collation_server = utf8_unicode_ci
character_set_server = utf8
before interacting with mysql, send this two querys:
SET NAMES 'utf8';
CHARSET 'utf8';
or, alternatively, let php do this after opening the connection:
mysql_set_charset('utf8', $conn); // when using the mysql_-functions
mysqli::set_charset('utf8') // when using mysqli
set UTF-8 as the default charset for your database
CREATE DATABASE `my_db` DEFAULT CHARACTER SET 'utf8';
do the same for tables:
CREATE TABLE `my_table` (
-- ...
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
assuming the client is a browser, serve your content as utf-8 and the the correct header:
header('Content-type: text/html; charset=utf-8');
to be really sure the browser understands, add a meta-tag:
<meta http-equiv="Content-type" content="text/html; charset=utf-8" />
and, last but not least, tell the browser to submit forms using utf-8
<form accept-charset="utf-8" ...>
Hello I have a character encoding problem in my application and thought to ask for some help, because I couldn't solve the problem even thought I was given some guidance so here goes:
My Ä and Ö characters are shown in the browser as: �
I will also post all what I have done so far trying to solve the problem:
1) Database: I have tried changing the collation of my tables, here are some info what SHOW TABLE STATUS gives for one of my tables:
Name = test_groups Engine = InnoDB Version = 10 Row_format = Compact
Collation = utf8_swedish_ci
Database character variables gives:
| character_set_client = utf8 | character_set_connection =
utf8 | character_set_database = latin1 (I
Wonder is this the cause?) | character_set_filesystem
= binary | character_set_results = utf8 | character_set_server = utf8 |
character_set_system = utf8
2) In apache httpd.conf I have:
AddDefaultCharset UTF-8
3) In my Zend-application application.ini:
resources.view.encoding = "UTF-8"
4) In my firefox 14.0.1 browser
edit->preferences->content->advanced->Default character encoding =
Unicode (UTF-8)
5) In my php code meta-tag:
<meta content="text/html; charset=UTF-8" http-equiv="Content-Type">
Now here's also few other interesting things: When I look at my page and change from firefox
View->Character encoding->Western (ISO-8859-1)
, the �-characters which came from the MySQL database turn out ok to öä-characters, but the öä-characters that come from my php-code turn into ät-characters.
Another thing when I check the encoding of the data coming from my MySQL-database with
mb_detect_encoding($DATA_FROM_MYSQL_DATABASE)
it outputs UTF-8!! Then lastly if I do in the code:
utf8_encode($DATA_FROM_MYSQL_DATABASE)
and output the result the problem disappears that is �-characters -> öä-characters. So what's going on here x) All help appreciated
Are you sending SET NAMES utf8 in your PHP as the first query to MySQL ? That could be the cause if not.
SET NAMES indicates what character set the client will use to send SQL
statements to the server. Thus, SET NAMES 'cp1251' tells the server,
“future incoming messages from this client are in character set
cp1251.” It also specifies the character set that the server should
use for sending results back to the client. (For example, it indicates
what character set to use for column values if you use a SELECT
statement.)
SET NAMES utf8 in MySQL? has more detail about how and why.
Troubleshoot:
Check your database (with PHPMyAdmin, for instance). Are the characters correctly stored? Or does it seem gibberish?
If the characters in the database are ok, then the problem happens when retrieving. If they are stored incorrectly (as I would guess they are), then the problem is in the "storing".
Check your source code file and verify if they are encoded in UTF-8.
Force mysql connection to use UTF8 (mysqli::set_charset('utf8') or mysql_set_charset('utf8') or PDO: Add charset to the connection string (charset=utf8) )
I'm using Codeigniter not for so long but I've some charset problems.. I'm asking around at the CI Forum, but I want to go further, still no global solution: http://codeigniter.com/forums/viewthread/204409/
The problem was a database error 1064. I've got a solution, use iconv! Works fine, but I think it's not necessary. I'm searching a lot on the internet for charset's etc but I'm using CI now, how about charsets and CI...
So I've a lot of question about it, I hope someone can make it clear for me:
What’s the best way to set the charset global? And what to set?
In the head
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
In config/config.php
$config['charset'] = 'UTF-8';
In config/database.php
$db['default']['char_set'] = 'utf8';
$db['default']['dbcollat'] = 'utf8_general_ci';
In .htaccess, my rewrite rules and
php_value magic_quotes_gpc Off
AddDefaultCharset UTF-8
Also need send a header? Where to place? Something like?
header('Content-Type: text/html; charset=UTF-8');
In my editor (Notepad++) save files as UTF-8? Or UTF-8 (without BOM)? Or is ANSI good (this is what I’m using now)?
Use utf8_unicode_ci or utf8_general_ci for the MySQL database? And why?
How about reading RSS feeds, how to handle multiple charsets? Where I’m working on I’ve two feeds, one with UTF-8 encoding and the other with ISO-8859-1. This will be stored in the database and will be compared sometimes to see if there are new items. It fails on special chars.
I'm working with:
- CI 2.0.3
- PHP 5.2.17
- MySQL 5.1.58
More information added:
Model:
function update_favorite($data)
{
$this->db->where('id', $data['id']);
$this->db->where('user_id', $data['user_id']);
$this->db->update('favorites', $data);
return;
}
Controller:
$this->favorites_model->update_favorite(array(
'id' => $id,
'rss_last' => $rss_last,
'user_id' => $this->session->userdata('user_id')
));
When $rss_last is a “normal” value like: “test” (without quotes) it works fine.
When it’s a value with more length like (in Dutch): F-Secure vindt malware met certificaat van Maleisische overheid
I get this error:
Error Number: 1064
You have an error in your SQL syntax; check the manual that
corresponds to your MySQL server version for the right syntax to use
near ‘vindt malware met certificaat van Maleisische overheid,
user_id = ‘1’ WHERE `i’ at line 1
UPDATE favorites SET id = ‘15’, rss_last = F-Secure vindt
malware met certificaat van Maleisische overheid, user_id = ‘1’
WHERE id = ‘15’ AND user_id = ‘1’
Filename:
/home/.../domains/....nl/public_html/new/models/favorites_model.php
Line Number: 35
Someone at the CI forum told me to use this:
'rss_last' => iconv("UTF-8", "UTF-8//TRANSLIT", $rss_last)
This works fine, but I think this is not necessary..
The value $rss_last came out a RSS feed, as told before, sometimes a UTF-8 and other times a ISO-8859-1 encoding:
$rss = file_get_contents('http://www.website.com/rss.xml');
$feed = new SimpleXmlElement($rss);
$rss_last = $feed->channel->item[0]->title;
It looks like this last part is the problem, when $rss_last is set to the value it works fine:
$rss_last = 'F-Secure vindt malware met certificaat van Maleisische overheid';
When the value came out the RSS it give problems...
Some more questions..
Just found this: Detect encoding and make everything UTF-8
Best solution? But.. is iconv not more simple, do something like this:
$encoding = some_function_to_get_encoding_from_feed($feed);
$rss_last = iconv($encoding, "UTF-8//TRANSLIT", $feed->channel->item[0]->title);
But what to use for "some_function_to_get_encoding_from_feed"? mb_detect_encoding?
And mb_convert_encoding vs iconv?
1) There is no global solution.
2)
AddDefaultCharset UTF-8
It's needed for Apache response to client with right encoding. Make it.
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
not necessarily, but recommended by W3C.
$config['charset'] = 'UTF-8';
it's desirable
$db['default']['char_set'] = 'utf8';
$db['default']['dbcollat'] = 'utf8_general_ci';
Encoding for CI connection to database. If encoding of your database is UTF-8 - make it mandatory.
header('Content-Type: text/html; charset=UTF-8');
Do not do this unless necessary. Charset already indicated in HTML code and .htaccess.
Use utf8_unicode_ci or utf8_general_ci for the MySQL database? And why?
For their own language (Russian), I use utf8_general_ci.
In my editor (Notepad++) save files as UTF-8?
Absolutely! All code that Apache will give as UTF8 should be in UTF8.
How about reading RSS feeds, how to handle multiple charsets?
If you have each RSS in each table - you can specify charset for each table and set right encoding with each sql query.
Yes, cyrillic symbols, for example, will fails on non-UTF8.
UTF-8 (without BOM) should give you the best results based on your configuration and there's no need to send separate headers since the encoding is already selected in the head part. Utf8_general_ci should do fine for the MySQL database.
Perhaps the entries in the database are not valid?
Relevent code:
$status = $db->run(
"INSERT INTO user_wall (accountID, fromID, text, datetime) VALUES (:toID, :fromID, :text, '" . time() . "')",
array(":toID" => $toID, ":fromID" => %accountID, ":text" => $text)
);
I am taking input text from javascript, throwing it in an AJAX call to handle it, which calls a function which includes these lines of code.
The text string in question is: "Türkçe Türkçe Türkçe!"
Upon investigating the database, the following value is saved "Türkçe Türkçe Türkçe!", which is double utf8_encode'd.
When viewing the text by SELECTing it from the database, I get "Türkçe Türkçe Türkçe!", which is how they should be saved in the database in the first place.
As far as I know, I am not encoding the data as it is being prepared by PDO...
Encoding is a b*tch. You need to make sure it is as you expect it to be in several places:
The HTML page with the Javascript. Set it to utf-8 with a meta tag like:
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
The connection to your database. Execute the query set names 'utf8' after connecting to the database (and before any other queries).
Your database field. In MySQL it's called a collation set it to utf8_general_ci (ci stands for case-insensitive).
If you have these 3 your data should always be, and stay, utf-8 (unless you're doing encoding yourself).
For good measure, make sure your source code files are utf-8 as well. Windows typically defaults to iso.