Codeigniter and charsets - php

I'm using Codeigniter not for so long but I've some charset problems.. I'm asking around at the CI Forum, but I want to go further, still no global solution: http://codeigniter.com/forums/viewthread/204409/
The problem was a database error 1064. I've got a solution, use iconv! Works fine, but I think it's not necessary. I'm searching a lot on the internet for charset's etc but I'm using CI now, how about charsets and CI...
So I've a lot of question about it, I hope someone can make it clear for me:
What’s the best way to set the charset global? And what to set?
In the head
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
In config/config.php
$config['charset'] = 'UTF-8';
In config/database.php
$db['default']['char_set'] = 'utf8';
$db['default']['dbcollat'] = 'utf8_general_ci';
In .htaccess, my rewrite rules and
php_value magic_quotes_gpc Off
AddDefaultCharset UTF-8
Also need send a header? Where to place? Something like?
header('Content-Type: text/html; charset=UTF-8');
In my editor (Notepad++) save files as UTF-8? Or UTF-8 (without BOM)? Or is ANSI good (this is what I’m using now)?
Use utf8_unicode_ci or utf8_general_ci for the MySQL database? And why?
How about reading RSS feeds, how to handle multiple charsets? Where I’m working on I’ve two feeds, one with UTF-8 encoding and the other with ISO-8859-1. This will be stored in the database and will be compared sometimes to see if there are new items. It fails on special chars.
I'm working with:
- CI 2.0.3
- PHP 5.2.17
- MySQL 5.1.58
More information added:
Model:
function update_favorite($data)
{
$this->db->where('id', $data['id']);
$this->db->where('user_id', $data['user_id']);
$this->db->update('favorites', $data);
return;
}
Controller:
$this->favorites_model->update_favorite(array(
'id' => $id,
'rss_last' => $rss_last,
'user_id' => $this->session->userdata('user_id')
));
When $rss_last is a “normal” value like: “test” (without quotes) it works fine.
When it’s a value with more length like (in Dutch): F-Secure vindt malware met certificaat van Maleisische overheid
I get this error:
Error Number: 1064
You have an error in your SQL syntax; check the manual that
corresponds to your MySQL server version for the right syntax to use
near ‘vindt malware met certificaat van Maleisische overheid,
user_id = ‘1’ WHERE `i’ at line 1
UPDATE favorites SET id = ‘15’, rss_last = F-Secure vindt
malware met certificaat van Maleisische overheid, user_id = ‘1’
WHERE id = ‘15’ AND user_id = ‘1’
Filename:
/home/.../domains/....nl/public_html/new/models/favorites_model.php
Line Number: 35
Someone at the CI forum told me to use this:
'rss_last' => iconv("UTF-8", "UTF-8//TRANSLIT", $rss_last)
This works fine, but I think this is not necessary..
The value $rss_last came out a RSS feed, as told before, sometimes a UTF-8 and other times a ISO-8859-1 encoding:
$rss = file_get_contents('http://www.website.com/rss.xml');
$feed = new SimpleXmlElement($rss);
$rss_last = $feed->channel->item[0]->title;
It looks like this last part is the problem, when $rss_last is set to the value it works fine:
$rss_last = 'F-Secure vindt malware met certificaat van Maleisische overheid';
When the value came out the RSS it give problems...
Some more questions..
Just found this: Detect encoding and make everything UTF-8
Best solution? But.. is iconv not more simple, do something like this:
$encoding = some_function_to_get_encoding_from_feed($feed);
$rss_last = iconv($encoding, "UTF-8//TRANSLIT", $feed->channel->item[0]->title);
But what to use for "some_function_to_get_encoding_from_feed"? mb_detect_encoding?
And mb_convert_encoding vs iconv?

1) There is no global solution.
2)
AddDefaultCharset UTF-8
It's needed for Apache response to client with right encoding. Make it.
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
not necessarily, but recommended by W3C.
$config['charset'] = 'UTF-8';
it's desirable
$db['default']['char_set'] = 'utf8';
$db['default']['dbcollat'] = 'utf8_general_ci';
Encoding for CI connection to database. If encoding of your database is UTF-8 - make it mandatory.
header('Content-Type: text/html; charset=UTF-8');
Do not do this unless necessary. Charset already indicated in HTML code and .htaccess.
Use utf8_unicode_ci or utf8_general_ci for the MySQL database? And why?
For their own language (Russian), I use utf8_general_ci.
In my editor (Notepad++) save files as UTF-8?
Absolutely! All code that Apache will give as UTF8 should be in UTF8.
How about reading RSS feeds, how to handle multiple charsets?
If you have each RSS in each table - you can specify charset for each table and set right encoding with each sql query.
Yes, cyrillic symbols, for example, will fails on non-UTF8.

UTF-8 (without BOM) should give you the best results based on your configuration and there's no need to send separate headers since the encoding is already selected in the head part. Utf8_general_ci should do fine for the MySQL database.
Perhaps the entries in the database are not valid?

Related

cyrillized words read from database with php

Why when I'm reading a cyrillic text from database it's ok , but when I put this text in select-option menu I get strange symbols
http://prikachi.com/images/813/6589813g.jpg
http://prikachi.com/images/811/6589811I.jpg
I think that I put everywhere to be utf-8 but I don't know ...
in my html I use :
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
most likely you're not using the same charset (utf-8) everywhere so your data gets messed up at some point. depending on what exactly you're doing, you'll have to change/add one or more of the following points (maybe it's the SET CHARSET/mysql_set_charset you forgot):
tell MySQL to use utf-8. to do this, add this to your my.cnf:
collation_server = utf8_unicode_ci
character_set_server = utf8
before interacting with mysql, send this two querys:
SET NAMES 'utf8';
CHARSET 'utf8';
or, alternatively, let php do this after opening the connection:
mysql_set_charset('utf8', $conn); // when using the mysql_-functions
mysqli::set_charset('utf8') // when using mysqli
set UTF-8 as the default charset for your database
CREATE DATABASE `my_db` DEFAULT CHARACTER SET 'utf8';
do the same for tables:
CREATE TABLE `my_table` (
-- ...
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
assuming the client is a browser, serve your content as utf-8 and the the correct header:
header('Content-type: text/html; charset=utf-8');
to be really sure the browser understands, add a meta-tag:
<meta http-equiv="Content-type" content="text/html; charset=utf-8" />
and, last but not least, tell the browser to submit forms using utf-8
<form accept-charset="utf-8" ...>

MySQL Character encoding (öä) in PHP application

Hello I have a character encoding problem in my application and thought to ask for some help, because I couldn't solve the problem even thought I was given some guidance so here goes:
My Ä and Ö characters are shown in the browser as: �
I will also post all what I have done so far trying to solve the problem:
1) Database: I have tried changing the collation of my tables, here are some info what SHOW TABLE STATUS gives for one of my tables:
Name = test_groups Engine = InnoDB Version = 10 Row_format = Compact
Collation = utf8_swedish_ci
Database character variables gives:
| character_set_client = utf8 | character_set_connection =
utf8 | character_set_database = latin1 (I
Wonder is this the cause?) | character_set_filesystem
= binary | character_set_results = utf8 | character_set_server = utf8 |
character_set_system = utf8
2) In apache httpd.conf I have:
AddDefaultCharset UTF-8
3) In my Zend-application application.ini:
resources.view.encoding = "UTF-8"
4) In my firefox 14.0.1 browser
edit->preferences->content->advanced->Default character encoding =
Unicode (UTF-8)
5) In my php code meta-tag:
<meta content="text/html; charset=UTF-8" http-equiv="Content-Type">
Now here's also few other interesting things: When I look at my page and change from firefox
View->Character encoding->Western (ISO-8859-1)
, the �-characters which came from the MySQL database turn out ok to öä-characters, but the öä-characters that come from my php-code turn into ät-characters.
Another thing when I check the encoding of the data coming from my MySQL-database with
mb_detect_encoding($DATA_FROM_MYSQL_DATABASE)
it outputs UTF-8!! Then lastly if I do in the code:
utf8_encode($DATA_FROM_MYSQL_DATABASE)
and output the result the problem disappears that is �-characters -> öä-characters. So what's going on here x) All help appreciated
Are you sending SET NAMES utf8 in your PHP as the first query to MySQL ? That could be the cause if not.
SET NAMES indicates what character set the client will use to send SQL
statements to the server. Thus, SET NAMES 'cp1251' tells the server,
“future incoming messages from this client are in character set
cp1251.” It also specifies the character set that the server should
use for sending results back to the client. (For example, it indicates
what character set to use for column values if you use a SELECT
statement.)
SET NAMES utf8 in MySQL? has more detail about how and why.
Troubleshoot:
Check your database (with PHPMyAdmin, for instance). Are the characters correctly stored? Or does it seem gibberish?
If the characters in the database are ok, then the problem happens when retrieving. If they are stored incorrectly (as I would guess they are), then the problem is in the "storing".
Check your source code file and verify if they are encoded in UTF-8.
Force mysql connection to use UTF8 (mysqli::set_charset('utf8') or mysql_set_charset('utf8') or PDO: Add charset to the connection string (charset=utf8) )

Oracle connection to retrieve or insert Arabic values from database

I have this code in drupal 6 to retrieve arabic values from Oracle databse:
<?php
session_start();
$conn=oci_connect('localhost','pass','IP....');
$stid=oci_parse($conn,"select arabic_name from arabic_names_table");
oci_execute($stid);
if($row-oci_fetch_array($stid,OCI_ASSOC+OCI_RETURNS_NULLS))
{
$name_ar=$row['arabic_name'];
}
?>
When values are retrieved from the DB or inserted to the DB they appears like this ???
Please note:
My Oracle database reads normal Arabic characters. From PL/SQL I can insert arabic values
I have installed the mbstring
I have the utf-8 encoding enabled.
How can I solve this problem?
From the oracle database, when you try to fetch data, normally you will get the character encoding will be the encoding type of the client installed in the system (the machine that you installed the php). This encoding will be the charset of the windows registry for the oracle client. (see HKEY_LOCAL_MACHINE\SOFTWARE\ORACLE\KEY_OraClient11g_home1), and the key is NLS_LANG. If you search the value of the above key, you will get something like ARABIC_UNITED ARAB EMIRATES.AR8MSWIN1256. Please note that the encoding type is AR8MSWIN1256. In the character map array this is mapped to windows-1256 ( windows-1256 => AR8MSWIN1256 ).
See this link http://websvn.projects.ez.no/wsvn/ezoracle/?op=comp&compare[]=%2Fstable#385&compare[]=%2Fstable#386.
That is, after you fetch the data from the database the char encoding will be windows-1256. Now if your web page is using utf-8 charset, you need to convert the string to utf-8. For this you can use iconv().
$win1256 = iconv('windows-1256', 'utf-8', $my_string); //$my_string -> windows-1256
echo $win1256; // Results the utf-8 format .
If you are still facing problem you check the charset in the page, it must be utf-8.
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
I think this will solve your problem.

Special characters to mysql from php

I know there have been a lot of almost the same questions, but I still didn't find the answer to my problem.
I want to place "les Îles Açores" into the db. But I get:
les Îles Açores
I tried usin:
SET Names 'ut8)
$mysqli->set_charset("utf8");
mysql_real_escape_string()
htmlentities (Here I got htmlentities, but I want to know if there's another way)
Code:
$name_fr = $_POST["name_fr"]; $name_nl = $_POST["name_nl"];
$arr_kollommen = array("NAME_FR","NAME_NL");
$arr_waardes = array($naam_nl,$naam_fr);
$obj_db->insert("landen",$arr_kollommen,$arr_waardes);
Does someone has an idea how to solve my litle problem?
Thank you very much!
Make sure the table uses the correct CHARSET, for example:
CREATE TABLE myTable (
one VARCHAR(255),
two VARCHAR(255)
) DEFAULT CHARSET=utf8;
Make sure you actually write in UTF8 (meaning your IDE / editor you write your code must have encoding set to UTF8).
Is the record corrupted both in the DB and on your page after you fetch it or only in DB?
$name_fr = $_POST["name_fr"];
$name_nl = $_POST["name_nl"];
$arr_kollommen = array("NAME_FR","NAME_NL");
$arr_waardes = array($naam_nl,$naam_fr);
$obj_db->insert("landen",$arr_kollommen,$arr_waardes);
Try using instead of encode to utf_8 decode.
like this:
$name_fr = $_POST["name_fr"];
$name_nl = $_POST["name_nl"];
$naam_fr = utf8_decode($naam_fr);
$naam_nl = utf8_decode($naam_nl);
$arr_kollommen = array("NAME_FR","NAME_NL");
$arr_waardes = array($naam_nl,$naam_fr);
$obj_db->insert("landen",$arr_kollommen,$arr_waardes);
2 possible reasons i can see:
1) Your database doesn't feature UTF-8 fields
2) When you read your data from the server, you are not setting the connection as utf-8. If you have to set it utf-8 when writting you also have to set it utf-8 when reading.
Check using PHPMyAdmin if the data is wrecked... If it is, then it means that your SET names'utf-8' is not working...
Do you pass the "UTF-8" parameter into your htmlentities, and html_entity_decode this way ?
html_entity_decode($text,ENT_QUOTES , "UTF-8");

php 5.2 + mysql 5.1 character encoding issue

Background:
There is a table, events; this table is formatted latin1. Individual columns in this table are set to utf8. The column we will cherry pick to discuss is 'title' which is one of the utf8 columns. The website is set for utf8 both via apache and the meta tag.
As a test, if I save décor or © into the title field and perform
select title, LENGTH(title) as len, CHAR_LENGTH(title) as chlen
from events where length(title) != char_length(title)
I will get décor or ©, 12, 10 back as a result; which is expected showing that the data has indeed been properly saved into my utf8 column.
However, upon echoing the title out to a page, it's mangeld into d�cor or � which makes no sense to me since, as mentioned before, the character encoding is set to utf-8 on the page.
Not sure if this final detail makes a difference but if I edit the page and resubmit the mangled text it turns into d%uFFFDcor or %uFFFD both in the database and when displayed to the page. Further submits cause no change.
Actual Question:
Does anyone have an idea as to what I may be doing wrong? :-P
Well, there's likely one of three problems.
1. Mysql's connection is not using UTF-8
This means that it's converted to another charset (likely Latin-1) before it hits PHP. I've found the best solution is to run the following queries:
SET CHARACTER SET = "utf8";
SET character_set_database = "utf8";
SET character_set_connection = "utf8";
SET character_set_server = "utf8";
2. The page rendered is not really set to UTF-8
Set both the Content-type header and the <meta> tag content types to UTF-8. Some browsers don't respect one or the other...
header ('Content-Type: text/html; charset=UTF-8');
echo '<meta http-equiv="content-type" content="text/html; charset=utf-8" />';
As noted in the comments, that's not the problem...
3. You're doing something to the string before echoing it
Most of PHP's string functions will not do well with UTF-8. If you're calling a normal function that doesn't accept a $charset parameter, the chances are that it won't work with utf-8 strings (such as str_replace). If it does have a $charset parameter (like htmlspecialchars, make sure that you set it.
echo htmlspecialchars($content, ENT_COMPAT, 'UTF-8');

Categories