PDO and special characters [duplicate] - php

This question already has answers here:
UTF-8 all the way through
(13 answers)
Closed 9 years ago.
I have my page encoding set to utf8 and even in the meta tag as utf8.
However, when i'm taking a value from a database it's putting a diamond with a question mark instead - im assuming doesnt know the character.
The character is a é. If i do a echo é; it displays as normal on the page. Also if i write it manually in html. However, when i grab the same value from a database call using PDO i get a �
I'm assuming its a PDO setting. I've tried:
$db->exec("SET NAMES 'utf8';");
but this doesnt resolve it.
Any suggestions?

Many things can go wrong on the way. Usually you need to have your source file encoded using utf-8, and opening the database connection using utf-8 and defining the database tables as utf-8.
A great article from #deceze that helped me clarify things is http://kunststube.net/frontback/.
The most obvious things you can try in your case are:
Save you source file with utf8 encoding. This option exists in editors like Notepad++ or Crimpson Editor.
create the PDO connection with utf8 option:
$connection = new PDO('mysql:host='.$this->host.';dbname='.$this->db_name.';charset=utf8', $this->user, $this->pass,array(PDO::ATTR_EMULATE_PREPARES => false, PDO::ATTR_ERRMODE => PDO::ERRMODE_EXCEPTION));
make sure your table is utf-8 encoded and your form has the option :
< form action="action.php" accept-charset="utf-8">
Update: maybe utf8_encode fixed your problem, but there is a wrong conversion somewhere from the PHP to the database and back. or an wrong file encoding. You should fix the root of the problem, and utf8_encode will not be needed anymore.

Related

How to get charset right in PHP/MySQL? [duplicate]

This question already has answers here:
UTF-8 all the way through
(13 answers)
Closed 7 years ago.
I've done a very few web PHP/MySQL based web projects, and sooner or later, I always have charset issues.
I live in Spain, and we have some special characters over here: ç ñ á é í ó and ú
There are so many variables, that the charset always gets messed up:
MySQL database collation
PHP/HTML headers
Web browser codification settings
PHP settings
Apache settings
What I would like to have is a basic guideline on how to setup everything, so that I don't have issues with these Spanish characters.
There are three types of ways I populate my HTML output:
I query the MySQL database with PHP, and echo the output.
I write some words directly with HTML, for example
<p>Qué rábanos pasaría mañana</p>
I read a labels.ini file with parse_ini_file($file); The label file looks something like:
SORTING_ENTITY = Línea de negocio
SORTING_PLURAL = Líneas de negocio
MAIN_ENTITY = Instalación
MAIN_PLURAL = Instalaciones
So when I view the website, sometimes the texts generated from MySQL are messed up, other times the direct HTML is messed up, and other times everything is okay, but the content coming from the .ini file is messed up.
Also sometimes, I use web forms, so that the users input data that is saved in MySQL. The users write for example "Pájaro" in the web form, and some incorrect chars are stored in the database like "P}jaros" or something like that.
I would like to have some guidelines, so that everything is setup in a way that whatever I write in direct HTML or .ini file is shown in the website, and whatever the users writes in the web form is stored correctly, and also displayed in the same way when later reading this data and echoing with PHP.
I don't want to be using stuff like:
á
ñ
echo utf8_encode($dat);
In HTML head element always include
<meta charset="UTF-8"/>
To output UTF-8 charecters in PHP
header("Content-type: 'text/html'; Charset='UTF-8');
Also, remember to put headers on top of every PHP script
For PHP CRUD(Create, Read, Update, Delete) use this code
$conObj = new mysqli("", "", "", "");
$conObj->query("SET character_set_results = 'utf8', character_set_client = 'utf8', character_set_connection = 'utf8', character_set_database = 'utf8', character_set_server = 'utf8'");
Try to create a class with this "query" function (call it encoder or anything you like), so whenever you make an object of this class, this function will be automatically executed and you will not have to hard code it and write under every connection instance/object.

Getting special characters out of a MySQL database with PHP [duplicate]

This question already has answers here:
UTF-8 all the way through
(13 answers)
Closed 9 years ago.
I have a table that includes special characters such as ™.
This character can be entered and viewed using phpMyAdmin and other software, but when I use a SELECT statement in PHP to output to a browser, I get the diamond with question mark in it.
The table type is MyISAM. The encoding is UTF-8 Unicode. The collation is utf8_unicode_ci.
The first line of the html head is
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
I tried using the htmlentities() function on the string before outputting it. No luck.
I also tried adding this to php before any output (no difference):
header('Content-type: text/html; charset=utf-8');
Lastly I tried adding this right below the initial mysql connection (this resulted in additional odd characters being displayed):
$db_charset = mysql_set_charset('utf8',$db);
What have I missed?
Below code works for me.
$sql = "SELECT * FROM chartest";
mysql_set_charset("UTF8");
$rs = mysql_query($sql);
header('Content-type: text/html; charset=utf-8');
while ($row = mysql_fetch_array($rs)) {
echo $row['name'];
}
There are a couple things that might help. First, even though you're setting the charset to UTF-8 in the header, that might not be enough. I've seen the browser ignore that before. Try forcing it by adding this in the head of your html:
<meta charset='utf-8'>
Next, as mentioned here, try doing this:
mysql_query ("set character_set_client='utf8'");
mysql_query ("set character_set_results='utf8'");
mysql_query ("set collation_connection='utf8_general_ci'");
EDIT
So I've just done some reading up an playing around a bit. First let me tell you, despite what I mentioned in the comments, utf8_encode() and utf8_decode() will not help you here. It helps to actually understand UTF-8 encoding. I found the Wikipedia page on UTF-8 very helpful. Assuming the value you are getting back from the database is in fact already UTF-8 encoded and you simply dump it out right after getting it then it should be fine.
If you are doing anything with the database result (manipulating the string in any way especially) and you don't use the unicode aware functions from the PHP mbstring library then it will probably mess it up since the standard PHP string functions are not unicode aware.
Once you understand how UTF-8 encoding works you can do something cool like this:
$test = "™";
for($i = 0; $i < strlen($test); $i++) {
echo sprintf("%b ", ord($test[$i]));
}
Which dumps out something like this:
11100010 10000100 10100010
That's a properly encoded UTF-8 '™' character. If you don't have a character like that in your data retrieved from the database then something is messed up.
To check, try searching for a special character that you know is in the result using mb_strpos():
var_dump(mb_strpos($db_result, '™'));
If that returns anything other than false then the data from the database is fine, otherwise we can at least establish that it's a problem between PHP and the database.
you need to execute the following query first.
mysql_query("SET NAMES utf8");

Character missing when saving a row?

$userTb = new My_Tb_User(); //Child of Zend_Db_Table_Abstract
$row = $userTb->find(9)->current();
$row->name = 'STÖVER';
$row->save();
Inside user table at row 9 for name column value ST gets stored instead of STÖVER ?
Ö is a german character supported in UTF-8 . IF I enter manually 'STÖVER' using phpmyadmin it get stored correctly .
I also passed charset parameter with value utf8 when creating db adapter but still no luck !
If you read the manual entry for utf8_encode, it converts an ISO-8859-1 encoded string to UTF-8. The function name is a horrible misnomer, as it suggests some sort of automagic encoding that is necessary. That is not the case. If your source code is saved as UTF-8 and you assign "STÖVER" to $string, then $string holds the character "STÖVER" encoded in UTF-8. No further action is necessary. In fact, trying to convert the UTF-8 string (incorrectly) from ISO-8859-1 to UTF-8 will garble it.
utf8_encode('STÖVER');
check this question in stackoverflow
This is a bad practice to use utf8_encode, this adds a lot of complexity to your app. Try to solve the problem by looking for the source.
Ssome thoughts :
a database server charset problem (check encoding of your server)
a database client charset problem (check encoding of your connection)
a database table charset problem (check encoding of your table)
a php default encoding problem (check default_encoding parameter in parameters.ini)
a multibyte missconfigured (see mb_string parameters in parameters.ini)
a <form> charset problem (check that it is sent as utf-8)
a <html> charset problem (where no enctype is set in your html file)
a Content-encoding: problem (where the wrong encoding is sent by Apache).

PHP/MySQL encoding problems. � instead of certain characters

I have come across some problems when inputting certain characters into my mysql database using php. What I am doing is submitting user inputted text to a database. I cannot figure out what I need to change to allow any kind of character to be put into the database and printed back out through php as it's suppose to.
My MySQL collation is: latin1_swedish_ci
Just before I send the text to the database from my form I use mysql_real_escape_string() on the data.
Example below
this text:
�People are just as happy as they make up their minds to be.�
� Abraham Lincoln
is suppose to look like this:
“People are just as happy as they make up their minds to be.”
― Abraham Lincoln
As mentioned by others, you need to convert to UTF8 from end to end if you want to support "special" characters. This means your web page, PHP, mysql connection and mysql table. The web page is fairly simple, just use the meta tag for UTF8. Ideally your headers would say UTF8 also.
<meta http-equiv="content-type" content="text/html; charset=utf-8" />
Set your PHP to use UTF8. Things would probably work anyway, but it's a good measure to do this:
mb_internal_encoding('UTF-8');
mb_http_output('UTF-8');
mb_http_input('UTF-8');
For mysql, you want to convert your table to UTF8, no need to export/import.
ALTER TABLE table_name CONVERT TO CHARACTER SET utf8
You can, and should, configure mysql to default utf8. But you can also run the query:
SET NAMES UTF8
as the first query after establishing a connection and that will "convert" your database connection to UTF8.
That should solve all your character display problems.
The likeliest cause of the problem is that the database connection is set to latin1 but you are feeding it text encoded in UTF-8. The simplest way to solve this is to convert your input into what the client expects:
$quote = iconv("UTF-8", "WINDOWS-1252//TRANSLIT", $quote);
(What MySQL calls latin1 is windows-1252 in the rest of the world.) Note that many characters, such as the quotation dash U+2015 that you use there, cannot be represented in this encoding and will be converted into something else. Ideally you should change the column encoding to utf8.
An alternative solution: set the database connection to utf8. It doesn't matter how the columns are encoded: MySQL internally converts text from the connection encoding into the storage encoding, you can keep the columns as latin1 if you want to. (If you do, the quotation dash U+2015 will be turned into a question mark ? because it's not in latin1)
How to set the connection encoding depends on what library you are using: if you use the deprecated MySQL library it's mysql_set_charset, if MySQLi it's mysqli_set_charset, if PDO add encoding=utf8 to the DSN.
If you do this you'll have set the page encoding to UTF-8 with the Content-Type header.
Otherwise you would be having the same problem with the browser: feeding it text encoded in UTF-8 when it's expecting something else:
header("Content-Type: text/html; charset=utf-8");
The solutions provided are helpful if starting from scratch. Putting all possible connections to UTF-8 is indeed the safest. UTF-8 is the most used charset on the net for a variety of reasons.
Some suggestions and a word of warning:
copy the tables you want to sanitize with a unique prefix (tmp_)
although your db-connection is forced to utf8, check you General Settings collation, change to utf8_bin if that was not done yet
you need to run this on the local server
the funny char error is mostly due to mixing LATIN1 with UTF-8 configurations. This solution is designed for this. It could work with other used char-sets that LATIN1 but I haven't checked this
check these tmp_tables extensively before copying back to the original
Builds the 2 array needed for the magic:
$chars = get_html_translation_table(HTML_ENTITIES, ENT_QUOTES, "UTF-8");
$LATIN1 = $UTF8 = array();
while (list($key,$val) = each ($chars)) {
$UTF8[] = $key;
$LATIN1[] = $val;
}
Now build up the routines you need: (tables->)rows->fields and at each field call
$row[$field] = mysql_real_escape_string(str_replace($LATIN1 , $UTF8 , $row[$field]));
$q[] = "$field = '{$row[$field]}'";
Finally build up and send the query:
mysql_query("UPDATE $table SET " . implode(" , " , $q) . " WHERE id = '{$row['id']}' LIMIT 1");
change the MySQL collation to utf8_unicode_ci or utf8_general_ci, including the table and the database.
You will need to set your database in utf-8 yes. There is many ways to do it. By changin the config file, via phpmyadmin or by calling php function (sorry memory blank) right before insert and update the mysql.
Unfortunately, i think you will have to re-enter any data you entered before.
One thing you also need to know, from personnal experience, make sure all table with relation have the same collation or you won'T be able to JOIN them.
as reference: http://dev.mysql.com/doc/refman/5.6/en/charset-syntax.html
Also, i can be a apache setting. We've experienced the same issue on 'free-hosting' server as well as on my brother's server. Once switched to another server, all the charater's became neat. Verfiy you apache setting, sorry but i can't bting more light on apache's config.
Get rid of everything you just need to follow these two points, every problem regarding special languages characters will be resolved.
1- You need to define the collation of your table to be utf8_general_ci.
2- define <meta http-equiv="content-type" content="text/html; charset=utf-8"> in the HTML after head tag.
2- You need to define the mysql_set_charset('utf8',$link_identifier); in the file where you made connection with the database and right after the selection of database like 'mysql_select_db' use this 'mysql_set_charset' this will allow you to add and retrieve data properly in what ever the language it is.
If your text has been encoded and decoded with the wrong encoding and so the mojibake is actually "solidified" into unicode characters, then the solutions mentioned so far won't work. I ended up having success with the ftfy Python package to automatically detect/fix mojibake:
https://github.com/LuminosoInsight/python-ftfy
https://pypi.org/project/ftfy/
https://ftfy.readthedocs.io/en/latest/
>>> import ftfy
>>> print(ftfy.fix_encoding("(ง'⌣')ง"))
(ง'⌣')ง
Hopefully this helps people who are in a similar situation.

php using utf8 without utf_encode

i'm running a german website which gets content from a mysql database.
i've defined the charset as utf8 as following:
<meta http-equiv='Content-Type' content='text/html;charset=utf-8' />
the problem is, when fetching + displaying contents from the database i always need to use utf8_encode in order to get the proper german "umlauts".
i want to maintain the utf8 charset for my web as i'll have to add more languages which have special characters.
any ideas on how to 1:1 echo database contents without having to utf8_encode?
thanks
Hard to tell without seeing how you are connecting to your database, but a common problem is the database connection itself.
After opening / selecting the database you need to set:
$db->exec('SET CHARACTER SET utf8'); // PDO
mysql_set_charset('utf8'); // Deprecated mysql_* extension
Whenever I want to use utf-8 with PHP and MySQL, I found that usually these two functions are the ones you should use after mysql_connect():
mysql_set_charset('utf8', $link);
mysql_query('SET NAMES utf8', $link);
Setting the content type in the header may do the trick:
header('content-type: text/html; charset=utf-8');
I had a similar problem and i solve adding this in the beginning of my PHP file:
ini_set('default_charset', 'UTF-8');
mb_internal_encoding('UTF-8');
Additionally, is very important to check if you are saving your PHP file in UTF-8 format without BOM, i had a big headache with this. I recomend Notepad++, it shows the current file encoding and allow you to convert to UTF-8 without BOM if necessary.
If you would like to see my problem and solution, it is here.
Hope it can help you!

Categories