storing arabic text in mysql using pdo in php

storing arabic text in mysql using pdo in php - php

I'm working on arabic site and for that I want to store the arabic input in database. I've set the character set to utf8mb4_general_ci. When I'm printing the data before the insert query, then it is showing me correct arabic value. But when I am inserting it into db it is storing as Ø§ÙØ±ÙØ§Ø¶â. I am using PDO in PHP and I've also set the character set to utf 8 in connection string.
$this->pdo = new PDO($dsn, $this->settings["user"],
$this->settings["password"], array(PDO::MYSQL_ATTR_INIT_COMMAND => "SET NAMES utf8"));
But I am not able to store arabic character in my table.

When setting client charset, one have to make it match the actual data encoding.
So, if your input data is in utf-8, everything should work, but in this case why would you set database charset to utf8mb4, not utf8?
If your input data encoding is different from utf-8, then you have to set names to match this actual encoding.
Also, setting charset in PDO::MYSQL_ATTR_INIT_COMMAND is but a superstition. Although in most cases it plausible, better set it via DSN - it works for all the currently supported PHP versions. Note that encoding names are slightly different from commonly used.
Regarding strange characters you're observing - it's most likely no more than measurement error. The tool you are using to browse the database, have to both support that encoding and set up to display it properly.
All the above is based on the assumption that
I'd set the character set to utf8mb4_general_ci.
statement is about setting the table charset.

Related

Characters getting encoded to �

I am using php + mysql to make a dynamic page. My db has “Make which is encoded to �Make in the web page. I though it to be an encoding issue so,I tried using <html lang='en' dir='ltr'> & <meta charset="utf-8" /> But that too didn't help

When dealing with any charset, it's important that you set everything to the same. You mentioned having set both PHP and HTML headers to UTF-8, which often does the trick, but it's also important that the database-connection, the actual database and it's tables are encoded with UTF-8 as well.
Connection
You also need to specify the charset in the connection itself.
PDO (specified in the object itself):
$handler = new PDO('mysql:host=localhost;dbname=database;charset=utf8', 'username', 'password', array(PDO::MYSQL_ATTR_INIT_COMMAND => "SET CHARACTER SET UTF8"));
MySQLi: (placed directly after creating the connection)
For OOP: $mysqli->set_charset("utf8");
For procedural: mysqli_set_charset($mysqli, "utf8");
(where $mysqli is the MySQLi connection)
MySQL (depricated, you should convert to PDO or MySQLi): (placed directly after creating the connection)
mysql_set_charset("utf8");
Database and tables
Your database and all its tables has to be set to UTF-8. Note that charset is not exactly the same as collation (see this post).
You can do that by running the queries below once for each database and tables (for example in phpMyAdmin)
ALTER DATABASE databasename CHARACTER SET utf8 COLLATE utf8_unicode_ci;
ALTER TABLE tablename CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
File-encoding
It might also be needed for the file itself to be UTF-8 encoded. If you're using Notepad++ to write your code, this can be done in the "Format" drop-down on the taskbar (you should use Convert to..., as this won't mess your current file up) - but any decent IDE would have a similar option. You should use UTF-8 w/o BOM (see this StackOverflow question).
Other
It may be that you already have values in your database that are not encoded with UTF-8. Updating them manually could be a pain and could consume a lot of time. Should this be the case, you could use something like ForceUTF8 and loop through your databases, updating the fields with that function.
Should you follow all of the pointers above, chances are your problem will be solved. If not, you can take a look at this StackOverflow post: UTF-8 all the way through.

If the � is in your database column itself, change the original character to the following:
http://www.w3schools.com/charsets/ref_html_ansi.asp

PHP displaying Chinese characters: SET NAMES 'utf8' not working

I'm trying to work with a database that I have, but I can't display Chinese characters in it. The database was actually a MS Access file first, that I converted into mysql with a program. Anyway, many rows have Chinese characters in them and I can't get them to display properly in any browser.
I can display Chinese characters just fine otherwise, and I can also see them if I use phpmyadmin to look at the tables. I searched around for a solution to this problem and it seems to me that the usual fix is to do the "SET NAMES 'utf8'" query, but this only changed the displayed characters from question marks to other, weird, symbols.
If I look in phpmyadmin collation is utf8_general_ci for the database and all the tables.
Any ideas?

For MySQL DB, this solves the problem:
$dbh = mysql_connect($hostname, $username, $password);
mysql_select_db($db, $dbh);
mysql_set_charset('utf8', $dbh);
PDO solution:
$dbh = new PDO('mysql:host=$hostname;dbname=$db;charset=UTF-8', $username, $password);

You'd have to make sure of a few things:
Before import, the character set of the table you're going to use has to be set as utf8. You must also make sure the imported data actually contains proper utf8 encoded characters.
At the time of import you have to specify the character set the established session (e.g. by running SET NAMES utf8;)
After import, you should write a small script that reads a row that you know has special characters in it; the script must:
use header('Content-Type: text/plain; charset=utf-8'); or whichever mime type you wish to set
set the correct character set for the established MySQL connection (utf8)
If all goes well, it should display your data correctly.

Converting latin1_swedish_ci to utf8 with PHP

I have a database filled with values like â™¥â€¢â—â™¥ Dhaka â™¥â€¢â—â™¥ (Which should be ♥•●♥ Dhaka ♥•●♥) as I didnt specify the collation while creating the database.
Now I want to Fix it. I cannot fetch the data again from where I got it from at the first place. So I was thinking if it might be possible to fetch the data in a php script and convert it to the correct characters.
I've changed the collation of the database and the fields to utf8_general_ci..

The collation is NOT the same as the character set. The collation is only used for sorting and comparison of text (that's why there's a language term in there). The actual character set may be different.
The most common failure is not in the database but rather in the connection between PHP and MySQL. The default charset for the connection is usually ISO-8859-1. You need to change that the first thing you do after connecting, using either the SQL query SET NAMES 'utf-8'; or the mysql_set_charset function.
Also check the character set of your tables. This may be wrong as well if you have not specified UTF-8 to begin with (again: this is not the same as the collation). But make sure to take a backup before changing anything here. MySQL will try to convert the charset from the previous one, so you may need to reload the data from backup if you have actually saved UTF-8 data in ISO-8859-1 tables.

I would look into mb_detect_encoding() and mb_convert_encoding() and see if they can help you.

MySQL collation and PHP charset conflict

I have a bunch of Danish text taken from a latin-1 MySQL database and it displays correctly when echoed in PHP. The problem starts when I need to echo some other Danish characters, which are not taken from the database.
What I do is actually output the header
Content-Type: text/html; charset=iso-8859-1
to also let the non-queried characters to display correctly as well.
Problems is, when I do that the queried characters display incorrectly.

Just because the data is stored in a latin-1 collated table doesn't mean that it's latin-1 encoded. This is due to MySQL not doing any character translation when the connection SET NAMES setting is the same as the collation.
I suspect that you have some UTF8 characters stored in a latin1 database which is confusing the issue.
For more help please can you add details of the:
MySQL connection encoding that you have set
Details of where the "non-queried" characters are coming from

Use unicode. UTF-8 => the right way.
So, set utf8_unicode_ci in database, UTF-8 as page charset and before your query set mysql_query("SET NAMES UTF8");

Does MySQL collation type need to match PHP page charset type?

I have started debugging my RSS feed because it has some strange characters in it (i.e. the missing-character glyph). I started with two excellent beginner resources:
The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets: http://www.joelonsoftware.com/articles/Unicode.html
Character Sets / Character Encoding Issues: http://www.phpwact.org/php/i18n/charsets
The reason I believe our RSS feed is having problems is because users are copy&pasteing MS Word documents into a textarea on the site and our PHP pages are using the "iso-8859-1" charset which is incompatible with the special "Windows-1252" encodings for things like bullet points and smart quotes used by MS Word.
So I'm hoping to fix the issue, all I'll need to do is start using "utf-8" in the pages that take/give user input??. I.e. set the following in the HEAD section:
<meta http-equiv="Content-Type" content="text/html;charset=utf-8" />
The real reason I'm raising this question though, is because my DB fields that store my user input are in "latin1_swedish_ci" and I want to know whether I NEED to convert them to "utf8_general_ci"? MySQL doesn't really care about the charset does it? It just sees a bunch of bytes and If I put Unicode into a field collated as Latin it'll still come back out as Unicode right? Changing the field will be tiresome because the field is part of a FULLTEXT index where the other fields will also need their collation changing which means dropping the index and rebuilding it (which is no small task when there's large amounts of TEXT involved).

The real reason I'm raising this question though, is because my DB fields that store my user input are in "latin1_swedish_ci" and I want to know whether I NEED to convert them to "utf8_general_ci"?
No. latin1_swedish_ci and utf8_general_ci are collations - not charsets. The collation won't affect the way that characters are stored or input/output. It only controls how sorting functions order their results. The collation - to work as expected - should match the storage charset. So if your tables are stored in utf8, you should use a utf8 collation.
The storage charset for mysql is not directly tied to the charset in php. You can use utf8 as the storage characterset for Mysql, while using iso-8859-1 in php. In that case, you need to tell Mysql about it, by setting the charset on the connection (set names XXX). Mysql will then convert as needed. If you don't use the same charset on Mysql and php, you'll end up with the charset capacity that is the lowest dommon denominator, so even though strings are stored in utf8, you'll not have the full unicode range of characters available. Therefore you should use utf8 in both Mysql and php.

No - definitively not. As MySQL posseses the ability to transform strings from one character set into another on the fly, it's important though that your MySQL server knows what character set you're working with on the client side (client side = PHP script, NOT the client accessing your webpage). This can be done by issuing the query
SET NAMES 'utf8';
prior to any other query you send to the server. MySQL will then do the appropriate conversions from your client character set into the internal MySQL character set into the table and/or column character set and all the way back. So generally you only have to worry about setting the correct client character set. This character set must match the character set you use to output your data to the webserver.
Please have a look at the MySQL manual:
9.1.4. Connection Character Sets and Collations
or 9.1. Character Set Support in general.

To save someone some time searching for how to change the mysql connection charset nicely with pdo/mysql here's how i do it:
$dbc = new pdo('mysql:dbname=DBNAME;host=DBHOST', $user, $pw, array(PDO::MYSQL_ATTR_INIT_COMMAND => sprintf( "SET NAMES %s", $charset ) ) );

In HTTP the character encoding is declared by the charset parameter in the Content-Type header field of the HTTP response. Other declaration are overwritten by the declaration in the HTTP header:
[…] user agents must observe the following priorities when determining a document's character encoding (from highest priority to lowest):
An HTTP "charset" parameter in a "Content-Type" field.
A META declaration with "http-equiv" set to "Content-Type" and a value set for "charset".
The charset attribute set on an element that designates an external resource.
Additionally you should explicitly declare the accepted character encoding with the accept-charset attribute in the form element. Otherwise the user agent may take (but must not) the character encoding used in the form document to encode the input data:
The default value for this attribute is the reserved string "UNKNOWN". User agents may interpret this value as the character encoding that was used to transmit the document containing this FORM element.
This should give you the best chance that the incoming data is encoded correctly. But it’s not guarateed. So better check if the data is acutally encoded with UTF-8 (there are functions/algorithms to do this).

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.