Characters getting encoded to � - php

I am using php + mysql to make a dynamic page. My db has “Make which is encoded to �Make in the web page. I though it to be an encoding issue so,I tried using <html lang='en' dir='ltr'> & <meta charset="utf-8" /> But that too didn't help

When dealing with any charset, it's important that you set everything to the same. You mentioned having set both PHP and HTML headers to UTF-8, which often does the trick, but it's also important that the database-connection, the actual database and it's tables are encoded with UTF-8 as well.
Connection
You also need to specify the charset in the connection itself.
PDO (specified in the object itself):
$handler = new PDO('mysql:host=localhost;dbname=database;charset=utf8', 'username', 'password', array(PDO::MYSQL_ATTR_INIT_COMMAND => "SET CHARACTER SET UTF8"));
MySQLi: (placed directly after creating the connection)
For OOP: $mysqli->set_charset("utf8");
For procedural: mysqli_set_charset($mysqli, "utf8");
(where $mysqli is the MySQLi connection)
MySQL (depricated, you should convert to PDO or MySQLi): (placed directly after creating the connection)
mysql_set_charset("utf8");
Database and tables
Your database and all its tables has to be set to UTF-8. Note that charset is not exactly the same as collation (see this post).
You can do that by running the queries below once for each database and tables (for example in phpMyAdmin)
ALTER DATABASE databasename CHARACTER SET utf8 COLLATE utf8_unicode_ci;
ALTER TABLE tablename CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
File-encoding
It might also be needed for the file itself to be UTF-8 encoded. If you're using Notepad++ to write your code, this can be done in the "Format" drop-down on the taskbar (you should use Convert to..., as this won't mess your current file up) - but any decent IDE would have a similar option. You should use UTF-8 w/o BOM (see this StackOverflow question).
Other
It may be that you already have values in your database that are not encoded with UTF-8. Updating them manually could be a pain and could consume a lot of time. Should this be the case, you could use something like ForceUTF8 and loop through your databases, updating the fields with that function.
Should you follow all of the pointers above, chances are your problem will be solved. If not, you can take a look at this StackOverflow post: UTF-8 all the way through.

If the � is in your database column itself, change the original character to the following:
http://www.w3schools.com/charsets/ref_html_ansi.asp

Related

Encoding troubles converting MySQL to Mongo with PHP

I've been having a lot of encoding troubles with PHP/Mongo in general.
Right now, I'm in the process of converting some data from MySQL to Mongo. I have a string that contains a é, but when I try to encode it to UFT-8 (via mb_convert_encoding, uft8_encode), it turns into é. I'm sure other strings also contain other accented characters.
I've tried mb_detect_encoding, which told me the string is UTF-8, but when I do mb_check_encoding($string, 'UTF-8'), it returns false.
Basically, I have no idea what's wrong. This is on a page that is just a PHP script, no HTML. Any advice to this problem, or in general maintaining character encoding when inserting into Mongo?
Here is the script in question: https://plnkr.co/edit/eAkLxfklzLNCsZTBPKsX
The MySQL table is using a MyISAM engine, charset utf8, collation utf8_unicode_ci
Do not use the mysql_* API; change to mysqli_*
Do not use any mb or utf8 encode/decode routines; they merely hide the 'proper' solution.
Right after connecting to mysql, do SET NAMES utf8.
SHOW CREATE TABLE -- verify that the table/columns are CHARACTER SET utf8 (or utf8mb4)
é is the Mojibake for é. It usually indicates a mismatch of latin1 settings and utf8 settings.
If using PDO: $db = new PDO('dblib:host=host;dbname=db;charset=UTF8', $user, $pwd); or execute SET NAMES utf8.

storing arabic text in mysql using pdo in php

I'm working on arabic site and for that I want to store the arabic input in database. I've set the character set to utf8mb4_general_ci. When I'm printing the data before the insert query, then it is showing me correct arabic value. But when I am inserting it into db it is storing as اÙرÙاضâ. I am using PDO in PHP and I've also set the character set to utf 8 in connection string.
$this->pdo = new PDO($dsn, $this->settings["user"],
$this->settings["password"], array(PDO::MYSQL_ATTR_INIT_COMMAND => "SET NAMES utf8"));
But I am not able to store arabic character in my table.
When setting client charset, one have to make it match the actual data encoding.
So, if your input data is in utf-8, everything should work, but in this case why would you set database charset to utf8mb4, not utf8?
If your input data encoding is different from utf-8, then you have to set names to match this actual encoding.
Also, setting charset in PDO::MYSQL_ATTR_INIT_COMMAND is but a superstition. Although in most cases it plausible, better set it via DSN - it works for all the currently supported PHP versions. Note that encoding names are slightly different from commonly used.
Regarding strange characters you're observing - it's most likely no more than measurement error. The tool you are using to browse the database, have to both support that encoding and set up to display it properly.
All the above is based on the assumption that
I'd set the character set to utf8mb4_general_ci.
statement is about setting the table charset.

Converting latin1_swedish_ci to utf8 with PHP

I have a database filled with values like ♥•â—♥ Dhaka ♥•â—♥ (Which should be ♥•●♥ Dhaka ♥•●♥) as I didnt specify the collation while creating the database.
Now I want to Fix it. I cannot fetch the data again from where I got it from at the first place. So I was thinking if it might be possible to fetch the data in a php script and convert it to the correct characters.
I've changed the collation of the database and the fields to utf8_general_ci..
The collation is NOT the same as the character set. The collation is only used for sorting and comparison of text (that's why there's a language term in there). The actual character set may be different.
The most common failure is not in the database but rather in the connection between PHP and MySQL. The default charset for the connection is usually ISO-8859-1. You need to change that the first thing you do after connecting, using either the SQL query SET NAMES 'utf-8'; or the mysql_set_charset function.
Also check the character set of your tables. This may be wrong as well if you have not specified UTF-8 to begin with (again: this is not the same as the collation). But make sure to take a backup before changing anything here. MySQL will try to convert the charset from the previous one, so you may need to reload the data from backup if you have actually saved UTF-8 data in ISO-8859-1 tables.
I would look into mb_detect_encoding() and mb_convert_encoding() and see if they can help you.

ext/mysql charset support vs ext/mysqli charset

I read some articles that promoted the use of the new ext/mysqli in php due to it's support of character sets. I currently use ext/mysql and use SET NAMES UTF-8 to ensure all my data is stored as utf-8. isn't that charset support in ext/mysql or am I missing something larger?
Thanks :)
SET NAMES UTF-8 does no mean the data are stored in UTF-8. That means that data is RECIEVED in UTF-8 from client and is SERVED in UTF-8 to client.
Storage encoding is set when you create a db/table/row, for example
CREATE TABLE{
...
}CHARSET=utf8;
or
CREATE DATABASE DEFAULT CHARACTER SET utf8
Read here: Mysql: latin1-> utf8. Convert characters to their multibyte equivalents
2 Lyon
mysql goes just fine.
Please check once more the encodings of the tables and rows via, for example, phpMyAdmin. Remember that setting encoding to database doesn't automatically change the encoding of tables. It's just used for a default value if table encoding is not specified

SET NAMES utf8 in MySQL?

I often see something similar to this below in PHP scripts using MySQL
query("SET NAMES utf8");
I have never had to do this for any project yet so I have a couple basic questions about it.
Is this something that is done with PDO only?
If it is not a PDO specific thing, then what is the purpose of doing it? I realize it is setting the encoding for mysql but I mean, I have never had to use it so why would I want to use it?
It is needed whenever you want to send data to the server having characters that cannot be represented in pure ASCII, like 'ñ' or 'ö'.
That if the MySQL instance is not configured to expect UTF-8 encoding by default from client connections (many are, depending on your location and platform.)
Read http://www.joelonsoftware.com/articles/Unicode.html in case you aren't aware how Unicode works.
Read Whether to use "SET NAMES" to see SET NAMES alternatives and what exactly is it about.
From the manual:
SET NAMES indicates what character set
the client will use to send SQL
statements to the server.
More elaborately, (and once again, gratuitously lifted from the manual):
SET NAMES indicates what character set
the client will use to send SQL
statements to the server. Thus, SET
NAMES 'cp1251' tells the server,
“future incoming messages from this
client are in character set cp1251.”
It also specifies the character set
that the server should use for sending
results back to the client. (For
example, it indicates what character
set to use for column values if you
use a SELECT statement.)
Getting encoding right is really tricky - there are too many layers:
Browser
Page
PHP
MySQL
The SQL command "SET CHARSET utf8" from PHP will ensure that the client side (PHP) will get the data in utf8, no matter how they are stored in the database. Of course, they need to be stored correctly first.
DDL definition vs. real data
Encoding defined for a table/column doesn't really mean that the data are in that encoding. If you happened to have a table defined as utf8 but stored as differtent encoding, then MySQL will treat them as utf8 and you're in trouble. Which means you have to fix this first.
What to check
You need to check in what encoding the data flow at each layer.
Check HTTP headers, headers.
Check what's really sent in body of the request.
Don't forget that MySQL has encoding almost everywhere:
Database
Tables
Columns
Server as a whole
Client
Make sure that there's the right one everywhere.
Conversion
If you receive data in e.g. windows-1250, and want to store in utf-8, then use this SQL before storing:
SET NAMES 'cp1250';
If you have data in DB as windows-1250 and want to retreive utf8, use:
SET CHARSET 'utf8';
Few more notes:
Don't rely on too "smart" tools to show the data. E.g. phpMyAdmin does (was doing when I was using it) encoding really bad. And it goes through all the layers so it's hard to find out.
Also, Internet Explorer had really stupid behavior of "guessing" the encoding based on weird rules.
Use simple editors where you can switch encoding. I recommend MySQL Workbench.
This query should be written before the query which create or update data in the database, this query looks like :
mysql_query("set names 'utf8'");
Note that you should write the encode which you are using in the header for example if you are using utf-8 you add it like this in the header or it will couse a problem with Internet Explorer
so your page looks like this
<html>
<head>
<title>page title</title>
<meta charset="UTF-8" />
</head>
<body>
<?php
mysql_query("set names 'utf8'");
$sql = "INSERT * FROM ..... ";
mysql_query($sql);
?>
</body>
</html>
The solution is
$conn->set_charset("utf8");
Instead of doing this via an SQL query use the php function: mysqli::set_charset
mysqli_set_charset
Note:
This is the preferred way to change the charset. Using mysqli_query() to set it (such as SET NAMES utf8) is not recommended.
See the MySQL character set concepts section for more information.
from http://www.php.net/manual/en/mysqli.set-charset.php
Thanks #all!
don't use: query("SET NAMES utf8"); this is setup stuff and not a query. put it right afte a connection start with setCharset() (or similar method)
some little thing in parctice:
status:
mysql server by default talks latin1
your hole app is in utf8
connection is made without any extra (so: latin1) (no SET NAMES utf8 ..., no set_charset() method/function)
Store and read data is no problem as long mysql can handle the characters.
if you look in the db you will already see there is crap in it (e.g.using phpmyadmin).
until now this is not a problem! (wrong but works often (in europe)) ..
..unless another client/programm or a changed library, which works correct, will read/save data. then you are in big trouble!
Not only PDO. If sql answer like '????' symbols, preset of you charset (hope UTF-8) really recommended:
if (!$mysqli->set_charset("utf8"))
{ printf("Can't set utf8: %s\n", $mysqli->error); }
or via procedure style mysqli_set_charset($db,"utf8")

Categories