I've inserted arabic characters into my database and as they're being displayed as question marks
I use SafeMySQL class to connect to the database and I set my charset to utf8
All my tables are set to utf8_general_ci and my database is also set to utf8_general_ci
I force a utf8 encoding in htaccess and inside the html
Does anyone have any idea what may have been left out of this that is still displaying arabic characters as question marks?
I am the author of SafeMysql. By default this library is using utf8 encoding, so there should be no problem with database connection encoding, unless you are explicitly setting some other encoding.
From the error message you gave in comments,
Warning: #1366 Incorrect string value: '\xD9\x85\xD8\xAB\xD8\xA7...' for column 'report_message' at row 1
I would say that you have to double-check your table's (and fields) charset.
Run the following query and see, if the table's definition indeed contain utf8 in all respective places:
DESCRIBE TABLE table_name
Related
I have a database table with a column where I categorized Persian alphabetic letters to select with MySQL WHERE later. everything works fine for all letters, but I have a problem while selecting letter (چ) which is stored as (Ù†) in database and (ن) which is stored as (Ú†).
first I thought the problem could be from inserting same letters, but when I checked in database , letters where stored with different encoding I mean (Ù†) and (Ú†).
when I zoom in these letters the tick over U is different. both letters are echoed correctly when I echo them on webpage, but when I choose to select letters WHERE letter = 'چ' it shows letters with (ن) too!!!
all of the webpages that insert and read data from DB are in UTF-8 and database collation is utf_persian-ci.
I cant find where the problem is with this? any help is appreciated,
Mojibake. (or not; see below) Probably:
The bytes you have in the client are correctly encoded in utf8 (good).
You connected with SET NAMES latin1 (or set_charset('latin1') or ...), probably by default. (It should have been utf8.)
The column in the tables may or may not have been CHARACTER SET utf8, but it should have been that.
For PHP:
⚈ mysqli interface: mysqli_set_charset('utf8') function.
⚈ PDO interface: set the charset attribute of the PDO dsn or via SET NAMES utf8.
The COLLATION (eg, utf8_persion_ci) is not relevant to Mojibake. It is relevant to how characters are ordered.
Edit
You say "is stored as (Ù†)" -- How do you know? Most attempts to see what is stored are subject to the client fiddling with the bytes. This is a sure way to see what is there:
SELECT col, HEX(col) FROM tbl ...
For چ, the HEX should be DA86 for proper utf8 (or utf8mb4) encoding. If you get C39AE280A0, then you have "double encoding". In general, Arabic/Persian/Farsi should be of the form Dxyy.
If you read چ while connected with latin1, you will get Ù†, which is DA86 in latin1 encoding (Ù = DA and † = 86).
ن encodes as D986.
Double Encoding
I used hex(col) to send query and got C399E280A0 for ن and C39AE280A0 for چ .
So, you have "double encoding", not "Mojibake".
C399 is utf8 for Ù; E280A0 is utf8 for †. Your character was changed from latin1 to utf8 twice. Usually the end result is invisible to the outside world, but messed up in the table. That is because the SELECT decodes twice. However, since you are seeing only one decode, things are not that simple.
Caveat: You have a situation where I have not experimented; the advice I give you could be wrong.
Here's what probably happened.
The client had characters encoded as utf8 (good) hex: D986;
When inserting, the application lied by claiming that the client had latin1 encoding. (This is the old default.); D9 converted to Ù and 86 converted to †;
The column in the table declared CHARACTER SET utf8 (good). But now the Ù is stored as C399 and the † is stored as E280A0, for a total of 5 bytes;
When reading the connection claimed utf8 (good) for the client, so those 5 bytes were turned back into Ù†;
The client dutifully said the utf8 data was Ù†.
Notice the imbalance between the INSERT and the SELECT. You tagged this PHP; did PHP both write and read the data? Did it have a different setting for the charset for writing and reading?
The problem seems to be only in setting the charset for writing. It needed to be explicitly utf8, not defaulting to latin1.
But what about the data? If everything I said (about double encoding) matches what you have, then an UPDATE can fix the data. See my blog for the details.
This is a typical result of using a 'locale specific unicode encoding', in your case utf8_persian_ci. I expect that if you switch your collation to utf8_unicode_ci, it will work as expected.
If by any change you want to get rid of the case-insensitivity, you could switch to utf8_bin.
For further reference see the MySQL documentation.
I am converting a spreadsheet using PHPExcel to a Database and the cell value happens to contain Russian. If I run mb_detect_encoding() I am told the text is UTF8 and if I set a header of UTF8 then I see the correct Russian characters.
However if I compile it into a string (with only addslashes involved in the process) and insert it into the table I see lots of ????. I have set the table characterset as utf8mb4 and also set the collation as utf8mb4_general_ci. I have also run $this->db->query("SET NAMES 'utf8mb4'"); on my DB connection.
I run PDO query() with my multi part insert and get the ???s but if I output the query to screen I get ÐŸÐ¾Ñ which would be valid UTF8. Why would this not be stored correctly in the database?
I have kept this question rather than deleting it so someone may find the answer helpful.
The reason I was struggling was because in SQLYog it doesn't show you the column Charset by default. There is an option which reads "Hide language options" on the Alter table view which will then reveal that when SQLyog creates a table it uses the default server Charset as opposed to what you define the table Charset to be. I'm not sure if thats correct - but the solution simply is to turn on the Column Charset settings and check they match what you are expecting.
По is Mojibake for По. Probably...
The bytes you have in the client are correctly encoded in utf8 (good).
You connected with SET NAMES latin1 (or set_charset('latin1') or ...), probably by default. (It should have been utf8.)
The column in the tables may or may not have been CHARACTER SET utf8, but it should have been that.
The question marks imply...
you had utf8-encoded data (good)
SET NAMES latin1 was in effect (default, but wrong)
the column was declared CHARACTER SET latin1 (default, but wrong)
One way to help diagnose the problem(s) is to run
SELECT col, HEX(col) FROM tbl WHERE ...
For По, the hex should be D09FD0BE. Each Cyrillic character, in utf8, is hex D0xx.
I am a starter in php/mySQL, and I am currently facing a problem to display symbol such as ® onto my html. The symbol is stored in a table which can display properly when viewed from phpmyadmin, but when I use php to retrieve the table content, it does not display the symbol but instead displaying a symbol of a diamond with a ? inside it. I have set the html page to utf-8 and my table to utf8_general_ci but no luck from those.
The symbol is able to display correctly when I put straight to html or even store in php variable.
The query I used to get the content is
while ($row = mysql_fetch_array($result)){
echo ($row["symbol"]);
}
Many thanks in advance
You can use html character entities instead of direct symbol
Do not use
®
Try it
®
These types of encoding issues can get complex when dealing with different character sets. In these cases, just changing the collation will not fix the problem, you need to change the CHARSET. Only after changing the CHARSET should you worry about the collation (they are not the same thing).
Just to be safe, export your database/table before altering it.
I would begin, by converting the table to utf8 since it is now the standard.
ALTER TABLE tbl_name
CONVERT TO CHARACTER SET utf8
By doing this, it will also change the CHARSET of the table and columns to utf8, but you may still need to manually change the collation of the columns to utf8_general_ci (seems like you have already done that).
In the event you want to change the default character set (for new columns)...
ALTER TABLE tbl_name
DEFAULT CHARACTER SET utf8
EDIT :
If changing the CHARSET in the database doesn't work, you can try setting it on the PHP side. Just add this after your connection.
mysql
mysql_set_charset("utf8");
mysqli
$mysqli->set_charset("utf8");
PDO
PDO::MYSQL_ATTR_INIT_COMMAND => "SET CHARACTER SET 'utf8'"
Here is some helpful documentation:
10.1 Character Set Support
10.1.12 Column Character Set Conversion
10.1.13.1 Unicode Character Sets
You can convert the trademark, copyright or other symbols into/out database via an HTMLEntity
The htmlentities() function converts characters to HTML entities.
Reference: http://www.php.net/manual/en/function.htmlentities.php
Reference: http://www.php.net/manual/en/function.htmlspecialchars.php
® Registered Trademark ®
™ Trademark Symbol:
Other useful information and symbols can be found here: http://www.w3schools.com/html/html_entities.asp
When I fetch a record from my database, where the database, the table, and the row are all set to utf8_unicode_ci, I recieve a question boxed in a diagonal square in place of the correct unicode character; this is despite me also setting the HTML encoding on the page with:
<meta charset="utf8">
I have a suspicion however it is to do with MySQL/PHP though because when I print_r the output the question marks are still displaying while a manually entered degree symbol (the symbol I should be seeing) works fine.
This SQL query also did nothing:
SET NAMES utf8;
Any ideas? I've checked every end of my setup.
utf8_unicode_ci is the collation, you need the character set as utf8 as example:
CREATE TABLE someTable DEFAULT CHARACTER SET utf8 DEFAULT COLLATE utf8_unicode_ci;
As adrienne states in their answer here:
make sure that all of the following are true:
The DB connection is using UTF-8
The DB tables are using UTF-8
The individual columns in the DB tables are using UTF-8
The data is actually stored properly in the UTF-8 encoding inside the database (often not the case if you've imported from bad sources,
or changed table or column collations)
The web page is requesting UTF-8
Apache is serving UTF-8
Spent hours on this now and could use some help! Our website queries our db - table columns are set to Latin1 collation, website has set names to UTF8 for queries.
The data is French and when we do a search for a string including accented characters we get the "Illegal mix of collations (latin1_swedish_ci,IMPLICIT) and (utf8_general_ci,COERCIBLE) for operation 'like'" error.
If you navigate to a page which loads the data completely it shows accented characters no problem, it is just when using the search function that it breaks.
We have tried a number of methods including ALTER TABLE t1 CHANGE c1 c1 BLOB; ALTER TABLE t1 CHANGE c1 c1 TEXT CHARACTER SET utf8;
but this damages the data: the text before the first accented character in each field is fine, but then the rest of the text in the field is completely dropped: 'Métal' becomes 'M'. I am using phpMyAdmin to try and fix this BTW. Not sure if that's a problem.
So is the data UTF8 encoded? If so, why does the ALTER TABLE not work, I've seen it mentioned as THE way to fix this problem on so many webpages! If the fact it doesn't work means that the data is not UTF8 encoded, how do I find out what it is?
Having a different encoding between your website and your database is not a very good idea.... to avoid this problem, it's better to have everything in utf8. Though, it should be possible to convert the encoding of your tables playing with collations.