I am a starter in php/mySQL, and I am currently facing a problem to display symbol such as ® onto my html. The symbol is stored in a table which can display properly when viewed from phpmyadmin, but when I use php to retrieve the table content, it does not display the symbol but instead displaying a symbol of a diamond with a ? inside it. I have set the html page to utf-8 and my table to utf8_general_ci but no luck from those.
The symbol is able to display correctly when I put straight to html or even store in php variable.
The query I used to get the content is
while ($row = mysql_fetch_array($result)){
echo ($row["symbol"]);
}
Many thanks in advance
You can use html character entities instead of direct symbol
Do not use
®
Try it
®
These types of encoding issues can get complex when dealing with different character sets. In these cases, just changing the collation will not fix the problem, you need to change the CHARSET. Only after changing the CHARSET should you worry about the collation (they are not the same thing).
Just to be safe, export your database/table before altering it.
I would begin, by converting the table to utf8 since it is now the standard.
ALTER TABLE tbl_name
CONVERT TO CHARACTER SET utf8
By doing this, it will also change the CHARSET of the table and columns to utf8, but you may still need to manually change the collation of the columns to utf8_general_ci (seems like you have already done that).
In the event you want to change the default character set (for new columns)...
ALTER TABLE tbl_name
DEFAULT CHARACTER SET utf8
EDIT :
If changing the CHARSET in the database doesn't work, you can try setting it on the PHP side. Just add this after your connection.
mysql
mysql_set_charset("utf8");
mysqli
$mysqli->set_charset("utf8");
PDO
PDO::MYSQL_ATTR_INIT_COMMAND => "SET CHARACTER SET 'utf8'"
Here is some helpful documentation:
10.1 Character Set Support
10.1.12 Column Character Set Conversion
10.1.13.1 Unicode Character Sets
You can convert the trademark, copyright or other symbols into/out database via an HTMLEntity
The htmlentities() function converts characters to HTML entities.
Reference: http://www.php.net/manual/en/function.htmlentities.php
Reference: http://www.php.net/manual/en/function.htmlspecialchars.php
® Registered Trademark ®
™ Trademark Symbol:
Other useful information and symbols can be found here: http://www.w3schools.com/html/html_entities.asp
Related
I have a database table with a column where I categorized Persian alphabetic letters to select with MySQL WHERE later. everything works fine for all letters, but I have a problem while selecting letter (چ) which is stored as (Ù†) in database and (ن) which is stored as (Ú†).
first I thought the problem could be from inserting same letters, but when I checked in database , letters where stored with different encoding I mean (Ù†) and (Ú†).
when I zoom in these letters the tick over U is different. both letters are echoed correctly when I echo them on webpage, but when I choose to select letters WHERE letter = 'چ' it shows letters with (ن) too!!!
all of the webpages that insert and read data from DB are in UTF-8 and database collation is utf_persian-ci.
I cant find where the problem is with this? any help is appreciated,
Mojibake. (or not; see below) Probably:
The bytes you have in the client are correctly encoded in utf8 (good).
You connected with SET NAMES latin1 (or set_charset('latin1') or ...), probably by default. (It should have been utf8.)
The column in the tables may or may not have been CHARACTER SET utf8, but it should have been that.
For PHP:
⚈ mysqli interface: mysqli_set_charset('utf8') function.
⚈ PDO interface: set the charset attribute of the PDO dsn or via SET NAMES utf8.
The COLLATION (eg, utf8_persion_ci) is not relevant to Mojibake. It is relevant to how characters are ordered.
Edit
You say "is stored as (Ù†)" -- How do you know? Most attempts to see what is stored are subject to the client fiddling with the bytes. This is a sure way to see what is there:
SELECT col, HEX(col) FROM tbl ...
For چ, the HEX should be DA86 for proper utf8 (or utf8mb4) encoding. If you get C39AE280A0, then you have "double encoding". In general, Arabic/Persian/Farsi should be of the form Dxyy.
If you read چ while connected with latin1, you will get Ù†, which is DA86 in latin1 encoding (Ù = DA and † = 86).
ن encodes as D986.
Double Encoding
I used hex(col) to send query and got C399E280A0 for ن and C39AE280A0 for چ .
So, you have "double encoding", not "Mojibake".
C399 is utf8 for Ù; E280A0 is utf8 for †. Your character was changed from latin1 to utf8 twice. Usually the end result is invisible to the outside world, but messed up in the table. That is because the SELECT decodes twice. However, since you are seeing only one decode, things are not that simple.
Caveat: You have a situation where I have not experimented; the advice I give you could be wrong.
Here's what probably happened.
The client had characters encoded as utf8 (good) hex: D986;
When inserting, the application lied by claiming that the client had latin1 encoding. (This is the old default.); D9 converted to Ù and 86 converted to †;
The column in the table declared CHARACTER SET utf8 (good). But now the Ù is stored as C399 and the † is stored as E280A0, for a total of 5 bytes;
When reading the connection claimed utf8 (good) for the client, so those 5 bytes were turned back into Ù†;
The client dutifully said the utf8 data was Ù†.
Notice the imbalance between the INSERT and the SELECT. You tagged this PHP; did PHP both write and read the data? Did it have a different setting for the charset for writing and reading?
The problem seems to be only in setting the charset for writing. It needed to be explicitly utf8, not defaulting to latin1.
But what about the data? If everything I said (about double encoding) matches what you have, then an UPDATE can fix the data. See my blog for the details.
This is a typical result of using a 'locale specific unicode encoding', in your case utf8_persian_ci. I expect that if you switch your collation to utf8_unicode_ci, it will work as expected.
If by any change you want to get rid of the case-insensitivity, you could switch to utf8_bin.
For further reference see the MySQL documentation.
I am converting a spreadsheet using PHPExcel to a Database and the cell value happens to contain Russian. If I run mb_detect_encoding() I am told the text is UTF8 and if I set a header of UTF8 then I see the correct Russian characters.
However if I compile it into a string (with only addslashes involved in the process) and insert it into the table I see lots of ????. I have set the table characterset as utf8mb4 and also set the collation as utf8mb4_general_ci. I have also run $this->db->query("SET NAMES 'utf8mb4'"); on my DB connection.
I run PDO query() with my multi part insert and get the ???s but if I output the query to screen I get ÐŸÐ¾Ñ which would be valid UTF8. Why would this not be stored correctly in the database?
I have kept this question rather than deleting it so someone may find the answer helpful.
The reason I was struggling was because in SQLYog it doesn't show you the column Charset by default. There is an option which reads "Hide language options" on the Alter table view which will then reveal that when SQLyog creates a table it uses the default server Charset as opposed to what you define the table Charset to be. I'm not sure if thats correct - but the solution simply is to turn on the Column Charset settings and check they match what you are expecting.
По is Mojibake for По. Probably...
The bytes you have in the client are correctly encoded in utf8 (good).
You connected with SET NAMES latin1 (or set_charset('latin1') or ...), probably by default. (It should have been utf8.)
The column in the tables may or may not have been CHARACTER SET utf8, but it should have been that.
The question marks imply...
you had utf8-encoded data (good)
SET NAMES latin1 was in effect (default, but wrong)
the column was declared CHARACTER SET latin1 (default, but wrong)
One way to help diagnose the problem(s) is to run
SELECT col, HEX(col) FROM tbl WHERE ...
For По, the hex should be D09FD0BE. Each Cyrillic character, in utf8, is hex D0xx.
I am working on a turkish website, which has stored many malformed turkish characters in a MySQL database, like:
- ş as þ
- ı as ý
- ğ as ð
- Ý as İ
i can not change the data in the database, because the database are updated daily and the new data will contain the malformed characters again. So my idea was to change the data in PHP instead of changing the data in the database. I have tried some steps:
Turkish characters are not displayed correctly
Fix Turkish Charset Issue Html / PHP (iconv?)
PHP Turkish Language displaying issue
PHP MYSQL encoding issue ( Turkish Characters )
I am using the PHP-MySQLi-Database-Class available on GitHub with utf8 as charset.
I have even tried to replace the malformed characters with str_replace, like:
$newString = str_replace ( chr ( 253 ), "ı", $newString );
My question is, how can i solve the issue without changing the characters in the database? Are there any best practices? Is it a good option just to replace the characters?
EDIT:
solved it by using
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-9" />
2022 update. I made a wide research and I found this solution and it's working.
let's say your db_connection is $mysqli:
$mysqli = mysqli_connect($hostname, $username, $password, $database) OR DIE ("Baglanti saglanamadi!");
just add this line after. it works like magic with all languages even Arabic:
mysqli_set_charset($mysqli, 'utf8');
Two solutions are good
PHP MYSQL encoding issue ( Turkish Characters )
PHP Turkish Language displaying issue
Also you can set configuration on phpMyAdmin
Operations > Table options > Collation > select utf8_general_ci
if you create the tables already edit the collation structures also
SELECT CONVERT(CONVERT(UNHEX('d0dddef0fdfe') USING ...) USING utf8);
latin5 / iso-8859-1 shows ĞİŞğış
latin1 / iso-8859-9 shows ÐÝÞðýþ
You are confusing two similar encodings; see the first paragraph in https://en.wikipedia.org/wiki/ISO/IEC_8859-9 .
"Collation" is only for sorting. But first you need to change the CHARACTER SET to latin5. Then change the collation to latin5_turkish_ci. (Since that is the default for latin5, no action need be taken.)
This may suffice to make the change in MySQL: EDIT 3
NO, this is probably wring -- ALTER TABLE tbl CONVERT TO CHARACTER SET latin5;
After seeing more of the issue, this "2-step ALTER" is probably correct:
ALTER TABLE Tbl MODIFY COLUMN col VARBINARY(...) ...;
ALTER TABLE Tbl MODIFY COLUMN col VARCHAR(...) ... CHARACTER SET latin5 ...;
Do that for each table. Be sure to test this on a copy of your data first.
The 2-step ALTER is useful for when the bytes are correct, but the CHARACTER SET is not.
CONVERT TO should be used when the characters are correct, but you want a different encoding (and CHARACTER SET). See Case 5.
Edit 1
E7 and FD and cp1250, dec8, latin1 and latin2 for ç and ý. FD in latin5 is ı. I conclude that your encoding is latin1, not latin5.
You say you cannot change the "scripts". Let's look at your limitations. Are you restricted on the INSERT side? Or the SELECT side? Or both? What is rendering the text; html? MySQL is willing to change from latin1 to/from latin5 and you insert/select (based on a few settings). And/or you could lie to HTML (via a meta tag) to get it to interpret the bytes differently. Please spell out the details of the data flow.
Edit 2
Given that the HEX in the table is E7FD6B6172FD6C6D6173FD6E61, and it should be rendered as çıkarılmasına, ... Note especially the second letter needs to show as ı (Turkish dotless small I), not ý (small Y with acute), correct?
Start by trying
<meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-9"/>
That should give you the `latin5 rendering, as you already found out. IANA Reference.
As for "Best practice", that would involve changing the way text is inserted. You have stated this as off-limits.
Apparently you have latin5 characters stored in a latin1 column. Since latin1 does not involve any checking, you can insert and retrieve latin5 characters without any trouble.
This does not address the desire to have Turkish collation. If necessary, I can probably concoct a way to specify Turkish ordering on particular statements; please provide a sample statement.
I've inserted arabic characters into my database and as they're being displayed as question marks
I use SafeMySQL class to connect to the database and I set my charset to utf8
All my tables are set to utf8_general_ci and my database is also set to utf8_general_ci
I force a utf8 encoding in htaccess and inside the html
Does anyone have any idea what may have been left out of this that is still displaying arabic characters as question marks?
I am the author of SafeMysql. By default this library is using utf8 encoding, so there should be no problem with database connection encoding, unless you are explicitly setting some other encoding.
From the error message you gave in comments,
Warning: #1366 Incorrect string value: '\xD9\x85\xD8\xAB\xD8\xA7...' for column 'report_message' at row 1
I would say that you have to double-check your table's (and fields) charset.
Run the following query and see, if the table's definition indeed contain utf8 in all respective places:
DESCRIBE TABLE table_name
I am trying to import data from an XML file into a MYSQL DB using PHP. I am able to get the code to work just fine but when I look at the data in the DB there are special characters. For example, when I look at the XML in my browser it shows up as "outdoors in good weater..." but in the DB it appears to as "outdoors in good weather…".
I've cycled through all the different types of collation for that field in my DB but it does not seem to help much. Sometimes it shows up with the characters mentioned above and others as ???.
I have also tried to sync up the data with the following code in my PHP
$mysqli->query("SET NAMES 'utf8' COLLATE 'utf8_general_ci'");
But, again I have had no luck.
Thank you for reading this and for your help!
Akshay
You need to change the character set to UTF-8, along with your collation:
ALTER TABLE tablename CONVERT TO CHARACTER SET utf8 COLLATE utf8_general_ci;
What you are seeing is a Unicode ellipsis character (…) being converted into another character set, which is probably Latin1. That is why it looks garbled.