I am working on a turkish website, which has stored many malformed turkish characters in a MySQL database, like:
- ş as þ
- ı as ý
- ğ as ð
- Ý as İ
i can not change the data in the database, because the database are updated daily and the new data will contain the malformed characters again. So my idea was to change the data in PHP instead of changing the data in the database. I have tried some steps:
Turkish characters are not displayed correctly
Fix Turkish Charset Issue Html / PHP (iconv?)
PHP Turkish Language displaying issue
PHP MYSQL encoding issue ( Turkish Characters )
I am using the PHP-MySQLi-Database-Class available on GitHub with utf8 as charset.
I have even tried to replace the malformed characters with str_replace, like:
$newString = str_replace ( chr ( 253 ), "ı", $newString );
My question is, how can i solve the issue without changing the characters in the database? Are there any best practices? Is it a good option just to replace the characters?
EDIT:
solved it by using
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-9" />
2022 update. I made a wide research and I found this solution and it's working.
let's say your db_connection is $mysqli:
$mysqli = mysqli_connect($hostname, $username, $password, $database) OR DIE ("Baglanti saglanamadi!");
just add this line after. it works like magic with all languages even Arabic:
mysqli_set_charset($mysqli, 'utf8');
Two solutions are good
PHP MYSQL encoding issue ( Turkish Characters )
PHP Turkish Language displaying issue
Also you can set configuration on phpMyAdmin
Operations > Table options > Collation > select utf8_general_ci
if you create the tables already edit the collation structures also
SELECT CONVERT(CONVERT(UNHEX('d0dddef0fdfe') USING ...) USING utf8);
latin5 / iso-8859-1 shows ĞİŞğış
latin1 / iso-8859-9 shows ÐÝÞðýþ
You are confusing two similar encodings; see the first paragraph in https://en.wikipedia.org/wiki/ISO/IEC_8859-9 .
"Collation" is only for sorting. But first you need to change the CHARACTER SET to latin5. Then change the collation to latin5_turkish_ci. (Since that is the default for latin5, no action need be taken.)
This may suffice to make the change in MySQL: EDIT 3
NO, this is probably wring -- ALTER TABLE tbl CONVERT TO CHARACTER SET latin5;
After seeing more of the issue, this "2-step ALTER" is probably correct:
ALTER TABLE Tbl MODIFY COLUMN col VARBINARY(...) ...;
ALTER TABLE Tbl MODIFY COLUMN col VARCHAR(...) ... CHARACTER SET latin5 ...;
Do that for each table. Be sure to test this on a copy of your data first.
The 2-step ALTER is useful for when the bytes are correct, but the CHARACTER SET is not.
CONVERT TO should be used when the characters are correct, but you want a different encoding (and CHARACTER SET). See Case 5.
Edit 1
E7 and FD and cp1250, dec8, latin1 and latin2 for ç and ý. FD in latin5 is ı. I conclude that your encoding is latin1, not latin5.
You say you cannot change the "scripts". Let's look at your limitations. Are you restricted on the INSERT side? Or the SELECT side? Or both? What is rendering the text; html? MySQL is willing to change from latin1 to/from latin5 and you insert/select (based on a few settings). And/or you could lie to HTML (via a meta tag) to get it to interpret the bytes differently. Please spell out the details of the data flow.
Edit 2
Given that the HEX in the table is E7FD6B6172FD6C6D6173FD6E61, and it should be rendered as çıkarılmasına, ... Note especially the second letter needs to show as ı (Turkish dotless small I), not ý (small Y with acute), correct?
Start by trying
<meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-9"/>
That should give you the `latin5 rendering, as you already found out. IANA Reference.
As for "Best practice", that would involve changing the way text is inserted. You have stated this as off-limits.
Apparently you have latin5 characters stored in a latin1 column. Since latin1 does not involve any checking, you can insert and retrieve latin5 characters without any trouble.
This does not address the desire to have Turkish collation. If necessary, I can probably concoct a way to specify Turkish ordering on particular statements; please provide a sample statement.
Related
I have a database table with a column where I categorized Persian alphabetic letters to select with MySQL WHERE later. everything works fine for all letters, but I have a problem while selecting letter (چ) which is stored as (Ù†) in database and (ن) which is stored as (Ú†).
first I thought the problem could be from inserting same letters, but when I checked in database , letters where stored with different encoding I mean (Ù†) and (Ú†).
when I zoom in these letters the tick over U is different. both letters are echoed correctly when I echo them on webpage, but when I choose to select letters WHERE letter = 'چ' it shows letters with (ن) too!!!
all of the webpages that insert and read data from DB are in UTF-8 and database collation is utf_persian-ci.
I cant find where the problem is with this? any help is appreciated,
Mojibake. (or not; see below) Probably:
The bytes you have in the client are correctly encoded in utf8 (good).
You connected with SET NAMES latin1 (or set_charset('latin1') or ...), probably by default. (It should have been utf8.)
The column in the tables may or may not have been CHARACTER SET utf8, but it should have been that.
For PHP:
⚈ mysqli interface: mysqli_set_charset('utf8') function.
⚈ PDO interface: set the charset attribute of the PDO dsn or via SET NAMES utf8.
The COLLATION (eg, utf8_persion_ci) is not relevant to Mojibake. It is relevant to how characters are ordered.
Edit
You say "is stored as (Ù†)" -- How do you know? Most attempts to see what is stored are subject to the client fiddling with the bytes. This is a sure way to see what is there:
SELECT col, HEX(col) FROM tbl ...
For چ, the HEX should be DA86 for proper utf8 (or utf8mb4) encoding. If you get C39AE280A0, then you have "double encoding". In general, Arabic/Persian/Farsi should be of the form Dxyy.
If you read چ while connected with latin1, you will get Ù†, which is DA86 in latin1 encoding (Ù = DA and † = 86).
ن encodes as D986.
Double Encoding
I used hex(col) to send query and got C399E280A0 for ن and C39AE280A0 for چ .
So, you have "double encoding", not "Mojibake".
C399 is utf8 for Ù; E280A0 is utf8 for †. Your character was changed from latin1 to utf8 twice. Usually the end result is invisible to the outside world, but messed up in the table. That is because the SELECT decodes twice. However, since you are seeing only one decode, things are not that simple.
Caveat: You have a situation where I have not experimented; the advice I give you could be wrong.
Here's what probably happened.
The client had characters encoded as utf8 (good) hex: D986;
When inserting, the application lied by claiming that the client had latin1 encoding. (This is the old default.); D9 converted to Ù and 86 converted to †;
The column in the table declared CHARACTER SET utf8 (good). But now the Ù is stored as C399 and the † is stored as E280A0, for a total of 5 bytes;
When reading the connection claimed utf8 (good) for the client, so those 5 bytes were turned back into Ù†;
The client dutifully said the utf8 data was Ù†.
Notice the imbalance between the INSERT and the SELECT. You tagged this PHP; did PHP both write and read the data? Did it have a different setting for the charset for writing and reading?
The problem seems to be only in setting the charset for writing. It needed to be explicitly utf8, not defaulting to latin1.
But what about the data? If everything I said (about double encoding) matches what you have, then an UPDATE can fix the data. See my blog for the details.
This is a typical result of using a 'locale specific unicode encoding', in your case utf8_persian_ci. I expect that if you switch your collation to utf8_unicode_ci, it will work as expected.
If by any change you want to get rid of the case-insensitivity, you could switch to utf8_bin.
For further reference see the MySQL documentation.
I am a starter in php/mySQL, and I am currently facing a problem to display symbol such as ® onto my html. The symbol is stored in a table which can display properly when viewed from phpmyadmin, but when I use php to retrieve the table content, it does not display the symbol but instead displaying a symbol of a diamond with a ? inside it. I have set the html page to utf-8 and my table to utf8_general_ci but no luck from those.
The symbol is able to display correctly when I put straight to html or even store in php variable.
The query I used to get the content is
while ($row = mysql_fetch_array($result)){
echo ($row["symbol"]);
}
Many thanks in advance
You can use html character entities instead of direct symbol
Do not use
®
Try it
®
These types of encoding issues can get complex when dealing with different character sets. In these cases, just changing the collation will not fix the problem, you need to change the CHARSET. Only after changing the CHARSET should you worry about the collation (they are not the same thing).
Just to be safe, export your database/table before altering it.
I would begin, by converting the table to utf8 since it is now the standard.
ALTER TABLE tbl_name
CONVERT TO CHARACTER SET utf8
By doing this, it will also change the CHARSET of the table and columns to utf8, but you may still need to manually change the collation of the columns to utf8_general_ci (seems like you have already done that).
In the event you want to change the default character set (for new columns)...
ALTER TABLE tbl_name
DEFAULT CHARACTER SET utf8
EDIT :
If changing the CHARSET in the database doesn't work, you can try setting it on the PHP side. Just add this after your connection.
mysql
mysql_set_charset("utf8");
mysqli
$mysqli->set_charset("utf8");
PDO
PDO::MYSQL_ATTR_INIT_COMMAND => "SET CHARACTER SET 'utf8'"
Here is some helpful documentation:
10.1 Character Set Support
10.1.12 Column Character Set Conversion
10.1.13.1 Unicode Character Sets
You can convert the trademark, copyright or other symbols into/out database via an HTMLEntity
The htmlentities() function converts characters to HTML entities.
Reference: http://www.php.net/manual/en/function.htmlentities.php
Reference: http://www.php.net/manual/en/function.htmlspecialchars.php
® Registered Trademark ®
™ Trademark Symbol:
Other useful information and symbols can be found here: http://www.w3schools.com/html/html_entities.asp
I have a predefined Microsoft SQL database. The database collation is specified as SQL_Latin1_General_CP1_CI_AS.
The database version is: SQL Server Express 10.0.1600.22
Some of the tables have values with special characters (I assume UTF-8). I am reading these tables with php mssql, and I end up with question marks in the output ????? ???? ?????
I have tried playing with ini_set('mssql.charset','utf8'), with different encoding values such as windows-cp1251, windows-cp1252 with no luck.
I am not sure how to proceed with this? I guess that I need the equivalent of MySQL SET NAMES UTF-8, but I am not sure how to do that in MSSQL. Any ideas?
Converting the tables to utf8 unfortunately is not an option. The field type is nvarchar(250)
A lot of missing information here; for example, the type of the field in the database (varchar, nvarchar???).
Try converting your field to an unicode field if it's stored as a varchar field; for example:
Select cast (field1 as nvarchar(200)) as field1, ...
Also, even if the database collation is SQL_Latin1_General_CP1_CI_AS, each field can have its own specific collation. The database collation is mainly the default collation to be used when you don't specify the collation for a field.
Well, the error could be in a variety of places. The first thing to check is to make sure that your web page can display the extended characters correctly on your web site pages by writing some; for example:
<html>...<p>é</p>...
Of course, if unecessary to try with an HTML entities such as é for doing these tests because the problem is not there; you must use real encoded extended characters.
After that, you can check that your extended characters have been correctly stored in your database by checking their numerical values with the function Unicode(). You can also check that you can write an extended character using the NChar() function; for example:
Select Unicode (substring (field1, 1, 1)), Unicode (N'é'), NChar(233), ...
The Unicode() and NChar() functions are standard Microsoft SQL functions.
It has been many years since the last time that I have coded with PHP, however, from what I recall, the default values for the configuration were sufficient for displaying any extended characters coming from a SQL-Server database using a nvarchar or ntext field.
Use the connection option: "CharacterSet" => "UTF-8" as stated at https://msdn.microsoft.com/en-us/library/cc626307(v=sql.105).aspx
It worked for me. No further conversion needed.
My website is using charset iso 8859 1
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" />
when user post chinese character, it will be saved into the database as & # 2 0 3 2 0 ; & # 2 2 9 0 9 ; which will output as the chinese character when retrieved.
I need to set my website to UTF-8
when user post chinese character, it will be saved as some funky character in the mysql, and when retrieved, some characters are correct but some are wrong.
my question is, after i set to UTF-8, how to i make mysql save the text as # 2 0 3 2 0 ; & # 2 2 9 0 9 ; instead of funky chars.
i tried to use htmlentities. it's not working correctly.
my script is using
$message = htmlentities(strip_tags(mysql_real_escape_string($_POST['message']),'<img><vid>'));
when it's saved to mysql it is like this ä½ å¥½
when it's retrieved to be display is like this ä½ å¥½ <---- is the funky character that is stored in mysql previously
i been seeing lots of encoding related issues
the owner should google a bit before posting
general check list:
mysql table schema to use utf-8
mysql clients connection to use utf-8 mysql --default-character-set=utf8
php mysqli_set_charset to utf-8
html encoding to utf-8
putty, emac clients... to be in utf-8
You should follow ajreal's advice on setting your encodings to UTF-8.
However, from the sound of it you may already have data stored in the database which will have to be converted.
If your website is uniformly iso-8859-1 then most likely Chinese characters are stored as HTML character entities, which means that data is not not stored mis-encoded and converting the character sets should not cause problems. If you carry out the instructions and find that characters appear incorrectly afterwards, it might be because text is stored mis-encoded, in which case there are steps that can be taken to remedy the situation.
Character sets for an existing column may be converted using syntax like
ALTER TABLE TableName MODIFY ColumnName COLUMN_TYPE CHARACTER SET utf8 [NOT NULL]
where COLUMN_TYPE is one of CHAR(n), VARCHAR(n), TEXT and the square brackets indicate that NOT NULL is optional.
Edit
"my question is, after i set to UTF-8, how to i make mysql save the text as # 2 0 3 2 0 ; & # 2 2 9 0 9 ; instead of funky chars."
This might be best tackled in your scripting language rather than in MySQL. If using PHP you might be able to use htmlentities() for this purpose.
If you are trying to go to UTF8 after you are already using another encoding, try these steps:
Run ALTER TABLE tablename CONVERT TO CHARACTER SET UTF8 on your tables
Configure MySQL to default to UTF8 or just run the SET NAMES UTF8 query once you establish a database connection. You only need to run it once per connection since it sets the connection to UTF8.
Output a UTF8 header before delivering content to the browser. header('Content-Type: text/html; charset=utf-8');
Include the UTF8 content-type meta tag on your page.
meta http-equiv="Content-Type" content="text/html; charset=utf-8"
If you do all those steps, everything should work fine.
User N before your values. Like:
mysql_query ("insert into ".$mysql_table_prefix."links ( url, title) values ( N'$url ', N'$title');
Try to tell mysql to use utf before executing any query. You can put this query in any file that is used in all files. replace "utf8_swedish_ci" with your collation
$sql='SET NAMES "utf8" COLLATE "utf8_swedish_ci"';
mysql_query($sql);
Spent hours on this now and could use some help! Our website queries our db - table columns are set to Latin1 collation, website has set names to UTF8 for queries.
The data is French and when we do a search for a string including accented characters we get the "Illegal mix of collations (latin1_swedish_ci,IMPLICIT) and (utf8_general_ci,COERCIBLE) for operation 'like'" error.
If you navigate to a page which loads the data completely it shows accented characters no problem, it is just when using the search function that it breaks.
We have tried a number of methods including ALTER TABLE t1 CHANGE c1 c1 BLOB; ALTER TABLE t1 CHANGE c1 c1 TEXT CHARACTER SET utf8;
but this damages the data: the text before the first accented character in each field is fine, but then the rest of the text in the field is completely dropped: 'Métal' becomes 'M'. I am using phpMyAdmin to try and fix this BTW. Not sure if that's a problem.
So is the data UTF8 encoded? If so, why does the ALTER TABLE not work, I've seen it mentioned as THE way to fix this problem on so many webpages! If the fact it doesn't work means that the data is not UTF8 encoded, how do I find out what it is?
Having a different encoding between your website and your database is not a very good idea.... to avoid this problem, it's better to have everything in utf8. Though, it should be possible to convert the encoding of your tables playing with collations.