Stored non-English characters, got '?????' - MySQL Character Set issue

Stored non-English characters, got '?????' - MySQL Character Set issue - php

My site that I am working on is in Farsi and all the text are being displayed as ????? (question marks).
I changed the collation of my DB tables to UTF8_general_ci but it still shows ???
I ran the following script to change all the tables but this did not work as well.
I want to know what am I doing wrong
<?php
// your connection
mysql_connect("mysql.ord1-1.websitesettings.com","user_name","pass");
mysql_select_db("895923_masihiat");
// convert code
$res = mysql_query("SHOW TABLES");
while ($row = mysql_fetch_array($res))
{
foreach ($row as $key => $table)
{
mysql_query("ALTER TABLE " . $table . " CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci");
echo $key . " => " . $table . " CONVERTED<br />";
}
}
?>

Bad news. But first, double check:
SELECT col, HEX(col)...
to see what is in the table. If the hex shows 3F, then the data is gone. Correctly stored, the dal character should be hex D8AF; hah is hex D8AD.
What happened:
you had utf8-encoded data (good)
SET NAMES latin1 was in effect (default, but wrong)
the column was declared CHARACTER SET latin1 (default, but wrong)
As you INSERTed the data, it was converted to latin1, which does not have values for Farsi characters, so question marks replaced them.
The cure (for future `INSERTs):
Recode your application using mysqli_* interface instead of the deprecated mysql_* interface.
utf8-encoded data (good)
mysqli_set_charset('utf8')
check that the column(s) and/or table default are CHARACTER SET utf8
If you are displaying on a web page, <meta...utf8> should be near the top.
The discussion above is about CHARACTER SET, the encoding of characters. Now for a tip on COLLATION, which is used for comparing and sorting.
If you want these to be treated equal: 'بِسْمِ' = 'بسم', then use utf8_unicode_ci (instead of utf8_general_ci) for the COLLATION.

Related

Enable hebrew on mysql database [wamp]

I am trying to insert hebrew values to my MySQL database, but the output is only strange chars, or question marks(???? ???) or empty rectangles ▯▯▯▯ .
I've tried to change the collation and the charset to utf8, but it dont helped so much.
When I use the command "show variables like 'char%' :
everything is utf8 (except character_set_filesystem --> binary of course)
By the way, I am using WAMP Server.
How can I fix it and use hebrew on mysql database?
Thank you .

Set the mysqli charset prior to insert or select, i.e.:
mysqli_set_charset($con,"utf8");
and set your table fields to :
Charset Set: utf8
Collation: utf8_general_ci
Update based on you comment:
Your string is json encoded to decode it use:
$string = '{"name":"\u05d1\u05d9\u05d2 \u05d1\u05d5\u05e8\u05d2\u05e8"}';
print_r( json_decode($string));
OUTPUT:
stdClass Object
(
[name] => ביג בורגר
)

JSON creating from PHP giving wrong data?

I have one php form where i used to enter data to database(phpmyadmin), and i used SELECT query to display all values in database to view in php form.
Also i have another PHP file which i used to create JSON from the same db table.
Here when i enter foreign languages like "Experiența personală:" the value getting saved in DB is "ExperienÈ›a personalÄƒ: " but when i use select query to display this in same php form it coming correctly "Experiența personală:". So the db is correct and now am using following php code to create JSON
<?php
$servername = "localhost";
$username = "root";
$password = "root";
$dbname = "aaps";
// Create connection
$con=mysqli_connect($servername,$username,$password,$dbname);
// Check connection
mysqli_set_charset($con, 'utf8');
//echo "connected";
$rslt=mysqli_query($con,"SELECT * FROM offers");
while($row=mysqli_fetch_assoc($rslt))
{
$taxi[] = array('code'=> $row["code"], 'name'=> $row["name"],'contact'=> $row["contact"], 'url'=> $row["url"], 'details'=> $row["details"]);
}
header("Content-type: application/json; charset=utf-8");
echo json_encode($taxi);
?>
and JSON looks like
[{"code":"CT1","name":"Experien\u00c8\u203aa personal\u00c4\u0192: ","contact":"4535623643","url":"images\/offers\/event-logo-8.jpg","details":"Experien\u00c8\u203aa personal\u00c4\u0192: jerhbehwgrh 234234 hjfhjerg#$%$#%#4"},{"code":"ewrw","name":"Experien\u00c8\u203aa personal\u00c4\u0192: ","contact":"ewfew","url":"","details":"eExperien\u00c8\u203aa personal\u00c4\u0192: Experien\u00c8\u203aa personal\u00c4\u0192: Experien\u00c8\u203aa personal\u00c4\u0192: "},{"code":"Experien\u00c8\u203aa personal\u00c4\u0192: ","name":"Experien\u00c8\u203aa personal\u00c4\u0192: ","contact":"","url":"","details":"Experien\u00c8\u203aa personal\u00c4\u0192: "}]
In this "\u00c8\u203aa" this is wrong it supposed to be "\u021b" (t).
So pho used to creating JSON making this issue.
But am unable to find exactly why its coming like this . please help

Avoid Unicode -- note the extra argument:
json_encode($s, JSON_UNESCAPED_UNICODE)
Don't use utf8_encode/decode.
ă turning into Äƒ is Mojibake. It probably means that
The bytes you have in the client are correctly encoded in utf8 (good).
You connected with SET NAMES latin1 (or set_charset('latin1') or ...), probably by default. (It should have been utf8.)
The column in the tables may or may not have been CHARACTER SET utf8, but it should have been that.
If you need to fix for the data it takes a "2-step ALTER", something like
ALTER TABLE Tbl MODIFY COLUMN col VARBINARY(...) ...;
ALTER TABLE Tbl MODIFY COLUMN col VARCHAR(...) ... CHARACTER SET utf8 ...;
Before making any changes, do
SELECT col, HEX(col) FROM tbl WHERE ...
With that, ă should show hex of C483. If you see C384C692, you have "double-encoding", which is messier to fix.

Depending on the version of MySql in the database, it may not be using the full utf-8 set, as stated in the documentation:
The ucs2 and utf8 character sets do not support supplementary characters that lie outside the BMP. Characters outside the BMP compare as REPLACEMENT CHARACTER and convert to '?' when converted to a Unicode character set.
This, however, is not likely to be related to your problem. I would try a couple of different things and see if it solves your problem.
use SET NAMES utf-8
You can read more about that here
use utf8_encode() when inserting data to the database, and utf8_decode() when extracting. That way, you don't have to worry about MySql manipulating the unicode characters. Documentation

How to support emojis with flourish?

I am using flourishlib for a website. My client requested that we should be able to use emojis with mobile phones. In theory we should change the character-encoding from utf8 to utf8mb4 for the MySQL database.
So far, so good, however, if we make this switch, like this:
# For each database:
ALTER DATABASE database_name CHARACTER SET = utf8mb4 COLLATE utf8mb4_unicode_ci;
# For each table:
ALTER TABLE table_name CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
# For each column:
ALTER TABLE table_name CHANGE column_name column_name VARCHAR(191) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
# (Don’t blindly copy-paste this! The exact statement depends on the column type, maximum length, and other properties. The above line is just an example for a `VARCHAR` column.)
Then each character will use four bytes instead of three bytes. This would increase the database's size with 33%. This would result in worse performance and more storage space used up. So, as a result, we have decided to switch to an encoding of utf8mb4 for only specific columns of specific tables.
To make sure everything is all right, I have checked several things. Among them, I have checked flourishlib and found a few suspect parts:
There is an fUTF8 class, which does not seem to support utf8mb4
At fDatabase I am quoting some findings:
if ($this->connection && function_exists('mysql_set_charset') && !mysql_set_charset('utf8', $this->connection)) {
throw new fConnectivityException(
'There was an error setting the database connection to use UTF-8'
);
}
//...
// Make MySQL act more strict and use UTF-8
if ($this->type == 'mysql') {
$this->execute("SET SQL_MODE = 'REAL_AS_FLOAT,PIPES_AS_CONCAT,ANSI_QUOTES,IGNORE_SPACE'");
$this->execute("SET NAMES 'utf8'");
$this->execute("SET CHARACTER SET utf8");
}
At fSQLSchemaTranslation I can see this:
$sql = preg_replace('#\)\s*;?\s*$#D', ')ENGINE=InnoDB, CHARACTER SET utf8', $sql);
I have the suspicion that flourishlib will not support our quest of making a few columns of a few table have a character encoding of utf8mb4. I wonder whether we can upgrade something somehow to make this support. As a worst-case scenario, we can override every textual occurrence of utf8 to utf8mb4. However, that would be a very ugly hack and we wonder whether there is a better solution. Should we make this hack or is there a more orthodox approach?

I have resolved the issue. I have altered the tables where I wanted to support emojis by changing the column character set and collation, like this:
ALTER TABLE table_name CHANGE column_name column_name text CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
After that, I had to make a few ugly hacks to make flourishlib able to support emojis.
fDatabase.php:
line 685:
if ($this->connection && function_exists('mysql_set_charset') && !mysql_set_charset('utf8mb4', $this->connection)) {
throw new fConnectivityException(
'There was an error setting the database connection to use UTF-8'
);
}
line 717 stays the same, everything crashes if this line is changed:
if ($this->connection && function_exists('mysqli_set_charset') && !mysqli_set_charset($this->connection, 'utf8')) {
line 800:
// Make MySQL act more strict and use UTF-8
if ($this->type == 'mysql') {
$this->execute("SET SQL_MODE = 'REAL_AS_FLOAT,PIPES_AS_CONCAT,ANSI_QUOTES,IGNORE_SPACE'");
$this->execute("SET NAMES 'utf8mb4'");
$this->execute("SET CHARACTER SET utf8mb4");
}
fSQLSchemaTranslation.php:
line 1554:
$sql = preg_replace('#\)\s*;?\s*$#D', ')ENGINE=InnoDB, CHARACTER SET utf8mb4', $sql);
fXML.php:
line 403:
if (preg_replace('#[^a-z0-9]#', '', strtolower($encoding)) == 'utf8mb4') {
// Remove the UTF-8 BOM if present
$xml = preg_replace("#^\xEF\xBB\xBF#", '', $xml);
fCore::startErrorCapture(E_NOTICE);
$cleaned = self::iconv('UTF-8', 'UTF-8', $xml);
if ($cleaned != $xml) {
$xml = self::iconv('Windows-1252', 'UTF-8', $xml);
}
fCore::stopErrorCapture();
}
and finally, when there are modifications for any of the columns affected, I execute this:
App::db()->query("set names 'utf8mb4'");
which, essentially triggers the ->query() execution of an fDatabase object.

increase the database's size with 33%.
Not true. English letters still take 1 byte each. What you gain with utf8mb4 is the ability to store emoji and some Chinese characters.
You shouldn't need to ALTER ... CHANGE the columns. Except that you probably had a canned VARCHAR(255) which has issues. Don't simply switch to 191, switch to a 'reasonable' number for each column. Or do nothing. The 191 comes only from an INDEX limitation. You are no indexing every column, are you?
fUTF8 class, which does not seem to support
Complain to flourishlib. Or abandon it. (Too many questions in these forums are complaints about inadequate 3rd party packages, not MySQL, itself.)
You might be able to change to utf8mb4 in MySQL and let flourishlib be oblivious to the change. Technically speaking, MySQL's utf8mb4 matches the rest of the world's concept of utf8; MySQL's utf8 is an incomplete implementation.
$this->execute("SET NAMES 'utf8'");
If you can see this code, you can change it.

SELECT from data_base utf-8_bin data purified

All I want is to SELECT an utf8_bin string stored in an table with 2 rows
1 id and 2 continut first is an int(20) NOT NULL AUTO INCREMENT and second is VARCHAR(2000) utf8_bin NOT NULL used for Romanian language inserts.
The data is stored correct whe i acces it from phpmyadmin, but on echo it returns strange characters instead romanian diacritics.
$sqlis = "SELECT continut FROM cantari WHERE id = {$id}";
$dbh->query($sqlis);
foreach ($dbh->query($sqlis) as $liniie);
$continut = $liniie['continut'];
This is the result: Cau?i r�uri ?i izvoare
$continut is my data from sql. I've setted meta on file as utf8 <meta charset="utf-8"> in header's content.
Can htaccess help me or css? or how to replace that elements with the normal Romanian diacritics?

Have you set the character set to UTF-8 on the Mysql connection?
For example:
$dbh->set_charset("utf8")
Without it, the data returned from MySQL will be translated to Cp1252. Any chars that don't fit in Cp1252 will be set to "?".

$sqlis = "SELECT continut FROM cantari WHERE id = {$id}; SET NAMES 'utf8' ";
So all we need is to specify in SQL server side that we will use utf8 characters.

how to detect and fix character encoding in a mysql database via php?

I have received this database full of people names and data in French, which means, using characters such as é,è,ö,û, etc. Around 3000 entries.
Apparently, the data inside has been encoded sometimes using utf8_encode(), and sometimes not. This result in a messed up output: at some places the characters show up fine, at others they don't.
At first i tried to track down every place in the UI where those issues arise and use utf8_decode() where necessary, but it's really not a practicable solution.
I did some testing and there is no reason to use utf8_encode in the first place, so i'd rather remove all that and just work in UTF8 everywhere - at the browser, middleware and database levels. So i need to clean the database, converting all misencoded data by its cleaned up version.
Question : would it be possible to create a function in php that would check if a utf8 string is correctly encoded (without utf8_encode) or not (with utf8_encode), and, if it was, convert it back to its original state?
In other terms: i would like to know how i could detect utf8 content that has been utf8_encode() to utf8 content that has not been utf8_encode()d.
**UPDATE: EXAMPLE **
Here is a good example: you take a string full of special chars and take a copy of that string and utf8_encode() it. The function i'm dreaming of takes both strings, leaves the first one untouched and the second string is now the same as string one.
I tried this:
$loc_fr = setlocale(LC_ALL, 'fr_BE.UTF8','fr_BE#euro', 'fr_BE', 'fr', 'fra', 'fr_FR');
$str1= "éèöûêïà ";
$str2 = utf8_encode($str1);
function convert_charset($str) {
$charset= mb_detect_encoding($str);
if( $charset=="UTF-8" ) {
return utf8_decode($str);
}
else {
return $str;
}
}
function correctString($str) {
echo "\nbefore: $str";
$str= convert_charset($str);
echo "\nafter: $str";
}
correctString($str1);
echo('<hr/>'."\n");
correctString($str2);
And that gives me:
before: éèöûêïà after: �������
before: Ã©Ã¨Ã¶Ã»ÃªÃ¯Ã after: éèöûêïà
Thanks,
Alex

It's not completely clear from the question what character-encoding lens you're currently looking through (this depends on the defaults of your text editor, browser headers, database configuration, etc), and what character-encoding transformations the data has gone through. It may be that, for example, by tweaking a database configuration everything will be corrected, and that's a lot better than making piecemeal changes to data.
It looks like it might be a problem of utf8 double-encoding, and if that's the case, both the original and the corrupted data will be in utf8, so encoding detection won't give you the information you need. The approach in that case requires making assumptions about what characters can reasonably turn up in your data: as far as PHP and Mysql are concerned "Ã©" is perfectly legal utf8, so you have to make a judgement based on what you know about the data and its authors that it must be corrupted. These are risky assumptions to make if you're just a technician. Luckily, if you know the data is in French and there's only 3000 records, it's probably ok to make those kinds of assumptions.
Below is a script that you can adapt first of all to check your data, then to correct it, and finally to check it again. All it's doing is processing a string as utf8, breaking it into characters, and comparing the characters against a whitelist of expected French characters. It signals a problem if the string is either not in utf8 or contains characters that aren't normally expected in French, for example:
PROBABLY OK Côte d'Azur
HAS NON-WHITELISTED CHAR CÃ´te d'Azur 195,180 Ã´
NON-UTF8 C�e d'Azur
Here's the script, you'll need to download the dependent unicode functions from http://hsivonen.iki.fi/php-utf8/
<?php
// Download from http://hsivonen.iki.fi/php-utf8/
require "php-utf8/utf8.inc";
$my_french_whitelist = array_merge(
range(0,127), // throw in all the lower ASCII chars
array(
0xE8, // small e-grave
0xE9, // small e-acute
0xF4, // small o-circumflex
//... Will need to add other accented chars,
// Euro sign, and whatever other chars
// are normally expected in the data.
)
);
// NB, whether this string literal is in utf8
// depends on the encoding of the text editor
// used to write the code
$str1 = "Côte d'Azur";
$test_data = array(
$str1,
utf8_encode($str1),
utf8_decode($str1),
);
foreach($test_data as $str){
$questionable_chars = non_whitelisted(
$my_french_whitelist,
$str
);
if($questionable_chars===true){
p("NON-UTF8", $str);
}else if ($questionable_chars){
p(
"HAS NON-WHITELISTED CHAR",
$str,
implode(",", $questionable_chars),
unicodeToUtf8($questionable_chars)
);
}else{
p("PROBABLY OK", $str);
}
}
function non_whitelisted($whitelist, $utf8_str){
$codepoints = utf8ToUnicode($utf8_str);
if($codepoints===false){ // has non-utf8 char
return true;
}
return array_diff(
array_unique($codepoints),
$whitelist
);
}
function p(){
$args = func_get_args();
echo implode("\t", $args), "\n";
}

I think you might be taking a more compilation approach. I received a Bulgarian database a few weeks back that was dynamically encoded in the DB, but when moving it to another database I got the funky ???
The way I solved that was by dumping the database, setting the database to utf8 collation and then importing the data as binary. This auto-converted everything to utf8 and didn't give me anymore ???.
This was in MySQL

When you connect to the database remember to always use mysql_set_charset('utf8', $db_connection);
it will fix everything, it solved all my problems.
See this: http://phpanswer.com/store-french-characters-into-mysql-db-and-display/

As you said that your data is sometimes converted using utf8_encode, your data is encoded with either UTF-8 oder ISO 8859-1 (since utf8_encode converts from ISO 8859-1 to UTF-8). And since UTF-8 encodes the characters from 128 to 255 with two bytes starting with 1100001x, you just have to test if your data is valid UTF-8 and convert it if not.
So scan all your data if it already is UTF-8 (see several is_utf8 functions) and use utf8_encode if it’s not UTF-8.

my problem is that somehow I got in my database chars like these à,é,ê in plain format or utf8 encoded. After investigation I got the conclusion that some browser (I do not know IE or FF or other) is encoding the submitted input data as there was no utf8 encoding intentionally added to handling the submit forms. So, if I would read data with utf8_encode, I'll alter the other plain chars, and vice-versa.
My solution, after I studied solutions given above:
1. I created a new database with charset utf8
2. Imported the database AFTER I changed the charset definition on CREATE TABLE statement in sql dump file from Latin.... to UTF8.
3. import data from original database
(until here maybe will be enough just to change the charset on existing db and tables, and this only if original db is not utf8)
4. update the content in database directly by replacing the utf8 encoded chars with there plain format something like
UPDATE `clients` SET `name` = REPLACE(`name`,"Ã©",'é' ) WHERE `name` LIKE CONVERT( _latin1 '%é%' USING utf8 );
I put in db class (for php code) this line to make sure that their is a UTF8 communication
$this->query('SET CHARSET UTF8');
So, ho to update? (step 4)
I've built an array with possible chars that might be encoded
$special_chars = array(
'ù','û','ü',
'ÿ',
'à','â','ä','å','æ',
'ç',
'é','è','ê','ë',
'ï','î',
'ô','','ö','ó','ø',
'ü');
I've buit an array with pairs of table,field that should be updated
$where_to_look = array(
array("table_name" , "field_name"),
..... );
than,
foreach($special_chars as $char)
{
foreach($where_to_look as $pair)
{
//$table = $pair[0]; $field = $pair[1]
$sql = "SELECT id , `" . $pair[1] . "` FROM " .$pair[0] . " WHERE `" . $pair[1] . "` LIKE CONVERT( _latin1 '%" . $char . "%' USING utf8 );";
if($db->num_rows() > 0){
$sql1 = "UPDATE " . $pair[0] . " SET `" . $pair[1] . "` = REPLACE(`" . $pair[1] . "`,CONVERT( _latin1 '" . $char . "' USING utf8 ),'" . $char . "' ) WHERE `" . $pair[1] . "` LIKE CONVERT( _latin1 '%" . $char . "%' USING utf8 )";
$db1->query($sql1);
}
}
}
The basic ideea is to use encoding features of mysql to avoid encoding done between mysql, apache, browser and back;
NOTE: I had not avaiable php functions like mb_....
Best

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Stored non-English characters, got '?????' - MySQL Character Set issue - php

Related

Enable hebrew on mysql database [wamp]

JSON creating from PHP giving wrong data?

How to support emojis with flourish?

SELECT from data_base utf-8_bin data purified

how to detect and fix character encoding in a mysql database via php?

Categories

Resources