PHP / MySQL special characters issue - php

I'm facing an issue with the special characters. I'm taking information from a DB in MSSQL which returns in php a value which may contain specials characters like "à é ö ü" etc. In my sample, I will use the city name of Zürich and when I try to insert this information into a MySQL database, I get the following error :
"Incorrect string value: '\xFCrich ...' for column..."
so, I've done the following but it still showing the same error message:
$arrSearch = array('\xE4','\xF6','\xFC','\xC4','\xD6','\xDC','\xDF');
$arrReplace = array('ä','ö','ü','Ä','Ö','Ü','ß',);
$City=str_replace($arrSearch, $arrReplace, $City);
If I do an echo of $City, I get the following :
Z�rich (rectangular block)
I've tried as well hex2bin() but I just get a white page and nothing is inserted into Database. FYI, DB collation is in utf8mb4_general_ci and setlocale(LC_ALL, 'en_EN') is set in php file. All php files are encoded into UTF8 and chatset is set as follow : mysql_set_charset('utf8mb4',$link);
I must admit, I'm a bit lost. Does anyone has a clue on how to fix this?
Thanks.
EDIT: The server hosting this app is running under 2008R2/IIs 7.5 and I've found this KB by Microsoft. I'll try the hotfix and the registry modification but it didnt work. http://support.microsoft.com/kb/2277918/

Set the character set to utf8.

Ok got it! Was so stupid.... I'm using FPDF with that insertion and to show special characters properly in FPDF, I had to set an iconv('UTF-8', $charset, $_REQUEST['City']);
Sorry and thanks again for assistance! now works like a charm

Related

Correct "Incorrect string value" in MySQL without amending the databases encoding

PHP: 7.2.5
Laravel: 7.25
We have a bug where a very small number of users are trying to insert copy with the '􏰀' character included. I'm assuming this is because of a copy and paste from a PDF, I have seen them before with line breaks. This produces the following error:
SQLSTATE[HY000]: General error: 1366 Incorrect string value: '\xF4\x8F\xB0\x80</...' for column 'body' at row 1 (SQL: update `post` set `body` = <p>􏰀</p>, `body_raw` = 􏰀, `post`.`updated_at` = 2020-10-06 10:34:22 where `id` = 1)
Character '􏰀':
Decimal Character Codes: 56319, 56320
Hexadecimal Character Codes: 0xdbff, 0xdc00
HTML with named character references: ? ?
Looking at Google, a suggestion is that you could update the DB encoding from utf8 to utf8mb4. This is probably the optimal solution, but we have a large database and I'm uneasy amending the encoding (though this may be very safe). I'm concerned about possible data loss/corruption.
As this issue is only appearing on this 1 character in our bug system, and its 100% not required, I'm inclined to just remove it before saving it in the database, to create the minimum changes.
I'm inclined to do the following:
str_replace("􏰀","", $post);
But if I paste the character '􏰀' into any of my code editors it disappears (I assuming utf8 encoding). What would the best way to accomplish this?
With great help from #04FS (thanks). I have found a solution. As mentioned, I think the database utf8 to utf8mb4 fix is probably the best route here. But as not to amend the database, here is the solution I have found.
The main confusing issue here is with the character "􏰀". As I can not enter it into my text editors it was hard to work with. So I relied on 3rd party sites to encode it. One suggestion was to use char() to be able to write and match the character. But on 2 different websites, the character code came out both char(111) and char(244). With char(244) I was able to use str_replace, but it only created a partial replacement and broke the SQL query.
#04FS mentioned trying urlencode() which gave me '%F4%8F%B0%80' for that character. This matches the database error. So the following solution works correctly:
private function removeSpecialCharacters($str) {
$str = str_replace(urldecode('%F4%8F%B0%80'), '', $str);
return $str;
}

Data in MySQL database doesn't show correctly in website

I am trying to translate a English website to Persian. problems i was facing was :
website were loading in Latin Unicode, so I had to change the charset to utf-8 so contents show correctly in Persian
data in MySQL database are not correctly shown in website probably cause of the Unicode problem
What I have done:
<?php ini_set('default_charset','utf-8'); header('Content-type: text/html; charset=utf-8'); ?>
by this , problem #1 fixed
but for problem number 2 i still facing the issue, although i have altered the tables to use utf 8 , but problem still persists. I gladly like to see how anyone can help me with this.
function bbcode ($str) {
//$str = htmlentities($str);
$token = array(
"'\[b\](.*?)\[/b\]'is",
'/\[i\](.*?)\[\/i\]/is',
'/\[u\](.*?)\[\/u\]/is',
'/\[url\=(.*?)\](.*?)\[\/url\]/is',
'/\[url\](.*?)\[\/url\]/is',
'/\[img\](.*?)\[\/img\]/is',
'/\[mail\=(.*?)\](.*?)\[\/mail\]/is',
'/\[mail\](.*?)\[\/mail\]/is',
'/\[font\=(.*?)\](.*?)\[\/font\]/is',
'/\[size\=(.*?)\](.*?)\[\/size\]/is',
'/\[color\=(.*?)\](.*?)\[\/color\]/is',
"':big_smile:'is",
"':cool:'is",
"':hmm:'is",
"':lol:'is",
"':mad:'is",
"':neutral:'is",
"':roll:'is",
"':sad:'is",
"':smile:'is",
"':tongue:'is",
"':wink:'is",
"':yikes:'is",
"':bull:'is",
'/\[item\=(.*?)\](.*?)\[\/item\]/is',
'/\[spell\=(.*?)\](.*?)\[\/spell\]/is',
"':warrior:'is",
"':paladin:'is",
"':hunter:'is",
"':rogue:'is",
"':priest:'is",
"':dk:'is",
"':shaman:'is",
"':mage:'is",
"':warlock:'is",
"':druid:'is",
"'\[ul\](.*?)\[/ul\]'is",
"'\[ol\](.*?)\[/ol\]'is",
"'\[li\](.*?)\[/li\]'is",
);
thanks alot in advance
Sorry, my reply wasn't clear enough. I was almost sleep. The databases are empty, so I don't have to convert anything, but when I am inserting data into them, the data doesn't appear correctly. BTW, I'm not good with php or mysql; I am reading these articles and suggestions for hours and I'm just getting more confused. Can you just tell me where should I enter the code and what code,
$link = mysql_connect("localhost","UserName","Password") or die(mysql_error());
mysql_set_charset("utf8",$link);
mysql_select_db("DataBase Name") or die(mysql_error());
I guess the thing I found out from these articles is to add the mysql_set_charset("utf8",$link) part to the above code while the server tries to connect to db, but I have tried that and its not working. My website uses includes so thats like this:
include("../../config/config.php");
$connect = mysql_connect("$db_host", "$db_user", "$db_pass")or die(mysql_error());
mysql_set_charset("utf8",$link);
Assuming you've correctly converted the data in your tables to UTF-8 (just changing the character set is not enough), it sounds like you might be having problems with the connection not being set up as UTF-8. Have a look at SET NAMES, and more specifically this question.
If you're not sure you've converted your data to UTF-8, I'd have a look at this question as well as this Wordpress article and make sure you've followed the steps.

PHP/MySQL encoding problems. � instead of certain characters

I have come across some problems when inputting certain characters into my mysql database using php. What I am doing is submitting user inputted text to a database. I cannot figure out what I need to change to allow any kind of character to be put into the database and printed back out through php as it's suppose to.
My MySQL collation is: latin1_swedish_ci
Just before I send the text to the database from my form I use mysql_real_escape_string() on the data.
Example below
this text:
�People are just as happy as they make up their minds to be.�
� Abraham Lincoln
is suppose to look like this:
“People are just as happy as they make up their minds to be.”
― Abraham Lincoln
As mentioned by others, you need to convert to UTF8 from end to end if you want to support "special" characters. This means your web page, PHP, mysql connection and mysql table. The web page is fairly simple, just use the meta tag for UTF8. Ideally your headers would say UTF8 also.
<meta http-equiv="content-type" content="text/html; charset=utf-8" />
Set your PHP to use UTF8. Things would probably work anyway, but it's a good measure to do this:
mb_internal_encoding('UTF-8');
mb_http_output('UTF-8');
mb_http_input('UTF-8');
For mysql, you want to convert your table to UTF8, no need to export/import.
ALTER TABLE table_name CONVERT TO CHARACTER SET utf8
You can, and should, configure mysql to default utf8. But you can also run the query:
SET NAMES UTF8
as the first query after establishing a connection and that will "convert" your database connection to UTF8.
That should solve all your character display problems.
The likeliest cause of the problem is that the database connection is set to latin1 but you are feeding it text encoded in UTF-8. The simplest way to solve this is to convert your input into what the client expects:
$quote = iconv("UTF-8", "WINDOWS-1252//TRANSLIT", $quote);
(What MySQL calls latin1 is windows-1252 in the rest of the world.) Note that many characters, such as the quotation dash U+2015 that you use there, cannot be represented in this encoding and will be converted into something else. Ideally you should change the column encoding to utf8.
An alternative solution: set the database connection to utf8. It doesn't matter how the columns are encoded: MySQL internally converts text from the connection encoding into the storage encoding, you can keep the columns as latin1 if you want to. (If you do, the quotation dash U+2015 will be turned into a question mark ? because it's not in latin1)
How to set the connection encoding depends on what library you are using: if you use the deprecated MySQL library it's mysql_set_charset, if MySQLi it's mysqli_set_charset, if PDO add encoding=utf8 to the DSN.
If you do this you'll have set the page encoding to UTF-8 with the Content-Type header.
Otherwise you would be having the same problem with the browser: feeding it text encoded in UTF-8 when it's expecting something else:
header("Content-Type: text/html; charset=utf-8");
The solutions provided are helpful if starting from scratch. Putting all possible connections to UTF-8 is indeed the safest. UTF-8 is the most used charset on the net for a variety of reasons.
Some suggestions and a word of warning:
copy the tables you want to sanitize with a unique prefix (tmp_)
although your db-connection is forced to utf8, check you General Settings collation, change to utf8_bin if that was not done yet
you need to run this on the local server
the funny char error is mostly due to mixing LATIN1 with UTF-8 configurations. This solution is designed for this. It could work with other used char-sets that LATIN1 but I haven't checked this
check these tmp_tables extensively before copying back to the original
Builds the 2 array needed for the magic:
$chars = get_html_translation_table(HTML_ENTITIES, ENT_QUOTES, "UTF-8");
$LATIN1 = $UTF8 = array();
while (list($key,$val) = each ($chars)) {
$UTF8[] = $key;
$LATIN1[] = $val;
}
Now build up the routines you need: (tables->)rows->fields and at each field call
$row[$field] = mysql_real_escape_string(str_replace($LATIN1 , $UTF8 , $row[$field]));
$q[] = "$field = '{$row[$field]}'";
Finally build up and send the query:
mysql_query("UPDATE $table SET " . implode(" , " , $q) . " WHERE id = '{$row['id']}' LIMIT 1");
change the MySQL collation to utf8_unicode_ci or utf8_general_ci, including the table and the database.
You will need to set your database in utf-8 yes. There is many ways to do it. By changin the config file, via phpmyadmin or by calling php function (sorry memory blank) right before insert and update the mysql.
Unfortunately, i think you will have to re-enter any data you entered before.
One thing you also need to know, from personnal experience, make sure all table with relation have the same collation or you won'T be able to JOIN them.
as reference: http://dev.mysql.com/doc/refman/5.6/en/charset-syntax.html
Also, i can be a apache setting. We've experienced the same issue on 'free-hosting' server as well as on my brother's server. Once switched to another server, all the charater's became neat. Verfiy you apache setting, sorry but i can't bting more light on apache's config.
Get rid of everything you just need to follow these two points, every problem regarding special languages characters will be resolved.
1- You need to define the collation of your table to be utf8_general_ci.
2- define <meta http-equiv="content-type" content="text/html; charset=utf-8"> in the HTML after head tag.
2- You need to define the mysql_set_charset('utf8',$link_identifier); in the file where you made connection with the database and right after the selection of database like 'mysql_select_db' use this 'mysql_set_charset' this will allow you to add and retrieve data properly in what ever the language it is.
If your text has been encoded and decoded with the wrong encoding and so the mojibake is actually "solidified" into unicode characters, then the solutions mentioned so far won't work. I ended up having success with the ftfy Python package to automatically detect/fix mojibake:
https://github.com/LuminosoInsight/python-ftfy
https://pypi.org/project/ftfy/
https://ftfy.readthedocs.io/en/latest/
>>> import ftfy
>>> print(ftfy.fix_encoding("(ง'⌣')ง"))
(ง'⌣')ง
Hopefully this helps people who are in a similar situation.

Problem with utf-8

I have a problem with utf-8. I use the framework Codeigniter. For a client i have to
convert a CSV file to a database. But when i add the data trough a query to the database
the is a problem. Some characters doesn,t work. For example this word: Eén. When i add this word at PhpMyadmin, it's right.
When i try trought Codeigniter query, it doesn't.
My database stands on Utf-8. The Codeigniter config is utf-8. The database config is on utf-8.
Here is the query:
$query = "INSERT INTO lds_leerdoel(id,leerdoel,kind_omschrijving,cito,groep_id,OCW,opbouw,
kerngebied_id,jaar_maand,KVH,craats,refnivo,toelichting,auteur)
VALUES
(
'".$this->db->escape_str($id)."',
'".$leerdoel."',
'".$this->db->escape_str($kind_omschrijving)."',
'".$this->db->escape_str($cito)."',
'".$this->db->escape_str($groep_id)."',
'".$this->db->escape_str($OCW)."',
'".$this->db->escape_str($opbouw)."',
'".$this->db->escape_str($kerngebied_id)."',
'".$this->db->escape_str($jaar_maand)."',
'".$this->db->escape_str($KVH)."',
'".$this->db->escape_str($craats)."',
'".$this->db->escape_str($refnivo)."',
'".$this->db->escape_str($toelichting)."',
'".$this->db->escape_str($auteur)."'
)";
$this->db->query($query);
The problem is the field leerdoel. Does somebody a solution. Thank you verry much!!
Greetings,
Jelle
You'll need to run this query before the insert query
"SET NAMES utf8"
Shouldn't you use a national character string literal? http://dev.mysql.com/doc/refman/5.0/en/charset-national.html
Meaning that you would write:
$query = "INSERT INTO lds_leerdoel(id,leerdoel,kind_omschrijving,cito,groep_id,OCW,opbouw,
kerngebied_id,jaar_maand,KVH,craats,refnivo,toelichting,auteur)
VALUES
(
'".$this->db->escape_str($id)."',
N'".$leerdoel."',
-- rest of query omitted
Try to convert the text to Unicode with iconv():
iconv( "ISO-8859-1", "UTF-8", $leerdoel );
You might need to experiment a little if you don't know what encoding the file uses. (I think ISO-8859-1 or ISO-8859-15 are the most common.)
Try adding this to your header
header('Content-Type: text/html; Charset=UTF-8');
Also check the encoding settings of your editor.
I had a similar problem and i solve adding this in the beginning of my PHP file:
ini_set('default_charset', 'UTF-8');
mb_internal_encoding('UTF-8');
Additionally, is very important to check if you are saving your PHP file in UTF-8 format without BOM, i had a big headache with this. I recomend Notepad++, it shows the current file encoding and allow you to convert to UTF-8 without BOM if necessary.
If you would like to see my problem and solution, it is here.
Hope it can help you!

Problem with storing german words into the MySql DB....?

I am facing an small issue in my project When I am trying to store some German words into the MYSQL Database. When this German words contains umlauts i.e. characters ä, ö, ß, ü etc., they are not stored as they are.....?
I want to store them as it is into the Database.To do so I tried to change the COLLATION to UTF8-general-ci, and others in the list using PHP myAdmin. But none of them is working for me.
Am I in the right way or I have to do something else.
Please suggest some help.
Thanks In Advance......
You have to choose the right transfer encoding either. Call
SET NAMES utf8
before inserting the data and make sure that the german words are utf8-encoded before inserting.
Try to use utf8_encode($string) to encode your text into UTF8 first, before saving it into the database. In order for characters to display correctly in a certain language, you have to (1) set the text into the right charset and then also (2) set a database to the right charset (as you did).
Also, for example, file display.php will output the German text, you can open the file in any editors (EmEditor?) and then "save as", choose a right encoding scheme. After that, the display file, when outputting the text, will take care of the charset.
years ago I've faced the same problem. I've solved it by implicit setting NAMES option for mysql. In my code it looks like this:
//inside AbstractMapper class
public function __construct($modelClass, $dbTable) {
$this->setDbTable($dbTable);
$stmt = new Zend_Db_Statement_Pdo($this->getDbTable()->getAdapter(), 'set names utf8');
$stmt->execute();
$this->_model_class = $modelClass;
}
After connecting to the database, use the following codes:
SET NAMES XXX
replace XXX with your working charset.

Categories