Convert latin1 characters on a UTF8 table into UTF8 - php

Only today I realized that I was missing this in my PHP scripts:
mysql_set_charset('utf8');
All my tables are InnoDB, collation "utf8_unicode_ci", and all my VARCHAR columns are "utf8_unicode_ci" as well. I have mb_internal_encoding('UTF-8'); on my PHP scripts, and all my PHP files are encoded as UTF-8.
So, until now, every time I "INSERT" something with diacritics, example:
mysql_query('INSERT INTO `table` SET `name`="Jáuò Iñe"');
The 'name' contents would be, in this case: Jáuò Iñe.
Since I fixed the charset between PHP and MySQL, new INSERTs are now storing correctly. However, I want to fix all the older rows that are "messed" at the moment. I tried many things already, but it always breaks the strings on the first "illegal" character. Here is my current code:
$m = mysql_real_escape_string('¿<?php echo "¬<b>\'PHP á (á)ţăriîş </b>"; ?> ă-ţi abcdd;//;ñç´พดแทฝใจคçăâξβψδπλξξςαยนñ ;');
mysql_set_charset('utf8');
mysql_query('INSERT INTO `table` SET `name`="'.$m.'"');
mysql_set_charset('latin1');
mysql_query('INSERT INTO `table` SET `name`="'.$m.'"');
mysql_set_charset('utf8');
$result = mysql_iquery('SELECT * FROM `table`');
while ($row = mysql_fetch_assoc($result)) {
$message = $row['name'];
$message = mb_convert_encoding($message, 'ISO-8859-15', 'UTF-8');
//$message = iconv("UTF-8", "ISO-8859-1//IGNORE", $message);
mysql_iquery('UPDATE `table` SET `name`="'.mysql_real_escape_string($message).'" WHERE `a1`="'.$row['a1'].'"');
}
It "UPDATE"s with the expected characters, except that the string gets truncated after the character "ă". I mean, that character and following chars are not included on the string.
Also, testing with the "iconv()" (that is commented on the code) does the same, even with //IGNORE and //TRANSLIT
I also tested several charsets, between ISO-8859-1 and ISO-8859-15.

From what you describe, it seems you have UTF-8 data that was originally stored as Latin-1 and then not converted correctly to UTF-8. The data is recoverable; you'll need a MySQL function like
convert(cast(convert(name using latin1) as binary) using utf8)
It's possible that you may need to omit the inner conversion, depending on how the data was altered during the encoding conversion.

After I searched about an hour or two for this answer, I needed to migrate an old tt_news db from typo into a new typo3 version. I tried to convert the charset in the export file and import it back already, but didn't get it working.
Then I tried the answer above from ABS and started an update on the table:
UPDATE tt_news SET
title=convert(cast(convert(title using latin1) as binary) using utf8),
short=convert(cast(convert(short using latin1) as binary) using utf8),
bodytext=convert(cast(convert(bodytext using latin1) as binary) using utf8)
WHERE 1
You can also convert imagecaption, imagealttext, imagetitletext and keywords if needed.
Hope this will help somebody migrating tt_news to new typo3 version.

the way is better way
use connection tow you database normal
then use this code to make what you need
you must make your page encoding utf-8 by meta in header cod html (dont forget this)
then use this code
$result = mysql_query('SELECT * FROM shops');
while ($row = mysql_fetch_assoc($
$name= iconv("windows-1256", "UTF-8", $row['name']);
mysql_query("SET NAMES 'utf8'");
mysql_query("update `shops` SET `name`='".$name."' where ID='$row[ID]' ");
}

I highly recommend using 'utf8mb4' instead of 'utf8', since utf8 cannot store some chinese characters and emojis.

Related

Solving UTF8 & french accents incompatibility

I have a PHP script which saves user content into a mysql database (PHP 5.4, mysql 5.5.31)
All string-related fields in my database have utf8_unicode_ci as collation.
My (simplified) code looks like this:
$db_handle = mysql_connect('localhost', 'username', 'password');
mysql_select_db('my_db');
mysql_set_charset('utf8', $db_handle);
// ------ INSERT: First example -------
$s = "je viens de télécharger et installer le logiciel";
$sql = "INSERT INTO my_table (post_id, post_subject, post_text) VALUES (1, 'subject 1', '$s')";
mysql_query($sql, $db_handle);
// ------ INSERT: Second example -------
$s = "EPrints and العربية";
$sql = "INSERT INTO my_table (post_id, post_subject, post_text) VALUES (2, 'subject 2', '$s')";
mysql_query($sql, $db_handle);
// -------------
mysql_close($db_handle);
The problem is, the first insert (latin text with the é accents) fails unless I comment this line:
mysql_set_charset('utf8', $db_handle);
But the second query (mix of latin & arabic content) will fail unless I call mysql_set_charset('utf8', $db_handle);
I've been struggling with this for 2 days now. I thought UTF8 does support characters like the french accents, but obviously it doesn't!
How can I fix this?
mysql_set_charset('utf8', $db_handle) tells the database that the data you're going to send it will be encoded in UTF-8. If the result is messed up, that means you did not in fact send UTF-8 encoded text. Double check the encoding of what you're sending.
I thought UTF8 does support characters like the french accents, but obviously it doesn't!
I does just fine.
See What Every Programmer Absolutely, Positively Needs To Know About Encodings And Character Sets To Work With Text and Handling Unicode Front To Back In A Web App.
Is the PHP text in UTF-8? This concerns the encoding of the editor. When yes, then the bytes in the string literal should already be okay.
It seems to be the case as Arabic is written too.
Use prepared statements for the SQL. This has several advantages: security (SQL injection), escaping of quotes and other special characters, and ... maybe ... encoding of the SQL string.
Unlikely: try
$s = utf8_encode("je viens de télécharger et installer le logiciel");
Though I can foresee another problem: the definition of utf8_encode expects an ISO-8859-1 string, feasible for French, but not for Arabic. If this works, the encoding of the PHP is wrong somehow.
(I find Java to be more consistent w.r.t. Unicode, so I am not entirely sure for PHP.)
The issue of knowing the encoding and converting if necessary, can be addressed using something like this, which makes sure that coding is CP1252. Reverse this to make sure it is UTF8.
function conv_text($value) {
$result = mb_detect_encoding($value." ","UTF-8,CP1252") == "UTF-8" ? iconv("UTF-8", "CP1252", $value ) : $value;
return $result;
}

Can not insert french string in database mysql php

I have form with input text, when i add text
Un sac à main de femme recèlerait une quantité importante de bactéries
it adds in database only Un sac
i have tried with addslashes, mysql_real_escape_string, htmlspecialchars etc. also using UTF-8 encoding, but still it can not insert whole string
YOu should use utf8_unicode_ci as your column's collation in orer for French strings to be added in it.
In order to store non-US strings in the database, you must ensure that each of the following 3 steps are correctly implemented:
You database table must be set to a charset compatible with French. To be future proof, I recommend creating tables with UTF-8. For more information see the MySQL documentation.
Your database connection must be set to a proper character set both when storing and when querying. To do this, use mysqli_set_charset() (or whatever your MySQL connector offers).
Your input form AND your view page must be served with the exact character set as your data. To do that, you will need to set the following header: header('Content-Type: text/html; charset=UTF-8'); (If you are using a different charset, change it accordingly.)
You can of course use a different character set for storage and representation but why would you want to do that?
Also, when working with databases and HTML, you should consider:
ALWAYS escape your data as it goes into the database. Use mysqli_real_escape_string() or whatever escape method your database connector offers. Also, do NOT set the connection charset by using SET NAMES UTF8, otherwise your connector library will not know what charset to use for escaping. For more information google "sql injection".
ALWAYS escape your data as it goes into HTML with htmlspecialchars(). Also pay attention to ALWAYS provide the correct character set. For more information google "xss".
After breaking my head for 2 days straight and reading all the possible answers here's what solved the problem and allows me to insert additional weird characters like em dash etc. and retrieve data without seeing weird characters.
Here's the complete step-by-step setup.
The collation of the db column need to be: utf8_general_ci
The type is: varchar(250)
In the PHP header set the default client character set to UTF8
mysql_set_charset("UTF8", $link);
Set the character set result so we can show french characters
$sql = "SET character_set_results=utf8";
$result = mysql_query($sql);
In the html header specify, so you can view the french characters:
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
When inserting the data do NOT use utf8_decode, just the below will work fine
$query = 'insert into tbl (col) VALUES ("'.mysql_real_escape_string($variable).'");
Use normal queries to retreive data, example query:
$query = "select * from table;";
Finally got this fixed, hope this is helpful to others.
In the php:
header ('Content-type: text/html; charset=utf-8');
After connection:
mysql_set_charset("utf8");
Just to follow up with this, I was using dbForge Studio and just pasting in French text and I had all the collations/encoding set properly. The one thing I didn't have set was the actual encoding for the connection to the db. Set it to UTF8 and all was well again. #2 in #Janoszen answer.
Had the same problem. The input text came from ANSII file, so it wasn't quite UTF8, despite all my utf8 settings. utf8_encode(input_text) solved it.
I have tried
htmlentities()
. .it saves the string as it is in the database
You should try this to insert special character in mysql :
$con = mysql_connect($server,$uname,$pass);
$res = mysql_select_db($database,$con)
mysql_set_charset("letin1", $con);

Converting html entities to utf-8 and inserting them into a mysql database

I am trying to convert a string from HTML-ENTITIES to UTF-8 and then save the encoded string in my database. The html entities are greek letters and look for example like this: νω
Now I tried thousands of different ways, starting from just using utf8_encode or html_entity_decode until now I came across the function mb_convert_encoding().
Now the really weird thing is that when converting my string and then outputting it, it is correctly encoded to utf-8, but when inserting this string into my database I end up getting something like: ξÏνω.
This is the code for the encoding:
header('Content-Type: text/html; charset=utf-8');
mb_internal_encoding('utf-8');
......
while($arr = $select->fetch_array(MYSQLI_ASSOC))
{
$text = $arr["greek"];
$result = mb_convert_encoding($text, 'UTF-8', 'HTML-ENTITIES');
$mysqli->query("UPDATE some SET greek = '".$result."'");
}
When outputting my query and then manually doing a sql query in phpmyadmin it works fine, so it doesnt seem to be a problem of my db. There must be some problem when transferring the encoded string to my database...
As you see in your script, you are instructing the browser to use UTF8. That is the first step.
However your database needs the same thing and also the encoding/collation on the tables need to be UTF8 too.
You can either recreate your tables using utf8_general_ci or utf8_unicode_ci as the collation, or convert the existing tables (see here)
You need to also make sure that your database connection i.e. php code to mysql is using UTF8. If you are using PDO there are plenty of articles that show how to do that. The simplest way is to do:
$mysqli->query('SET NAMES utf8');
NOTE The change you will make now is final. If you change the connection encoding to your database, you could affect existing data.
EDIT You can do the following to set the connection
$mysqli = new mysqli($host, $user, $pass, $db);
if (!$mysqli->set_charset("utf8")) {
die("Error loading character set utf8: %s\n", $mysqli->error);
}
$mysqli->close();
Links of interest:
Whether to use "SET NAMES"
Execute the SET NAMES 'utf8' query prior to any others.

Encoding SQL_Latin1_General_CP1_CI_AS into UTF-8

I'm generating a XML file with PHP using DomDocument and I need to handle asian characters. I'm pulling data from the MSSQL2008 server using the pdo_mssql driver and I apply utf8_encode() on the XML attribute values. Everything works fine as long as there's no special characters.
The server is MS SQL Server 2008 SP3
The database, table and column collation are all SQL_Latin1_General_CP1_CI_AS
I'm using PHP 5.2.17
Here's my PDO object:
$pdo = new PDO("mssql:host=MyServer,1433;dbname=MyDatabase", user123, password123);
My query is a basic SELECT.
I know storing special characters into SQL_Latin1_General_CP1_CI_AS columns isn't great, but ideally it would be nice to make it work without changing it, because other non-PHP programs already use that column and it works fine. In SQL Server Management Studio I can see the asian characters correctly.
Considering all the details above, how should I process the data?
I found how to solve it, so hopefully this will be helpful to someone.
First, SQL_Latin1_General_CP1_CI_AS is a strange mix of CP-1252 and UTF-8.
The basic characters are CP-1252, so this is why all I had to do was UTF-8 and everything worked. The asian and other UTF-8 characters are encoded on 2 bytes and the php pdo_mssql driver seems to hate varying length characters so it seems to do a CAST to varchar (instead of nvarchar) and then all the 2 byte characters become question marks ('?').
I fixed it by casting it to binary and then I rebuild the text with php:
SELECT CAST(MY_COLUMN AS VARBINARY(MAX)) FROM MY_TABLE;
In php:
//Binary to hexadecimal
$hex = bin2hex($bin);
//And then from hex to string
$str = "";
for ($i=0;$i<strlen($hex) -1;$i+=2)
{
$str .= chr(hexdec($hex[$i].$hex[$i+1]));
}
//And then from UCS-2LE/SQL_Latin1_General_CP1_CI_AS (that's the column format in the DB) to UTF-8
$str = iconv('UCS-2LE', 'UTF-8', $str);
I know this post is old, but the only thing that work for me was
iconv("CP850", "UTF-8//TRANSLIT", $var);
I had the same issues with SQL_Latin1_General_CP1_CI_AI, maybe it work for SQL_Latin1_General_CP1_CI_AS too.
You can try so:
header("Content-Type: text/html; charset=utf-8");
$dbhost = "hostname";
$db = "database";
$query = "SELECT *
FROM Estado
ORDER BY Nome";
$conn = new PDO( "sqlsrv:server=$dbhost ; Database = $db", "", "" );
$stmt = $conn->prepare( $query, array(PDO::ATTR_CURSOR => PDO::CURSOR_SCROLL, PDO::SQLSRV_ATTR_CURSOR_SCROLL_TYPE => PDO::SQLSRV_CURSOR_BUFFERED, PDO::SQLSRV_ENCODING_SYSTEM) );
$stmt->execute();
while ( $row = $stmt->fetch( PDO::FETCH_ASSOC ) )
{
// CP1252 == code page Latin1
print iconv("CP1252", "ISO-8859-1", "$row[Nome] <br>");
}
For me, none of the above was the direct solution--though I did use parts of above solutions. This worked for me with the Vietnamese alphabet. If you come across this post and none of the above work for you, try:
$req = "SELECT CAST(MY_COLUMN as VARBINARY(MAX)) as MY_COLUMN FROM MY_TABLE";
$stmt = $conn->prepare($req);
$stmt->execute();
while ($row = $stmt->fetch(PDO::FETCH_ASSOC)) {
$str = pack("H*",$row['MY_COLUMN']);
$str = mb_convert_encoding($z, 'HTML-ENTITIES','UCS-2LE');
print_r($str);
}
And a little bonus--I had to json_encode this data and was (duh) getting html code instead of the special characters. to fix just use html_entity_decode() on the strings before sending with json_encode.
No need for crazy stuff. Collation SQL_Latin1_General_CP1_CI_AS character encoding is: Windows-1252
This works perfect for me: $str = mb_convert_encoding($str, 'UTF-8', 'Windows-1252');
By default, PDO uses PDO::SQLSRV_ENCODING_UTF8 for sending/receiving data.
If your current collate is LATIN1, have you tried specifiying PDO::SQLSRV_ENCODING_SYSTEM to let PDO know that you want to use the current system encoding instead of UTF-8 ?
You could even use PDO::SQLSRV_ENCODING_BINARY which returns data in a binary form (no encoding or translation is done when transfering data). This way, you could handle character encoding on your side.
More documentation here: http://ca3.php.net/manual/en/ref.pdo-sqlsrv.php
Thanks #SGr for answer.
I found out a better way for doing that :
SELECT CAST(CAST(MY_COLUMN AS VARBINARY(MAX)) AS VARCHAR(MAX)) as MY_COLUMN FROM MY_TABLE;
and also try with:
SELECT CAST(MY_COLUMN AS VARBINARY(MAX)) as MY_COLUMN FROM MY_TABLE;
And in PHP you should just convert it to UTF-8 :
$string = iconv('UCS-2LE', 'UTF-8', $row['MY_COLUMN']);

How to extract a UTF-8 string (In Arabic) from a MySQL DB and echo to screen using PHP

I have a MySQL db, i've set collation = utf8_unicode_ci.
I'm trying to fetch the value through PHP but i'm getting "???" instead of the actual string.
I have read about this subject and tried using mb_convert_encoding but it didn't work, what am I missing?
Can someone please post a code snippet that actually pulls a value from a DB and echos the string to the screen?
Thanks,
I have a MySQL db, i've set collation = utf8_unicode_ci.
I'm trying to fetch the value through PHP but i'm getting "???" instead of the actual string.
Character sets are how characters are encoded.
Collations are how characters are sorted.
These are different things. Chances are that your tables or columns have the right collation, but the wrong character set. The Internationalization section of the MySQL manual has a great deal of information on how to set things up correctly.
Can someone please post a code snippet that actually pulls a value from a DB and echos the string to the screen?
Let's demonstrate how to use utf8 as a character set, and the utf8 "general case insensitive" collation. I'm using PDO in this example, but the same general idea should work with mysqli as well. I wouldn't advise using the old mysql extension.
// Let's tell MySQL we're going to be working with utf8 data.
// http://dev.mysql.com/doc/refman/5.1/en/charset-connection.html
$db->query("SET NAMES 'utf8'");
// Create a table with our proper charset and collation.
// If we needed to, we could specify the charset and collation with
// each column.
// http://dev.mysql.com/doc/refman/5.1/en/charset-column.html
// We could also set the defaults at the database level.
// http://dev.mysql.com/doc/refman/5.1/en/charset-database.html
$db->query('
CREATE TABLE foo(
bar TEXT
)
DEFAULT CHARACTER SET utf8
DEFAULT COLLATE utf8_general_ci
ENGINE=InnoDB
');
// I don't know Arabic, so I'll type this in English. It should
// work fine in Arabic, as long as the string is encoded as utf8.
$sth = $db->prepare("INSERT INTO foo(bar) VALUES(?)");
$sth->execute(array("Hello, world!"));
$sth = $db->query("SELECT bar FROM foo LIMIT 1");
$row = $sth->fetch(PDO::FETCH_NUM);
echo $row[0]; // Will echo "Hello, world!", or whatever you inserted.
#tomp's comment below is correct. Make sure to emit a proper character set with your content type header. For example:
header('Content-type: text/html; charset=utf-8'); // Note the dash!

Categories