UTF-8 in SQL Server 2008 Database + PHP - php

I want to store data with PHP in a MS SQL 2008 Database.
I`ve got problems with letters like ä ö ü ß, they are displayed incorrect in the database and when i display it on the website.
It works when I utf8_encode the data on input and utf8_decode on output the data with PHP.
Is there a other easier way to solve this?

I've solved this once, the problem is that the PHP's mssql driver is broken (can't find the link to bugs.php.net, but it is there) and fails when it comes to nchar and nvarchar fieldsa and utf8. You'll need to convert the data and queries a little bit:
SELECT some_nvarchar_field FROM some_table
First, you need to change the output to binary - that way it won't get corrupted:
SELECT CONVERT(varbinary(MAX), some_nvarchar_field) FROM some_table;
Then in PHP, you'll need to convert it to UTF-8 back, there you'll need iconv extension
iconv('UTF-16LE', 'UTF-8', $result['some_nvarchar_field']);
This fixes selectiong data from database, however, if you want to actually put some data TO the database, or add a WHERE clause, you'll still be getting errors, so the fix for WHERE, UPDATE, INSERT and so on is by converting the string to hexadecimal form:
Imagine you have this query:
INSERT INTO some_table (some_nvarchar_field) VALUES ('ŽČŘĚÝÁÖ');
Now, we'll have to run some PHP:
$value = 'ŽČŘĚÝÁÖ';
$value = iconv('UTF-8', 'UTF-16LE', $value); //convert into native encoding
$value = bin2hex($value); //convert into hexadecimal
$query = 'INSERT INTO some_table (some_nvarchar_field) VALUES(CONVERT(nvarchar(MAX), 0x'.$value.'))';
The query becomes this:
INSERT INTO some_table (some_nvarchar_field) VALUES (CONVERT(nvarchar(MAX), 0x7d010c0158011a01dd00c100d600));
And that will work!
I've tested this with MS SQL server 2008 on Linux using FreeTDS and it works just fine, I've got some huge websites runing on this with no issues what so ever.

I searched for two days how to insert UTF-8 data (from web forms) into MSSQL 2008 through PHP. I read everywhere that you can't, you need to convert to UCS2 first (like cypher's solution recommends).
On Windows environment SQLSRV said to be a good solution, which I couldn't try, since I am developing on Mac OSX.
However, FreeTDS manual (what PHP mssql uses on OSX) says to add a letter "N" before the opening quote:
mssql_query("INSERT INTO table (nvarcharField) VALUES (N'űáúőűá球最大的采购批发平台')", +xon);
According to this discussion, N character tells the server to convert to Unicode.
https://softwareengineering.stackexchange.com/questions/155859/why-do-we-need-to-put-n-before-strings-in-microsoft-sql-server

i can insert by code
$value = $_POST['first_name'];
$value = iconv("UTF-8","UCS-2LE",$value);
$value2 = $_POST['last_name'];
$value2 = iconv("UTF-8","UCS-2LE",$value2);
$query = "INSERT INTO tbl_sample(first_name, last_name) VALUES (CONVERT(VARBINARY(MAX), '".$value."') , CONVERT(VARBINARY(MAX), '".$value2."'))";
odbc_exec($connect, $query);
if i can not search

Related

Query with email headers from special Latin characters rejected by PHP mysqli_query and MariaDB command line, works in HeidiSQL

I have encountered a scenario where an email from someone in Europe keeps failing to execute. After minimizing the query I've determined that after all special characters like å and é are removed the query works fine in PHP / mysqli_query. The queries also don't work in MariaDB's command line though they do work in HeidiSQL, I imagine whatever HeidiSQL uses it internally adjusts strings used in the Query tabs.
Let's get the following out of the way:
Database Character Set: utf8mb4.
Database Collation: utf8mb4_unicode_520_ci.
Database column collation: utf8mb4_unicode_520_ci.
The correct query for the request method SET CHARACTER SET 'utf8mb4' is being correctly executed.
Here is the query:
INSERT INTO example_table (example_column) VALUES ('Håko');
I should note that I tried the following (which also failed) even though I firmly believe that this issue occurs from and should be resolved via PHP:
INSERT INTO example_table (example_column) VALUES (CONVERT('Håko' USING utf8));
Here is the MariaDB error:
Incorrect string value: '\xE9rard ...'
Like I said this string is originating from an email message so I'm pretty sure that the issue is with PHP, not MariaDB. So let's go backwards to that code that seems to otherwise work. Please keep in mind that this has taken at least two days to put together in the correct order to even get the strings to appear correctly in the MariaDB query log without being incorrectly converted to UTF-8 and corrupting the special Latin characters:
<?php
$s1 = '=?iso-8859-1?Q?=22G=E9rd_Tabt=22?= <berbs#example.com>';//"Gérd Tabt" <berbs#example.com>
if (strlen($s1) > 0)
{
if (substr_count($s1, '=?') && substr_count($s1, '?= '))
{
$p = explode('?= ', $s1);
$p[0] = $p[0].'?=';
$s2 = imap_mime_header_decode($p[0])[0]->text.' '.$p[1];
}
else {$s2 = imap_mime_header_decode($s1)[0]->text;}
if (strpos($s1, '=96') !== false) {$s2 = mb_convert_encoding($s2, 'UTF-8', 'CP1252');}
else if (mb_convert_encoding($s2, 'UTF-8') == substr_count($s1, '?')) {$s2 = mb_convert_encoding($s2, 'UTF-8');}
}
else {$s2 = $s1;}
?>
There isn't any other relevant code handling this header string.
What is causing what I presume to be UTF-8 encoded strings to break PHP's mysqli_query and the MariaDB command line from working with this query?
Where did the hex E9 come from? That is encoded latin1. Yet your configuration seems to claim that your client is encoded utf8mb4. You must have the connection charset match what the encoding is in the client. The database and table and client can have a different encoding; MariaDB is happy to convert on the fly when INSERTing or SELECTing.
For more analysis, see Trouble with UTF-8 characters; what I see is not what I stored
if (mb_convert_encoding($s2, 'UTF-8') == substr_count($s1, '?'))
This makes no sense: comparing a string (converted from anything to UTF-8) against an integer (amount of matches) will only ever be equal when the converted text is '0', which is also the amount of finding '?' in it, and due to the type unsafe comparison parameter == this is the only scenario where '0' equals 0.
So your text is never converted to UTF-8 and remains whatever it was (in this case ISO-8859-1).
mb_convert_encoding($s2, 'UTF-8')
Sure you want to convert to UTF-8 without telling the source encoding? ISO-8859-1 as per email header isn't the only one to expect - why not extracting that information and passing it to the function?
MariaDB is right: you're handing over ISO-8859-1 encoded text in that case, while the DBMS expects the UTF-8 encoding.

php - sqlsrv encoding

My Project builds on PHP and connect to MS SQL Server. I am using sqlsrv library. The fields type in MS SQL is nvarchar. When I define the parameters for connection I also put "utf8". It is:
global $cnf;
$cnf = array();
$cnf["mssql_user"] = "xxx";
$cnf["mssql_host"] = "xxx";
$cnf["mssql_pw"] = "xxx";
$cnf["mssql_db"] = "xxx";
$cnf["CharacterSet"] = "**UTF-8**";
When Insert records to database, for Vietnamese content and Chinese content I use:
$city = iconv('UTF-8', 'utf-16le', $post['city']);
$params = array(array($city, null, SQLSRV_PHPTYPE_STRING(SQLSRV_ENC_BINARY)));
$sql= "INSERT INTO tblCityGarden (city) VALUES(?)
$stmt = sqlsrv_query( $this->dbhandle, $sql, $params);
It inserts data OK for Vietnamese and Chinese language (the data stored in database for Vietnamese and Chinese is correct).
However when I load the records back into web, It appears the strange character (?, �).
I try some php as iconv, mb_detect_encoding, mb_convert_encoding and search many results on internet, but It cannot work. How can I display correct data
Please someone who has experiences about this issues
I had this same problem (�), but with single quotes, double qoutes, and "Rights Reserved" characters... Here is what I've found:
The CharacterSet specified seems to "rule all", so what you set this to will determine the encoding for the connection (as it should). I did not have ANY CharacterSet configured on my connection(s). Simply setting this resolved my issue, along with making sure that the values that were inserted into my DB were not double encoded via htmlspecialchars().
header()'s must be set before ANY output (this is really important)
headers cannot be set to something different later in the document
Sometimes trailing spaces before and/or after the closing ?> in your PHP file can cause issues (I don't use this closing tag, but I saw this mentioned a lot while searching)
I am not familiar with iconv(), and I am most certainly not experienced with encoding in general, but I solved my issue just by taking the time to check my headers and ensure they meet the above standards...
Your query parameters also look strange:
$params = array(array($city, null, SQLSRV_PHPTYPE_STRING(SQLSRV_ENC_BINARY)));
I have not seen a multidimensional array passed into that argument... (just a note)

Convert latin1 characters on a UTF8 table into UTF8

Only today I realized that I was missing this in my PHP scripts:
mysql_set_charset('utf8');
All my tables are InnoDB, collation "utf8_unicode_ci", and all my VARCHAR columns are "utf8_unicode_ci" as well. I have mb_internal_encoding('UTF-8'); on my PHP scripts, and all my PHP files are encoded as UTF-8.
So, until now, every time I "INSERT" something with diacritics, example:
mysql_query('INSERT INTO `table` SET `name`="Jáuò Iñe"');
The 'name' contents would be, in this case: Jáuò Iñe.
Since I fixed the charset between PHP and MySQL, new INSERTs are now storing correctly. However, I want to fix all the older rows that are "messed" at the moment. I tried many things already, but it always breaks the strings on the first "illegal" character. Here is my current code:
$m = mysql_real_escape_string('¿<?php echo "¬<b>\'PHP á (á)ţăriîş </b>"; ?> ă-ţi abcdd;//;ñç´พดแทฝใจคçăâξβψδπλξξςαยนñ ;');
mysql_set_charset('utf8');
mysql_query('INSERT INTO `table` SET `name`="'.$m.'"');
mysql_set_charset('latin1');
mysql_query('INSERT INTO `table` SET `name`="'.$m.'"');
mysql_set_charset('utf8');
$result = mysql_iquery('SELECT * FROM `table`');
while ($row = mysql_fetch_assoc($result)) {
$message = $row['name'];
$message = mb_convert_encoding($message, 'ISO-8859-15', 'UTF-8');
//$message = iconv("UTF-8", "ISO-8859-1//IGNORE", $message);
mysql_iquery('UPDATE `table` SET `name`="'.mysql_real_escape_string($message).'" WHERE `a1`="'.$row['a1'].'"');
}
It "UPDATE"s with the expected characters, except that the string gets truncated after the character "ă". I mean, that character and following chars are not included on the string.
Also, testing with the "iconv()" (that is commented on the code) does the same, even with //IGNORE and //TRANSLIT
I also tested several charsets, between ISO-8859-1 and ISO-8859-15.
From what you describe, it seems you have UTF-8 data that was originally stored as Latin-1 and then not converted correctly to UTF-8. The data is recoverable; you'll need a MySQL function like
convert(cast(convert(name using latin1) as binary) using utf8)
It's possible that you may need to omit the inner conversion, depending on how the data was altered during the encoding conversion.
After I searched about an hour or two for this answer, I needed to migrate an old tt_news db from typo into a new typo3 version. I tried to convert the charset in the export file and import it back already, but didn't get it working.
Then I tried the answer above from ABS and started an update on the table:
UPDATE tt_news SET
title=convert(cast(convert(title using latin1) as binary) using utf8),
short=convert(cast(convert(short using latin1) as binary) using utf8),
bodytext=convert(cast(convert(bodytext using latin1) as binary) using utf8)
WHERE 1
You can also convert imagecaption, imagealttext, imagetitletext and keywords if needed.
Hope this will help somebody migrating tt_news to new typo3 version.
the way is better way
use connection tow you database normal
then use this code to make what you need
you must make your page encoding utf-8 by meta in header cod html (dont forget this)
then use this code
$result = mysql_query('SELECT * FROM shops');
while ($row = mysql_fetch_assoc($
$name= iconv("windows-1256", "UTF-8", $row['name']);
mysql_query("SET NAMES 'utf8'");
mysql_query("update `shops` SET `name`='".$name."' where ID='$row[ID]' ");
}
I highly recommend using 'utf8mb4' instead of 'utf8', since utf8 cannot store some chinese characters and emojis.

Why do I get invalid characters when converting MS SQL Data to MYSQL?

I'm writing a PHP script to import data into a MYSQL database from a Microsoft SQL Server 2008 database.
The MSSQL Server is set with a collation of "SQL_Latin1_General_CP1_CI_AS" and the data in question is being stored in a column of the type "nchar".
My PHP web pages use
<meta http-equiv="content-type" content="text/html; charset=utf-8">
to indicate that they should be displayed with UTF-8 Character encoding.
I'm pulling the data from the MSSQL database using the sqlsrv PHP extension.
$sql = 'SELECT * FROM [tArticle] WHERE [ID] = 6429';
$stmt = &sqlsrv_query($dbHandler, $sql);
while ($row = sqlsrv_fetch_object($stmt)) {
// examples of what I've tried simply to display the data
echo $row->Text1;
echo utf8_encode($row->Text1);
echo iconv("ISO-8859-1", "UTF-8", $row->Text1);
echo iconv("ISO-8859-1", "UTF-8//TRANSLIT", $row->Text1);
}
Forget about inserting the data into the MYSQL database for now. I can't get the string to display properly in my PHP page. From the examples in my listing:
echo $row->Text1
is rendered by my browser as an obviously invalid character: "Lucy�s"
all of the examples following that one are rendered as blanks: "Lucys"
It looks like a character set mismatch problem to me but how can I get this data to display properly from the MS SQL database (without changing my web-page encoding)? If I can figure that out I can probably work out the storing it in the MYSQL database part.
If the strings in the source database are encoded in UTF-8, you should use utf8_decode, not utf8_encode.
But they're probably encoded in some Latin or "Western" Windows code page. So I would try iconv("CP1252", "UTF-8", $row->Text1);, for example.
Another alternative is to run a SQL query that explicitly sets a known encoding. For example, according to the Windows Collation Name (Transact-SQL) documentation, this query would use code page 1252 to encode field Text1: SELECT Text1 COLLATE SQL_Latin1_General_CP1_CI_AS FROM ....
try this command it's working for me :
$connectionInfo = array( "Database"=>"DBName", "CharacterSet" =>"UTF-8");

PHP + SQL Server - How to set charset for connection?

I'm trying to store some data in a SQL Server database through php.
Problem is that special chars aren't converted properly. My app's charset is iso-8859-1
and the one used by the server is windows-1252.
Converting the data manually before inserting doesn't help, there seems to be some
conversion going on.
Running the SQL query 'set char_convert off' doesn't help either.
Anyone have any idea how I can get this to work?
EDIT: I have tried ini_set('mssql.charset', 'windows-1252'); as well, but no result with that one either.
Client charset is necessary but not sufficient:
ini_set('mssql.charset', 'UTF-8');
I searched for two days how to insert UTF-8 data (from web forms) into MSSQL 2008 through PHP. I read everywhere that you can't, you need to convert to UCS2 first (like cypher's solution recommends).
On Windows SQLSRV said to be a good solution, which I couldn't try, since I am developing on Mac OSX.
However, FreeTDS manual (what PHP mssql uses on OSX) says to add a letter "N" before the opening quote:
mssql_query("INSERT INTO table (nvarcharField) VALUES (N'űáúőűá球最大的采购批发平台')", +xon);
According to this discussion, N character tells the server to convert to Unicode.
https://softwareengineering.stackexchange.com/questions/155859/why-do-we-need-to-put-n-before-strings-in-microsoft-sql-server
I had the same problem and ini_set('mssql.charset', 'utf-8') did not work for me.
However, it worked in uppercase:
ini_set('mssql.charset', 'UTF-8');
I suggest looking at the following points:
Ensure that the columns that you're storing the information in are nchar or nvarchar as char and nvarchar don't support UCS-2 (SQLServer doesn't store in UTF-8 format btw)
If you're connecting with the mssql library/extension for PHP, run: ini_set('mssql.charset', 'utf-8'); as there's no function with a charset argument (connect, query etc)
Ensure that your browsers charset is also set to UTF-8
If ini_set('mssql.charset', 'UTF-8'); doesn't help AND you don't have root access to modify the system wide freetds.conf file, here's what you can do:
1. Set up /your/local/freetds.conf file:
[sqlservername]
host=192.168.0.56
port=1433
tds version=7.0
client charset=UTF-8
2. Make sure your connection DSN is using the servername, not the IP:
'dsn' => 'dblib:host=sqlservername;dbname=yourdb
3. Make FreeTDS to use your local freetds.conf file as an unprivileged user from php script via env variables:
putenv('FREETDSCONF=/your/local/freetds.conf');
If you are using TDS protocol version 7 or above, ALL communications over the wire are converted to UCS2. The server will convert from UCS2 into whatever the table or column collation is set to, unless the column is nvarchar or ntext. You can store UTF-8 into regular varchar or text, you just have to use a TDS protocol version lower than 7, like 6.0 or 4.2. The only drawback with this method is that you cannot query any nvarchar, ntext, or sys.* tables (I think you also can't do any CAST()ing) - as the server refuses to send anything that might possibly be converted to UTF-8 to any client using protocol version lower than 7.
It is not possible to avoid converting character sets when using TDS protocol version 7 or higher (roughly equivalent to MSSQL 2005 or newer).
In my case, It worked after I added the "CharacterSet" parameters into sqlsrv_connect() connection's option.
$connectionInfo = array(
"Database"=>$DBNAME,
"ConnectionPooling"=>0,
"CharacterSet"=>"UTF-8"
);
$LAST_CONNECTION = sqlsrv_connect($DBSERVER, $connectionInfo);
See documentation here :
https://learn.microsoft.com/en-us/sql/connect/php/connection-options?view=sql-server-2017
I've had luck in a similar situation (using a PDO ODBD connection) using the following code to convert the encoding before printing output:
$data = mb_convert_encoding($data, 'ISO-8859-1', 'windows-1252');
I had to manually set the source encoding, because it was erroneously being reported as 'ISO-8859-1' by mb_detect_encoding().
My data was also being stored in the database by another application, so I might be in a unique situation, although I hope it helps!
For me editing this file:
/etc/freetds/freetds.conf
...and changing/setting 'tds version' parameter to '7.0' helped. Edit your freetds.conf and try to change this parameter for your server configuration (or global).
It will work even without apache restart.
I did not notice someone to mention another way of converting results from MSSQL database. The good old iconv() function:
iconv (string $in_charset, string $out_charset, string $str): string;
In my case everything else failed to provide meaningful conversion, except this one when getting the results. Of course, this is done inside the loop of parsing the results of the query - from CP1251 to UTF-8:
foreach ($records as $row=>$col) {
$array[$row]['StatusName'] = iconv ('CP1251', 'UTF-8' , $records[$row]['StatusName']);
}
Ugly, but it works.
Can't you just convert your tables to your application encoding? Or use utf-8 in both?
I don't know whether MSSQL supports table-level encodings, though.
Also, try the MB (multibyte) string functions, if the above fails.
You should set the charset with ini_set('mssql.charset', 'windows-1252') before the connection. If you use it after the mssql_connect it has no effect.
Just adding ini_set('mssql.charset', 'UTF-8'); didn't help me in my case. I had to specify the UTF-8 character set on the column:
$age = 30;
$name = utf8_encode("Joe");
$select = sqlsrv_query($conn, "SELECT * FROM Users WHERE Age = ? AND Name = ?",
array(array($age), array($name, SQLSRV_PARAM_IN, SQLSRV_PHPTYPE_STRING('UTF-8')));
You can use the mysql_set_charset function:
http://it2.php.net/manual/en/function.mysql-set-charset.php

Categories