Get UTF-8 from MSSQL with ODBC

Get UTF-8 from MSSQL with ODBC - php

I have problems selecting text from a MSSQL Database. I am connecting through ODBC with the SQL Server Native Client 10.0 driver. The Data is stored in a ntext column with UTF-8 characters and I need to output it correctly with PHP, if you are curious it's a download script. I've tried several tricks and tips like converting the character encoding but none of them really worked.
I have to mention that I am not able to change anything in the database.
<?php
odbc_connect();
odbc_execute();
odbc_fetch_row();
header("Content-type: application/octet-stream");
echo odbc_result(...., 'ntext-column');

Finally I got the solution.
You have to cast your column to VARBINARY in the select query. If you have a ntext column you have to cast it to nvarchar first.
SELECT CAST(CAST(column AS NVARCHAR(MAX)) AS VARBINARY(MAX)) AS column FROM ...
Then you have to enable the binary mode for odbc and set the expected length of the binary data. Finally you have to convert the binary data from the charset UCS-2LE to UTF-8, then it's done.
...
odbc_execute($query, ...);
odbc_longreadlen($query, 1000000); // Set the expected length to 1 Megabyte
odbc_binmode($query, ODBC_BINMODE_RETURN); // Enable binary mode
... // Fetch the row
$data = iconv('UCS-2LE', 'UTF-8', odbc_result($query, 'column'));

Related

charset issue with accents retrieving DB2 values with php over odbc

I'm trying to do a select from a DB2 through PHP and odbc and then save those values on a file. The OS where the code is being executed is Debian. What I do is the following:
$query = "SELECT NAME FROM DATABASE_EXAMPLE.TABLE_EXAMPLE";
$result = odbc_prepare($server, $query);
$success = odbc_execute($result);
$linias = "";
if ($success) {
while ($myRow = odbc_fetch_array($result)) {
$linias .=format_word($myRow['NAME'], 30) . "\r\n";
}
generate_file($linias);
function format_word($paraula, $longitut) {
return str_pad(utf8_encode($paraula), $longitut, " ", STR_PAD_LEFT);
}
function generate_file($linias) {
$nom_fitxer = date('YmdGis');
file_put_contents($nom_fitxer . ".tmp", $linias);
rename($nom_fitxer . '.tmp', $nom_fitxer . '.itf');
}
The problem is that some of the retrieved values contains spanish letters and accents. To make and example, one of the values is "ÁNGULO". If I var_dump the code on my browser I get the word fine, but when it's write into the file it apends weird characters on it (that's why I think there is a problem with the charset). I have tried different workarounds but it just make it worst. The file opened with Notepad++ (with UTF8 encoding enabled) looks like:
Is there a function in PHP that translate between charsets?
Edit
Following erg instructions I do further research:
The DB2 database use IBM284 charset, as I found executing the next command:
select table_schema, table_name, column_name, character_set_name from SYSIBM.COLUMNS
Firefox says the page is encoded as Unicode.
If i do:
var_dump(mb_detect_encoding($paraula));
I get bool(false) as a result.
I have changed my function for formating the word hoping that iconv resolve the conflict:
function format_word($paraula, $longitut) {
$paraula : mb_convert_encoding($paraula, 'UTF-8');
$paraula= iconv("IBM284", "UTF-8", $paraula);
return $paraula;
}
But it doesn't. Seems like the ODBC it's doing some codification bad and that is what mess the data. How can I modify the odbc to codificate to the right charset? I have seen some on Linux changing the locale, but if I execute the command locale on the PC I get:
LC_NAME="es_ES.UTF-8"
LC_ADDRESS="es_ES.UTF-8"
...

I will try to summarize from the comments into an answer:
First note that PHPs utf8_encode will convert from ISO-8859-1 to utf-8. If your database / ODBC-Driver does not return ISO-8859-1 encoded strings, PHPs utf8_encode will fail or return garbage.
The easiest solution should be to let the database / driver convert the values to correct encoding, using its CAST function: https://www.ibm.com/support/knowledgecenter/SSEPEK_11.0.0/sqlref/src/tpc/db2z_castspecification.html
Try to alter your query to let DB2 convert everything to UTF-8 directly and omit the utf8_encode call. This can be done by altering you query to something like:
SELECT CAST(NAME AS VARCHAR(255) CCSID 1208) FROM DATABASE_EXAMPLE.TABLE_EXAMPLE
Thanks to Sergei for the note about CCSID 1208 on IBM PUA. I changed CCSID UNICODE to CCSID 1208.
I do not have DB2 at hand here, so the above query is untested. I'm not sure if this will return utf-8 or utf-16..

JSON creating from PHP giving wrong data?

I have one php form where i used to enter data to database(phpmyadmin), and i used SELECT query to display all values in database to view in php form.
Also i have another PHP file which i used to create JSON from the same db table.
Here when i enter foreign languages like "Experiența personală:" the value getting saved in DB is "ExperienÈ›a personalÄƒ: " but when i use select query to display this in same php form it coming correctly "Experiența personală:". So the db is correct and now am using following php code to create JSON
<?php
$servername = "localhost";
$username = "root";
$password = "root";
$dbname = "aaps";
// Create connection
$con=mysqli_connect($servername,$username,$password,$dbname);
// Check connection
mysqli_set_charset($con, 'utf8');
//echo "connected";
$rslt=mysqli_query($con,"SELECT * FROM offers");
while($row=mysqli_fetch_assoc($rslt))
{
$taxi[] = array('code'=> $row["code"], 'name'=> $row["name"],'contact'=> $row["contact"], 'url'=> $row["url"], 'details'=> $row["details"]);
}
header("Content-type: application/json; charset=utf-8");
echo json_encode($taxi);
?>
and JSON looks like
[{"code":"CT1","name":"Experien\u00c8\u203aa personal\u00c4\u0192: ","contact":"4535623643","url":"images\/offers\/event-logo-8.jpg","details":"Experien\u00c8\u203aa personal\u00c4\u0192: jerhbehwgrh 234234 hjfhjerg#$%$#%#4"},{"code":"ewrw","name":"Experien\u00c8\u203aa personal\u00c4\u0192: ","contact":"ewfew","url":"","details":"eExperien\u00c8\u203aa personal\u00c4\u0192: Experien\u00c8\u203aa personal\u00c4\u0192: Experien\u00c8\u203aa personal\u00c4\u0192: "},{"code":"Experien\u00c8\u203aa personal\u00c4\u0192: ","name":"Experien\u00c8\u203aa personal\u00c4\u0192: ","contact":"","url":"","details":"Experien\u00c8\u203aa personal\u00c4\u0192: "}]
In this "\u00c8\u203aa" this is wrong it supposed to be "\u021b" (t).
So pho used to creating JSON making this issue.
But am unable to find exactly why its coming like this . please help

Avoid Unicode -- note the extra argument:
json_encode($s, JSON_UNESCAPED_UNICODE)
Don't use utf8_encode/decode.
ă turning into Äƒ is Mojibake. It probably means that
The bytes you have in the client are correctly encoded in utf8 (good).
You connected with SET NAMES latin1 (or set_charset('latin1') or ...), probably by default. (It should have been utf8.)
The column in the tables may or may not have been CHARACTER SET utf8, but it should have been that.
If you need to fix for the data it takes a "2-step ALTER", something like
ALTER TABLE Tbl MODIFY COLUMN col VARBINARY(...) ...;
ALTER TABLE Tbl MODIFY COLUMN col VARCHAR(...) ... CHARACTER SET utf8 ...;
Before making any changes, do
SELECT col, HEX(col) FROM tbl WHERE ...
With that, ă should show hex of C483. If you see C384C692, you have "double-encoding", which is messier to fix.

Depending on the version of MySql in the database, it may not be using the full utf-8 set, as stated in the documentation:
The ucs2 and utf8 character sets do not support supplementary characters that lie outside the BMP. Characters outside the BMP compare as REPLACEMENT CHARACTER and convert to '?' when converted to a Unicode character set.
This, however, is not likely to be related to your problem. I would try a couple of different things and see if it solves your problem.
use SET NAMES utf-8
You can read more about that here
use utf8_encode() when inserting data to the database, and utf8_decode() when extracting. That way, you don't have to worry about MySql manipulating the unicode characters. Documentation

XOR encode a multibyte string and save to MySQL field without loss

I'm currently using this function to obfuscate a bit the field values in MySQL and protect it from direct dumping. It all works good and values are stored correctly, but what happens when i try to store a multibyte string?
Here's an example, let's try to encode the string álex:
<?
$v = xorencode('álex');
// step 1 - encode
echo $v."\n";
// step 2 - decode
echo xorencode($v);
?>
Works good, i see some obfuscated string first time, and then i see álex again. Now if i try to save it in a VARCHAR field in a MySQL table, and then select it - i no longer have a utf string, instead it gets returned as gllex.
Note, MySQL tables and fields collations are utf8_general_ci, files are UTF-8, and i SET NAMES utf8 after connecting. Any workaround to this?
Thanks

How to ensure all data going in and out of a database is utf-8 encoded?

I just learned about character sets today, so forgive the newb factor if this is confusing. Please ask for clarification if it's needed.
I wrote a program in php which recursively goes through the files in a folder and stores the file names in a database. The file names are then all exported from the database in json format using the json_encode($array) function.
However this function only works with UTF-8 encoded data. And since a few of the key-value pairs in the json export have the value of null, I'm lead to believe that those strings of filenames taken from the database are in fact not utf-8.
I've ensured that all the data going in and out of the the database is utf-8 by setting the defaults to utf-8 in my.cnf and restarting mysql from the command line using service mysql restart
[client]
default-character-set=utf8
[mysqld]
default-character-set = utf8
I then created my database, the table and all the columns in the table and confirmed that the database, table and all the columns are in fact utf-8
Checks if database is utf-8
SELECT default_character_set_name FROM information_schema.SCHEMATA S
WHERE schema_name = "schemaname";
Checks if table is utf-8
SELECT CCSA.character_set_name FROM information_schema.`TABLES` T,
information_schema.`COLLATION_CHARACTER_SET_APPLICABILITY` CCSA
WHERE CCSA.collation_name = T.table_collation
AND T.table_schema = "schemaname"
AND T.table_name = "tablename";
Checks if field is utf-8
SELECT character_set_name FROM information_schema.`COLUMNS` C
WHERE table_schema = "schemaname"
AND table_name = "tablename"
AND column_name = "columnname";
There's this file that has the characters –µ–ª–∫—É–Ω—á–∏–∫ in the file name. When it's stored in the database the values appear as â€“Â©â€“Âµâ€“Âªâ€“â'.
Per my database settings, are all the strings going in and out of my database utf-8?
What can I do to ensure the data I am SELECT'ing from the database is utf-8, so I can perform json_encode($array)? (NOTE: this function only works on utf-8 encoded data)

Unfortunately I don't know how you can ensure everything coming out is UTF-8 (now I'm curious too!), but a starting point would be trying this in your PHP:
$encodedNames = array();
$errors = array();
// Loop through all of the filenames
foreach($filenames as $filename)
{
// Check if it's UTF-8 encoded
if('UTF-8' === mb_detect_encoding($filename, 'UTF-8', true))
{
$encodedNames[] = $filename;
}
else
{
$errors[] = $filename;
}
}
// json_encode the UTF-8 filenames
$jsonString = json_encode($encodedNames);
// Log the other filenames here so you can deal with them later...
http://php.net/manual/en/function.mb-detect-encoding.php

Oracle connection to retrieve or insert Arabic values from database

I have this code in drupal 6 to retrieve arabic values from Oracle databse:
<?php
session_start();
$conn=oci_connect('localhost','pass','IP....');
$stid=oci_parse($conn,"select arabic_name from arabic_names_table");
oci_execute($stid);
if($row-oci_fetch_array($stid,OCI_ASSOC+OCI_RETURNS_NULLS))
{
$name_ar=$row['arabic_name'];
}
?>
When values are retrieved from the DB or inserted to the DB they appears like this ???
Please note:
My Oracle database reads normal Arabic characters. From PL/SQL I can insert arabic values
I have installed the mbstring
I have the utf-8 encoding enabled.
How can I solve this problem?

From the oracle database, when you try to fetch data, normally you will get the character encoding will be the encoding type of the client installed in the system (the machine that you installed the php). This encoding will be the charset of the windows registry for the oracle client. (see HKEY_LOCAL_MACHINE\SOFTWARE\ORACLE\KEY_OraClient11g_home1), and the key is NLS_LANG. If you search the value of the above key, you will get something like ARABIC_UNITED ARAB EMIRATES.AR8MSWIN1256. Please note that the encoding type is AR8MSWIN1256. In the character map array this is mapped to windows-1256 ( windows-1256 => AR8MSWIN1256 ).
See this link http://websvn.projects.ez.no/wsvn/ezoracle/?op=comp&compare[]=%2Fstable#385&compare[]=%2Fstable#386.
That is, after you fetch the data from the database the char encoding will be windows-1256. Now if your web page is using utf-8 charset, you need to convert the string to utf-8. For this you can use iconv().
$win1256 = iconv('windows-1256', 'utf-8', $my_string); //$my_string -> windows-1256
echo $win1256; // Results the utf-8 format .
If you are still facing problem you check the charset in the page, it must be utf-8.
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
I think this will solve your problem.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Get UTF-8 from MSSQL with ODBC - php

Related

charset issue with accents retrieving DB2 values with php over odbc

JSON creating from PHP giving wrong data?

XOR encode a multibyte string and save to MySQL field without loss

How to ensure all data going in and out of a database is utf-8 encoded?

Oracle connection to retrieve or insert Arabic values from database

Categories

Resources