Problem with regional characters (polish) in php application and ms sql server - php

I have a problem with regional string characters inserted to MS SQL Server database.
There is a PHP application that connects with mssql server and inserts some data. But instead of inserting characters such as: ą, Ą, ć, Ł, ź (and so on - btw. these are polish regional characters), when inserted into mssql table they appear as a, A, c, L, z.
Here is some background:
I use freetds drivers for mssql connectivity
column type is nvarchar(MAX)
I looked at data sent "trough wire" (with WireShark) and the UTF-8 encoded data looks ok, for example Ą is sent as U+0104.
When inserting same string into local database instance (Microsoft SQL Server Express Edition) it works fine, but on remote host (in customer location - it is Microsoft SQL Server Standard Edition 64-bit) this "de-regionalization" occurs.
It seems like this remote mssql server doesn't handle this input data sent by php application right. Can anyone see/know what can be wrong here?

Maybe a rookie mistake, but I found the source of this problem according to KB #239530:
"When SQL Server converts a Unicode string without the N prefix from Unicode to the SQL Server database's code page, any characters in the Unicode string that do not exist in the SQL Server code page will be lost."
I suppose the local instance of sqlserver in some way treats all incoming string literals as Unicode strings and that remote server doesn't, thus requiring N prefix.

You can use this workaround, use htmlentities function before insert to database and html_entity_decode function after retrieve data from database. You can use ISO-8859-2 or Windows-1250 charset.

Related

How to properly pass form content with special characters to SQL Server nvarchar field from PHP/Linux app?

I have a web application on a Ubuntu/PHP7/nginx stack that for data reporting purposes passes data to a SQL Server 2008 database. I'm using the php7.0-sybase PDO package to connect to MSSQL. I use a prepared statement call to a stored procedure to insert the data. The fields on the SQL Server side are nvarchar fields of various lengths. The issue: form content from the web application with special characters (newlines, accented letters, etc) shows as all garbled (Chinese-esque) characters once inserted into the database. I suspect it is an encoding issue, but I haven't been able to pinpoint what exactly that issue is as I'm not knowledgeable in this area.
If I test doing an insert directly on SQL Server using T-SQL, the text with special characters shows correctly.
If I dump the encoding of the text before I send it from the web app, it tells me it is ASCII, which should be compatible with UTF-8, correct? I've also tried explicitly converting it to UTF-8 before sending.
Is there something I need to set in the configuration of my db driver? Or maybe the driver just doesn't know how to properly handle this data?
As a note, we've experimented with sending this type of data to a varchar field, and the text appears correctly, so this is an issue specifically with the nvarchar data type.
As of now, SQL Server does not support UTF-8 for stored data. The nvarchar data type corresponds to UTF-16.

PHP + MS SQL Server character coding

I have a Codeigniter project on my Ubuntu Linux server.
I don't use MySQL because I am connecting Microsoft SQL 2014 Express Server. (I am using FreeTDS on my Linux server)
The FreeTDS is working, I can connect MS SQL server, but I have a problem with character coding.
I am using Hungarian_CI_AS collaction in my MS SQL server and UTF-8 on FreeTDS (client charset = UTF-8) and Codigniter.
The problem: I have MS SQL field content: Igazgatósági előterjesztések
And this shows after SQL query: Igazgat�s�gi el?terjeszt�sek
(It doesn't show hungarian character ó, á, ő, é, etc.)
I think this is UTF-8 problem. I looked for this problem but I don't find any good tip. I tried ini_set('mssql.charset', 'UTF-8'); on the php.ini but this is not working.
I tried convert the string after the query. Example: UTF-8 to ISO8859-1 and UCS2 to UTF-8 and UCS2 to ISO8859-1 etc.
The best result when í, ó, ú, é character is working, but ő charater is not working.
What is the solution? Which charater coding does MS SQL Server use?
How to convert this string in order to work?
First, please check if you can type that character since some tool, different of your application, for example SQL Server Management Studio, then save it in the table and create an stored procedure with a variable, for example, that saves, that character, in both cases if you can see the character correctly, please follow the next steps:
Put some identifier for the special characters to replace the special character since your application, for example, if you need to save that "o" with double acute, replace for "_1" (think a better idea :P for the identifier).
Write a stored procedure to replace that "_1" again for that "o" with double acute but, since SQL Server, you write the character directly in the code of the stored procedure.
Is a little dirty work around and depends if you can type that character directly to SQL Server. I have had that problems before, and I don't remember exactly the final solution, but I remember this trick that let me continue working to find the final solution.
Plus: Did you check the type of the column that stores the character?, whatever, start trying that requested at the beginning.

How to correctly store/retrieve UTF-8 Data into MS-SQL tables

i'm having big trouble storing extended characters from UTF-8 in MS-SQL Server Tables.
Every extended character stored in in ASCII-extended range (8-bit). i didn't find any way to store and not distort the information stored or losing extended character info data.
my current setup for the server is:
IIS 7.5
PHP 5.6.11 (x86_64 VC11 Fast-CGI)
SQL Server Native Client for SQL Server 2008 R2 8 (Used via ODBC)
Why i'm not using SQL-Server driver Extension for PHP, simple, Microsoft don't have any available build version of that driver for 64-bit.
What i have tried so far:
my Fields are Nvarchar, Nchar datatype.
Used utf8_decode() to prepare the date for INSERT/UPDATE statements
Used utf8_encode() when i retrieve the Data with the SELECTs
Used mb_convert_encoding($string,[new_encoding],'UTF-8'); Where new encoding was Windows-1252, UCS-2, ASCII. (in subtitution of utf8_decode)
Used iconv(); With encoding CP437.(in subtitution of utf8_decode)
no success in any case.
what i'm experimenting when data is stored, and you make a SELECT directly into the database the extended characters in strings are shown as garbage.
lets say:
i write "ánimo" and that word its stored as "ánimo" when i retrieve from table to screen displays as "ánimo"
input ----> decode_utf8() ---> MS-SQL ----> encode_utf8()---> screen
any Ideas ???

About saving Unicode text into mssql database

I have a flash application that communicates with php to save data to nvarchar(1200) column. However when I change to different language support i.e locale, and type into the flash app the letters are good but in the db they are saved as question marks instead of the reall letters.
How can I solve this problem?
How to save the real letters in db?
Your database may not be configured to use UTF-8 encoding. SQL Server 7.0 and SQL Server 2000 use a different Unicode encoding (UCS-2) and do not recognize UTF-8 as valid character data.
Other versions of mssql may be similar.
See this for more information: http://support.microsoft.com/kb/232580
If that's not the issue, backtrack to PHP and test the encoding type on the data you are receiving. Make sure it matches what needs to be in your DB, or convert it first.

Php/ODBC encoding problem

I use ODBC to connect to SQL Server from PHP.
In PHP I read some string (nvarchar column) data from SQL Server and then want to insert it to mysql database. When I try to insert such value to mysql database table I get this mysql error:
Incorrect string value: '\xB3\xB9ow...' for column 'name' at row 1
For string with all ASCII characters everything is fine, the problem occurs when non-ASCII characters (from some European languages) exist.
So, in more general terms: there is a Unicode string in MS SQL Server database, which is retrieved by PHP trough ODBC. Then it is put in sql insert query (as value for utf-8 varchar column) which is executed for mysql database.
Can someone explain to me what is happening in this situation in terms of encoding? At which step what character encoding convertions may take place?
I use: PHP 5.2.5, MySQL5.0.45-community-nt, MS Sql Server 2005.
PHP have to run on Linux platform.
UPDATE: The error doesn't occur when I call utf8_encode($s) on this string and use that value in mysql insert query, but then the inserted string doesn't display correctly in mysql database (so that utf8 encoding only worked for enforcing proper utf8 string, but it loses correct characters).
First you have the encoding of the DB. Then you have the encoding used by the ODBC client.
If the encoding of your ODBC client connection does not match the one of the DB, the ODBC layer will automatically transcode your data, in some cases.
The trick here is to force the encoding of the ODBC client connection.
For an "all UTF-8" setup :
$conn=odbc_connect(DB_DSN,DB_USR,DB_PWD);
odbc_exec($conn, "SET NAMES 'UTF8'");
odbc_exec($conn, "SET client_encoding='UTF-8'");
// processing here
This works perfectly with PostgreSQL + Php 5.x.
The exact syntax and options depends on the DB vendor.
You can find very useful and clear additional info for MySql here : http://dev.mysql.com/doc/refman/5.0/fr/charset-connection.html
hope this helps.
Maybe you can use the PDO extension, if it will make any difference?
There is a user contributed comment here that suggests to change the data types in sql server to somethig else, if this is not possible look at the users class that casts fields.
I have no experience with ODBC via PHP, but with the mysql functions PHP seems to default to ASCII and UTF8 connections need to be made explicit if you want to avoid trouble.
Are you sure PHP and the MySQL server are communicating in UTF8? Until PHP 6 the Unicode support tends to be annoyingly inconistent like that.
I remember that the MySQL docs mention a connection string parameter to tweak the Unicode encoding.
From your description it sounds like PHP is treating the connection as ASCII-only.

Categories