PHP + MS SQL Server character coding

PHP + MS SQL Server character coding - php

I have a Codeigniter project on my Ubuntu Linux server.
I don't use MySQL because I am connecting Microsoft SQL 2014 Express Server. (I am using FreeTDS on my Linux server)
The FreeTDS is working, I can connect MS SQL server, but I have a problem with character coding.
I am using Hungarian_CI_AS collaction in my MS SQL server and UTF-8 on FreeTDS (client charset = UTF-8) and Codigniter.
The problem: I have MS SQL field content: Igazgatósági előterjesztések
And this shows after SQL query: Igazgat�s�gi el?terjeszt�sek
(It doesn't show hungarian character ó, á, ő, é, etc.)
I think this is UTF-8 problem. I looked for this problem but I don't find any good tip. I tried ini_set('mssql.charset', 'UTF-8'); on the php.ini but this is not working.
I tried convert the string after the query. Example: UTF-8 to ISO8859-1 and UCS2 to UTF-8 and UCS2 to ISO8859-1 etc.
The best result when í, ó, ú, é character is working, but ő charater is not working.
What is the solution? Which charater coding does MS SQL Server use?
How to convert this string in order to work?

First, please check if you can type that character since some tool, different of your application, for example SQL Server Management Studio, then save it in the table and create an stored procedure with a variable, for example, that saves, that character, in both cases if you can see the character correctly, please follow the next steps:
Put some identifier for the special characters to replace the special character since your application, for example, if you need to save that "o" with double acute, replace for "_1" (think a better idea :P for the identifier).
Write a stored procedure to replace that "_1" again for that "o" with double acute but, since SQL Server, you write the character directly in the code of the stored procedure.
Is a little dirty work around and depends if you can type that character directly to SQL Server. I have had that problems before, and I don't remember exactly the final solution, but I remember this trick that let me continue working to find the final solution.
Plus: Did you check the type of the column that stores the character?, whatever, start trying that requested at the beginning.

Related

Store special characters (german) SqlServer via php

I have a fedora machine acting as server, with apache running php 5.3
A scripts acts as an entry page for various sources sending me "messages".
The php script is called like: serverAddress/phpScript.php?message=MyMessage the message is then saved via PDO to connect to SqlServer 2008 db.
If the message contains any special characters (e.g. german), like: üäöß then in the db I will get some gibberish instead of the correct string: Ã¼Ã¤Ã¶ÃŸ
The db is perfectly capable of UTF-8 - I can connect and send/retrieve german characters without any issue with other tools (not via php).
Inside the php script:
if I echo the input string I get the correct string üäöß
if I save it to a file (log the input) I see the gibberish: Ã¼Ã¤Ã¶ÃŸ
What is causing this behavior? How can I fix it?
multibyte is enabled (yum install php-mbstring followed by a apache restart)
at the start of my php script I have:
mb_internal_encoding('UTF-8');
mb_http_output('UTF-8');
mb_http_input('UTF-8');
mb_language('uni');
mb_regex_encoding('UTF-8');
ob_start('mb_output_handler');
from what I understand the default encoding type when dealing with mssql via PDO is UTF-8
New development:
A colleague pointed me to the PDO_DBLIB page (visible only from cache in this moment) where I saw $res->bindValue(':value', iconv('UTF-8', 'ISO8859-1', $value);
I replaced all my $res->bindParam(':text',$text); with $res->bindParam(':text',iconv('UTF-8', 'ISO8859-1',$text)); and everything worked :).
The mb_internal_encoding.... and all other lines were no longer needed.
Why does it work when using the ISO8859-1 encoding?

A database may handle special characters without even supporting the Unicode set (which UTF-8 happens to be an encoding, specifically a variable-length one).
A character set is a mapping between numbers and characters. Unicode and ASCII are common examples of charsets. Unicode states that the sign € maps to the number 8364 (really it uses the code point U+20AC). UTF-8 is a way to encode Unicode code points, and represents U+20AC with three bytes: 0xE2 0x82 0xAC; UTF-16 is another encodind for Unicode code points, which always use two bytes: 0x20AC (link). Both of these encodings refer to the same 8364th entry in the Unicode catalogue.
ASCII is both a charset and an encoding scheme: the ASCII character set maps number from 0 to 127 to 128 human chars, and the ASCII encoding requires a single byte.
Always remember that a String is a human concept. It's represented in a computer by the tuple (byte_content, encoding). Let's say you want to store Unicode strings in your database. Please, note: it's not necessary to use the Unicode set if you just need to support German users. It's useful when you want to store Arabian, Chinese, Hebrew and German at the same time in the same column. MS SQLServer uses UCS-2 to encode Unicode, and this holds true for columns declared NCHAR or NVARCHAR (note the N prefix). So your first action will be checking if the target columns types are actually nvarchar (or nchar).
Then, let's assume that all input strings are UTF-8 encoded in your PHP script. You want to execute something like
$stmt->bindParam(':text', $utf8_encoded_text);
According to the documentation, UTF-8 is the default string encoding. I hope it's smart enough to work with NVARCHAR, otherwise you may need to use the extra options.
Your colleague's solution doesn't store Unicode strings: it converts in the ISO-8859-1 space, then saves the bytes in simple CHAR or VARCHAR columns. The difference is that you won't be able to store character outside of the ISO-8859-1 space (eg Polish)

Take a look at this article on "Handling Unicode Front to Back in a Web App". By far one of the best articles I've seen on the subject. If you follow the guide and the issues are still present, then you know for sure that it's not your fault.

About saving Unicode text into mssql database

I have a flash application that communicates with php to save data to nvarchar(1200) column. However when I change to different language support i.e locale, and type into the flash app the letters are good but in the db they are saved as question marks instead of the reall letters.
How can I solve this problem?
How to save the real letters in db?

Your database may not be configured to use UTF-8 encoding. SQL Server 7.0 and SQL Server 2000 use a different Unicode encoding (UCS-2) and do not recognize UTF-8 as valid character data.
Other versions of mssql may be similar.
See this for more information: http://support.microsoft.com/kb/232580
If that's not the issue, backtrack to PHP and test the encoding type on the data you are receiving. Make sure it matches what needs to be in your DB, or convert it first.

Error converting client characters into server's character set?

I have designed a web form using PHP as a server-side script that should insert data into a Sybase ASE database table using ODBC Functions.
When I fill the form fields with English word and ASCII Characters it works ok and saves data in the database but when I use Arabic and extended ASCII, UTF, or Unicode characters, I get the following error message:-
Warning: odbc_exec()
[function.odbc-exec]: SQL error:
[INTERSOLV][ODBC SQL Server
driver][SQL Server]Error converting
client characters into server's
character set. Some character(s) could
not be converted. , SQL state S1000 in
SQLExecDirect in
C:\wamp\www\website1\webpage1.php on
line 111
Is this because I have the settings on my database (or server) set incorrectly? In which case what should I change? and how do I change it?
Or, do I need to some function(s) to convert the extended ASCII characters? In which case, have the necessary functions already been writeen? and where can I find them?
How I can solve that problem?
Thanks for your help.

I discovered the solution of the problem afterwards which was that I am not saving the php file in the correct encoding using Save As and choosing the right encoding.
I would like to share that so that it may be helpful for others who had the same problem.
Sometimes We face problems that seems to be very difficult but their solutions are very simple as we discovered afterwards.

Problem with regional characters (polish) in php application and ms sql server

I have a problem with regional string characters inserted to MS SQL Server database.
There is a PHP application that connects with mssql server and inserts some data. But instead of inserting characters such as: ą, Ą, ć, Ł, ź (and so on - btw. these are polish regional characters), when inserted into mssql table they appear as a, A, c, L, z.
Here is some background:
I use freetds drivers for mssql connectivity
column type is nvarchar(MAX)
I looked at data sent "trough wire" (with WireShark) and the UTF-8 encoded data looks ok, for example Ą is sent as U+0104.
When inserting same string into local database instance (Microsoft SQL Server Express Edition) it works fine, but on remote host (in customer location - it is Microsoft SQL Server Standard Edition 64-bit) this "de-regionalization" occurs.
It seems like this remote mssql server doesn't handle this input data sent by php application right. Can anyone see/know what can be wrong here?

Maybe a rookie mistake, but I found the source of this problem according to KB #239530:
"When SQL Server converts a Unicode string without the N prefix from Unicode to the SQL Server database's code page, any characters in the Unicode string that do not exist in the SQL Server code page will be lost."
I suppose the local instance of sqlserver in some way treats all incoming string literals as Unicode strings and that remote server doesn't, thus requiring N prefix.

You can use this workaround, use htmlentities function before insert to database and html_entity_decode function after retrieve data from database. You can use ISO-8859-2 or Windows-1250 charset.

Php/ODBC encoding problem

I use ODBC to connect to SQL Server from PHP.
In PHP I read some string (nvarchar column) data from SQL Server and then want to insert it to mysql database. When I try to insert such value to mysql database table I get this mysql error:
Incorrect string value: '\xB3\xB9ow...' for column 'name' at row 1
For string with all ASCII characters everything is fine, the problem occurs when non-ASCII characters (from some European languages) exist.
So, in more general terms: there is a Unicode string in MS SQL Server database, which is retrieved by PHP trough ODBC. Then it is put in sql insert query (as value for utf-8 varchar column) which is executed for mysql database.
Can someone explain to me what is happening in this situation in terms of encoding? At which step what character encoding convertions may take place?
I use: PHP 5.2.5, MySQL5.0.45-community-nt, MS Sql Server 2005.
PHP have to run on Linux platform.
UPDATE: The error doesn't occur when I call utf8_encode($s) on this string and use that value in mysql insert query, but then the inserted string doesn't display correctly in mysql database (so that utf8 encoding only worked for enforcing proper utf8 string, but it loses correct characters).

First you have the encoding of the DB. Then you have the encoding used by the ODBC client.
If the encoding of your ODBC client connection does not match the one of the DB, the ODBC layer will automatically transcode your data, in some cases.
The trick here is to force the encoding of the ODBC client connection.
For an "all UTF-8" setup :
$conn=odbc_connect(DB_DSN,DB_USR,DB_PWD);
odbc_exec($conn, "SET NAMES 'UTF8'");
odbc_exec($conn, "SET client_encoding='UTF-8'");
// processing here
This works perfectly with PostgreSQL + Php 5.x.
The exact syntax and options depends on the DB vendor.
You can find very useful and clear additional info for MySql here : http://dev.mysql.com/doc/refman/5.0/fr/charset-connection.html
hope this helps.

Maybe you can use the PDO extension, if it will make any difference?
There is a user contributed comment here that suggests to change the data types in sql server to somethig else, if this is not possible look at the users class that casts fields.

I have no experience with ODBC via PHP, but with the mysql functions PHP seems to default to ASCII and UTF8 connections need to be made explicit if you want to avoid trouble.
Are you sure PHP and the MySQL server are communicating in UTF8? Until PHP 6 the Unicode support tends to be annoyingly inconistent like that.
I remember that the MySQL docs mention a connection string parameter to tweak the Unicode encoding.
From your description it sounds like PHP is treating the connection as ASCII-only.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.