Encoding problem when connecting to SQL Server database via odbc_connect() - php

Firstly, I don't have an option to use latest SQLSRV drivers on my host so I am stuck with odbc connection.
$connection_string = 'DRIVER={SQL Server};SERVER=111.111.111.111;DATABASE=MY_DATABASE';
$user = 'name';
$pass = 'pass';
$connection = odbc_connect( $connection_string, $user, $pass, SQL_CUR_USE_ODBC );
The collation of that database is Slovak_CI_AI. If I set my PHP header to utf-8, output data looks messed up, encoding is wrong.
If I put 'Slovak_CI_AI' as a charset to my PHP header, data displays fine, but it is probably a no go, because I need to work with that data in WordPress, which fails to process them if they contain special/non-english characters (those strings looks broken to WP).
I've tried many conversions with mb_convert_encoding, iconv or utf8_decode, but no luck. WordPress uses utf-8.
I can't find any solution for this.
Update: I've tried adding CHARSET=UTF8 to my odbc connection string, but no luck. Also I found out the character set for texts in the database is cp1250. I've tried setting cp1250 as a charset to my PHP header, output is fine but WordPress still fails once it encounters a special character. I've tried converting those strings from cp1250 to utf-8 with iconv, but no luck as well - strings have wrong encoding on output and WordPress fails as well.

This whole encoding thing still feels chaotic to me, but I somehow managed to do it. It works when:
odbc connection string contains charset=cp1250
PHP header character set is set to utf-8
I convert all problematic strings from cp1250 to utf-8 with iconv

Related

How has this mysql string been encoded and how can I replicate it?

Here are the hex values of two strings stored in a MySQL database using two different methods.
20C3AFC2BBC2BFC3A0C2A4E280A2C3A0C2A4C2BEC3A0C2A4C5A1C3A0C2A4E2809A20C3A0C2A4C2B6C3A0C2A4E280A2C3A0C2A5C28DC3A0C2A4C2A8C3A0C2A5E280B9C3A0C2A4C2AEC3A0C2A5C28DC3A0C2A4C2AFC3A0C2A4C2A4C3A0C2A5C28DC3A0C2A4C2A4C3A0C2A5C281C3A0C2A4C2AEC3A0C2A5C28D20C3A0C2A5C2A420C3A0C2A4C2A8C3A0C2A5E280B9C3A0C2A4C2AAC3A0C2A4C2B9C3A0C2A4C2BFC3A0C2A4C2A8C3A0C2A4C2B8C3A0C2A5C28DC3A0C2A4C2A4C3A0C2A4C2BF20C3A0C2A4C2AEC3A0C2A4C2BEC3A0C2A4C2AEC3A0C2A5C28D20C3A0C2A5C2A5
and
E0A495E0A4BEE0A49AE0A48220E0A4B6E0A495E0A58DE0A4A8E0A58BE0A4AEE0A58DE0A4AFE0A4A4E0A58DE0A4A4E0A581E0A4AEE0A58D20E0A5A420E0A4A8E0A58BE0A4AAE0A4B9E0A4BFE0A4A8E0A4B8E0A58DE0A4A4E0A4BF20E0A4AEE0A4BEE0A4AEE0A58D20E0A5A5
They represent the string काचं शक्नोम्यत्तुम् । नोपहिनस्ति माम् ॥. The former appears to be encoded badly, but works in the application, the latter appears encoded correctly but does not. I need to be able to create the first hex string from the input.
Here comes the long version: I've got a legacy application built in PHP/MySQL. The database connection charset is latin1. The charset of the table is utf8 (don't ask). The input is coerced into being correct utf8 via the ForceUTF8 composer library. Looking directly in the database, the stored value of this string is काचं शकà¥à¤¨à¥‹à¤®à¥à¤¯à¤¤à¥à¤¤à¥à¤®à¥ । नोपहिनसà¥à¤¤à¤¿ मामॠ॥
I am aware that this looks horrendous and appears to me to be badly encoded, however it is out of scope to fix the legacy application. The rest of the application is able to cope with this data as it is and everything else works and displays perfectly well with it.
I have created an external node application to replace the current insert routine running on Azure. I've set the connection charset to latin1, it's connecting to the same database and running the same insert statement. The only part of the puzzle I've not been able to replicate is the ForceUTF8 library as I could find no equivalent in the npm ecosystem. When the same string is inserted it renders perfectly when looking at the raw field in PHP Storm i.e. it looks exactly like the original text above, and the hex value of the string is the latter of the two presented at the top of the question. However, when viewed in the application the values are corrupted by question marks and black diamonds.
If, within the PHP application, I run SET NAMES utf8 ahead of the rendering data query then the node-inserted values render correctly, and the legacy ones now display as corrupted. Adding set names utf8 to the application for this query is not an acceptable solution since it breaks the appearance of the legacy data, and fixing the legacy data is also not an acceptable solution.
I have tried all sorts of connection charsets and various Iconv functions to make the data exactly match how the legacy app makes it but have not been able to "break it" in exactly the same way.
How can I make "काचं शक्नोम्यत्तुम् । नोपहिनस्ति माम् ॥" into a string, the hex value of which is "20C3AFC2BBC2BFC3A0C2A4E280A2C3A0C2A4C2BEC3A0C2A4C5A1C3A0C2A4E2809A20C3A0C2A4C2B6C3A0C2A4E280A2C3A0C2A5C28DC3A0C2A4C2A8C3A0C2A5E280B9C3A0C2A4C2AEC3A0C2A5C28DC3A0C2A4C2AFC3A0C2A4C2A4C3A0C2A5C28DC3A0C2A4C2A4C3A0C2A5C281C3A0C2A4C2AEC3A0C2A5C28D20C3A0C2A5C2A420C3A0C2A4C2A8C3A0C2A5E280B9C3A0C2A4C2AAC3A0C2A4C2B9C3A0C2A4C2BFC3A0C2A4C2A8C3A0C2A4C2B8C3A0C2A5C28DC3A0C2A4C2A4C3A0C2A4C2BF20C3A0C2A4C2AEC3A0C2A4C2BEC3A0C2A4C2AEC3A0C2A5C28D20C3A0C2A5C2A5" using some variation of database connection charset and string conversion?
I'm not familiar with PHP, but I was able to generate the "horrendous" encoding with Python (and it is horrendous...not sure how someone intentionally generated this crap). Hopefully this guides you to a solution:
import re
expected = '20C3AFC2BBC2BFC3A0C2A4E280A2C3A0C2A4C2BEC3A0C2A4C5A1C3A0C2A4E2809A20C3A0C2A4C2B6C3A0C2A4E280A2C3A0C2A5C28DC3A0C2A4C2A8C3A0C2A5E280B9C3A0C2A4C2AEC3A0C2A5C28DC3A0C2A4C2AFC3A0C2A4C2A4C3A0C2A5C28DC3A0C2A4C2A4C3A0C2A5C281C3A0C2A4C2AEC3A0C2A5C28D20C3A0C2A5C2A420C3A0C2A4C2A8C3A0C2A5E280B9C3A0C2A4C2AAC3A0C2A4C2B9C3A0C2A4C2BFC3A0C2A4C2A8C3A0C2A4C2B8C3A0C2A5C28DC3A0C2A4C2A4C3A0C2A4C2BF20C3A0C2A4C2AEC3A0C2A4C2BEC3A0C2A4C2AEC3A0C2A5C28D20C3A0C2A5C2A5'
original = 'काचं शक्नोम्यत्तुम् । नोपहिनस्ति माम् ॥'
# Encode in UTF-8 w/ BOM (U+FEFF encoded in UTF-8 as a signature)
step1 = original.encode('utf-8-sig')
# Windows-1252 doesn't define some byte -> codepoint mappings and Python normally
# raises an error on those bytes. Use an error handler to keep the bytes that
# fail, then replace the escape codes with the matching Unicode codepoint.
step2 = step1.decode('cp1252',errors='backslashreplace')
step3 = re.sub(r'\\x([0-9a-f]{2})', lambda x: chr(int(x.group(1),16)), step2)
# There is an extra space before the UTF-8-encoded BOM for some reason
step4 = ' ' + step3
step5 = step4.encode('utf8')
# Format to match expected string
final = step5.hex().upper()
print(final == expected) # True
HEX('काचं') = 'E0A495E0A4BEE0A49AE0A482'
-- utf8mb4 to utf8mb4 hex
HEX(CONVERT(CONVERT(BINARY('काचं') USING latin1) USING utf8mb4)) = 'C3A0C2A4E280A2C3A0C2A4C2BEC3A0C2A4C5A1C3A0C2A4E2809A' is utf8mb4 to double-encoded
See "double-encoding" in Trouble with UTF-8 characters; what I see is not what I stored
More
"Double-encoding", as I understand it, is where utf8 bytes (up to 4 bytes per "character") are treated as latin1 (or cpnnnn) and converted to utf8, and then that happens a second time. In this case, each 3-byte Devanagari is converted twice, leading to between 6 and 9 bytes.
You explained the cause here:
The database connection charset is latin1. The charset of the table is utf8
BOM is, in my opinion, a red herring. It was intended to be a useful clue that a "text" file was encoded in UTF-8, but unfortunately, very few products generate it. Hence, BOM is more of a distraction than a help. (I don't think MySQL has any way to take care of BOM -- after all, most database activity is at the row level, not the file level.)
The solution (for the data flow) in MySQL context is to rip out all "conversion" functions and, instead, configure things so that MySQL will convert at the appropriate places. Your mention of "latin1" was the main "mis-configuration".
The long expression (HEX...) gives a clue of how to fix the data, but it must be coordinated with changes to configuration and changes to code.

I get an Ansi string instead of Utf-8 from Utf-8 mysql table [duplicate]

This question already has answers here:
UTF-8 all the way through
(13 answers)
Closed 4 years ago.
When I moved from php mysql shared hosting to my own VPS I've found that code which outputs user names in UTF8 from mysql database outputs ?�??????� instead of 鬼神❗. My page has utf-8 encoding, and I have default_charset = "UTF-8" in php.ini, and header('Content-Type: text/html; charset=utf-8'); in my php file, as well as <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> in html part of it.
My database has collation utf8_bin, and table has the same. On both previos and current hosting in phpmyadmin for this database record I see: 鬼神❗. When I create ANSI text file in Notepad++, paste 鬼神❗ into it and select Encoding->Encode in UTF-8 menu I see 鬼神❗, so I suppose it is correct encoded UTF-8 string.
Ok, and then I added
init_connect='SET collation_connection = utf8_general_bin'
init_connect='SET NAMES utf8'
character-set-server=utf8
collation-server=utf8_general_bin
skip-character-set-client-handshake
to my.cnf and now my page shows 鬼神❗ instead of ?�??????�. This is the same output I get in phpmyadmin on both hostings, so I'm on a right way. And still somehow on my old hosting the same php script returns utf-8 web page with name 鬼神❗ while on new hosting - 鬼神❗. It looks like the string is twice utf-8 encoded: I get utf-8 string, I give it as ansi string to Notepad++ and it encodes it in correct utf-8 string.
However when I try utf8_encode() I get й¬ÑзÒÑвÑâ, and utf8_decode() returns ?�???????. The same result return mb_convert_encoding($name,"UTF-8","ISO-8859-1"); and iconv( "ISO-8859-1","UTF-8", $name);.
So how could I reproduce the same conversion Notepad++ does?
See answer below.
The solution was simple yet not obvious for me, as I never saw my.cnf on that shared hosting: it seems that that server had settings as follows
init_connect='SET collation_connection = cp1252'
init_connect='SET NAMES cp1252'
character-set-server=cp1252
So to make no harm to other code on my new server I have to place mysql_query("SET NAMES CP1252"); on top of each php script which works with utf8 strings.
The trick here was script gets a string as is (ansi) and outputs it, and when browser is warned that page is in utf-8 encoding it just renders my strings as utf-8.

Stuck writing UTF-8 file via PHP's fwrite

I can't figure out what I'm doing wrong. I'm getting file content from the database. When I echo the content, everything displays just fine, when I write it to a file (.html) it breaks. I've tried iconv and a few other solutions, but I just don't understand what I should put for the first parameter, I've tried blanks, and that didn't work very well either. I assume it's coming out of the DB as UTF-8 if it's echoing properly. Been stuck a little while now without much luck.
function file($fileName, $content) {
if (!file_exists("out/".$fileName)) {
$file_handle = fopen(DOCROOT . "out/".$fileName, "wb") or die("can't open file");
fwrite($file_handle, iconv('UTF-8', 'UTF-8', $content));
fclose($file_handle);
return TRUE;
} else {
return FALSE;
}
}
Source of the html file looks like.
Comes out of the DB like this:
<h5>Текущая стабильная версия CMS</h5>
goes in file like this
<h5>Ð¢ÐµÐºÑƒÑ‰Ð°Ñ ÑÑ‚Ð°Ð±Ð¸Ð»ÑŒÐ½Ð°Ñ Ð²ÐµÑ€ÑÐ¸Ñ CMS</h5>
EDIT:
Turns out the root of the problem was Apache serving the files incorrectly. Adding
AddDefaultCharset utf-8
To my .htaccess file fixed it. Hours wasted... At least I learned something though.
Edit: The database encoding does not seem to be the issue here, so this part of the answer is retained for information only
I assume it's coming out of the DB as UTF-8
This is most likely your problem, what database type do you use? Have you set the character encoding and collation details for the database, the table, the connection and the transfer.
If I was to hazard a guess, I would say your table is MySQL and that your MySQL collation for the database / table / column should all be UTF8_general_ci ?
However, for some reason MySQL UTF8 is not actually UTF8, as it stores its data in 3bits rather than 4bits, so can not store the whole UTF-8 Character sets, see UTF-8 all the way through .
So you need to go through every table, column on your MySQL and change it from UTF8_ to the UTF8mb4_ (note: since MySQL 5.5.3) which is UTF8_multibyte_4 which covers the whole UTF-8 Spectrum of characters.
Also if you do any PHP work on the data strings be aware you should be using mb_ PHP functions for multibyte encodings.
And finally, you need to specify a connection character set for the database, don't run with the default one as it will almost certainly not be UTF8mb4, and hence you can have the correct data in the database, but then that data is repackaged as 3bit UTF8 before then being treated as 4bit UTF8 by PHP at the other end.
Hope this helps, and if your DB is not MySQL, let us know what it is!
Edit:
function file($fileName, $content) {
if (!file_exists("out/".$fileName)) {
$file_handle = fopen(DOCROOT . "out/".$fileName, "wb") or die("can't open file");
fwrite($file_handle, iconv('UTF-8', 'UTF-8', $content));
fclose($file_handle);
return TRUE;
} else {
return FALSE;
}
}
your $file_handle is trying to open a file inside an if statement that will only run if the file does not exist.
Your iconv is worthless here, turning from "utf-8" to er, "utf-8". character detection is extremely haphazard and hard for programs to do correctly so it's generally advised not to try and work out / guess what a character encoding it, you need to know what it is and tell the function what it is.
The comment by Dean is actually very important. The HTML should have a <meta charset="UTF-8"> inside <head>.
That iconv call is actually not useful and, if you are right that you are getting your content as UTF-8, it is not necessary.
You should check the character set of your database connection. Your database can be encoded in UTF-8 but the connection could be in another character set.
Good luck!

Mysql insert text data truncated by weird character encodings

I'm importing data from a CSV file that comes from excel, but i can't seem to insert my data correctly. This data contains french accented characters and if i open the CSV with OpenOffice (i don't use excel) i just select UTF-8 and the data gets converted and shown fine.
If i try to read that into php memory, i can see they are UTF-8 encoded strings if i use MB_DETECT_ENCODING. I connect to a database and specify all UTF-8 charsets using:
mysql_query('SET character_set_results = "utf8", character_set_client = "utf8", character_set_connection = "utf8", character_set_database = "utf8", character_set_server = "utf8"');
And i can certify that my database contains UTF-8 only fields and tables.
What happens is that my content gets truncated at the first accented character. But that happens only in my php script it seems. I output all my data to the browser and if i copy the INSERT statement, it inserts the whole data.
There might be something going on between php and the browser output but i can certify that it's not in the programming of the script... Thus far, i was able to circumvent this issue by HTMLENTITY'ing all my data, but the problem is that my search engine is going coo-coo-crazy because of that...
Any reason or way you can spare would be really appreciated...
EDIT #1:
I searched for the default excel encoding of CSV data and found out it was CP1252. I tried using ICONV('CP1252', 'UTF-8//TRANSLIT', $data) and now, the accented characters seem to fit. I'm going to try it everywhere in my script to see if all my accented character issues are fix and post the solution if so...
After countless tries, i was able to fix all my encoding problems but some of them i still don't know why they happen. I hope this will give some help to someone else later:
function fixEncoding($data){
//Replace
return iconv('CP1252', 'UTF-8//TRANSLIT', $data);
}
I used this function now to recode my strings correctly. It seems that excel saves data as CP1252 and NOT utf-8.
Further more, it seems there is a bug with accented characters at the start of a string in a CSV if you use fgetcsv, so i had to forego usage of fgetcsv and create an alternate method cause i'm not in PHP 5.3, maybe str_getcsv could have fixed my issue i'm not sure but in the current case it couldn't cause i don't have the function. I even tried looking for ports and nothing seems to exist and work correctly.
This is my solution, although very ugly, it works for me:
function fgetcsv2($filepointer, $maxlen, $sep, $enc){
$data = fgets($filepointer, $maxlen);
if($data === false){
return false;
}
$data = explode($sep, $data);
return $data;
}
Good luck to all who get similar problems
I also had to work on such a project, and, seriously, PHPExcel was my savior to avoid any brainfuck.
P.S. : also, there is this link to help you getting started (in french).
I have just had a similar problem and although I tested the $value using MB_DETECT_ENCODING and it said it was UTF-8, it still truncated the data.
Not knowing what to convert from, I couldn't use the iconv function mentioned above.
However I forced it to UTF-8 using utf8_encode($value) and everything works fine now.
Which encoding are you using for your tables?
MB_DETECT_ENCODING is not 100% correct all the time and no encoding detecter can ever be that.

Dealing with eacute and other special characters using Oracle, PHP and Oci8

Hi I am trying to store names into an Oracle database and fetch them back using PHP and oci8.
However, if I insert the é directly into the Oracle database and use oci8 to fetch it back I just receive an e
Do I have to encode all special characters (including é) into html entities (ie: é) before inserting into database ... or am I missing something ?
Thx
UPDATE: Mar 1 at 18:40
found this function:
http://www.php.net/manual/en/function.utf8-decode.php#85034
function charset_decode_utf_8($string) {
if(#!ereg("[\200-\237]",$string) && #!ereg("[\241-\377]",$string)) {
return $string;
}
$string = preg_replace("/([\340-\357])([\200-\277])([\200-\277])/e","'&#'.((ord('\\1')-224)*4096 + (ord('\\2')-128)*64 + (ord('\\3')-128)).';'",$string);
$string = preg_replace("/([\300-\337])([\200-\277])/e","'&#'.((ord('\\1')-192)*64+(ord('\\2')-128)).';'",$string);
return $string;
}
seems to work, although not sure if its the optimal solution
UPDATE: Mar 8 at 15:45
Oracle's character set is ISO-8859-1.
in PHP I added:
putenv("NLS_LANG=AMERICAN_AMERICA.WE8ISO8859P1");
to force the oci8 connection to use that character set.
Retrieving the é using oci8 from PHP now worked ! (for varchars, but not CLOBs had to do utf8_encode to extract it )
So then I tried saving the data from PHP to Oracle ... and it doesnt work..somewhere along the way from PHP to Oracle the é becomes a ?
UPDATE: Mar 9 at 14:47
So getting closer.
After adding the NLS_LANG variable, doing direct oci8 inserts with é works.
The problem is actually on the PHP side.
By using ExtJs framework, when submitting a form it encodes it using encodeURIComponent.
So é is sent as %C3%A9 and then re-encoded into é.
However it's length is now 2 (strlen($my_sent_value) = 2) and not 1.
And if in PHP I try: $my_sent_value == é = FALSE
I think if I am able to re-encode all these characters in PHP back into lengths of byte size 1 and then inserting them into Oracle, it should work.
Still no luck though
UPDATE: Mar 10 at 11:05
I keep thinking I am so close (yet so far away).
putenv("NLS_LANG=AMERICAN_AMERICA.WE8ISO8859P9"); works very sporadicly.
I created a small php script to test:
header('Content-Type: text/plain; charset=ISO-8859-1');
putenv("NLS_LANG=AMERICAN_AMERICA.WE8ISO8859P9");
$conn= oci_connect("user", "pass", "DB");
$stmt = oci_parse($conn, "UPDATE temp_tb SET string_field = '|é|'");
oci_execute($stmt, OCI_COMMIT_ON_SUCCESS);
After running this once and loggin into the Oracle Database directly I see that STRING_FIELD is set to |¿|. Obviously not what I had come to expect from my previous experience.
However, if I refresh that PHP page twice quickly.... it worked !!!
In Oracle I correctly saw |é|.
It seems like maybe the environment variable is not being correctly set or sent in time for the first execution of the script, but is available for the second execution.
My next experiment is to export the variable into PHP's environment, however, I need to reset Apache for that...so we'll see what happens, hopefully it works.
I presume you are aware of these facts:
There are many different character sets: you have to pick one and, of course, know which one you are using.
Oracle is perfectly capable of storing text without HTML entities (é). HTML entities are used in, well, HTML. Oracle is not a web browser ;-)
You must also know that HTML entities are not bind to a specific charset; on the contrary, they're used to represent characters in a charset-independent context.
You indistinctly talk about ISO-8859-1 and UTF-8. What charset do you want to use? ISO-8859-1 is easy to use but it can only store text in some latin languages (such as Spanish) and it lacks some common chars like the € symbol. UTF-8 is trickier to use but it can store all characters defined by the Unicode consortium (which include everything you'll ever need).
Once you've taken the decision, you must configure Oracle to hold data in such charset and choose an appropriate column type. E.g., VARCHAR2 is fine for plain ASCII, NVARCHAR2 is good for UTF-8.
This is what I finally ended up doing to solve this problem:
Modified the profile of the daemon running PHP to have:
NLS_LANG=AMERICAN_AMERICA.WE8ISO8859P1
So that the oci8 connection uses ISO-8859-1.
Then in my PHP configuration set the default content-type to ISO-8859-1:
default_charset = "iso-8859-1"
When I am inserting into an Oracle Table via oci8 from PHP, I do:
utf8_decode($my_sent_value)
And when receiving data from Oracle, printing the variable should just work as so:
echo $my_received_value
However when sending that data over ajax I have had to use:
utf8_encode($my_received_value)
If you really cannot change the character set that oracle will use then how about Base64 encoding your data before storing it in the database. That way, you can accept characters from any character set and store them as ISO-8859-1 (because Base64 will output a subset of the ASCII character set which maps exactly to ISO-8859-1). Base64 encoding will increase the length of the string by, on average, 37%
If your data is only ever going to be displayed as HTML then you might as well store HTML entities as you suggested, but be aware that a single entity can be up to 10 characters per unencoded character e.g. ϑ is ϑ
I had to face this problem : the LatinAmerican special characters are stored as "?" or "¿" in my Oracle database ... I can't change the NLS_CHARACTER_SET because we're not the database owners.
So, I found a workaround :
1) ASP.NET code
Create a function that converts string to hexadecimal characters:
public string ConvertirStringAHex(String input)
{
Encoding encoding = System.Text.Encoding.GetEncoding("ISO-8859-1");
Byte[] stringBytes = encoding.GetBytes(input);
StringBuilder sbBytes = new StringBuilder(stringBytes.Length);
foreach (byte b in stringBytes)
{
sbBytes.AppendFormat("{0:X2}", b);
}
return sbBytes.ToString();
}
2) Apply the function above to the variable you want to encode, like this
myVariableHex = ConvertirStringZHex( myVariable );
In ORACLE, use the following:
PROCEDURE STORE_IN_TABLE( iTEXTO IN VARCHAR2 )
IS
BEGIN
INSERT INTO myTable( SPECIAL_TEXT )
VALUES ( UTL_RAW.CAST_TO_VARCHAR2(HEXTORAW( iTEXTO ));
COMMIT;
END;
Of course, iTEXTO is the Oracle parameter which receives the value of "myVariableHex" from ASP.NET code.
Hope it helps ... if there's something to improve pls don't hesitate to post your comments.
Sources:
http://www.nullskull.com/faq/834/convert-string-to-hex-and-hex-to-string-in-net.aspx
https://forums.oracle.com/thread/44799
If you have different charsets between the server side code (php in this case) and the Oracle database, you should set server side code charset in the Oracle connection, then Oracle made the conversion.
Example: Let's assume:
php charset utf-8 (default).
Oracle charset AMERICAN_AMERICA.WE8ISO8859P1
In the connection to Oracle made by php you should set UTF8 (third parameter).
oci_pconnect("USER", "PASS", "URL"),"UTF8");
Doing this, you write code in utf-8 (not doing any conversion at all) and get utf-8 from the database through this connection.
So you could write something like SELECT * FROM SOME_TABLE WHERE TEXT = 'SOME TEXT LIKE áéíóú Ñ' and also get utf-8 text as a result.
According to the php documentation, by default, Oracle client (oci_pconnect) takes the NLS_LANG environment variable from the Operating system. Some debian based systems has no NLS_LANG enviromental variable, so I think Oracle client use it's own charset (AMERICAN_AMERICA.WE8ISO8859P1) if we don't specify the third parameter.

Categories