myrocks (mariadb + rocksdb) php charset - php

There are already plenty of posts about choosing the right charset for mysql, but it's again a different (and very frustrating) story for the rocksdb engine.
Firstly, I decided to use utf8-binary as charset (latin1, utf8-bin and binary are supported by myrocks) because my data may contain special chars and I want to be on the save side.
Furthermore, I am using PHP and PDO for loading data into mysql and the connection looks like this:
$pdo = new PDO('mysql:host=localhost;dbname=dbname;charset=utf8', 'user', 'password');
So I set the charset to utf8 (I also tried to use utf8_bin, but this is not supported by PDO). Although, I am able to insert some rows, sometimes I get errors like the following one:
Incorrect string value: '\xF0\x9F\x87\xA8\xF0\x9F...' for column 'column_name'
But what's the error now? This hex sequence encodes a unicode-smily (a regional indicator symbol letter c + regional indicator symbol letter n). Seems for me like valid utf8 and mysql as well as php are configured to use it.

You gotta have utf8mb4, not MySQL's subset utf8.
🇨 needs a 4-byte UTF-8 encoding, hex F09F87A8.
If rocksdb does not support it, then abandon either such characters, or rocksdb. Change the charset in the PDO call, and on the columns that need it.

Related

MySQL table name with accent. Invalid utf8 character string when updating via PDO

I'm in a situation where I need to update some rows in a table named "matrículas'. The query looks something like this:
UPDATE `matrículas` SET...
When I run this query in my SQL program (HeidiSQL) directly, it executes without problems. When I do it in PHP via a PDO object, I get the following error:
SQLSTATE[HY000]: General error: 1300 Invalid utf8 character string: 'matr\xEDculas'
My PDO object is set up like this:
$db= new PDO(
'mysql:host='.$credentials['host'].';dbname='.$credentials['dbname'].';charset=utf8',
$credentials['user'],
$credentials['password'],
array(PDO::MYSQL_ATTR_INIT_COMMAND => "SET NAMES utf8")
);
$db->setAttribute(PDO::ATTR_ERRMODE, PDO::ERRMODE_EXCEPTION);
The actual update is done by taking the above query and doing this:
$query = $this->db->prepare($sql);
$query->execute($params);
Both the table and the database were created using the utf8_general_ci collation.
Any ideas what I'm doing wrong? btw, I'm currently testing in Windows in case that has anything to do with it...
ERROR 1300 (HY000): Invalid utf8 character string: 'matr\xEDculas'
The \xNN notation gives the hex encoded value for the invalid byte(s) in the character string.
Unicode code point 237 (í), when encoded in utf-8, is a 2-byte character that is encoded as 0xC3 0xAD... but the error shows 0xED, which happens to be the ISO/IEC-8859-1 (Latin1) encoding for the character í.
Since the error is related to a column name being passed from the script rather than external data, that suggested what turned out to be the issue -- that the PHP script, itself, had the column name encoded incorrectly, since the character set in which the script had been saved was ISO-8859-1 rather than UTF-8.
`matrículas`
this is cp866-gp2312 encoding
please change it to utf-8 like matriculas
i having the different encoding style
If you must use accented letters in table names, then they must be encoded in UTF-8 in the client.
That is, it is not a PDO problem, but an encoding problem is your source editor/language/whatever.

Convert from ISO-8859-2 to ORACLE char set AL16UTF16

Hello this is a follow up questing from yesterday,
I have a php script that is parsing a website. I am getting strings in UTF-8 now i want to insert those strings into my Oracle database which uses:
NLS_NCHAR_CHARACTERSET = AL16UTF16
NLS_CHARACTERSET = EE8ISO8859P2
I have tried with :
$rep = iconv("UTF-8","AL-16UTF-16",$string);
// FAILS - produces ?? in database or scripts fails with "wrong charset"
I have also tried with
$rep = iconv("UTF-8","ISO-8859-2",$string);
$rep1 = iconv("UTF-8","AL-16UTF-16",$rep);
same as above ... fails with ?? in database.
Anyone has any idea what should i try next?
The OCI driver implicitly handles charset conversion. When connecting, ensure you set your charset as UTF-8:
oci_connect($username, $password, $connection_string, 'UTF-8');
This tells OCI to expect you to provide strings in UTF8 format and to provide resultsets in UTF8, converted from the database charset. From the manual (emphasis mine):
Determines the character set used by the Oracle Client libraries. The character set does not need to match the character set used by the database. If it doesn't match, Oracle will do its best to convert data to and from the database character set. Depending on the character sets this may not give usable results. Conversion also adds some time overhead.
This means, assuming that the strings you want to input are in UTF8, that you shouldn't need to use iconv() at all. Just let OCI handle that for you.

Superscript character in PHP causing a MySQLi select query to find 0 rows

I am using PHP 5.3.3 and MySQL 5.1.61. The column in question is using UTF-8 encoding and the PHP file is encoded in UTF-8 without BOM.
When doing a MySQLi query with a ² character in SQLyog on Windows, the query executes properly and the correct search result displays.
If I do this same exact query in PHP, it will execute but will show 0 affected_rows.
Here's what I tried:
Using both LIKE instead of =
Changing the encoding of the PHP file to ANSI, UTF-8 without BOM, and UTF-8
Doing 'SET NAMES utf-8' and 'latin1' before running the query
Did header('Content-Type: text/html; charset=UTF-8'); in PHP
Escaping using MySQLi::real_escape_string
Doing a filter_var($String, FILTER_SANITIZE_STRING)
Tried a MySQLi stmt bind
The only way I could get it to work properly is if I swapped the ² for a % and changed = to LIKE in PHP.
How can I get it query properly in PHP when using the ²?
You should be able to get the query to work by ensuring the following:
Prepping PHP for UTF-8
You first need to make sure the PHP pages that will be issuing these queries are served as UTF-8 encoded pages. This will ensure that any UTF-8 output coming from the database is displayed properly. In Firefox, you can check to see if this is the case by visiting the page you're interested in and using the View Page Info menu item. When you do so, you should see UTF-8 as the value for the page's Encoding. If the page isn't being served as UTF-8, you can do so one of two ways. Either you can set the encoding in a call to header(), like this:
header('Content-Type: text/html; charset=UTF-8');
Or, you can use a meta tag in your page's head block:
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
Prepping MySQL for UTF-8
Next up, you need to make sure the database is set up to use the UTF-8 encoding. This can be set at the server, database, table, or column levels. If you're on a shared host, you probably can only control the table and column levels of your hierarchy. If you have control of the server or database, you can check to see what character encoding they are using by issuing these two commands:
SHOW VARIABLES LIKE 'character_set_system';
SHOW VARIABLES LIKE 'character_set_database';
Changing the database level encoding can be done using a command like this:
(CREATE | ALTER) DATABASE ... DEFAULT CHARACTER SET utf8;
To see what character encoding a table uses, simply do:
SHOW CREATE TABLE myTable;
Similarly, here's how to change a table-level encoding:
(CREATE | ALTER) TABLE ... DEFAULT CHARACTER SET utf8;
I recommend setting the encoding as high as you possibly can in the hierarchy. This way, you don't have to remember to manually set it for new tables. Now, if your character encoding for a table is not already set to UTF-8, you can attempt to convert it using an alter statement like this:
ALTER TABLE ... CONVERT TO CHARACTER SET utf8;
Be very careful about using this statement! If you already have UTF-8 values in your tables, they may become corrupted when you attempt to convert. There are some ways to get around this, however.
Forcing MySQLi to Use UTF-8
Finally, before you connect to your database, make sure you issue the appropriate call to say that you are using the UTF-8 encoding. Here's how:
$db = new mysqli(DB_HOST, DB_USERNAME, DB_PASSWORD, DB_NAME);
// Change the character set to UTF-8 (have to do it early)
if(! $db->set_charset("utf8"))
{
printf("Error loading character set utf8: %sn", $db->error);
}
Once you do that, everything should hopefully work as expected. The only characters you need to worry about encoding are the big 5 for HTML: <, >, ', ", and &. You can handle that using the htmlspecialchars() function.
If you want to read more (and get links to additional resources), feel free to check out the articles I wrote about this process. There are two parts: Unicode and the Web: Part 1, and Unicode and the Web: Part 2. Good luck!

UTF-8 Database Problem

I've a MySQL table that has a UTF-8 charset and upon attempting to insert to it via a PHP form, the database gives the following error:
PDOStatement::execute():
SQLSTATE[HY000]: General error: 1366
Incorrect string value: '\xE8' for
column ...
The character in question is 'è', yet I don't see why this should be a problem considering the database and table are set to UTF-8.
Edit
I've tried directly from the mysql terminal and have the same problem.
Your database might be set to UTF-8, but the database connection also needs to be set to UTF-8. You should do that with a SET NAMES utf8 statement. You can use the driver_options in PDO to have it execute that as soon as you connect:
$handle = new PDO("mysql:host=localhost;dbname=dbname",
'username', 'password',
array(PDO::MYSQL_ATTR_INIT_COMMAND => "SET NAMES utf8"));
Have a look at the following two links for more detailed information about making sure your entire site uses UTF-8 appropriately:
UTF-8 all the way through…
UTF8, PHP and MySQL
E8 is greater than the maximum usable character 7F in a one-byte UTF8 character: http://en.wikipedia.org/wiki/UTF-8
It seems your connection is not set to UTF8 but some other 8 bit encoding like ISO Latin. If you set the database to UTF8 you only change the character set the database uses internally, connections may be on a different default value (latin1 for older MySQL versions) so you should try to send an initial SET CHARACTER SET utf-8 after connecting to the database. If you have access to my.cnf you can also set the correct default value there, but keep in mind that changing the default may break any other sites/apps running on the same host.
Before passing the value to Mysql you can use the following code:
$val = mb_check_encoding($val, 'UTF-8') ? $val : utf8_encode($val);
convert the string the to UTF-8, If it's matter of only one field.

PHP + SQL Server - How to set charset for connection?

I'm trying to store some data in a SQL Server database through php.
Problem is that special chars aren't converted properly. My app's charset is iso-8859-1
and the one used by the server is windows-1252.
Converting the data manually before inserting doesn't help, there seems to be some
conversion going on.
Running the SQL query 'set char_convert off' doesn't help either.
Anyone have any idea how I can get this to work?
EDIT: I have tried ini_set('mssql.charset', 'windows-1252'); as well, but no result with that one either.
Client charset is necessary but not sufficient:
ini_set('mssql.charset', 'UTF-8');
I searched for two days how to insert UTF-8 data (from web forms) into MSSQL 2008 through PHP. I read everywhere that you can't, you need to convert to UCS2 first (like cypher's solution recommends).
On Windows SQLSRV said to be a good solution, which I couldn't try, since I am developing on Mac OSX.
However, FreeTDS manual (what PHP mssql uses on OSX) says to add a letter "N" before the opening quote:
mssql_query("INSERT INTO table (nvarcharField) VALUES (N'űáúőűá球最大的采购批发平台')", +xon);
According to this discussion, N character tells the server to convert to Unicode.
https://softwareengineering.stackexchange.com/questions/155859/why-do-we-need-to-put-n-before-strings-in-microsoft-sql-server
I had the same problem and ini_set('mssql.charset', 'utf-8') did not work for me.
However, it worked in uppercase:
ini_set('mssql.charset', 'UTF-8');
I suggest looking at the following points:
Ensure that the columns that you're storing the information in are nchar or nvarchar as char and nvarchar don't support UCS-2 (SQLServer doesn't store in UTF-8 format btw)
If you're connecting with the mssql library/extension for PHP, run: ini_set('mssql.charset', 'utf-8'); as there's no function with a charset argument (connect, query etc)
Ensure that your browsers charset is also set to UTF-8
If ini_set('mssql.charset', 'UTF-8'); doesn't help AND you don't have root access to modify the system wide freetds.conf file, here's what you can do:
1. Set up /your/local/freetds.conf file:
[sqlservername]
host=192.168.0.56
port=1433
tds version=7.0
client charset=UTF-8
2. Make sure your connection DSN is using the servername, not the IP:
'dsn' => 'dblib:host=sqlservername;dbname=yourdb
3. Make FreeTDS to use your local freetds.conf file as an unprivileged user from php script via env variables:
putenv('FREETDSCONF=/your/local/freetds.conf');
If you are using TDS protocol version 7 or above, ALL communications over the wire are converted to UCS2. The server will convert from UCS2 into whatever the table or column collation is set to, unless the column is nvarchar or ntext. You can store UTF-8 into regular varchar or text, you just have to use a TDS protocol version lower than 7, like 6.0 or 4.2. The only drawback with this method is that you cannot query any nvarchar, ntext, or sys.* tables (I think you also can't do any CAST()ing) - as the server refuses to send anything that might possibly be converted to UTF-8 to any client using protocol version lower than 7.
It is not possible to avoid converting character sets when using TDS protocol version 7 or higher (roughly equivalent to MSSQL 2005 or newer).
In my case, It worked after I added the "CharacterSet" parameters into sqlsrv_connect() connection's option.
$connectionInfo = array(
"Database"=>$DBNAME,
"ConnectionPooling"=>0,
"CharacterSet"=>"UTF-8"
);
$LAST_CONNECTION = sqlsrv_connect($DBSERVER, $connectionInfo);
See documentation here :
https://learn.microsoft.com/en-us/sql/connect/php/connection-options?view=sql-server-2017
I've had luck in a similar situation (using a PDO ODBD connection) using the following code to convert the encoding before printing output:
$data = mb_convert_encoding($data, 'ISO-8859-1', 'windows-1252');
I had to manually set the source encoding, because it was erroneously being reported as 'ISO-8859-1' by mb_detect_encoding().
My data was also being stored in the database by another application, so I might be in a unique situation, although I hope it helps!
For me editing this file:
/etc/freetds/freetds.conf
...and changing/setting 'tds version' parameter to '7.0' helped. Edit your freetds.conf and try to change this parameter for your server configuration (or global).
It will work even without apache restart.
I did not notice someone to mention another way of converting results from MSSQL database. The good old iconv() function:
iconv (string $in_charset, string $out_charset, string $str): string;
In my case everything else failed to provide meaningful conversion, except this one when getting the results. Of course, this is done inside the loop of parsing the results of the query - from CP1251 to UTF-8:
foreach ($records as $row=>$col) {
$array[$row]['StatusName'] = iconv ('CP1251', 'UTF-8' , $records[$row]['StatusName']);
}
Ugly, but it works.
Can't you just convert your tables to your application encoding? Or use utf-8 in both?
I don't know whether MSSQL supports table-level encodings, though.
Also, try the MB (multibyte) string functions, if the above fails.
You should set the charset with ini_set('mssql.charset', 'windows-1252') before the connection. If you use it after the mssql_connect it has no effect.
Just adding ini_set('mssql.charset', 'UTF-8'); didn't help me in my case. I had to specify the UTF-8 character set on the column:
$age = 30;
$name = utf8_encode("Joe");
$select = sqlsrv_query($conn, "SELECT * FROM Users WHERE Age = ? AND Name = ?",
array(array($age), array($name, SQLSRV_PARAM_IN, SQLSRV_PHPTYPE_STRING('UTF-8')));
You can use the mysql_set_charset function:
http://it2.php.net/manual/en/function.mysql-set-charset.php

Categories