How to get the character set from a resultset? - php

mysqli_result::fetch_field() returns a type property for each column, which is an integer value.
The integer value is the same for VARCHAR and VARBINARY (0xFD) columns and also for CHAR and BINARY (0xFE) columns. Those column types can be detected with MYSQLI_TYPE_STRING and MYSQLI_TYPE_VAR_STRING constants.
To know if a string column is BINARY (they have a collation called binary), or to know if columns need to be converted to another character set, the character set name is really needed.
But... mysqli_result::fetch_field() has a charsetnr property which again returns an integer for the character set. Only this time there seems to be no way of knowing the character set name, let alone the collation?
So how can one get the character set names from mysqli_result ?

You do not need to know character set to see whether the field is binary.
Consider this example:
if ($result = $mysqli->query($query)) {
/* Get field information for all columns */
$finfo = $result->fetch_fields();
foreach ($finfo as $val) {
printf("Name: %s\n", $val->name);
printf("Table: %s\n", $val->table);
printf("Max. Len: %d\n", $val->max_length);
printf("Length: %d\n", $val->length);
printf("charsetnr: %d\n", $val->charsetnr);
printf("Flags: %d\n", $val->flags);
printf("Type: %d\n\n", $val->type);
}
$result->free();
}
You have $field->flags property. Check it for bit 128 (0x80). If the bit is set then the field is binary (BINARY, VARBINARY) and has "binary" collation.
I am not sure that you can set "binary" collation on non-binary field.

TL;DR charsetnr is the ID of the collation as listed by SHOW COLLATION.
I couldn't help noticing that even for numeric columns charsetnr was set to 63. This pointed me towards the manual that says:
To distinguish between binary and nonbinary data for string data types, check whether the charsetnr value is 63. If so, the character set is binary, which indicates binary rather than nonbinary data. This enables you to distinguish BINARY from CHAR, VARBINARY from VARCHAR, and the BLOB types from the TEXT types.
Retrieve more information about the collation and character set via:
SELECT `COLLATION_NAME`
, `CHARACTER_SET_NAME`
, `IS_DEFAULT`
, `IS_COMPILED`
, `SORTLEN`
FROM `INFORMATION_SCHEMA`.`COLLATIONS`
WHERE `ID` = ?;

SELECT IFNULL(COLLATION_NAME, 'binary')
FROM `COLUMNS`
WHERE table_schema = 'biglim'
AND table_name = 'article'
will find the collation associated with a particular table in a particular database (schema).

Related

In SQL, can't set a binary to 1

I have created a table in MySQL with a binary field name 'active' and set to NULL by default.
But when I want to update it with this command:
$update_users = $bdd -> query("UPDATE users SET `active` = 1 WHERE `id` = '$data1' LIMIT 1") or die(mysql_error());
The field is updated to 31 ! and not to 1.
I have also tried
SET `active` = true
but same result.
p.s: I have set 'active' using phpMyAdmin interface (no SQL statement), but here are the values of this field: Type => binary(1), Null => yes, Defaut => NULL
0x31 == 49 is the ASCII code of character '1'.
The value is stored in the column but, because of its type, it is returned in a special format: each byte from the value is displayed in its hexadecimal representation (2 uppercase hex digits). Depending on the MySQL client you use, the 0x hexadecimal prefix may or may not be present in the output.
As the documentation explains, the BINARY type contains strings (similar to CHAR types). That's why the numeric value you want to insert (1) is handled as a string and it becomes the character '1').
In order to store the number 1 into a BINARY column you have to store the character '0x01' in it:
UPDATE users SET `active` = 0x01 WHERE `id` = '$data1' LIMIT 1
Or you better use the TINYINT type instead.
I think you are confusing BINARY. According to the documentation:
The BINARY and VARBINARY types are similar to CHAR and VARCHAR, except that they contain binary strings rather than nonbinary strings. That is, they contain byte strings rather than character strings. This means they have the binary character set and collation, and comparison and sorting are based on the numeric values of the bytes in the values.
From looking at your question I assume what you really want is something like a boolean value (0 or 1) and not binary data.
In this case either use TINYINT(1) or BIT(1).

Not receiving correct data from firebird database when using PHP

I have some very odd situation. I am making a query to a firebird database and there is mismatch with the result in PHP. In DB the result is just fine, but when it comes to PHP there are different values.
The query:
SELECT LIST(t."ID", ',') ID,t."Date", LIST(n."Name",',') Name
FROM "Tests" t
LEFT JOIN "Names of tests" n ON t."Name ID" = n."ID"
WHERE t."Locked" = 0
GROUP BY t."Date"
ORDER BY t."Date" DESC
Result in DB:
ID = 546,552 Date = 23.10.2015 Name = Математика (тест),География(тест)
Result in PHP:
ID => 0x0000000200000000,
Date => 2015-10-23,
Name => 0x0000000500000000
I am using "UTF-8" encoding when connecting to DB with ibase_connect() the database encoding is WIN1251.
The result type of LIST() is a blob, not a CHAR or VARCHAR. I don't use PHP myself, but I believe that the Firebird/Interbase driver for PHP requires you to explicitly request the blob.
The values you see for ID and Name are the blob ids that can be used to request the blobs.
You have two options:
Request the blob value using these blob ids, see ibase_blob_open and ibase_blob_get (afaik, you will need to do the correct byte to character conversion yourself)
Cast the value to a VARCHAR (eg CAST(LIST(t."ID", ',') AS VARCHAR(2048)) AS ID)
The downside of the second option is that if you can have really long results, then you also need to cast to a long VARCHAR, otherwise you get truncation errors; and unfortunately varchars are restricted to 32K-2 bytes (8191 characters for UTF8), and a row as a whole to 64K bytes.

How to support emojis with flourish?

I am using flourishlib for a website. My client requested that we should be able to use emojis with mobile phones. In theory we should change the character-encoding from utf8 to utf8mb4 for the MySQL database.
So far, so good, however, if we make this switch, like this:
# For each database:
ALTER DATABASE database_name CHARACTER SET = utf8mb4 COLLATE utf8mb4_unicode_ci;
# For each table:
ALTER TABLE table_name CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
# For each column:
ALTER TABLE table_name CHANGE column_name column_name VARCHAR(191) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
# (Don’t blindly copy-paste this! The exact statement depends on the column type, maximum length, and other properties. The above line is just an example for a `VARCHAR` column.)
Then each character will use four bytes instead of three bytes. This would increase the database's size with 33%. This would result in worse performance and more storage space used up. So, as a result, we have decided to switch to an encoding of utf8mb4 for only specific columns of specific tables.
To make sure everything is all right, I have checked several things. Among them, I have checked flourishlib and found a few suspect parts:
There is an fUTF8 class, which does not seem to support utf8mb4
At fDatabase I am quoting some findings:
if ($this->connection && function_exists('mysql_set_charset') && !mysql_set_charset('utf8', $this->connection)) {
throw new fConnectivityException(
'There was an error setting the database connection to use UTF-8'
);
}
//...
// Make MySQL act more strict and use UTF-8
if ($this->type == 'mysql') {
$this->execute("SET SQL_MODE = 'REAL_AS_FLOAT,PIPES_AS_CONCAT,ANSI_QUOTES,IGNORE_SPACE'");
$this->execute("SET NAMES 'utf8'");
$this->execute("SET CHARACTER SET utf8");
}
At fSQLSchemaTranslation I can see this:
$sql = preg_replace('#\)\s*;?\s*$#D', ')ENGINE=InnoDB, CHARACTER SET utf8', $sql);
I have the suspicion that flourishlib will not support our quest of making a few columns of a few table have a character encoding of utf8mb4. I wonder whether we can upgrade something somehow to make this support. As a worst-case scenario, we can override every textual occurrence of utf8 to utf8mb4. However, that would be a very ugly hack and we wonder whether there is a better solution. Should we make this hack or is there a more orthodox approach?
I have resolved the issue. I have altered the tables where I wanted to support emojis by changing the column character set and collation, like this:
ALTER TABLE table_name CHANGE column_name column_name text CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
After that, I had to make a few ugly hacks to make flourishlib able to support emojis.
fDatabase.php:
line 685:
if ($this->connection && function_exists('mysql_set_charset') && !mysql_set_charset('utf8mb4', $this->connection)) {
throw new fConnectivityException(
'There was an error setting the database connection to use UTF-8'
);
}
line 717 stays the same, everything crashes if this line is changed:
if ($this->connection && function_exists('mysqli_set_charset') && !mysqli_set_charset($this->connection, 'utf8')) {
line 800:
// Make MySQL act more strict and use UTF-8
if ($this->type == 'mysql') {
$this->execute("SET SQL_MODE = 'REAL_AS_FLOAT,PIPES_AS_CONCAT,ANSI_QUOTES,IGNORE_SPACE'");
$this->execute("SET NAMES 'utf8mb4'");
$this->execute("SET CHARACTER SET utf8mb4");
}
fSQLSchemaTranslation.php:
line 1554:
$sql = preg_replace('#\)\s*;?\s*$#D', ')ENGINE=InnoDB, CHARACTER SET utf8mb4', $sql);
fXML.php:
line 403:
if (preg_replace('#[^a-z0-9]#', '', strtolower($encoding)) == 'utf8mb4') {
// Remove the UTF-8 BOM if present
$xml = preg_replace("#^\xEF\xBB\xBF#", '', $xml);
fCore::startErrorCapture(E_NOTICE);
$cleaned = self::iconv('UTF-8', 'UTF-8', $xml);
if ($cleaned != $xml) {
$xml = self::iconv('Windows-1252', 'UTF-8', $xml);
}
fCore::stopErrorCapture();
}
and finally, when there are modifications for any of the columns affected, I execute this:
App::db()->query("set names 'utf8mb4'");
which, essentially triggers the ->query() execution of an fDatabase object.
increase the database's size with 33%.
Not true. English letters still take 1 byte each. What you gain with utf8mb4 is the ability to store emoji and some Chinese characters.
You shouldn't need to ALTER ... CHANGE the columns. Except that you probably had a canned VARCHAR(255) which has issues. Don't simply switch to 191, switch to a 'reasonable' number for each column. Or do nothing. The 191 comes only from an INDEX limitation. You are no indexing every column, are you?
fUTF8 class, which does not seem to support
Complain to flourishlib. Or abandon it. (Too many questions in these forums are complaints about inadequate 3rd party packages, not MySQL, itself.)
You might be able to change to utf8mb4 in MySQL and let flourishlib be oblivious to the change. Technically speaking, MySQL's utf8mb4 matches the rest of the world's concept of utf8; MySQL's utf8 is an incomplete implementation.
$this->execute("SET NAMES 'utf8'");
If you can see this code, you can change it.

Stored non-English characters, got '?????' - MySQL Character Set issue

My site that I am working on is in Farsi and all the text are being displayed as ????? (question marks).
I changed the collation of my DB tables to UTF8_general_ci but it still shows ???
I ran the following script to change all the tables but this did not work as well.
I want to know what am I doing wrong
<?php
// your connection
mysql_connect("mysql.ord1-1.websitesettings.com","user_name","pass");
mysql_select_db("895923_masihiat");
// convert code
$res = mysql_query("SHOW TABLES");
while ($row = mysql_fetch_array($res))
{
foreach ($row as $key => $table)
{
mysql_query("ALTER TABLE " . $table . " CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci");
echo $key . " => " . $table . " CONVERTED<br />";
}
}
?>
Bad news. But first, double check:
SELECT col, HEX(col)...
to see what is in the table. If the hex shows 3F, then the data is gone. Correctly stored, the dal character should be hex D8AF; hah is hex D8AD.
What happened:
you had utf8-encoded data (good)
SET NAMES latin1 was in effect (default, but wrong)
the column was declared CHARACTER SET latin1 (default, but wrong)
As you INSERTed the data, it was converted to latin1, which does not have values for Farsi characters, so question marks replaced them.
The cure (for future `INSERTs):
Recode your application using mysqli_* interface instead of the deprecated mysql_* interface.
utf8-encoded data (good)
mysqli_set_charset('utf8')
check that the column(s) and/or table default are CHARACTER SET utf8
If you are displaying on a web page, <meta...utf8> should be near the top.
The discussion above is about CHARACTER SET, the encoding of characters. Now for a tip on COLLATION, which is used for comparing and sorting.
If you want these to be treated equal: 'بِسْمِ' = 'بسم', then use utf8_unicode_ci (instead of utf8_general_ci) for the COLLATION.

How can I insert Hebrew text into columns? [SQL 2000]

I tried to insert Hebrew a text value into a column,
But it changes the value to Gibberish.
An example of that:
mssql_query ("UPDATE TABLE SET COLUMON = N'בדיקה'");
As you can assume, It changes the value of the column, But the value changed to ????? and if I try to do it from Query Analyser it works fine.
My column's collation is HEBREW_CI_AS. How can I fix this?
You need to specify collation preperty for the string in the INSERT statement you are using. Also the string you are inserting should be of UNICODE datatype - use N prefix for that.
INSERT INTO MEMB_INFO (User, Pass, Name) VALUES ('Joni', '123456', N'גוני דף' COLLATE HEBREW_CI_AS)
Check that PHP variable can handle unicode characters. Otherwise it will be PHP that turns your string into question marks.
You may check out SQL Server drivers for PHP.
And Unicode Character Properties from PHP doicumentation.
Some resources on PHP and unicode:
http://www.sitepoint.com/bringing-unicode-to-php-with-portable-utf8/
http://php.net/manual/en/function.utf8-encode.php
http://allseeing-i.com/How-to-setup-your-PHP-site-to-use-UTF8
http://www.yiiframework.com/wiki/16/how-to-set-up-unicode/
http://pageconfig.com/post/portable-utf8
I solve this problem if someone else has this problem here is my way to fix that:
Create a new database for this specific table or else tables for your web.
Set Hebrew_CI_AS as collation (everyone to what he created).
In your PHP code use mb_convert_encoding() function for SELECT and INSERT.

Categories