I want to use ruby to read/insert data to a mysql database, onto which data were saved by a php code. When I read Chinese data, it does not appear correctly. It appears like 刘佳. But in a php page, it shows Chinese data correctly as 刘佳.
I confirmed the database uses utf-8 charset (CHARSET=utf8 COLLATE=utf8_unicode_ci).
my ruby code
require 'active_record'
class Student < ActiveRecord::Base
end
ActiveRecord::Base.establish_connection(
adapter: 'mysql2',
host: 'xxxx',
username: 'xxxx',
password: 'xxxx',
database: 'xxx_db',
encoding: 'utf8'
)
puts Student.first.name
It outputs an unknown string "刘佳".
How can I read Chinese data correctly and save a new Chinese record to database?
puts Student.first.name
It outputs an unknown string "刘佳".
I believe that is because whatever device you are using to view the ruby program's output (a terminal window?) is not set to "UTF-8" (see below for how to check that).
As far as I can tell, you have done everything right:
mysql docs:(http://dev.mysql.com/doc/refman/5.0/en/charset-applications.html)
Specify character settings per database. To create a database such
that its tables will use a given default character set and collation
for data storage, use a CREATE DATABASE statement like this:
CREATE DATABASE mydb
DEFAULT CHARACTER SET utf8
DEFAULT COLLATE utf8_general_ci;
Tables created in the database will use utf8 and utf8_general_ci by
default for any character columns.
Applications that use the database should also configure their
connection to the server each time they connect. This can be done by
executing a SET NAMES 'utf8' statement after connecting. The statement
can be used regardless of connection method: The mysql client, PHP
scripts, and so forth.
rails docs: (http://api.rubyonrails.org/classes/ActiveRecord/ConnectionAdapters/MysqlAdapter.html)
All the options for ActiveRecord::Base.establish_connection() are as follows (note the description for :encoding):
Options:
:host - Defaults to “localhost”.
:port - Defaults to 3306.
:socket - Defaults to “/tmp/mysql.sock”.
:username - Defaults to “root”
:password - Defaults to nothing.
:database - The name of the database. No default, must be provided.
:encoding - (Optional) Sets the client encoding by executing
“SET NAMES <encoding>” after connection.
:reconnect - Defaults to false (See MySQL documentation: dev.mysql.com/doc/refman/5.0/en/auto-reconnect.html).
:strict - Defaults to true. Enable STRICT_ALL_TABLES. (See MySQL documentation: dev.mysql.com/doc/refman/5.0/en/server-sql-mode.html)
:variables - (Optional) A hash session variables to send as `SET ##SESSION.key = value` on each database connection. Use the value `:default` to set a variable to its DEFAULT value. (See MySQL documentation: dev.mysql.com/doc/refman/5.0/en/set-statement.html).
:sslca - Necessary to use MySQL with an SSL connection.
:sslkey - Necessary to use MySQL with an SSL connection.
:sslcert - Necessary to use MySQL with an SSL connection.
:sslcapath - Necessary to use MySQL with an SSL connection.
:sslcipher - Necessary to use MySQL with an SSL connection.
(I had a hard time locating those, so I am posting all of them for future google searchers.)
And, when I run the following program in a terminal window, e.g.:
$ r 1.rb
where my terminal window is set to UTF-8:
~/ruby_programs$ locale
LANG="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_CTYPE="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_ALL=
...
# encoding: UTF-8
require 'active_record'
require 'mysql2'
class Student < ActiveRecord::Base
end
ActiveRecord::Base.establish_connection(
adapter: 'mysql2',
#host: 'localhost', #this is the default
#username: 'root', #this is the default
#password: '', #this is the default
database: 'mydb2',
encoding: 'utf8'
)
#Insert a record in the db (It shouldn't matter whether a php or a ruby program writes to the database.)
Student.create(
name: "\u732a", #Because of the comment at top of the program, this
#string will be encoded in UTF-8
info: "a pig" #..so will this one.
)
name = Student.first.name
puts name
name.each_byte{|b| printf "%x \n", b}
puts
...the output I see is a Chinese character in my terminal window, which when compared to the Chinese character for 'pig' matches exactly, followed by:
e7
8c
aa
And if you look here: http://www.fileformat.info/info/unicode/char/732a/index.html, those bytes make up the UTF-8 encoding of the unicode integer \u732a, which represents 'pig' in Chinese, which is what was in the string that was inserted into the db.
In any case, you should run my program and if you get the same kind of error, then it will prove that it is your terminal's encoding that is the problem.
Related
I'm using mysql database and php to insert russian characters into a table.
I'm using:
$conn->set_charset('utf-8');
into my .php page to set charset to utf-8 but, when I try to print the DB charset with:
echo "set name:".$conn->character_set_name();
it shows
set name:latin1
I've set my Table to:
utf8mb4_unicode_ci
but nothing change.
Printing the passed text from the ajax request, I can see the text written correctly.
What should I do?
I guess you aren't checking the return value of mysqli::set_charset(). It must be returning false because utf-8 is not a valid encoding name in MySQL; the correct name is utf8 (no dash). Or, even better, utf8mb4.
You can get a list of supported encodings with:
SHOW COLLATION;
I'm working on a project using Doctrine 2.4.3 with a MySQL 5.7.21 database with utf8 as default charset.
Recently, I've been looking to implement emoji support. To overcome MySQL's limitation of 3 bytes for utf8, I need to change the columns that can receive emojis to the utf8mb4 charset (see https://dev.mysql.com/doc/refman/5.5/en/charset-unicode-utf8mb4.html).
However, I have not found a way to reflect this in my entities (using annotations).
My database connection config is the following :
$data = array(
'driver' => 'pdo_mysql',
'host' => $dbhost,
'port' => $dbport,
'dbname' => $dbname,
'user' => $dbuser,
'password' => $dbpw,
'charset' => 'utf8mb4'
);
I tried adding annotations to the table :
/* #Entity(repositoryClass="path\to\DAO") #Table(name="post", indexes={#Index(name="uid", columns={"uid"})}, options={"charset":"utf8mb4", "collation":"utf8mb4_unicode_ci"})
* #HasLifecycleCallbacks */
class Post extends BaseEntity
{
...
}
In the same fashion, tried adding annotations to the column (in the same table) itself :
/* #Column(type="text", options={"charset":"utf8mb4", collation":"utf8mb4_unicode_ci"}) */
protected $text;
None of the above worked. I expected an ALTER TABLE query when executing doctrine orm:schema-tool:update --dump-sql but Doctrine sees no change, and I still can't insert 4 bytes emojis.
If I update the column's charset myself directly in MySQL, emojis do get supported, but when I do run orm:schema-tool:update, Doctrine sees a difference between my entity and the schema, but seems to not know what to make of it since the output I get is the following :
ALTER TABLE post CHANGE text text LONGTEXT NOT NULL ;
I also tried to add SET NAMES utf8mb4 COLLATE utf8mb4_unicode_ci as driverOptions in my database connection config array, alas to no result either.
Unfortunately, I could not find anything regarding this matter in Doctrine's documentation.
If any of you has any clue regarding this matter, feel free to hit me up! Thanks in advance.
To convert the whole table:
ALTER TABLE tbl CONVERT TO CHARACTER SET utf8mb4;
Please provide
SHOW CREATE TABLE ...
For more troubleshooting: Trouble with UTF-8 characters; what I see is not what I stored
As I have legacy requirements and cannot update Doctrine's lib as of right now, I had to find a workaround.
What I did was manually convert my tables to utf8mb4 with SQL queries, which is not overwritten by Doctrine back to utf8 when executing orm:schema-tool:update --force after the charset conversion.
For the record, I generated the update statements with the following script :
SELECT CONCAT('ALTER TABLE ', t.table_schema, '.', t.table_name, ' CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;')
FROM information_schema.tables t
WHERE t.table_schema LIKE {your_schema};
^ Do not execute this blindly - check beforehand if existing data will fit while utf8mb4 encoded. For more details check the very good article from Mathias Bynens on the matter : https://mathiasbynens.be/notes/mysql-utf8mb4#column-index-length
I also changed the database's charset settings.
ALTER DATABASE {database_name} CHARACTER SET = utf8mb4 COLLATE = utf8mb4_unicode_ci;
I did keep the 'charset' => 'utf8mb4' in the Doctrine's database connection settings array for correct transmission of the data.
For new entities (tables), annotating them with correct settings in table options does create them with the right charset and collation :
#Entity #Table(name="table", options={"charset":"utf8mb4", "collate":"utf8mb4_unicode_ci"})
Cheers.
I have a MySQL db in utf8_general_ci.
And my sphinx.conf is like this:
source jobs
{
type = mysql
sql_sock = /var/run/mysqld/mysqld.sock
sql_query_pre = SET NAMES utf8
...
}
When I query "système" I would like sphinx to search for "système" & "systeme" in the DB.
AND when I query "systeme" I would like sphinx to search for "système" & "systeme" too.
What it does now is removing all the characters before the accents (including the accents themselves). So "système" becomes "me" and "dév" becomes "v"...
PS : I'm using the sphinxapi.php - which shouldn't be preferred over SphinxQL, I know, but it should still work with the api. And I use EXTENDED match mode.
You need to setup your charset_table to be able do this
http://sphinxsearch.com/docs/current.html#charsets
Alas there is no 'magic' config option to just magically work with all languages text, need to setup charset_table to deal with the langauge(s) you deal with.
Although this is pretty close:
http://sphinxsearch.com/forum/view.html?id=9312
(ie steals the hard work MySQL had done with collations and mimics it in charset_table)
I have one php form where i used to enter data to database(phpmyadmin), and i used SELECT query to display all values in database to view in php form.
Also i have another PHP file which i used to create JSON from the same db table.
Here when i enter foreign languages like "Experiența personală:" the value getting saved in DB is "ExperienÈ›a personală: " but when i use select query to display this in same php form it coming correctly "Experiența personală:". So the db is correct and now am using following php code to create JSON
<?php
$servername = "localhost";
$username = "root";
$password = "root";
$dbname = "aaps";
// Create connection
$con=mysqli_connect($servername,$username,$password,$dbname);
// Check connection
mysqli_set_charset($con, 'utf8');
//echo "connected";
$rslt=mysqli_query($con,"SELECT * FROM offers");
while($row=mysqli_fetch_assoc($rslt))
{
$taxi[] = array('code'=> $row["code"], 'name'=> $row["name"],'contact'=> $row["contact"], 'url'=> $row["url"], 'details'=> $row["details"]);
}
header("Content-type: application/json; charset=utf-8");
echo json_encode($taxi);
?>
and JSON looks like
[{"code":"CT1","name":"Experien\u00c8\u203aa personal\u00c4\u0192: ","contact":"4535623643","url":"images\/offers\/event-logo-8.jpg","details":"Experien\u00c8\u203aa personal\u00c4\u0192: jerhbehwgrh 234234 hjfhjerg#$%$#%#4"},{"code":"ewrw","name":"Experien\u00c8\u203aa personal\u00c4\u0192: ","contact":"ewfew","url":"","details":"eExperien\u00c8\u203aa personal\u00c4\u0192: Experien\u00c8\u203aa personal\u00c4\u0192: Experien\u00c8\u203aa personal\u00c4\u0192: "},{"code":"Experien\u00c8\u203aa personal\u00c4\u0192: ","name":"Experien\u00c8\u203aa personal\u00c4\u0192: ","contact":"","url":"","details":"Experien\u00c8\u203aa personal\u00c4\u0192: "}]
In this "\u00c8\u203aa" this is wrong it supposed to be "\u021b" (t).
So pho used to creating JSON making this issue.
But am unable to find exactly why its coming like this . please help
Avoid Unicode -- note the extra argument:
json_encode($s, JSON_UNESCAPED_UNICODE)
Don't use utf8_encode/decode.
ă turning into ă is Mojibake. It probably means that
The bytes you have in the client are correctly encoded in utf8 (good).
You connected with SET NAMES latin1 (or set_charset('latin1') or ...), probably by default. (It should have been utf8.)
The column in the tables may or may not have been CHARACTER SET utf8, but it should have been that.
If you need to fix for the data it takes a "2-step ALTER", something like
ALTER TABLE Tbl MODIFY COLUMN col VARBINARY(...) ...;
ALTER TABLE Tbl MODIFY COLUMN col VARCHAR(...) ... CHARACTER SET utf8 ...;
Before making any changes, do
SELECT col, HEX(col) FROM tbl WHERE ...
With that, ă should show hex of C483. If you see C384C692, you have "double-encoding", which is messier to fix.
Depending on the version of MySql in the database, it may not be using the full utf-8 set, as stated in the documentation:
The ucs2 and utf8 character sets do not support supplementary characters that lie outside the BMP. Characters outside the BMP compare as REPLACEMENT CHARACTER and convert to '?' when converted to a Unicode character set.
This, however, is not likely to be related to your problem. I would try a couple of different things and see if it solves your problem.
use SET NAMES utf-8
You can read more about that here
use utf8_encode() when inserting data to the database, and utf8_decode() when extracting. That way, you don't have to worry about MySql manipulating the unicode characters. Documentation
Hello I have a character encoding problem in my application and thought to ask for some help, because I couldn't solve the problem even thought I was given some guidance so here goes:
My Ä and Ö characters are shown in the browser as: �
I will also post all what I have done so far trying to solve the problem:
1) Database: I have tried changing the collation of my tables, here are some info what SHOW TABLE STATUS gives for one of my tables:
Name = test_groups Engine = InnoDB Version = 10 Row_format = Compact
Collation = utf8_swedish_ci
Database character variables gives:
| character_set_client = utf8 | character_set_connection =
utf8 | character_set_database = latin1 (I
Wonder is this the cause?) | character_set_filesystem
= binary | character_set_results = utf8 | character_set_server = utf8 |
character_set_system = utf8
2) In apache httpd.conf I have:
AddDefaultCharset UTF-8
3) In my Zend-application application.ini:
resources.view.encoding = "UTF-8"
4) In my firefox 14.0.1 browser
edit->preferences->content->advanced->Default character encoding =
Unicode (UTF-8)
5) In my php code meta-tag:
<meta content="text/html; charset=UTF-8" http-equiv="Content-Type">
Now here's also few other interesting things: When I look at my page and change from firefox
View->Character encoding->Western (ISO-8859-1)
, the �-characters which came from the MySQL database turn out ok to öä-characters, but the öä-characters that come from my php-code turn into ät-characters.
Another thing when I check the encoding of the data coming from my MySQL-database with
mb_detect_encoding($DATA_FROM_MYSQL_DATABASE)
it outputs UTF-8!! Then lastly if I do in the code:
utf8_encode($DATA_FROM_MYSQL_DATABASE)
and output the result the problem disappears that is �-characters -> öä-characters. So what's going on here x) All help appreciated
Are you sending SET NAMES utf8 in your PHP as the first query to MySQL ? That could be the cause if not.
SET NAMES indicates what character set the client will use to send SQL
statements to the server. Thus, SET NAMES 'cp1251' tells the server,
“future incoming messages from this client are in character set
cp1251.” It also specifies the character set that the server should
use for sending results back to the client. (For example, it indicates
what character set to use for column values if you use a SELECT
statement.)
SET NAMES utf8 in MySQL? has more detail about how and why.
Troubleshoot:
Check your database (with PHPMyAdmin, for instance). Are the characters correctly stored? Or does it seem gibberish?
If the characters in the database are ok, then the problem happens when retrieving. If they are stored incorrectly (as I would guess they are), then the problem is in the "storing".
Check your source code file and verify if they are encoded in UTF-8.
Force mysql connection to use UTF8 (mysqli::set_charset('utf8') or mysql_set_charset('utf8') or PDO: Add charset to the connection string (charset=utf8) )