database stores strange characters [duplicate] - php

I have problem with directly inserting foreign characters like "ó,č,ĕ,ř" characters into database. dont working even with my php frontend to be sure there is no transformation or other encoding. So im using logged in psql directly and here is my setup :
server_encoding
-----------------
UTF8
(1 row)
and
client_encoding
-----------------
UTF8
(1 row)
database is :
Name | Owner | Encoding | Collate | Ctype |
my_db | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 |
So i guess there should be no problem.
I created this :
CREATE TABLE test (a text);
and now i want to insert some text
INSERT INTO TEST (a) ('ó');
And there is a message :
ERROR: invalid byte sequence for encoding "UTF8": 0xf327293b
Is there anyone who can help me please? it looks like it was ignoring my input encoding or i really dont know.
EDIT :
my terminal configuration
LANG=en_US.UTF-8
LANGUAGE=en_US:en
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=
EDIT2:
my_db=# \encoding
UTF8
EDIT3: psql from file
file
file -bi test
text/plain; charset=utf-8
execute
ERROR: syntax error at or near "'Ăł'"
LINE 1: INSERT INTO tes (a) ('Ăł');
EDIT4:
set client_encoding='latin1';
this works in psql but i need it to works with utf8. I know its possible i used this setup everytime with mysql databases and it works like a charm.
My jdbc driver needs it to be UTF8.
EDIT5:
Here is what am i doing here : click me
Before its stored i can see it - so php is working fine - , but after then when i read it from DB i cant see it. Thats because i moved closer to DB into psql to see whats going on. It looks like maybe server issue. Is it possible server can't handle that characters?
EDIT6:
Tomcat config
-Dfile.encoding=UTF8
URI encoding is set to UTF8 too. Where can be that problem? :(

If your shell is in latin1 encoding, as it appears from the comments, this will fix it:
set client_encoding = 'latin1';
If you don't want to change the client's system encoding you can change the default in postgresql.conf
client_encoding = latin1
Or change PHP's default character encoding:
default_charset = "utf-8";
Do it also in the Apache, or whatever http server you are using, config:
AddDefaultCharset UTF-8

Just another debugging test (I still think it's a terminal thing): can you write the insert statement in a UTF-8 encoded file and try to run the command from the file? Eg:
psql my_db -U postgres -f <utf8-encoded-file>
If this works fine then it's back to the terminal somehow ...

According to the comments you're using PuTTY, which defaults to latin-1. You need to configure PuTTY to use UTF-8. Just setting the server locale won't do any good unless your PuTTY encoding matches what the environment claims the encoding is.
Open PuTTy. Under the Window settings heading choose the Translation sub-heading. Set "Remote character set" to "utf-8". In the Fonts sub-tab make sure you are using a font with reasonable Unicode coverage. Then, in the Session menu type a name into the "saved settings" text entry box and type "save" to save your settings as a profile. You can override the "Default Settings" profile by selecting it and setting Save, but this will affect all future connections and new profiles so it may cause confusion if you use other servers that aren't utf-8.
(These instructions are based on my PuTTY on Fedora 18; there may be some differences in UI details in the recent Windows versions. If in doubt, search for how to set PuTTY to use utf-8.)

Related

encoding in doctrine or twig have problems

I have a project that written by symfony framework and use doctrine as orm and twig dor template engine.
Project no have any problem on my system(local), but on server: That have problem in encoding.
I use utf8 encodin and utf8_general_ci as collection charset.
as i say, i no have any problem in local and my data is right on server in phpmyadmin, but no on website's pages(my symfony project), I know that pages have true encoding because static text in twig is right and only data that read from mysql have problem
plz see site:
http://iaubir.cloudsite.ir/blog/zeinali
thank you for Your help
Edit: This Problem is only for imported row by phpmyadmin(import structure & data), if i login to admin panel and post a new post, that will display correctly
I update a field that value is "دروس" and now display correctly in my site, but in phpmyadmin that have "دروس" value
I try Detect encoding of "دروس" by This Site , that detect:
source encoding: utf-8 displayed as: windows-1258
Probably the problem is with your imported data, I wouldn't use phpadmin for such task.
Try this to export the database:
mysqldump -uroot -p database -r utf8.dump
and this to import it:
mysql -uroot -p --default-character-set=utf8 database
mysql> SOURCE utf8.dump

Bad encoding from database in php file using IIS 8.5

I must migrate large database and large php systems from php4 to php5.
Databases tables stored in UTF-8 format, but the data contain windows-1257;
All page in header is:
But I get data from database like this: AutomobiliĆø stovĆ«jimo;
var_dump(mysql_client_encoding($connect)); return utf8;
File encoding: windows-1257;
In Apache server (try Wamp in W7 and Windows server 2012) get normal data.
But IIS dont.. Mb IIS dont understand file encoding or etc..
I give up, and I need your help...
SOVLED: I change mysql configuration (my.ini) and set character_set_server utf8 to latin1
And now var_dump(mysql_client_encoding($connect)); return latin1;
And all projects works fine.
Databases tables stored in UTF-8 format, but the data contain
windows-1257.
Try converting the data from Windows-1257 to UTF-8 with something like:
$encoded = iconv ( "CP1257", "UTF-8", $string );

Ubuntu encoding of new files

I'm searching there for a long time, but without any helpful result.
I'm developing a PHP project using eclipse on a Ubuntu 11.04 VM. Every thing works fine. I've never need to look for the file encoding. But after deploying the project to my server, all contents were shown with the wrong encoding. After a manual conversion to UTF8 with Notepad++ my problems were solved.
Now I want to change it in my Ubuntu VM, too. And there's the problem. I've checked the preferences in Eclipse but every property ist set to UTF8: General content types, workspace, project settings, everything ...
If I look for the encoding on the terminal, it says "test_new.dat: text/plain; charset=us-ascii". All files are saved to ascii format. If I try to create a new file with the terminal ("touch") it's also the same.
Then I've tried to convert the files with iconv:
iconv -f US-ASCII -t UTF8 -o test.dat test_new.dat
But the encoding doesn't change. Especially PHP files seems to be resistant. I have some *.ini files in my project for which a conversion works?!
Any idea what to do?
Here are my locale settings of Ubuntu:
LANG=de_DE.UTF-8
LANGUAGE=de_DE:en
LC_CTYPE="de_DE.UTF-8"
LC_NUMERIC="de_DE.UTF-8"
LC_TIME="de_DE.UTF-8"
LC_COLLATE="de_DE.UTF-8"
LC_MONETARY="de_DE.UTF-8"
LC_MESSAGES="de_DE.UTF-8"
LC_PAPER="de_DE.UTF-8"
LC_NAME="de_DE.UTF-8"
LC_ADDRESS="de_DE.UTF-8"
LC_TELEPHONE="de_DE.UTF-8"
LC_MEASUREMENT="de_DE.UTF-8"
LC_IDENTIFICATION="de_DE.UTF-8"
LC_ALL=
I was also wondering about character encoding and found something that might be usefull here.
When I create a new empty .txt-file on my ubuntu 12.04 and ask for its character encoding with: "file -bi filename.txt" it shows me: charset=binary. After opening it and writing something inside like "haha" I saved it using "save as" and explicitly chose UTF-8 as character encoding. Now very strangely it did not show me charset=UTF-8 after asking again, but returned charset=us-ascii. This seemed already strange. But it got even stranger, when I did the whole thing again but this time included some german specific charakters (ä in this case) in the file and saved again (this time without saving as, I just pressed save). Now it said charset=UTF-8.
It therefore seems that at least gedit is checking the file and downgrading from UTF-8 to us-ascii if there is no need for UTF-8 since the file can be encoded using us-ascii.
Hope this helped a bit even though it is not php related.
Greetings
UTF-8 is compatible with ASCII. An ASCII text file is therefore also valid UTF-8, and a conversion from ASCII to UTF-8 is a no-op.

UTF8 characters from database don't show up properly in the browser - MySQL & PHP CodeIgniter

My database and tables are set to utf8_general_ci collation and utf8 charset. CodeIgniter is set to utf8. I've added meta tag charset=utf8, and I'm still getting something like: квартиры instead of cyrillic letters...
The same code running on the local machine works fine - Mac OSX. It's only breaking in the production machine, which is Ubuntu 11.10 64bit in AWS EC2. Static content from the .php files show up correctly, only the data coming from the database are messed up. Example page: http://dev.uzlist.com/browse/cat/nkv
Any ideas why?
Thanks.
FYI:
When I do error_log() the data coming from the database, it's the same values I'm seeing on the page. Hence, it's not the browser-server issue. It's something between mysql and php, since when I run SELECT * FROM categories, it shows the data in the right format. I'm using PHP CodeIgniter framework for database connection and query and as mentioned here, I have configured it to use utf8 connection and utf8_general_ci collation.
Make sure your my.cnf (likely to be in /etc/) has the following entries :
[mysqld]
default-character-set=utf8
default-collation=utf8_general_ci
character-set-server=utf8
collation-server=utf8_general_ci
init-connect='SET NAMES utf8'
[client]
default-character-set=utf8
You'll need to restart the mysql service once you make your changes.
Adding my comments in here to make this a little clearer.
Make sure the following HTTP header is being set so the browser knows what charset to expect.
Content-type: text/html; charset=UTF-8
Also try adding this tag into the top of your html <head> tag
<meta http-equiv="Content-type" value="text/html; charset=UTF-8" />
To make the browser show up correctly.you should check three points:
encoding of your script file.
encoding of connection.
encoding of database or table schema.
if all of these are compatible, you'll get the page you want.
The original data has been encoded as UTF-8, the result interpreted in Windows-1252 and then UTF-8 encoded again. This is really bad; it isn't about a simple encoding mismatch that a header would fix. Your data is actually broken.
If the data is ok in the database (check with SELECT hex(column) FROM myTable) to see if it was double encoded already in the database), then there must be your code that is converting it to UTF-8 on output.
Search your project for uses of function utf8_encode, convert_to_utf8, or just iconv or mb_convert_encoding. Running
$ grep -rn "\(utf8_\(en\|de\)code\|convert_to_utf8\|iconv\|mb_convert_encoding\)" .
On your application's /application folder should be enough to find something.
Also see config values for these:
<?php
var_dump(
ini_get( "mbstring.http_output" ),
ini_get( "mbstring.encoding_translation" )
);
Well, if you absolutely and positively sure that your mysql client encoding is set to utf8, there are 2 possible cases. One - double encoding - described by Esailija.
But there is another one: you have your data actually encoded in 1251, not in utf-8.
In this case you have to either recode your data or set proper encoding on the tables. Though it is not one button push task
Here is a manual (in russian) exаctly for that case: http://phpfaq.ru/charset#repair
In short, you have to dump your table, using the same encoding set on the table (to avoid recoding), backup that dump in safe place, then change table definitions to reflect the actual encoding and then load it back.
Potentially this may also be caused by the mbstring extension not being installed (which would explain a difference between your dev and production environments)
Check out this post, might give you a few more answers.
Try mysql_set_charset('utf8') after the mysql connect. Then it should works.
After 2 days of fighting this bug, finally figured out the issue. Thanks for #yourcommonsense, #robsquires, and a friend of mine from work for good resources that helped to debug the issue.
The issue was that at the time of the sql file dump to the database (import), charset for server, database, client, and connection was set to latin1 (status command helped to figure that out). So the command line was set to latin1 as well, which is why it was showing the right characters, but the connection with the PHP code was UTF8 and it was trying to encode it again. Ended up with double encoding.
Solution:
mysqldump the tables and the data (while in latin1)
dump the database
set the default charsets to UTF8 in /etc/my.cnf as Rob Squires mentioned
restart the mysql
create the database again with the right charset and collation
dump the file back into it
And it works fine.
Thanks all for contribution!

view Persian/Arabic data in mysql with django

I have table with Persian data and utf8_general_ci collection and with php program i was inserted data to database.
now i have new program with python - django and want view data but all data is bad view like پست
why? and what i can do for solve this problem?
ps: when i insert new data with python, all things is correct and view correctly.
If you’re running into the problem where unicode items in your Django / MySQL project are displayed as question marks, here’s the likely problem and solution, found in this django-users thread:
The likely problem is that your MySQL encoding is set to latin1, as opposed to utf8. You can check this via:
mysqld --verbose --help | grep character-set
You’ll probably see:
character-set-server latin1
You want this to be uft8. To modify it, edit your my.conf file ( /etc/mysql/my.conf on ubuntu ), adding the following lines to the appropriate sections:
[client]
...
default-character-set = utf8
[mysqld]
...
character-set-server=utf8
collation-server=utf8_unicode_ci
init_connect='set collation_connection = utf8_unicode_ci;'
Now restart mysql:
sudo /etc/init.d/mysql restart
And alter your existing tables to use the utf8 encoding:
mysql your_db_name
alter table your_table_name convert to character set utf8;
And that should do it.
Can you please check what the charset of the html page? It should be like <meta content='text/html; charset=UTF-8' http-equiv='Content-Type'/>

Categories