My DB needs to hold strings containing foreign language characters. As an example, a user enters string into form, the form is submitted and the string is added to DB. The string will be displayed on a web page. I would like to use UTF-8 as this will be able to handle all of the required languages.
Currently, I believe my DB is set to 'latin1', but webpages are capable of displaying correct characters anyways. Problems arise when trying to set textareas to hold foreign characters and when viewing DB via command-line.
How can I implement this effectively? My plan was to blitz the whole site such that the DB charset is UTF-8, and the web page charset is UTF-8. Could someone give me the minimal commands on how to do this so I don't end up duplicating things (Having "UTF-8" commands everywhere when I really just need one) and making things too difficult to control?
edit: Using MySQL, PHP and JavaScript/HTML
That would be the way to go (UTF-8) in the DB. Here's what you want to look at:
Does your browser support UTF-8 characters (make sure the font you use has characters for all of the relevant code pages that you need to support) and is the meta charset tag set correctly?
<meta http-equiv="Content-Type" content="text/html; charset=utf-8"> (in HTML4)
<?xml version="1.0" encoding="UTF-8"?> (In XHTML)
Are you sending a ContentType header? If so, make sure it matches what you define in your meta or XML version tag.
In regards to your command line, make sure the terminal you're using and your shell's charset also match. (check the locale on your server, assuming it's *NIX you can do this by typing "locale") doing this will change your locale setting:
export LANG="en_US.UTF-8"
If you're using windows check the system's locale. http://windows.microsoft.com/en-us/windows-vista/Change-the-system-locale . If you're using PuTTY to log into your server, you're going to want to make sure you've set it to unicode mode to support UTF-8.
The other thing you're going to want to look at in mysql is the table collation setting. Make sure it's a collation that makes sense for what you're looking to do or indices will have unexpected behaviors. (see http://dev.mysql.com/doc/refman/5.0/en/charset-mysql.html )
More likely than not, you'd want to set up a header file for your website itself that has the characterset encoding information and include that in every view. As far as the DB is concerned, obviously your text and varchar fields need to support the right encoding, there's no simple way to do this without altering each table to make sure its individual charset is right and its collation is right (once a table is created in a charset you need to convert it).
Related
I am transfering the database from one server to another server using phpmyadmin. I successfully transfered it but having issue with swedish characters. I can see the swedish characters are displaying properly within the tables but in php pages it is wrong seems like double encoded or any other problem. Can anyone help?
The problem could be lying in different parts. Welcome to the world of Unicode!
Make sure the collation for the columns in MySQL is utf8_* (I personally prefer utf8_bin).
Make sure the PHP page is telling the client that the contents are encoded with UTF8. That can/should be done in two ways:
Set the following header: header('Content-Type: text/html; charset=utf-8');
In your HTML <head> add the correct meta tag: <meta charset="utf-8">
(note: while in theory it's not strictly necessary to do both, as they're equivalent for the client, it's better to be redundant!)
Make sure the connection with MySQL uses UTF8. That can be done by executing a simple query right after the connection to the database: SET NAMES 'utf8' (e.g. mysqli_query("SET NAMES 'utf8'"); alter it accordingly if you're using PDO or the MySQLi OOP APIs).
Bonus: if you're using UTF8 in your PHP script, make sure you treat everything in an Unicode-safe way. So, prefer using mb_* functions to manipulate strings, use the u flag with preg_* functions, etc. And remember than UTF8 characters are variable in the number of bytes they use, from 1 to 4!
I have same setting for my both website only problem is with database after transfering it to an other server. Encoding of pages are same on both sites.
you can check it here
http://www.abswheels.se
http://www.dackis.se/abs/
you can see the difference. any sugguestions??
also everything is fine inside the database. I dont know why when i fetch the data with special character from database it has a problem. you can see the title bar of both website. everything is same on client side. same encoding same setting
I have a form in my page for users to leave a comment.
I'm currently using this charset:
meta http-equiv="Content-Type" content="text/html;charset=ISO-8859-1"
but retrieveving the comment from DB accents are not displaying correct ( Ex. è =>è ).
Which parameters should i care about for a correct handling of accents?
SOLVED
changed meta tag to charset='utf-8'
changed character-set Mysql (ALTER TABLE comments CONVERT TO CHARACTER SET utf-8)
changed connection character-set both when inserting records and retrieving ($conn->query('SET NAMES utf8'))
Now accents are displaying correct
thanks
Luca
Character sets can be complicated and pain to debug when it comes to LAMP web applications. At each of the stages that one piece of software talks to another there's scope for incorrect charset translation or incorrect storage of data.
The places you need to look out for are:
- Between the browser and the web server (which you've listed already)
- Between PHP and the MySQL server
The character you've listed look like normal a European character that will be included in the ISO-8859-1 charset.
Things to check for:
even though you're specifying the character set in a meta header have a look in your browser to be sure which character set the browser is actually using. If you've specified it the browser should use that charset to render/view the page but in cases I've seen it attempting to auto-detect the correct charset and failing. Most browsers will have an "encoding" menu (perhaps under "view") that allows you to choose the charset. Ensure that it says ISO-8859-1 (Western European).
MySQL can happily support character set conversion if required to but in most cases you want to have your tables and client connection set to use the same encoding. When configured this way MySQL won't attempt to do any encoding conversion and will just write the data you input byte for byte into the table. When read it'll come out the same way byte for byte.
You've not said if you're reading data from the database back out with the same web-app or with some other client. I'd suggest you try to read it out with the same web application and using the same meta charset header (again, check the browser is really setting it) and see what is displayed in the browser.
To debug these issues requires you to be really sure about whether the client/console you're using is doing any conversion too, the safest way is sometimes to get the data into a hex editor where you can be sure that nothing else is messing around with any translation.
If it doesn't look like it's a browser-side problem please can you include the output of the following commands against your database:
Run from a connection that your web-app makes (not from some other MySQL client):
SHOW VARIABLES LIKE 'character_set%';
SHOW VARIABLES LIKE 'collation%';
Run from any MySQL client:
SHOW CREATE TABLE myTable;
(where myTable is the table you're reading/writing data from/to)
The ISO-8859-1 character set is for Latin characters only. Try UTF-8, and make sure that the database these characters are coming from are also UTF-8 columns.
I am creating a web base application using PHP and MySQL. I want it to be able to save any kind of user input characters, both English and non-English characters like Arabic or Japanese at the same time.
What should I do to achieve that?
You need to use Unicode. Read the MySQL manual section on Unicode and Joel Spolsky's The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!).
You'll likely want to set the character set (encoding) of the table/columns in question to utf8. You'll also need to set the encoding of your HTML/PHP files to UTF-8. You can do this with a meta tag in <head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
You can also set the Content-Type: header that Apache/PHP sends out.
Even after setting this, you may still run in to browser-specific issues. For example, Internet Explorer may not always use UTF-8, so Rails 3 had to put in a workaround.
For MySQL, you first need to define your data with the UTF8 character set:
CREATE DATABASE xx [...] DEFAULT CHARACTER SET 'utf8' DEFAULT COLLATE utf8_general_ci
And when creating database connections from PHP, you just need to run a quick command after opening it:
SET NAMES 'utf8'
Alternatively, if you have access to MySQL's my.ini, you can just add this to the config and forget the above:
skip-character-set-client-handshake
collation_server=utf8_unicode_ci
character_set_server=utf8
(note that's not php.ini, but MySQL's ini)
For PHP, if you need to manipulate multibyte strings: make sure you have the mbstring library active, and then change your string & regexp function calls to use the mb_* equivalent.
Also, make sure your editor is saving in UTF8 so everything's consistent. Eclipse/PDT makes it easy, at least (project -> properties -> text file encoding).
Finally, handling cultural differences: that's the hard part. Sometimes it's as easy as setting p { direction: rtl; } in CSS, and other times you'll be tearing your hair out trying to decipher what alphabet(s) a user just posted with. It depends on what you're doing with the different languages.
For starters, make sure that you read up on SQL injection. You would need to take strong precautions so that you safely encode the input. Usually, you'd be filtering/discarding unsafe content. So if you really need to allow it, then you need to be careful that you don't make it easy to hack yourself.
Essentially, you need the same sort of protection, while allowing "dangerous" content such as source code examples, that sites like this one use. Also systems that are commonly targeted such as PHPBB2, WordPress, Wiki, etc..
I think your task is harder if the data needs to be searchable.
If you are using PHP, the mysql_real_escape_string() function looks good:
http://www.tizag.com/mysqlTutorial/mysql-php-sql-injection.php
Otherwise, use somethign similar.
I have turkish character problem in mysql database when adding content with tinymce from admin panel.
Charset is:
<meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-9"" />
How can I solve this?
Thanks in advance
Make sure the table in MySQL is also defined as having charset ISO-8859-9.
There's not enough information to say what your problem is, but in general you need the same character set in your HTML page (text/html;charset), PHP's connection to the database (mysql_set_charset), and MySQL's CREATE TABLE ... DEFAULT CHARACTER SET (if you just CREATE TABLE it will end up in Latin-1 which you probably don't want. Plus you would need to make sure not to use htmlentities-without-charset-argument on output (use htmlspecialchars instead).
See eg. this answer for more detail. That's talking about using UTF-8 for the encoding, but the same applies if you substitute ISO-8859-9 all the way through. (Although unless there's a good reason not to, you should really be using UTF-8.)
well I had a similar problem with my turkish site.
My tables were in latin5_turkish_ci an the charset of the php page were latin5
there was no problem when I submitted the content via php to database, all characters were being saved correctly
but when I tried to submit the content via jquery post method then any turkish character was being saved correctly to database
and php iconv function solved my problem
I have content stored in a Postgres DB, now everytime I call the content so that it gets displayed using php, i get funny squares in IE and funny square type question marks in Firefox?
Example below
* - March � May 2009
How do I remove this?
I do not have access to the server so can't adjust the encoding there, only have postgres DB details and FTP access to upload my files
I would also recommend: The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!) by Joel Spolsky, I've read it only recently myself, it will definitely help you sort out your problems.
You need to make sure that Postgres, PHP, and your browser all agree on the content encoding, and that you have an appropriate font selected in your browser. The simplest way to do that is to choose UTF8 for everything.
I don't know about PHP, but I do know about databases and browsers. First you need to find out if the database is UTF8. (From psql, I would do a "\l" and look at the encoding.) Then you need to find out if PHP supports UTF8 (I have no idea how you do that). Then you need to see if how those characters are being stored in the database by the PHP app. Then you need to figure out if the web server is correctly reporting the content encoding. (On Linux/Unix, I'd use the program "HEAD" (not "head") to see the headers its returning.) And then you need to figure out if your browser is using a font that supports UTF8.
Or, you could just make sure you only store ASCII and forget the rest of the world exists. Not recommended.
Wrong charset somewhere. The characters could be stored wrong already in database, or you have wrong charset in meta tags on the page(try manually change charset in browser), or there could be problem with wrong encoding when page is communicating with database.
Check this page http://www.postgresql.org/docs/8.2/static/multibyte.html for more informations.
Try to have same encoding on all places, preferably UTF-8
You have encoding issues. Make sure the encoding is set right in the database, in the html markup and make sure the files themselves are saved in proper encoding.