I have built a CMS that allows HTML to be stored in a database. It all started off very simple. I displayed the HTML in a textarea using htmlspecialchars to prevent it from breaking the form. Then saved it back using html_specialchars_decode. It all seemed to work fine until someone pasted some HTML into the system instead of typing. At this point it stored fine but lost most of the whitespace which meant all the lovely indentation had to be done from scratch.
To fix it, I tried specifying everything in utf-8 encoding because any attempt to fiddle with it seemed to produce invalid characters.
I specify utf-8 in the PHP header
header('Content-Type: text/html; charset=utf-8');
I specify utf-8 in my HTML page
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
I specify utf-8 in the HTML form
<form accept-charset="utf-8"
Then I read the posted value (basically) like this:
$Val = $_POST[$SafeFieldName];
My understanding was that PHP did everything in utf-8 so I am a bit surprised at this stage that I get gobbledegook - unless I now do this:
$Val = utf8_decode($Val);
So, at this stage - it works - sort of. I loose all my lovely indentation but not all of my white space. It's as if there are some non utf8 chars being stripped out. Weirdly I'm using Chrome but in Firefox, it seems fine
I think I'm just tying myself in knots now. Any elegant suggestions? I need to get to the bottom of this as opposed to just hack it to get it to work.
The connection to the DB and the DB tables itself should support UTF-8. Make sure that your table's collation is utf8_general_ci and that all string fields within the table also have the utf8_general_ci collation.
The DB connection should be UTF-8 as well:
mysql_set_charset('utf8');
See http://akrabat.com/php/utf8-php-and-mysql/ for more info.
Update: some report that
mysql_query('SET NAMES utf8');
is required sometimes as well!
If making your tables and connection UTF-8 is not possible, you could of course save the HTML as BASE64 encoded data, and decode it back when you retrieve it from the DB again.
Check your DataBase connection encodin, and check DataBase table field encoding where you store HTML.
Maybe there encoding is different from UTF-8
If this is an issue in and out of MySQL (as you suggested in the title) then you need to make sure the columns and tables are UTF8-BIN and put mysql_set_charset('utf8'); after opening the connection to MySQL.
Sorted - and the answer is really embarrassing - but you never know, some day someone may need this :)
I noticed that it worked differently (but still fairly rubbish) in Firefox so I had a look at my style sheet and found this:
white-space: nowrap;
Someone (me) must have put that in there to try to get horizontal scrolling working in some browser. Without that, the HTML makes it all the way to the DB and back again.
My only other question was why did I need this since the whole thing should have been arriving in utf8
$Val = utf8_decode($Val);
Magically - now I don't need it.
Related
I am transfering the database from one server to another server using phpmyadmin. I successfully transfered it but having issue with swedish characters. I can see the swedish characters are displaying properly within the tables but in php pages it is wrong seems like double encoded or any other problem. Can anyone help?
The problem could be lying in different parts. Welcome to the world of Unicode!
Make sure the collation for the columns in MySQL is utf8_* (I personally prefer utf8_bin).
Make sure the PHP page is telling the client that the contents are encoded with UTF8. That can/should be done in two ways:
Set the following header: header('Content-Type: text/html; charset=utf-8');
In your HTML <head> add the correct meta tag: <meta charset="utf-8">
(note: while in theory it's not strictly necessary to do both, as they're equivalent for the client, it's better to be redundant!)
Make sure the connection with MySQL uses UTF8. That can be done by executing a simple query right after the connection to the database: SET NAMES 'utf8' (e.g. mysqli_query("SET NAMES 'utf8'"); alter it accordingly if you're using PDO or the MySQLi OOP APIs).
Bonus: if you're using UTF8 in your PHP script, make sure you treat everything in an Unicode-safe way. So, prefer using mb_* functions to manipulate strings, use the u flag with preg_* functions, etc. And remember than UTF8 characters are variable in the number of bytes they use, from 1 to 4!
I have same setting for my both website only problem is with database after transfering it to an other server. Encoding of pages are same on both sites.
you can check it here
http://www.abswheels.se
http://www.dackis.se/abs/
you can see the difference. any sugguestions??
also everything is fine inside the database. I dont know why when i fetch the data with special character from database it has a problem. you can see the title bar of both website. everything is same on client side. same encoding same setting
I have this following
$html = <div>ياں ان کي پرائيويٹ ليمٹڈ کمپنياں ہيں</div>
But it is being stored in the mysql database as following format
تو يہ اسمب
لي ميں غر
يب کو آنے
نہيں
Actually, When I retrieve the data from mysql database and shows it on the webpage it is shown correctly.
But I want to know that Is it the standard format of unicode to store in the database, or the unicode data should be stored as it is (ياں ان کي پرائيويٹ ليمٹڈ کمپنياں ہيں)
When you store unicode in your database...
First off, your database has to be set as 'utf-general', which is not the default. With MySQL, you have to set both the table to utf format, AND individual columns to utf. In addition to this, you have to be sure that your connection is a utf-8 connection, but doing that varies based on what method you use to store the unicode text into your database.
To set your connection's char-set, if you are using Mysqli, you would do this:
$c->set_charset('utf8'); where $c is a Mysqli connection.
Still, you have to change your database charsets like I said before.
EDIT: I honestly don't think it matters MUCH how you store it, though I store it as the actual unicode characters, because that way if some user were to input '& #1610;' into the database, it wouldn't be retrieved as a unicode character by mistake.
EDIT: Here is a good example, if you remove that space between & and #1610; in my answer, it will be mistakenly retrieved from the server as a unicode character, unless you want users to be able to create unicode characters by using a code like that.
Not a perfect example since stackoverflow does that on purpose, and it doesn't work like that really, but the concept is the same.
Something wrong with data charset. I don't know what exactly.
This is workaround. Do it before insert/update:
$str = html_entity_decode($str, ENT_COMPAT, 'UTF-8');
it looks like to me that this is HTML encoding, the way PHP encode unicode to make sure it will display OK on the web page, no matter the page encoding.
Did you tried to fetch the same data using MySQL Workbench?
It seems that somewhere in your PHP code htmlentities is being used on the text -- instead of htmlspecialchars. The difference with htmlentities is that it escapes a lot of non-ASCII characters in the form you see there. Then the result of that is being stored in the database. It's not MySQL's doing.
In theory this shouldn't be necessary. It should be okay to output the plain characters if you set the character set of the page correctly. Aassuming UTF-8, for example, use header('Content-Type: text/html; charset=utf-8'); or <meta http-equiv="Content-Type" value="text/html; charset=utf-8">.
This might result in gibberish (mojibake) if you view the database directly (although it will display fine on the web page) unless you also make sure the character set of the database is set correctly. That means the table columns, table, database, and connection character set all to, probably, utf8mb4_general_bin or utf8_general_bin (or ..._general_ci). In practice getting it all working can be a bit of a nuisance. If you didn't write this code, then probably someone in your code base decided at some point to use htmlentities on it to convert the exotic characters to ASCII HTML entities, to make storage easier. Or sometimes people use htmlentities out of habit when the merer htmlspecialchars would be fine.
This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
UTF-8 all the way through
Searching high & low for a solution. I've tried many variations before posting the question.
What is required to have names appear the same in phpMyAdmin and html page? Can this even be accomplished?
EDIT 1: It would seem that this is a mysql issue. Why? Because the php generated html page will always show the correct characters. At this point it is only the database that shows incorrectly.
EDIT 2: Clarification. With the original settings shown in code snip and images below,
Enter João and submit
João displayed in database
João display after reload
Adding the mysqli_query ( $link, 'SET NAMES utf8' )
Enter João and submit
João displayed in database
Jo�o displayed after reload
end Edit 2
In a mysql database, viewed with phpMyAdmin:
The items appear in the database like this: (I've modified the first João to appear correct in database)
And in the html page with encoding set the names appear like (order is reversed & modified has black diamond),
Encoding: <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
I have tried changing the column collation to utf8_bin, utf8_general_ci, utf8_unicode_ci, all with no change to either side. Also changed the document (BBEdit) from UTF-8 to UTF-8 (with BOM), ISO Latin 1 and Windows Latin 1. Several of these created more black diamonds, making the issue worse. (Set to UTF-8 in images) I even tried to preg_replace ã, é etc with the encoded equivalents.
The short story is, João is entered on the page (content type above), João is in database, and João comes to the html page on refresh.
Looking for ideas. Thanks.
Character set issues are often really tricky to figure out. Basically, you need to make sure that all of the following are true:
The DB connection is using UTF-8
The DB tables are using UTF-8
The individual columns in the DB tables are using UTF-8
The data is actually stored properly in the UTF-8 encoding inside the database (often not the case if you've imported from bad sources, or changed table or column collations)
The web page is requesting UTF-8
Apache is serving UTF-8
Here's a good tutorial on dealing with that list, from start to finish: https://web.archive.org/web/20110303024445/http://www.bluebox.net/news/2009/07/mysql_encoding/
It sounds like your problem is specifically that you've got double-encoded (or triple-encoded) characters, probably from changing character sets or importing already-encoded data with the wrong charset. There's a whole section on fixing that in the above tutorial.
make sure your DB connection is using UTF-8 as well. Try putting the below line on top of your page,
mysql_query("SET NAMES utf8");
Default PHP hates UTF8. Make sure that you're using mbstring functions rather than the usual built-in string functions.
Make sure that your html page, along with the scripts participated in the AJAX data exchange are being served with a proper HTTP headers including
Content-Type: text/html; charset=UTF-8
As html-side encoding settings might be just ignored by browsers
I am pulling comments out of the database and have this, �, show up... how do I get rid of it? Is it because of whats in the database or how I'm showing it, I've tried using htmlspecialchars but doesn't work.
Please help
The problem lies with Character Encoding. If the character shows up fine in the database, but not on the page. Your page needs to be set to the same character encoding as the database. And vice a versa, if your page that posts to the database character encoding does not match, well it comes out weird.
I generally set my character encoding to UTF-8 for any type of posting fields, such as Comments / Posts. Most MySQL databases default to the latin charset. So you will need to modify that: http://yoonkit.blogspot.com/2006/03/mysql-charset-from-latin1-to-utf8.html
The HTML part can be done with a META tag: <META http-equiv="Content-Type" content="text/html; charset=UTF-8">
or with PHP: header('Content-type: text/html; charset=utf-8'); (must be placed before any output.)
Hopefully that gets the ball rolling for you.
That happens when you have a character that your font doesn't know how to display. It shows up differently in every program, many Windows programs show it as a box, Firefox shows it as a questionmark in a diamond, other programs just use a plain question mark.
So you can use a newer display system, install a missing font (like if it's asian characters) or look to see if it's one or two characters that do this and just replace them with something visible.
It might be problem of the way you are storing the information in the database. If the encoding you were using didn't accept accents (à, ñ, î, ç...), then it stores them using weird symbols. Same happens to other language specific symbols. There is probably not a solution for what's already in the database, but you can still save the following inserts by changing the encoding type in mysql.
Cheers
Make sure your database UTF-8 (if it won't solve the problem make sure you specify your char-set while connecting to the database).
You can also encode / decode before entering data to your database.
I would suggest to go with htmlspecialchars() for encoding and htmlspecialchars_decode() for decoding.
Are you passing your charset in mysql_set_charset() with mysql_connect() ???
As others have said, check what your database encoding is. You could try using utf8_encode() or iconv() to convert your character encoding.
Check your code for errors. That's all one can really say considering that you have given us absolutely no details as to what you're doing.
Encoding problems are usually what cause that (are you converting from integers to characters?), so, you fix it by checking if you're converting things properly.
I'm still learning the ropes with PHP & MySQL and I know I'm doing something wrong here with how character sets are set up, but can't quite figure out from reading here and on the web what I should do.
I have a standard LAMP installation with PHP 5, MySQL 5. I set everything up with the defaults. When some of my users input comments to our database some characters show up incorrectly - mostly apostrophes and em dashes at the moment. In MySQL apostrostrophes show up as ’. They display on the page this way also (I'm using htmlentities to output user comments).
In phpMyAdmin it says my MySQL Charset is UTF8-Unicode.
In my database my tables are all set up with the default Latin1-Swedish-ci.
My web pages all have meta http-equiv="Content-Type" content="text/html; charset=utf-8"
When I look at the site's http headers I see: Content-Type: text/html
Like a newbie, I hadn't considered character sets at all until things started looking odd on some of my pages. So does it make most sense for me to convert everything to utf-8 and will this affect my PHP code? Or should I try to get it all into Latin? And do I have to go into the database and replace these odd codes, or will they magically display once I set up the charsets properly? All the fiddling I've done so far hasn't helped (I set the http headers to utf-8, and also tried latin).
If you really want to understand these issues, I would start by reading this article at mysql.com. Basically, you want every piece of the puzzle to expect UTF-8 unicode. On the PHP side, you want to do something like:
<?php header("Content-type: text/html; charset=utf-8");?>
<html>
<head>
<meta http-equiv="Content-type" value="text/html; charset=utf-8">
And when you run your insert queries you want to make sure both the table's character encoding and the encoding that you're running the queries in are UTF-8. You can accomplish the latter by running the query SET NAMES utf8 right before you run an insert query.
http://www.phpwact.org/php/i18n/charsets
That site gave me a lot of good advice on how to make everything play nice in UTF-8.
I also recomened switching from htmlentities to htmlspecialchars as it is more UTF friendly.
The main point is to make sure everything is talking the same language. Your database, your database connection, your PHP, your page is in utf8 (should have a meta tag and a header saying so).
Sorry for not understanding all of your question. But when part of the question is "UTF-8 or not?", the answer is: "UTF-8, of course!"
You definitely want to sort things out now rather than later. One of the most important programming rules is not to keep going with a bad idea - don't dig yourself in any deeper!
As latin1 and utf-8 are compatible, you can convert your tables to use utf-8 without manipulating the data contained by hand. MySQL will sort this part out for you.
It's then important to check that everything is speaking utf-8. Set the http headers in apache or use a meta tag - this says to a browser that the HTML output is utf-8.
With this in mind, you need to make sure all of the data you send really is utf-8! Configure your IDE to save php/html files as utf-8. Finally make sure that PHP is using a utf-8 connection to MySQL - issue this query after connecting:
SET NAMES 'utf-8';