How to get rid of � using php - php

I am pulling comments out of the database and have this, �, show up... how do I get rid of it? Is it because of whats in the database or how I'm showing it, I've tried using htmlspecialchars but doesn't work.
Please help

The problem lies with Character Encoding. If the character shows up fine in the database, but not on the page. Your page needs to be set to the same character encoding as the database. And vice a versa, if your page that posts to the database character encoding does not match, well it comes out weird.
I generally set my character encoding to UTF-8 for any type of posting fields, such as Comments / Posts. Most MySQL databases default to the latin charset. So you will need to modify that: http://yoonkit.blogspot.com/2006/03/mysql-charset-from-latin1-to-utf8.html
The HTML part can be done with a META tag: <META http-equiv="Content-Type" content="text/html; charset=UTF-8">
or with PHP: header('Content-type: text/html; charset=utf-8'); (must be placed before any output.)
Hopefully that gets the ball rolling for you.

That happens when you have a character that your font doesn't know how to display. It shows up differently in every program, many Windows programs show it as a box, Firefox shows it as a questionmark in a diamond, other programs just use a plain question mark.
So you can use a newer display system, install a missing font (like if it's asian characters) or look to see if it's one or two characters that do this and just replace them with something visible.

It might be problem of the way you are storing the information in the database. If the encoding you were using didn't accept accents (à, ñ, î, ç...), then it stores them using weird symbols. Same happens to other language specific symbols. There is probably not a solution for what's already in the database, but you can still save the following inserts by changing the encoding type in mysql.
Cheers

Make sure your database UTF-8 (if it won't solve the problem make sure you specify your char-set while connecting to the database).
You can also encode / decode before entering data to your database.
I would suggest to go with htmlspecialchars() for encoding and htmlspecialchars_decode() for decoding.

Are you passing your charset in mysql_set_charset() with mysql_connect() ???

As others have said, check what your database encoding is. You could try using utf8_encode() or iconv() to convert your character encoding.

Check your code for errors. That's all one can really say considering that you have given us absolutely no details as to what you're doing.
Encoding problems are usually what cause that (are you converting from integers to characters?), so, you fix it by checking if you're converting things properly.

Related

How to correcly display data that is not English as a radio button title

I am having an issue where some title for radio button are showing blank due to non-english characters in the title.
Here is example of a string from the database that is showing blank.
WATERTOWN/171 Watertown St/Rte 16/Newtonÿ
The last character in the string is ÿ, that is what is causing the problem in this case.
How can I correct this problem?
On the very top of my page I have this code
<meta charset="utf-8">
I am not sure if the character ÿ is not a valid UTF-8 or not.
I tried using the method utf8_encode() on the data before storing it into the SQL Server database but that did not work.
That is the problem here and how to fix it?
PHP is a little bit relaxed on everything, also on encoding. There is no real "encoding" you probaly know from other languages.
Actually it treats the string byte by byte and passes them to the output.
What you can do (if not already done): set the database connection encoding to UTF-8.
And double check, if this character is not already in the database ;)
(And yes, ÿ is of course an UTF-8 character ;) but it looks a little bit, that you are using a different encoding for the database connection, then what is store actually in the database)

Why Unicode Data is stored in numeric form in mysql database

I have this following
$html = <div>ياں ان کي پرائيويٹ ليمٹڈ کمپنياں ہيں</div>
But it is being stored in the mysql database as following format
تو يہ اسمب
لي ميں غر
يب کو آنے
نہيں
Actually, When I retrieve the data from mysql database and shows it on the webpage it is shown correctly.
But I want to know that Is it the standard format of unicode to store in the database, or the unicode data should be stored as it is (ياں ان کي پرائيويٹ ليمٹڈ کمپنياں ہيں)
When you store unicode in your database...
First off, your database has to be set as 'utf-general', which is not the default. With MySQL, you have to set both the table to utf format, AND individual columns to utf. In addition to this, you have to be sure that your connection is a utf-8 connection, but doing that varies based on what method you use to store the unicode text into your database.
To set your connection's char-set, if you are using Mysqli, you would do this:
$c->set_charset('utf8'); where $c is a Mysqli connection.
Still, you have to change your database charsets like I said before.
EDIT: I honestly don't think it matters MUCH how you store it, though I store it as the actual unicode characters, because that way if some user were to input '& #1610;' into the database, it wouldn't be retrieved as a unicode character by mistake.
EDIT: Here is a good example, if you remove that space between & and #1610; in my answer, it will be mistakenly retrieved from the server as a unicode character, unless you want users to be able to create unicode characters by using a code like that.
Not a perfect example since stackoverflow does that on purpose, and it doesn't work like that really, but the concept is the same.
Something wrong with data charset. I don't know what exactly.
This is workaround. Do it before insert/update:
$str = html_entity_decode($str, ENT_COMPAT, 'UTF-8');
it looks like to me that this is HTML encoding, the way PHP encode unicode to make sure it will display OK on the web page, no matter the page encoding.
Did you tried to fetch the same data using MySQL Workbench?
It seems that somewhere in your PHP code htmlentities is being used on the text -- instead of htmlspecialchars. The difference with htmlentities is that it escapes a lot of non-ASCII characters in the form you see there. Then the result of that is being stored in the database. It's not MySQL's doing.
In theory this shouldn't be necessary. It should be okay to output the plain characters if you set the character set of the page correctly. Aassuming UTF-8, for example, use header('Content-Type: text/html; charset=utf-8'); or <meta http-equiv="Content-Type" value="text/html; charset=utf-8">.
This might result in gibberish (mojibake) if you view the database directly (although it will display fine on the web page) unless you also make sure the character set of the database is set correctly. That means the table columns, table, database, and connection character set all to, probably, utf8mb4_general_bin or utf8_general_bin (or ..._general_ci). In practice getting it all working can be a bit of a nuisance. If you didn't write this code, then probably someone in your code base decided at some point to use htmlentities on it to convert the exotic characters to ASCII HTML entities, to make storage easier. Or sometimes people use htmlentities out of habit when the merer htmlspecialchars would be fine.

Encoding issue storing HTML in mySQL using PHP

I have built a CMS that allows HTML to be stored in a database. It all started off very simple. I displayed the HTML in a textarea using htmlspecialchars to prevent it from breaking the form. Then saved it back using html_specialchars_decode. It all seemed to work fine until someone pasted some HTML into the system instead of typing. At this point it stored fine but lost most of the whitespace which meant all the lovely indentation had to be done from scratch.
To fix it, I tried specifying everything in utf-8 encoding because any attempt to fiddle with it seemed to produce invalid characters.
I specify utf-8 in the PHP header
header('Content-Type: text/html; charset=utf-8');
I specify utf-8 in my HTML page
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
I specify utf-8 in the HTML form
<form accept-charset="utf-8"
Then I read the posted value (basically) like this:
$Val = $_POST[$SafeFieldName];
My understanding was that PHP did everything in utf-8 so I am a bit surprised at this stage that I get gobbledegook - unless I now do this:
$Val = utf8_decode($Val);
So, at this stage - it works - sort of. I loose all my lovely indentation but not all of my white space. It's as if there are some non utf8 chars being stripped out. Weirdly I'm using Chrome but in Firefox, it seems fine
I think I'm just tying myself in knots now. Any elegant suggestions? I need to get to the bottom of this as opposed to just hack it to get it to work.
The connection to the DB and the DB tables itself should support UTF-8. Make sure that your table's collation is utf8_general_ci and that all string fields within the table also have the utf8_general_ci collation.
The DB connection should be UTF-8 as well:
mysql_set_charset('utf8');
See http://akrabat.com/php/utf8-php-and-mysql/ for more info.
Update: some report that
mysql_query('SET NAMES utf8');
is required sometimes as well!
If making your tables and connection UTF-8 is not possible, you could of course save the HTML as BASE64 encoded data, and decode it back when you retrieve it from the DB again.
Check your DataBase connection encodin, and check DataBase table field encoding where you store HTML.
Maybe there encoding is different from UTF-8
If this is an issue in and out of MySQL (as you suggested in the title) then you need to make sure the columns and tables are UTF8-BIN and put mysql_set_charset('utf8'); after opening the connection to MySQL.
Sorted - and the answer is really embarrassing - but you never know, some day someone may need this :)
I noticed that it worked differently (but still fairly rubbish) in Firefox so I had a look at my style sheet and found this:
white-space: nowrap;
Someone (me) must have put that in there to try to get horizontal scrolling working in some browser. Without that, the HTML makes it all the way to the DB and back again.
My only other question was why did I need this since the whole thing should have been arriving in utf8
$Val = utf8_decode($Val);
Magically - now I don't need it.

Native language problem in mysql with tinyMCE

I have turkish character problem in mysql database when adding content with tinymce from admin panel.
Charset is:
<meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-9"" />
How can I solve this?
Thanks in advance
Make sure the table in MySQL is also defined as having charset ISO-8859-9.
There's not enough information to say what your problem is, but in general you need the same character set in your HTML page (text/html;charset), PHP's connection to the database (mysql_set_charset), and MySQL's CREATE TABLE ... DEFAULT CHARACTER SET (if you just CREATE TABLE it will end up in Latin-1 which you probably don't want. Plus you would need to make sure not to use htmlentities-without-charset-argument on output (use htmlspecialchars instead).
See eg. this answer for more detail. That's talking about using UTF-8 for the encoding, but the same applies if you substitute ISO-8859-9 all the way through. (Although unless there's a good reason not to, you should really be using UTF-8.)
well I had a similar problem with my turkish site.
My tables were in latin5_turkish_ci an the charset of the php page were latin5
there was no problem when I submitted the content via php to database, all characters were being saved correctly
but when I tried to submit the content via jquery post method then any turkish character was being saved correctly to database
and php iconv function solved my problem

Why is this the extended ascii character (â, é, etc) getting replaced with <?> characters?

Why is this the extended ascii character (â, é, etc) getting replaced with <?> characters?
I attached a pic... but I am using PHP to pull the data from MySQL, and some of these locations have extended characters... I am using the Font Arial.
You can see the screen shot here: http://img269.imageshack.us/i/funnychar.png/
Still happening after the suggestions, here is what I did:
My firefox (view->encoding) is set to UTF-8 after adding the line, however, the text inside the option tags is still showing the funny character instead of the actual accented one. What should I look for now?
UPDATE:
I have the following in the PHP program that is giving my those <?> characters...
ini_set( 'default_charset', 'UTF-8' );
And right after my zend db object creation, I am setting the following query:
$db->query("SET NAMES utf8;");
I changed all my tables over to UTF-8 and reinserted all the data (waste of time) as it never helped. It was latin1 prior.
Also STATUS is reporting:
Connection: Localhost via UNIX socket
Server characterset: latin1
Db characterset: latin1
Client characterset: utf8
Conn. characterset: utf8
UNIX socket: /var/run/mysqld/mysqld.sock
Uptime: 4 days 20 hours 59 min 41 sec
Looking at the source of the page, I see
<option value="Br�l� Lake"> Br�l� Lake
OK- NEW UPDATE-
I Changed everything in my PHP and HTML to:
and
header('Content-Type: text/html; charset=latin1');
Now it works, what gives?? How do I convert it all to UTF-8?
That's what the browser does when it doesn't know the encoding to use for a character. Make sure you specify the encoding type of the text you send to the client either in headers or markup meta.
In HTML:
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
In PHP (before any other content is sent to the client):
header('Content-Type: text/html; charset=utf-8');
I'm assuming you'll want UTF-8 encoding. If your site uses another encoding for text, then you should replace UTF-8 with the encoding you're using.
One thing to note about using HTML to specify the encoding is that the browser will restart rendering a page once it sees the Content-Type meta tag, so you should include the <meta /> tag immediately after the <head /> tag in your page so the browser doesn't do any more extra processing than it needs.
Another common charset is "iso-8859-1" (Basic Latin), which you may want to use instead of UTF-8. You can find more detailed info from this awesome article on character encodings and the web. You can also get an exhaustive list of character encodings here if you need a specific type.
If nothing else works, another (rare) possibility is that you may not have a font installed on your computer with the characters needed to display the page. I've tried repeating your results on my own server and had no luck, possibly because I have a lot of fonts installed on my machine so the browser can always substitute unavailable characters from one font with another font.
What I did notice by investigating further is that if text is sent in an encoding different than the encoding the browser reports as, Unicode characters can render unexpectedly. To work around this, I used the HTML character entity representation of special characters, so â becomes â in my HTML and é becomes é. Once I did this, no matter what encoding I reported as, my characters rendered correctly.
Obviously you don't want to modify your database to HTML encode Unicode characters. Your best option if you must do this is to use a PHP function, htmlentities(). You should use this function on any data-driven text you expect to have Unicode characters in. This may be annoying to do, but if specifying the encoding doesn't help, this is a good last resort for forcing Unicode characters to work.
There is no such standard called "extended ASCII", just a bunch of proprietary extensions.
Anyway, there are a variety of possible causes, but it's not your font. You can start by checking the character set in MySQL, and then see what PHP is doing. As Dan said, you need to make sure PHP is specifying the character encoding it's actually using.
As others have mentioned, this is a character-encoding question. You should read Joel Spolsky's article about character encoding.
Setting
header('Content-Type: text/html; charset=utf-8');
will fix your problem if your php page is writing UTF-8 characters to the browser. If the text is still garbled, it's possible your text is not UTF-8; in that case you need to use the correct encoding name in the Content-Type header. If you have a choice, always use UTF-8 or some other Unicode encoding.
Simplest fix
ini_set( 'default_charset', 'UTF-8' );
this way you don't have to worry about manually sending the Content-Type header yourself.
EDIT
Make sure you are actually storing data as UTF-8 - sending non-UTF-8 data to the browser as UTF-8 is just as likely to cause problems as sending UTF-8 data as some other character set.
SELECT table_collation
FROM information_schema.`TABLES` T
WHERE table_name=[Table Name];
SELECT default_character_set_name
, default_collation_name
FROM information_schema.`SCHEMATA` S
WHERE schema_name=[Schema Name];
Check those values
There are two transmission encodings, PHP<->browser and Mysql<->PHP, and they need to be consistent with each other. Setting up the encoding for Mysql<->PHP is dealt with in the answers to the questions below:
Special characters in PHP / MySQL
How to make MySQL handle UTF-8 properly
php mysql character set: storing html of international content
The quick answer is "SET NAMES UTF8".
The slow answer is to read the articles recommended in the other answers - it's a lot better to understand what's going on and make one precise change than to apply trial and error until things seem to work. This isn't just a cosmetic UI issue, bad encoding configurations can mess up your data very badly. Think about the Simpsons episode where Lisa gets chewing gum in her hair, which Marge tries to get out by putting peanut butter on.
You should encode all special chars into HTML entities instead of depending on the charset.
htmlentities() will do the work for you.
I changed all my tables over to UTF-8 and reinserted all the data (waste of time) as it never helped. It was latin1 prior.
If your original data was latin1, then inserting it into a UTF-8 database won't convert it to UTF-8, AFAIK, it will insert the same data but now believe it's UTF-8, thus breaking.
If you've got a SQL dump, I'd suggest running it through a tool to convert to UTF-8. Notepad++ does this pretty well - simply open the file, check that the accented characters are displaying correctly, then find "convert to UTF-8" in the menu.
These special characters generally appear due to the the extensions. If we provide a meta tag with charset=utf-8 we can eliminate them by adding:
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
to your meta tags

Categories