Using accented characters in PHP - php

I am trying to upload text into my database. I keep getting encoded text when user input certain characters. For example, if the user inputs "Jalapeño" I end up getting "Jalape%C3%B1o" in my database.
I know this is because the "ñ" is being read as "%C3%B1"...
I am not exactly sure how to go about converting all of the potential accented characters, I have looked at rewurldecode and others, but am wondering if there is a quick solution here. Any recommendations would be wonderful. Thank you!

Make sure your database uses the right character set and collation (usually you'll want UTF-8). Also make sure you you set the connection character set and sent the right headers.
e.g. mysql:
mysql_connect();
mysql_set_charset("utf8");
header('Content-Type: text/html; charset=utf-8');
You could also use htmlentities to convert the special characters into HTML entities, but you'll still need to make sure your database collation and connection settings are correct.

Your character is urlencoded. To display it properly, you have to decode it. But url encoding is not meant to be used when saving data in the database. As dtech points, you have to use utf8 character set.

Related

Why Unicode Data is stored in numeric form in mysql database

I have this following
$html = <div>ياں ان کي پرائيويٹ ليمٹڈ کمپنياں ہيں</div>
But it is being stored in the mysql database as following format
تو يہ اسمب
لي ميں غر
يب کو آنے
نہيں
Actually, When I retrieve the data from mysql database and shows it on the webpage it is shown correctly.
But I want to know that Is it the standard format of unicode to store in the database, or the unicode data should be stored as it is (ياں ان کي پرائيويٹ ليمٹڈ کمپنياں ہيں)
When you store unicode in your database...
First off, your database has to be set as 'utf-general', which is not the default. With MySQL, you have to set both the table to utf format, AND individual columns to utf. In addition to this, you have to be sure that your connection is a utf-8 connection, but doing that varies based on what method you use to store the unicode text into your database.
To set your connection's char-set, if you are using Mysqli, you would do this:
$c->set_charset('utf8'); where $c is a Mysqli connection.
Still, you have to change your database charsets like I said before.
EDIT: I honestly don't think it matters MUCH how you store it, though I store it as the actual unicode characters, because that way if some user were to input '& #1610;' into the database, it wouldn't be retrieved as a unicode character by mistake.
EDIT: Here is a good example, if you remove that space between & and #1610; in my answer, it will be mistakenly retrieved from the server as a unicode character, unless you want users to be able to create unicode characters by using a code like that.
Not a perfect example since stackoverflow does that on purpose, and it doesn't work like that really, but the concept is the same.
Something wrong with data charset. I don't know what exactly.
This is workaround. Do it before insert/update:
$str = html_entity_decode($str, ENT_COMPAT, 'UTF-8');
it looks like to me that this is HTML encoding, the way PHP encode unicode to make sure it will display OK on the web page, no matter the page encoding.
Did you tried to fetch the same data using MySQL Workbench?
It seems that somewhere in your PHP code htmlentities is being used on the text -- instead of htmlspecialchars. The difference with htmlentities is that it escapes a lot of non-ASCII characters in the form you see there. Then the result of that is being stored in the database. It's not MySQL's doing.
In theory this shouldn't be necessary. It should be okay to output the plain characters if you set the character set of the page correctly. Aassuming UTF-8, for example, use header('Content-Type: text/html; charset=utf-8'); or <meta http-equiv="Content-Type" value="text/html; charset=utf-8">.
This might result in gibberish (mojibake) if you view the database directly (although it will display fine on the web page) unless you also make sure the character set of the database is set correctly. That means the table columns, table, database, and connection character set all to, probably, utf8mb4_general_bin or utf8_general_bin (or ..._general_ci). In practice getting it all working can be a bit of a nuisance. If you didn't write this code, then probably someone in your code base decided at some point to use htmlentities on it to convert the exotic characters to ASCII HTML entities, to make storage easier. Or sometimes people use htmlentities out of habit when the merer htmlspecialchars would be fine.

How to get rid of � using php

I am pulling comments out of the database and have this, �, show up... how do I get rid of it? Is it because of whats in the database or how I'm showing it, I've tried using htmlspecialchars but doesn't work.
Please help
The problem lies with Character Encoding. If the character shows up fine in the database, but not on the page. Your page needs to be set to the same character encoding as the database. And vice a versa, if your page that posts to the database character encoding does not match, well it comes out weird.
I generally set my character encoding to UTF-8 for any type of posting fields, such as Comments / Posts. Most MySQL databases default to the latin charset. So you will need to modify that: http://yoonkit.blogspot.com/2006/03/mysql-charset-from-latin1-to-utf8.html
The HTML part can be done with a META tag: <META http-equiv="Content-Type" content="text/html; charset=UTF-8">
or with PHP: header('Content-type: text/html; charset=utf-8'); (must be placed before any output.)
Hopefully that gets the ball rolling for you.
That happens when you have a character that your font doesn't know how to display. It shows up differently in every program, many Windows programs show it as a box, Firefox shows it as a questionmark in a diamond, other programs just use a plain question mark.
So you can use a newer display system, install a missing font (like if it's asian characters) or look to see if it's one or two characters that do this and just replace them with something visible.
It might be problem of the way you are storing the information in the database. If the encoding you were using didn't accept accents (à, ñ, î, ç...), then it stores them using weird symbols. Same happens to other language specific symbols. There is probably not a solution for what's already in the database, but you can still save the following inserts by changing the encoding type in mysql.
Cheers
Make sure your database UTF-8 (if it won't solve the problem make sure you specify your char-set while connecting to the database).
You can also encode / decode before entering data to your database.
I would suggest to go with htmlspecialchars() for encoding and htmlspecialchars_decode() for decoding.
Are you passing your charset in mysql_set_charset() with mysql_connect() ???
As others have said, check what your database encoding is. You could try using utf8_encode() or iconv() to convert your character encoding.
Check your code for errors. That's all one can really say considering that you have given us absolutely no details as to what you're doing.
Encoding problems are usually what cause that (are you converting from integers to characters?), so, you fix it by checking if you're converting things properly.

mysql and encoding

I moved my php application to the new server. i use mysql5 db. When i'm Updating or Inserting something to db, every " and - sign changed to ?. I use SET NAMES UTF8 and SET CHARACTER SET but it don't work. Any ideas?
SET NAMES UTF8 should be used on every page, when selecting as well as when updating or inserting.
actually this query must be used every time you connect to the database. just add it to connect code.
You need UTF-8 all the way through to make smart quotes and dashes (“”—) and other non-ASCII characters work reliably:
(1) Ensure that the browser sends you characters encoded to UTF-8. Do this by declaring the page that includes the form to be UTF-8:
<head>
<meta http-equiv="Content-Type" content="text/html;charset=utf-8">
...
(Ignore <form accept-encoding>, which doesn't work right in IE.)
(2) PHP deals with raw bytes and doesn't care what encoding they're in, but the database does care, so you have to tell it what encoding the bytes from PHP are coming in. This is what SET NAMES is doing, though mysql_set_charset may be preferable.
(3) Once the proper characters have reached the database, it'll need to store them in a Unicode encoding to make sure all characters can fit. Each column can have a different encoding, but you can use DEFAULT CHARACTER SET utf8 when you CREATE table to make all the text columns in it use UTF-8. You can also set the default character set for a database or the whole server to utf8 if you prefer.
If you have already CREATE​d the tables and they a non-UTF-8 collation, you'll have to recreate or alter the tables. You can check the current collation using SHOW FULL COLUMNS FROM sometable;.
(4) Make sure you HTML-encode text you output from PHP using htmlspecialchars() and not htmlentities(), which by default will mess up non-ASCII characters.
[You can, as an alternative to (2) and (3), just use the default Latin-1 encoding for the connection and the table storage, but put UTF-8 bytes in it nonetheless. The disadvantage of this approach is that it'll look wrong to other tools looking at the database, and lower/upper case characters won't compare against each other in the expected case-insensitive way.]
My guess is you are pasting from some text editor which is transforming the " into an angled pretty quote, and transforming your - into an mdash, which is causing both to be represented as ?.
While you set your database to accept UTF8 characters, you probably did not set your webserver/PHP to accept those characters. Try playing with mbstring functions, but check to make sure you arent using the slanted quotes or dashes.

MySQL Collation or PHP side to display accented letters properly

What is the best Collation for the column that can allow to store accented letters and parse them out perfectly without any encoding error, because whenever I add an accented letter such as é, å, it shows out with an encoding problem on the PHP side, but in the MySQL side it's fine...
How do I get the accented letters display properly?
You get them correctly by matching the encoding on both ends, ie. both your PHP output and your DB should use the same encoding. For European languages I would suggest using UTF-8 for both your scripts and the DB. Just remember that you still have to initialize UTF-8 collation in MySQL using SET NAMES 'utf8' COLLATE 'utf8_general_ci' (so run this query just after you make a connection to the DB and you should be ok).
Perhaps your problem isn't within the database, but within however you're displaying things from PHP? What content encoding are you specifying in your output? You might need to manually send a header to specify that the content is UTF-8 if that's what you're trying to output.
For instance: header("Content-Type: text/html; charset=UTF-8");

php mysql character set: storing html of international content

i'm completely confused by what i've read about character sets. I'm developing an interface to store french text formatted in html inside a mysql database.
What i understood was that the safe way to have all french special characters displayed properly would be to store them as utf8. so i've created a mysql database with utf8 specified for the database and each table.
I can see through phpmyadmin that the characters are stored exactly the way it is supposed to. But outputting these characters via php gives me erratic results: accented characters are replaced by meaningless characters. Why is that ?
do i have to utf8_encode or utf8_decode them? note: the html page character encodign is set to utf8.
more generally, what is the safe way to store this data? Should i combine htmlentities, addslashes, and utf8_encode when saving, and stripslashes,html_entity_decode and utf8_decode when i output?
MySQL performs character set conversions on the fly to something called the connection charset. You can specify this charset using the sql statement
SET NAMES utf8
or use a specific API function such as mysql_set_charset():
mysql_set_charset("utf8", $conn);
If this is done correctly there's no need to use functions such as utf8_encode() and utf8_decode().
You also have to make sure that the browser uses the same encoding. This is usually done using a simple header:
header('Content-type: text/html;charset=utf-8');
(Note that the charset is called utf-8 in the browser but utf8 in MySQL.)
In most cases the connection charset and web charset are the only things that you need to keep track of, so if it still doesn't work there's probably something else your doing wrong. Try experimenting with it a bit, it usually takes a while to fully understand.
I strongly recomend to read this article "The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)" by Joel Spolsky, to understand what are you doing and why.
It is useful to consider the PHP-generated front end and the MySQL backend separate components. MySQL should not have to worry about display logic, nor should PHP assume that the backend does any sort of preprocessing on the data.
My advice would be to store the data in plain characters using utf8 encoding, and escape any dangerous characters with MySQLs methods.
PHP then reads the utf8 encoded data from database, processes them (with htmlentities(), most often), and displays it via whichever template you choose to use.
Emil H. correctly suggested using
SET NAMES utf8
which should be the first thing you call after making a MySQL connection. This makes the MySQL treat all input and output as utf8.
Note that if you have to use utf8_encode or utf8_decode functions, you are not setting the html character encoding correctly. It is easiest to require that every component of your system uses utf8, since that way you should never have to do manual encoding/decoding, which can cause hard to track issues later on.
In adition to what Emil H said, you also need this in your page head tag:
<meta http-equiv="Content-Type" content="text/html;charset=utf-8" />

Categories