This question already has answers here:
UTF-8 all the way through
(13 answers)
Closed 7 years ago.
Everytime I make a new project, I end up in having troubles because I forgot to create the database collation by UTF-8 or there are some characters that slipped trough that I didn't see like é/à/.. but also the double .. or tripple ... seem to be very nasty. I usually use mysqli_real_escape_string to make sure he writes the characters away, and when i print them i use htlmentities. But that doesn't work for all characters, and defenitly not for double .. or tripple ... .
Is there a general rule / guideline that I should keep in mind when setting up a project, so I don't have troubles with these special characters?
Is there a general rule / guideline that I should keep in mind when setting up a project?
Sure.
Always set your database connection charset to match your HTML page actual charset.
Say, your pages are in utf-8, then issue
mysqli_set_charset($conn,'utf8');
right after connect
of your pages are in Windows-1251, then make it
mysqli_set_charset($conn,'cp1252');
and so on
Also always use mysqli_real_escape_string to format string literals you're adding into query dynamically,
and use htmlspecialchars() when printing user input back to HTML page
Update:
Also you need to setup your tables with charset that supports all the required characters (UTF-8 is a preferred default).
CREATE TABLE `table` (
...
) DEFAULT CHARSET=utf8
when creating your tables with such definition you will never have ?s in your data
Related
This question already has answers here:
UTF-8 all the way through
(13 answers)
Closed 6 years ago.
I have a form that needs to accept special font characters and write them to the database table. I believe the encoding is set correctly at the page/form level but when the field is written to the database the characters get changed to some other encoding. Other SO answers seem to indicate setting encoding to UTF-8 is the answer, which i've done.
Now, if I copy paste the characters below, direct to the database table, it holds them just fine as shown. Its only when I write it to the table from the form or when i retrieve it for display in web page.
Example characters: ⓄⒼקร
The web page is set as: <meta charset="utf-8">
The form tag includes attribute: accept-charset="UTF-8"
Php just before the INSERT has: $_POST['tag']=utf8_encode($_POST['tag']);
I have not had to write/encode those types of font/special characters before, so what am i doing wrong here?
Do not use the PHP utf8_encode() or utf8_decode() functions.
Despite their promising-sounding names, what these functions actually do is mangle UTF8 text -- either by double-encoding UTF8 text, or by converting text to the ISO8859-1 encoding and replacing characters outside the Latin-1 range with question marks.
Remove the call to utf8_encode(), make sure your database table has the proper encoding (CHARACTER SET = utf8mb4), and you should be fine.
This question already has answers here:
UTF-8 all the way through
(13 answers)
Closed 7 years ago.
I have a database which contains some blocks of text. These text blocks contain extended characters such as: ’ ‘ … “ and ”. When displayed directly to a web page they all show like this: �.
I've tried doing as str_replace to show normal characters, with no luck.
I've tried iconv, which will only work when set to ignore, which makes the punctuation look wrong.
I've tried html_encode, which also doesn't work. (I'm also using the parsedown script to format the text.)
The funny thing is, the website I'm replacing supports these characters fine, so I don't know what I'm doing wrong! (I don't have access to this website, or source code, or database, which is why I'm replacing it!)
Can anyone provide any help??
I just want to stop showing � and start showing proper characters!
Thanks to the above linked article, this issue is now resolved.
I firstly changed the collation of all of my tables as follows:
Specify the utf8mb4 character set on all tables and text columns in
your database.
Then in my php code where it connects to the database, I added this line:
$CONNECTION -> set_charset('utf8mb4');
All issues resolved! Thanks to all who contributed to my fix!
This question already has answers here:
UTF-8 all the way through
(13 answers)
Closed 9 years ago.
Tried searching for this question but I think I don't know the jargon. I am entering my site content into a mysql database using php, but all of my accented letters or apostrophes (spanish) get transformed into some crazy encoding. Ex:
' becomes â€
á becomes á
and etc. First off I don't know what this means or is, but when displaying them on my site, they definitely do not revert back, nor would I expect them too because if I manually enter in UTF-8 letters they totally work on my site.
Is there a way to fix this without re-entering all of my text? I have a feeling I can extract them using php, decode them and then insert them back in but I do not know the functions that do this. The best solution, and if anyone knows how that would be amazing, would be to just do it within sql. By the way the columns say collation utf8_general_ci.
For some further information, I am not doing anything to the text that gets entered into the database (I know that is bad but I suck at this stuff!) Also I am not doing anything when it is being queried. My functions insert pure text and extract pure text to each page. In this way I can write html into my forms and it appears as html on the page and therefore the browser interprets it correctly. I also have a feeling this is really sloppy but like I said... Thanks for all the help!
-- Edit --
So thanks to people who pointed me to the other questions. However, the way I fixed it was not in that answer, it just gave me the write keywords to start a new search. For anyone who has this problem, the way I fixed it was using the function utf8_decode(). I'm not sure this is a great fix, but at least it is working for now and speed was my biggest priority. I am certain the core problem is in how I am entering the data into the database.
You can make sure the character encoding for INSERT queries are correct at runtime using $mysqli->set_charset("utf8")
It's also prudent to make sure that PHP is sending the correct HTTP response headers, and the Browser is doing the right thing:
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
ini_set('default_charset', 'utf-8')
To alter your tables, do ALTER TABLE myTable CHARACTER SET utf8 COLLATE utf8_general_ci;
This question already has answers here:
UTF-8 all the way through
(13 answers)
Closed 9 years ago.
This might be a dummy question, but I'm a little lost in it.
How are Arabic questions exactly stored in a database ?
Let's take ب, if I insert that directly in the DB it becomes ?. Not good.
If I use a form (and php script) and store it as UTF-8, it is stored like ب. I can read it out and print it out, all good.
So my question is, are Arabic (and Japanese,...) letters always stored likes this in a mysql database ب ? Or should I change a setting somewhere and it should look like ب when I'm browsing the database?
It's just to define the length of my rows (varchars/chars) in the database...
DB set to utf8_general
Site fully UTF8
If you try to store a UTF-8 encoded character and it becomes ?, this means MySQL did not understand or support the encoding in which you sent the character. The column needs to be set to store utf8 data (better utf8mb4 if supported) and the connection encoding needs to be set to the correct encoding to inform MySQL in what encoding you're sending data to it.
If you get HTML entities from a form submission, this means the browser tried to send data in an encoding which did not support that particular character; therefore it had to fall back on HTML entities to encode the character. You need to set the encoding declarations correctly to tell the browser it should send UTF-8 encoded text to the server.
See Handling Unicode Front To Back In A Web App and/or UTF-8 all the way through for how to do all this.
This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
UTF-8 all the way through
okay, this is stupid that I can't figure it out.
Mysql database is set to utf8_general_ci collation. The field i'm having problems with is longtext type.
characters added to the database as é or other accented characters are returning as �.
I run the output through stripslashes and i've tried both with and without html_entity_decode but can find no change in the output. What am I doing wrong?
Cheers
What character encoding does the string have that you try to insert? If it is in ISO-8859-1 you can use the PHP function utf8_encode() to encode it to UTF-8 before inserting it into the database.
http://php.net/manual/en/function.utf8-encode.php
Getting encoding right is really tricky - there are too many layers:
Browser
Page
PHP
MySQL
The SQL command "SET CHARSET utf8" from PHP will ensure that the client side (PHP) will get the data in utf8, no matter how they are stored in the database. Of course, they need to be stored correctly first.
DDL definition vs. real data
Encoding defined for a table/column doesn't really mean that the data are in that encoding. If you happened to have a table defined as utf8 but stored as differtent encoding, then MySQL will treat them as utf8 and you're in trouble. Which means you have to fix this first.
What to check
You need to check in what encoding the data flow at each layer.
Check HTTP headers, headers.
Check what's really sent in body of the request.
Don't forget that MySQL has encoding almost everywhere:
Database
Tables
Columns
Server as a whole
Client
Make sure that there's the right one everywhere.
Conversion
If you receive data in e.g. windows-1250, and want to store in utf-8, then use this SQL before storing:
SET NAMES 'cp1250';
If you have data in DB as windows-1250 and want to retreive utf8, use:
SET CHARSET 'utf8';
Last note:
Don't rely on too "smart" tools to show the data. E.g. phpMyAdmin does (was doing when I was using it) encoding really bad. And it goes through all the layers so it's hard to find out. Also, Internet Explorer had really stupid behavior of "guessing" the encoding based on weird rules. Use simple editors where you can switch encoding. Also, I recommend MySQL Workbench.