This question already has answers here:
UTF-8 all the way through
(13 answers)
Closed 7 years ago.
I have a database which contains some blocks of text. These text blocks contain extended characters such as: ’ ‘ … “ and ”. When displayed directly to a web page they all show like this: �.
I've tried doing as str_replace to show normal characters, with no luck.
I've tried iconv, which will only work when set to ignore, which makes the punctuation look wrong.
I've tried html_encode, which also doesn't work. (I'm also using the parsedown script to format the text.)
The funny thing is, the website I'm replacing supports these characters fine, so I don't know what I'm doing wrong! (I don't have access to this website, or source code, or database, which is why I'm replacing it!)
Can anyone provide any help??
I just want to stop showing � and start showing proper characters!
Thanks to the above linked article, this issue is now resolved.
I firstly changed the collation of all of my tables as follows:
Specify the utf8mb4 character set on all tables and text columns in
your database.
Then in my php code where it connects to the database, I added this line:
$CONNECTION -> set_charset('utf8mb4');
All issues resolved! Thanks to all who contributed to my fix!
Related
This question already has answers here:
UTF-8 all the way through
(13 answers)
Closed 5 years ago.
This issue is absolutely killing me, I cannot get a single solution online to work!
I am trying to import the following text from a CSV to a MySQL table - via Navicat's import wizard:
L154 – TRAINING WARRANTY
The hyphen is a wide hyphen and so far I've managed to import it as either a question mark, or a black diamond with question mark inside. Same for £ symbols and other special characters.
Everyone always talks about UTF-8. So far I have tried:
Saving the CSV in Excel, clicking Tools > Web Options > Encoding: UTF-8
Right clicking the database and clicking EDIT. Setting Char set to utf8 Unicode and Collation: utf8_general_ci
I have "designed" the table and set the 2 options above to exactly the same.
I have edited the varchar field in question and set the same 2 fields again to the 2 types above.
But my hyphen will not import correctly.
It would be nice to know exactly how to go about importing data that has £ symbols and other special characters once and for all.
You can use htmlentities() to store these symbols in encoded form like e.g.
£ will be £ and – will be –, by using:
echo htmlentities('£');
and when retrieving, just use html_entity_decode() Like:
echo html_entity_decode('£'); // output £
Edit: As discussed in the comments, how you are trying to import the data from a CSV file. You have to change the encoding for the CSV file, which can be done using Notepad++, By going to Encoding->Encode in UTF-8
This question already has answers here:
UTF-8 all the way through
(13 answers)
Closed 6 years ago.
I have a form that needs to accept special font characters and write them to the database table. I believe the encoding is set correctly at the page/form level but when the field is written to the database the characters get changed to some other encoding. Other SO answers seem to indicate setting encoding to UTF-8 is the answer, which i've done.
Now, if I copy paste the characters below, direct to the database table, it holds them just fine as shown. Its only when I write it to the table from the form or when i retrieve it for display in web page.
Example characters: ⓄⒼקร
The web page is set as: <meta charset="utf-8">
The form tag includes attribute: accept-charset="UTF-8"
Php just before the INSERT has: $_POST['tag']=utf8_encode($_POST['tag']);
I have not had to write/encode those types of font/special characters before, so what am i doing wrong here?
Do not use the PHP utf8_encode() or utf8_decode() functions.
Despite their promising-sounding names, what these functions actually do is mangle UTF8 text -- either by double-encoding UTF8 text, or by converting text to the ISO8859-1 encoding and replacing characters outside the Latin-1 range with question marks.
Remove the call to utf8_encode(), make sure your database table has the proper encoding (CHARACTER SET = utf8mb4), and you should be fine.
This question already has answers here:
UTF-8 all the way through
(13 answers)
Closed 9 years ago.
Tried searching for this question but I think I don't know the jargon. I am entering my site content into a mysql database using php, but all of my accented letters or apostrophes (spanish) get transformed into some crazy encoding. Ex:
' becomes â€
á becomes á
and etc. First off I don't know what this means or is, but when displaying them on my site, they definitely do not revert back, nor would I expect them too because if I manually enter in UTF-8 letters they totally work on my site.
Is there a way to fix this without re-entering all of my text? I have a feeling I can extract them using php, decode them and then insert them back in but I do not know the functions that do this. The best solution, and if anyone knows how that would be amazing, would be to just do it within sql. By the way the columns say collation utf8_general_ci.
For some further information, I am not doing anything to the text that gets entered into the database (I know that is bad but I suck at this stuff!) Also I am not doing anything when it is being queried. My functions insert pure text and extract pure text to each page. In this way I can write html into my forms and it appears as html on the page and therefore the browser interprets it correctly. I also have a feeling this is really sloppy but like I said... Thanks for all the help!
-- Edit --
So thanks to people who pointed me to the other questions. However, the way I fixed it was not in that answer, it just gave me the write keywords to start a new search. For anyone who has this problem, the way I fixed it was using the function utf8_decode(). I'm not sure this is a great fix, but at least it is working for now and speed was my biggest priority. I am certain the core problem is in how I am entering the data into the database.
You can make sure the character encoding for INSERT queries are correct at runtime using $mysqli->set_charset("utf8")
It's also prudent to make sure that PHP is sending the correct HTTP response headers, and the Browser is doing the right thing:
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
ini_set('default_charset', 'utf-8')
To alter your tables, do ALTER TABLE myTable CHARACTER SET utf8 COLLATE utf8_general_ci;
This question already has answers here:
UTF-8 all the way through
(13 answers)
Closed 7 years ago.
Everytime I make a new project, I end up in having troubles because I forgot to create the database collation by UTF-8 or there are some characters that slipped trough that I didn't see like é/à/.. but also the double .. or tripple ... seem to be very nasty. I usually use mysqli_real_escape_string to make sure he writes the characters away, and when i print them i use htlmentities. But that doesn't work for all characters, and defenitly not for double .. or tripple ... .
Is there a general rule / guideline that I should keep in mind when setting up a project, so I don't have troubles with these special characters?
Is there a general rule / guideline that I should keep in mind when setting up a project?
Sure.
Always set your database connection charset to match your HTML page actual charset.
Say, your pages are in utf-8, then issue
mysqli_set_charset($conn,'utf8');
right after connect
of your pages are in Windows-1251, then make it
mysqli_set_charset($conn,'cp1252');
and so on
Also always use mysqli_real_escape_string to format string literals you're adding into query dynamically,
and use htmlspecialchars() when printing user input back to HTML page
Update:
Also you need to setup your tables with charset that supports all the required characters (UTF-8 is a preferred default).
CREATE TABLE `table` (
...
) DEFAULT CHARSET=utf8
when creating your tables with such definition you will never have ?s in your data
I'd really appreciate some help with this. I've wasted days on this problem and none of the suggestions I have found online seem to give me a fix.
I have a CSV file from a supplier. It appears to have been exported from an Microsoft system.
I'm using PHP to import the data into MySQL (both latest versions).
I have one particular record which contains a strange character that I can't get rid of. Manual editing to remove the character is possible, but I would prefer an automated solution as this will happen multiple times a day.
The character appears to be an interpretation of a “smart quote”. A hex editor tells me that the character codes are C2 and 92. In the hex editor it looks like a weird A followed by a smart quote. In other editors and Calc, Writer etc it just appears as a box. メ
I'm using mb_detect_encoding to determine the encoding. All records in the CSV file are returned as ASCII, except the one with the strange character, which is returned as UTF-8.
I can insert the offending record into MySQL and it just appears in Workbench as a square.
MySQL tables are configured to utf-8 – utf8_unicode_ci and other unusual UTF characters (eg fractions) are ok.
I've tried lots of solutions to this...
How to detect malformed utf-8 string in PHP?
Remove non-utf8 characters from string
Removing invalid/incomplete multibyte characters
How to detect malformed utf-8 string in PHP?
How to replace Microsoft-encoded quotes in PHP
etc etc but none of them have worked for me.
All I really want to do is remove or replace the offending character, ideally with a search and replace for the hex values but none of the examples I have tried have worked.
Can anyone help me move forward with this one please?
EDIT:
Can't post answer as not enough reputation:
Thanks for your input. Much appreciated.
I'm just going to go with the hex search and replace:
$DodgyText = preg_replace("/\xEF\xBE\x92/", "" ,$DodgyText);
I know it's not the elegant solution, but I need a quick fix and this works for me.
Another solution is:
$contents = iconv('UTF-8', 'Windows-1251//IGNORE',$contents);
$contents = iconv('Windows-1251', 'UTF-8//IGNORE',$contents);
Where you can replace Windows-1251 to your local encoding.
At a quick glance, this looks like a UTF-8 file. (UTF-8 is identical with the first 128 characters in the ASCII table, hence everything is detected as ASCII except for the special character.)
It should work if your database connection is also UTF-8 encoded (which it may not be by default).
How to do that depends on your database library, let us know which one you're using if you need help setting the connection encoding.
updated code based on established findings
You can do search & replace on strings using hexadecimal notation:
str_replace("\xEF\xBE\x92", '', $value);
This would return the value with the special code removed
That said, if your database table is UTF-8, you shouldn't need that conversion; instead you could look at the connection (or session) character set (i.e. SET NAMES utf8;). Configuring this depends on what library you use to connect to your database.
To debug the value you could use bin2hex(); this usually helps in doing searches online.