Black Diamonds that are Fixing themselves in MySQL - php

I am running into a very strange issue with a site that I am working on. The site is basically a job board where the owner or users can create job listings including a description that ends up being stored into a MySQL text field. What we are experiencing is this, whenever listings from certain sources are entered, they initially end up with the "Black Diamond" with a question mark inside character in place of apostrophes and double spaces. This part I know is an encoding issue and can correct. The real question is this, these black diamonds show when the record is displayed in a MySQL admin tool and when the job listing is viewed in a web browser (simple select statement displays the listing in a PHP app), but after the first time it is viewed, then the problem somehow fixes itself. It is like the running the select then displaying the record updates the job description field and fixes the encoding issues. How could this be? Has anyone ever heard of this or anything similar? I cannot understand how a database field would change without running an update statement...

How are the job listings entered? Are they entered via a web page? If so, what character encoding does the web page use? (This should determine the character encoding of the submitted data AFAIK.) What character set is the connection used to communicate with MySQL? What is the character set of the column the data is stored in? Finally, what is the character encoding of the web page(s) on which the entered data is reviewed?
Here is what I do: I declare all of my pages as UTF-8 encoded, using the following tag at the start of the <head> section:
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
I issue the following command immediately when I connect to MySQL, so as to make sure that MySQL understands the data I send to it will be UTF-8 encoded:
SET NAMES uft8
(Depending on the database abstraction method you use, a special function might be recommended in order to set the connection character set, like mysqli's mysqli_set_charset().)
I also make sure that those columns in which I intend to store UTF-8 data are declared to be UTF-8. You can find out what the character set of a column is by issuing SHOW CREATE TABLE table_name. The character set of the table (which by default is the character set for any column in the table) is displayed at the end. If the character set for the column is different to the default character set for the table then it is displayed as part of the column definition. If you wish to change the character set of a column then you can do so using ALTER TABLE.
If you have not previously taken the steps to handle character sets in your app then you may find that the tables are all using the latin1 character set. If you naively store UTF-8-encoded data (for example) into these columns, you may run into character encoding issues. Changing the column character set using ALTER TABLE does not necessarily fix your old data, because MySQL reads your old data assuming it to be valid latin1-encoded text and converts it to the eqivalent UTF-8 (correctly converting what it has read, but not giving the result you want).
The above steps would hopefully mean that future data will be correctly encoded and correctly displayed, but you may have data already mis-encoded in your database, so be aware that if you follow the above steps and still see older data displaying incorrectly, this may be why. Good luck.

Run into this problem a few years ago... I remember finding those notorious characters, and replacing them in php with a single quote or a double quote... Ofcourse with escaping... A simple preg_replace for those characters will do the trick... Its just an encoding issue...

This page, though geared for wordpress might help
http://codex.wordpress.org/Converting_Database_Character_Sets

I had the same issue (mysql encoding and webpage encoding set to UTF-8 but black diamonds showing up in my query results. I found this snippet while googling but cannot for the life of me find its source to give proper attribution:
if( function_exists('mysql_set_charset') ){
mysql_set_charset('utf8', $db_connection);
}else{
mysql_query("SET NAMES 'utf8'", $db_connection);
}
Anyway, it cleared up the issue for me.

Related

How to avoid these kind of characters â„¢ store into mysql table in php?

Please any one help me how to avoid these kind of characters â„¢ store into mysql table in php.
Thanks
To avoid these characters you should run on the active MySQL connection the query SET NAMES utf8 and also change the Charsets of the column to one that handles the characters correctly.
More Info https://dev.mysql.com/doc/refman/5.5/en/charset-charsets.html.
Also be aware that that the text might be correctly stored and the problem might appear when you are displaying the page. If so you should add the charset meta tag as described here http://www.w3schools.com/TAgs/att_meta_charset.asp

Unicode Characters issue in Cakephp and Mysql

I have a site implemented in Cakephp 2 years before for my client. At that time i was not aware about the site will be used world wide. Due to different country special characters have been used in the site. At that time cakephp utf8 option was not enabled and characters are saved in encoded form like ?? in database.
Now when ever we try to download the CSV these characters created problem and not appeared correctly in CSV. I have tried a lot to resolve this but did not succeed.
Please help me how to resolve this.
You must ensure 3 things-
1. enable `'encoding' => 'utf8',` on database settings at `app/Config/database.php`
2. Table column Collation must be set to 'utf8_general_ci' or `utf8_unicode_ci`
3. Html page character set must set as `utf-8`
Use the Below Query to get the Current Collation information for your Tables,
select TABLE_NAME,TABLE_TYPE,ENGINE,TABLE_COLLATION from information_schema.TABLES where TABLE_SCHEMA like 'YOURDATABASENAME';
The Column 'TABLE_COLLATION' will give you the collation info. If it's set to utf8, then almost all the characters could be saved in your DB and can be retrieved back. The issue you currently facing could be because of PHP or Browser encoding problems. But if your DB has different collation, which doesn't supports all characters, then the data saved in that DB is probably lost. It's almost impossible to identify encoding type and retrieve it back.
For future cases, you have two choices,
You could set UTF-8 as your DB Collation, but if you have indexed your string data, then for each and every char, MySQL process will hold 4 Bytes, even though UTF-8 is variable length encoding.So this will possibly increase your Memory usage.
Or
You could set latin1 as your DB Collation and you have to url-encode the characters from UTF-8 to latin and save them in DB. This will decrease your memory usage, but you would have the over head of Encoding/Decoding. If English is the major language in your DB, i would say you can go for this.
It depends on the Language you need to save in your DB and when showing them back in browser, the Browser must have set the supported encoding. In your case, if you are downloading them as CSV,it depends on the encoding format set for the file.

How can character encoding be made correctly in both php and mysql database [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
UTF-8 all the way through
Searching high & low for a solution. I've tried many variations before posting the question.
What is required to have names appear the same in phpMyAdmin and html page? Can this even be accomplished?
EDIT 1: It would seem that this is a mysql issue. Why? Because the php generated html page will always show the correct characters. At this point it is only the database that shows incorrectly.
EDIT 2: Clarification. With the original settings shown in code snip and images below,
Enter João and submit
João displayed in database
João display after reload
Adding the mysqli_query ( $link, 'SET NAMES utf8' )
Enter João and submit
João displayed in database
Jo�o displayed after reload
end Edit 2
In a mysql database, viewed with phpMyAdmin:
The items appear in the database like this: (I've modified the first João to appear correct in database)
And in the html page with encoding set the names appear like (order is reversed & modified has black diamond),
Encoding: <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
I have tried changing the column collation to utf8_bin, utf8_general_ci, utf8_unicode_ci, all with no change to either side. Also changed the document (BBEdit) from UTF-8 to UTF-8 (with BOM), ISO Latin 1 and Windows Latin 1. Several of these created more black diamonds, making the issue worse. (Set to UTF-8 in images) I even tried to preg_replace ã, é etc with the encoded equivalents.
The short story is, João is entered on the page (content type above), João is in database, and João comes to the html page on refresh.
Looking for ideas. Thanks.
Character set issues are often really tricky to figure out. Basically, you need to make sure that all of the following are true:
The DB connection is using UTF-8
The DB tables are using UTF-8
The individual columns in the DB tables are using UTF-8
The data is actually stored properly in the UTF-8 encoding inside the database (often not the case if you've imported from bad sources, or changed table or column collations)
The web page is requesting UTF-8
Apache is serving UTF-8
Here's a good tutorial on dealing with that list, from start to finish: https://web.archive.org/web/20110303024445/http://www.bluebox.net/news/2009/07/mysql_encoding/
It sounds like your problem is specifically that you've got double-encoded (or triple-encoded) characters, probably from changing character sets or importing already-encoded data with the wrong charset. There's a whole section on fixing that in the above tutorial.
make sure your DB connection is using UTF-8 as well. Try putting the below line on top of your page,
mysql_query("SET NAMES utf8");
Default PHP hates UTF8. Make sure that you're using mbstring functions rather than the usual built-in string functions.
Make sure that your html page, along with the scripts participated in the AJAX data exchange are being served with a proper HTTP headers including
Content-Type: text/html; charset=UTF-8
As html-side encoding settings might be just ignored by browsers

PHP charset accents issue

I have a form in my page for users to leave a comment.
I'm currently using this charset:
meta http-equiv="Content-Type" content="text/html;charset=ISO-8859-1"
but retrieveving the comment from DB accents are not displaying correct ( Ex. è =>è ).
Which parameters should i care about for a correct handling of accents?
SOLVED
changed meta tag to charset='utf-8'
changed character-set Mysql (ALTER TABLE comments CONVERT TO CHARACTER SET utf-8)
changed connection character-set both when inserting records and retrieving ($conn->query('SET NAMES utf8'))
Now accents are displaying correct
thanks
Luca
Character sets can be complicated and pain to debug when it comes to LAMP web applications. At each of the stages that one piece of software talks to another there's scope for incorrect charset translation or incorrect storage of data.
The places you need to look out for are:
- Between the browser and the web server (which you've listed already)
- Between PHP and the MySQL server
The character you've listed look like normal a European character that will be included in the ISO-8859-1 charset.
Things to check for:
even though you're specifying the character set in a meta header have a look in your browser to be sure which character set the browser is actually using. If you've specified it the browser should use that charset to render/view the page but in cases I've seen it attempting to auto-detect the correct charset and failing. Most browsers will have an "encoding" menu (perhaps under "view") that allows you to choose the charset. Ensure that it says ISO-8859-1 (Western European).
MySQL can happily support character set conversion if required to but in most cases you want to have your tables and client connection set to use the same encoding. When configured this way MySQL won't attempt to do any encoding conversion and will just write the data you input byte for byte into the table. When read it'll come out the same way byte for byte.
You've not said if you're reading data from the database back out with the same web-app or with some other client. I'd suggest you try to read it out with the same web application and using the same meta charset header (again, check the browser is really setting it) and see what is displayed in the browser.
To debug these issues requires you to be really sure about whether the client/console you're using is doing any conversion too, the safest way is sometimes to get the data into a hex editor where you can be sure that nothing else is messing around with any translation.
If it doesn't look like it's a browser-side problem please can you include the output of the following commands against your database:
Run from a connection that your web-app makes (not from some other MySQL client):
SHOW VARIABLES LIKE 'character_set%';
SHOW VARIABLES LIKE 'collation%';
Run from any MySQL client:
SHOW CREATE TABLE myTable;
(where myTable is the table you're reading/writing data from/to)
The ISO-8859-1 character set is for Latin characters only. Try UTF-8, and make sure that the database these characters are coming from are also UTF-8 columns.

Help with multi-lingual text, php, and mysql

I have had no end of problems trying to do what I thought would be relatively simple:
I need to have a form which can accept user input text in a mix of English an other languages, some multi-byte (ie Japanese, Korean, etc), and this gets processed by php and is stored (safely, avoiding SQL injection) in a mysql database. It also needs to be accessed from the database, processed, and used on-screen.
I have it set up fine for Latin chars but when I add a mix of Latin andmulti-byte chars it turns garbled.
I have tried to do my homework but just am banging my head against a wall now.
Magic quotes is off, I have tried using utf8_encode/decode, htmlentities, addslashes/stripslashes, and (in mysql) both "utf8_general_ci" and "utf8_unicode_ci" for the field in the table.
Part of the problem is that there are so many places where I could be messing it up that I'm not sure where to begin solving the problem.
Thanks very much for any and all help with this. Ideally, if someone has working php code examples and/or knows the right mysql table format, that would be fantastic.
Here is a laundry list of things to check are in UTF8 mode:
MySQL table encoding. You seem to have already done this.
MySQL connection encoding. Do SHOW STATUS LIKE 'char%' and you will see what MySQL is using. You need character_set_client, character_set_connection and character_set_results set to utf8 which can easily set in your application by doing SET NAMES 'utf8' at the start of all connections. This is the one most people forget to check, IME.
If you use them, your CLI and terminal settings. In bash, this means LANG=(something).UTF-8.
Your source code (this is not usually a problem unless you have UTF8 constant text).
The page encoding. You seem to have this one right, too, but your browsers debug tools can help a lot.
Once you get all this right, all you will need in your app is mysql_real_escape_string().
Oh and it is (sadly) possible to successfully store correctly encoded UTf8 text in a column with the wrong encoding type or from a connection with the wrong encoding type. And it can come back "correctly", too. Until you fix all the bits that aren't UTF8, at which point it breaks.
I don't think you have any practical alternatives to UTF-8. You're going to have to track down where the encoding and/or decoding breaks. Start by checking whether you can round-trip multi-language text to the data base from the mysql command line, or perhaps through phpmyadmin. Track down and eliminate problems at that level. Then move out one more level by simulating input to your php and examining the output, again dealing with any problems. Finally add browsers into the mix.
First you need to check if you can add multi-language text to your database directly. If its possible you can do it in your application
Are you serializing any data by chance? PHPs serialize function has some issue when serializing non-english characters.
Everything you do should be utf-8 encoded.
One thing you could try is to json_encode() the data when putting it into the database and json_decoding() it when it's retrieved.
The problem was caused by my not having the default char set in the php.ini file, and (possibly) not having set the char set in the mysql table (in PhpMyAdmin, via the Operations tab).
Setting the default char set to "utf-8" fixed it. Thanks for the help!!
Check your database connection settings. It also needs to support UTF-8.

Categories