Em Dash and En Dash from MySQL Db with PHP - php

I'm trying out output a MySQL database field with PHP which contains em dash and en dash, but although the ROWs output, the values with these dashes do not.
As far as I am aware, these are characters which should be used in proper english, therefore I don't think I should be stripping them out or replacing them with an alternative (like a hyphen).
By adding this code before the INSERT, I am able to get the em dash and en dash into the database properly (whereas without this line I saw unwanted characters instead):
mysql_query('SET NAMES utf8');
But, the value won't output. The database table and it's fields are using the utf8_general_ci collation and I've got these lines in my PHP page:
header('Content-type: text/html; charset=utf-8');
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
I'm outputting the value like so:
echo nl2br(htmlentities(preg_replace("/[\r\n]+/", "\n\n", $row['someText'])));
If I output the value without formatting, I see this question mark character:
�
Does anyone know how to get around this? Am I forced to replace them with hyphens even though that's grammatically incorrect, or is there a way to output them as they appear in the database?

I added the same MySQL command before retrieving data from the Db and this solved the issue.
mysql_query('SET NAMES utf8'); // Use utf-8
I don't know why this is needed but I'll investigate and see if I can make this a default.

Related

Getting mysql by php turns non-latin characters into question mark "?"

When I echo values with non-latin characters from MySQL they turn into question marks. And I mean question marks "?" not "�". I got these things:
header('Content-Type: text/html; charset=ISO-8859-2'); //php
<meta name="charset" content="ISO-8859-2" />//html
And they're not working!
Requesting help.
EDIT: More informations: in PHPMyAdmin I changed collation to utf8_polish_ci.
You might want to try issuing this SQL statement right after you connect:
SET character_set_results = latin2
It looks like your text is getting translated, by MySQL, from Unicode to latin-1 (iso-8859-1); the question marks you're seeing are replacement characters. MySQL translates text from its internal representation to the character set of the connection when it sends result sets.
You can read more about this here. http://dev.mysql.com/doc/refman/5.0/en/charset-connection.html

Why do I have to utf8_decode() my MySQL column value to get it to display properly?

I'm using CakePHP with App.encoding set to UTF-8, <meta http-equiv="Content-Type" content="text/html; charset=utf-8" /> present in my <head> and my MySQL database set to UTF-8 Unicode Encoding and utf8_general_ci collation. I also have "encoding"=>"UTF8" in my database.php connection details.
When I store a '£' symbol in the database table and view it using command line MySQL, the character displays correctly.
If I use CakePHP to fetch the rows from the database table and output them in my website, I see £ instead of my intended £ symbol.
However if I then use utf8_decode() to output my data, it displays correctly.
Is this correct? I have tried using htmlentities() to convert the £ symbol into £ but it outputs £ instead! Even when I use the additional parameters for charset.
Perhaps someone can help - I must have missed something here, but I thought that the characters should display correctly (in things like textarea HTML tags) if all your headers, meta tags etc were consistently UTF-8?
It sounds like the data in your database is wrong: the character £ is actually stored as the two characters £. You can confirm this by going to the database and using the hex and charset functions:
select charset(MyColumn), hex(MyColumn) from MyTable;
If the column is encoded in UTF-8, for the value '£' you should see output identical to this:
+---------------+-----------+
| utf8 | C2A3 |
+---------------+-----------+
If you see anything else, like if the charset column reports latin1 or if hex column reports C382C2A3, the data in the table is wrong. It can be fixed though, but the fix depends on the kind of error the data has. What do you get from charset and hex?
You can use htmlentities with third parameters to safely encode UTF-8 :
htmlentities("£", ENT_COMPAT, "UTF-8")
If all is in UTF8 remove the "encoding"=>"UTF8" in your database.php connection details:
$conn = mysql_connect($server, $username, $password);
//mysql_set_charset("UTF8", $conn); // REMOVED. ;)
mysql_select_db($database, $conn);

There are symbols like  and so on in database, what to do?

I have a few symbols in my description like  ⠀ and so on. Can I do anything about it? Or if it's in database, I can't do nothing now?
It sort of depends what the problem actually is...
If it's that those characters are supposed to be there (such as "Mañana" in Spanish) then you'll need to ensure everything is in UTF-8... the best way is to:
1: check the database tables are in "utf-8" encoding (if not convert them to utf-8)
2: as Martin noted, ensure the database connector is utf-8 using something like:
mysql_set_charset('utf8'); //note that MySQL uses no hyphen here
3: ensure the the document is utf-8 (you can add a header at the top)
<?php header('Content-type:text/html;charset=utf-8'); ?>
4: just to be on the safe side, add it in as a meta tag as well
<meta http-equiv="content-type" content="text/html;charset=utf-8" />
HOWEVER
It's quite possible you've got some duff characters in the database where something like ISO-8859-1 has been juggled to UTF-8, badly. In this case you'll notice things like £ where what you actually want is £ (because UTF-8 characters contain more data than ISO-8859-1 characters, that extra data can become an additional character if you're not careful).
In which case your best bet is to clean the database (you could probably do something like UPDATE table SET field = REPLACE(field, '£', '£') for common "errors") and then convert the whole kaboodle to UTF-8 (as outlined above) to avoid the problem recurring.
To avoid having such characters,
Set the charset for your form. HTML forms have charset attribute and value. You can use UTF-8
Set Charset for the Document, via PHP or using META tags ( but this only works on the output )
set Charset for the db table
get a class/function to do ascii character conversion as part of your data filtering and escaping

Problems with utf-8 encoding in php

Another utf-8 related problem I believe...
I am using php to update data in a mysql db then display that data elsewhere in the site. Previously I have run into utf-8 problems before where special characters are displayed as question marks when viewed in a browser but this one seems slightly different.
I have a number of records to enter that contain the è character. If I enter this directly in the db then it appears correctly on the page so I take this to mean that utf-8 content is being output correctly.
However when I try and update the values in the db through php, then the è character is replaced. What appears instead is & Atilde ; & uml ; (without the spaces) which appears in the browser as è
I have the tables in the database set to use UTF-8. I believe this is correct cos, as mentioned, if I update the db through phpMyAdmin, its all ok. Similarly I have set the character encoding for the page which seems to be correct. I am also running the sql statement "SET NAMES 'utf8';" before trying to update the db.
Anyone have any other ideas as to where the problem may lie?
Many thanks
Yup.
The character you have is LATIN SMALL LETTER E WITH GRAVE. As you can see, in UTF-8 that character is encoded into two bytes 0xC3 and 0xA8.
But in many default, western encodings (such as ISO-8859-1) which are single-byte only, this multi-byte character is decoded as two separate characters, LATIN CAPITAL LETTER A WITH TILDE and DIAERESIS. Notice how they are both encoded as C3 and A8 in ISO-8859-1?
Furthermore, it looks like PHP is processing these characters through htmlentities() which result in the à and ¨ respectively.
So, where exactly is the problem in your code? Well, htmlentities() could be doing it all by itself since its 3rd argument is a encoding name - which you may not have properly set to 'UTF-8'. But it could be some other string processing function as well. (Note: As a general rule, it's a bad idea to store HTML entities in the database - this step should be reserved for time of display)
There are a bunch of other ways to trip yourself up with UTF-8 in php - I suggest hitting up the cheatsheet and make sure you're in good shape.
Well it is your own code convert characters into entities.
To make it right:
Ban htmlentities function from your scripts forever.
Use htmlspecialchars, but not on insert, but whan displaying data.
Repair existing data in the database using html_entity_decode.
I suppose you're taking the results of some form submission and inserting the results in the database. If so, you must ensure that you instruct the browser to send UTF-8 data and you should validate the user input for a valid UTF-8 stream.
Change your form element to include accept-charset:
<form accept-charset="utf-8" method="post" ... >
<input type="text name="field" />
...
</form>
Validate the data with:
$valid = array_key_exists("field", $_POST) && !is_array($_POST['field']) &&
preg_match('//u', $_POST['field']) && ...; //check length with mb_strlen etc.
I think you miss Content-Type declaration on the html page:
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
If you don't have it, the browser will guess the encoding, and convert any characters outside of that encoding to entities when posting a form.

Inserting into mysql database with asian symbols such as ’ —

I cant seem to get these Chinese punctuation marks to work with my database (utf-8)
when i do an echo of the query the marks look like this
���
in php i have already done
$text=mysql_real_escape_string(htmlentities($text));
so as a result they are not saved into the database correctly what can i do to fix this?
Thanks
Executing mysql_query('SET NAMES utf-8'); before any operations with unicode will do the trick
Try using using utf8_encode() function while inserting into db and utf8_decode() while printing the same.
Add the character 'N' before your string value.
Eg. select from test_table where temp=N'unicode string'
besides if you want to use htmlentities, you have to set it to utf-8 encoding like that:
htmlentities($string,ENT_COMPAT,"UTF-8");
Don't put HTML-encoded data in the database. It should be raw text until the time you spit it onto the page (at which point you should use htmlspecialchars().
You need to make sure that both your database and your page are using UTF-8:
ensure your tables are CREATEd with a UTF-8 collation;
use mysql_set_charset after connecting to ensure the connection between MySQL and PHP is UTF-8;
set the Content-Type of the page to text/html;charset=utf-8 by header or meta tag.
You can get away with using a different encoding such as the default latin-1 on the database end and the connection if you treat it as bytes, but case-insensitive comparisons won't work if you do, so it's best to stick to UTF-8.

Categories