Getting mysql by php turns non-latin characters into question mark "?" - php

When I echo values with non-latin characters from MySQL they turn into question marks. And I mean question marks "?" not "�". I got these things:
header('Content-Type: text/html; charset=ISO-8859-2'); //php
<meta name="charset" content="ISO-8859-2" />//html
And they're not working!
Requesting help.
EDIT: More informations: in PHPMyAdmin I changed collation to utf8_polish_ci.

You might want to try issuing this SQL statement right after you connect:
SET character_set_results = latin2
It looks like your text is getting translated, by MySQL, from Unicode to latin-1 (iso-8859-1); the question marks you're seeing are replacement characters. MySQL translates text from its internal representation to the character set of the connection when it sends result sets.
You can read more about this here. http://dev.mysql.com/doc/refman/5.0/en/charset-connection.html

Related

Proper Charset to work with Vietnamese Characters (that isn't Unicode) in PHP [duplicate]

This question already has answers here:
UTF-8 all the way through
(13 answers)
Closed 9 months ago.
I've searched around for a while and haven't yet found something that'll work for me. I am using a PHP form to submit data into SAP using the SAP DI API. I need to figure out which character set will actually allow me to store and work with Vietnamese characters.
UTF8 seems to work for a lot of the characters but ô becomes ô. More importantly, there are character limits, and UTF-8 breaks character limits. If I have a string of 30 characters it tells the API that it's more than 50. The same is true for storing in MySQL--if there's a varchar character limit, UTF-8 causes the string to go above it.
Unfortunately, when I search, UTF-8 seems to be the only thing people suggest for Vietnamese characters. If I don't encode the characters at all, they get stored as their html character codes. I've also tried ISO-8859-1, converting into UCS-2 or UCS-4... I'm really at a loss. If anyone has experience working with vietnamese characters, your help would be greatly appreciated.
UPDATE
It appears the issue may be with my wampserver on Windows. here's a bit of code that is confusing me:
$str = 'VậTCôNG';
$str1 = utf8_encode($str);
if (mb_detect_encoding($str,"UTF-8",true) == true) {
print_r('yes');
if ($str1 == $str) {
print_r('yes2');
}
}
echo $str . $str1;
This prints "yes" but not "yes2", and $str.str1 = "VậTCôNGVậTCôNG" in the browser.
I have my php.ini file with:
default_charset = "utf-8"
and my httpd.conf file with:
AddDefaultCharset UTF-8
and my php file I'm running has:
header("Content-type: text/html; charset=utf-8");
So I'm now wondering: if the original string was utf-8, why wouldn't it equal a utf8 encoding of itself? and why is the utf8 encoding returning wrong characters? Is something wrong in the wampserver configurations?
ô is the "Mojibake" for ô. That is, you do have UTF-8, but something in the code mangled it.
See Trouble with utf8 characters; what I see is not what I stored and search for Mojibake. It says to check these:
The bytes to be stored need to be UTF-8-encoded. Fix this.
The connection when INSERTing and SELECTing text needs to specify utf8 or utf8mb4. Fix this.
The column needs to be declared CHARACTER SET utf8 (or utf8mb4). Fix this.
HTML should start with <meta charset=UTF-8>.
It is possible to recover the data in the database, but it depends on details not yet provided.
http://mysql.rjweb.org/doc.php/charcoll#fixes_for_various_cases
Each Vietnamese character take 2-3 bytes for encoding in UTF-8. It is unclear whether the "hard 50" is really a character limit or a byte limit.
If you happen to have Mojibake's sibling "double encoding", then a Vietnamese character will take 4-6 bytes and feel like 2-3 characters. See "Test the data" in the first link.
An example of how to 'undo' Mobibake in MySQL:
CONVERT(BINARY(CONVERT('VậTCôNG' USING latin1)) USING utf8mb4) --> 'VậTCôNG'
"Double encoding" is sort of like Mojibake twice. That is one side treats it as latin1, the other as UTF-8, but twice.
VậTCôNG, as UTF-8, is hex 56e1baad5443c3b44e47. If that hex is treated as character set cp850 or keybcs2, the string is Vß║¡TC├┤NG.
Change it to VISCII.
Input: ô
Output: ô
You can test it at Charset converter.

Question mark showed instead of special character

I am fetching data from mysql server but the data contains some special character which when i try to print through my php code show a question mark instead. I know this has something to do with character encoding, and i have set charset to utf-8 in my html code but still I am not getting the special characters.
Add mysql_query("SET NAMES utf8") to the start of the php code after the db connection.

Em Dash and En Dash from MySQL Db with PHP

I'm trying out output a MySQL database field with PHP which contains em dash and en dash, but although the ROWs output, the values with these dashes do not.
As far as I am aware, these are characters which should be used in proper english, therefore I don't think I should be stripping them out or replacing them with an alternative (like a hyphen).
By adding this code before the INSERT, I am able to get the em dash and en dash into the database properly (whereas without this line I saw unwanted characters instead):
mysql_query('SET NAMES utf8');
But, the value won't output. The database table and it's fields are using the utf8_general_ci collation and I've got these lines in my PHP page:
header('Content-type: text/html; charset=utf-8');
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
I'm outputting the value like so:
echo nl2br(htmlentities(preg_replace("/[\r\n]+/", "\n\n", $row['someText'])));
If I output the value without formatting, I see this question mark character:
�
Does anyone know how to get around this? Am I forced to replace them with hyphens even though that's grammatically incorrect, or is there a way to output them as they appear in the database?
I added the same MySQL command before retrieving data from the Db and this solved the issue.
mysql_query('SET NAMES utf8'); // Use utf-8
I don't know why this is needed but I'll investigate and see if I can make this a default.

Why do I have to utf8_decode() my MySQL column value to get it to display properly?

I'm using CakePHP with App.encoding set to UTF-8, <meta http-equiv="Content-Type" content="text/html; charset=utf-8" /> present in my <head> and my MySQL database set to UTF-8 Unicode Encoding and utf8_general_ci collation. I also have "encoding"=>"UTF8" in my database.php connection details.
When I store a '£' symbol in the database table and view it using command line MySQL, the character displays correctly.
If I use CakePHP to fetch the rows from the database table and output them in my website, I see £ instead of my intended £ symbol.
However if I then use utf8_decode() to output my data, it displays correctly.
Is this correct? I have tried using htmlentities() to convert the £ symbol into £ but it outputs £ instead! Even when I use the additional parameters for charset.
Perhaps someone can help - I must have missed something here, but I thought that the characters should display correctly (in things like textarea HTML tags) if all your headers, meta tags etc were consistently UTF-8?
It sounds like the data in your database is wrong: the character £ is actually stored as the two characters £. You can confirm this by going to the database and using the hex and charset functions:
select charset(MyColumn), hex(MyColumn) from MyTable;
If the column is encoded in UTF-8, for the value '£' you should see output identical to this:
+---------------+-----------+
| utf8 | C2A3 |
+---------------+-----------+
If you see anything else, like if the charset column reports latin1 or if hex column reports C382C2A3, the data in the table is wrong. It can be fixed though, but the fix depends on the kind of error the data has. What do you get from charset and hex?
You can use htmlentities with third parameters to safely encode UTF-8 :
htmlentities("£", ENT_COMPAT, "UTF-8")
If all is in UTF8 remove the "encoding"=>"UTF8" in your database.php connection details:
$conn = mysql_connect($server, $username, $password);
//mysql_set_charset("UTF8", $conn); // REMOVED. ;)
mysql_select_db($database, $conn);

PHP/MySQL Special Characters aren't displayed properly [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Special characters in PHP / MySQL
I have a problem. I have a piece of text in my database (MySQL 5.5.20) with characters like 'é' and " ' " who aren't displaying properly after executing the MySQL query and displaying it with echo($...). With every special character I've inputted in the database, it displays a small question mark inside a diamond. If I look at the text in the database itself, it is an 'é' and " ' ", so I figured the problem isn't MySQL.
One thing I could do is str_replace everything like " ' " --> "'" on input, but then I have to do this for every character there is.
Oh and I already have included
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
and this didn't work.
Hopefully you've all the information to help me, if not just say :) Thanks in advance!
Milaan
You need to have everything in utf-8:
The database field
The database connection (mysql_set_charset('utf8'); in classic mysql, something like $db->exec('SET CHARACTER SET utf8'); in PDO)
The content type (like you have already)
I was using the SQL query SET NAMES utf8 right after the connection to a DB is done successfully for over a years.
But this is not neccessary when You have everything in the same encoding
source files encoding
table columns collations
web page encoding (both in PHP header('Content-Type: text/html; charset=utf-8'); and in <header> <meta name="Content-Type" value="text/html; charset=utf-8" />)
I usually format all the input text with str_replace an replace all uncommon symbols with their &#xxx; equivalent, this is actually useful to prevent injection and bad html rendering
i.e. if someone inputs html tags they'll be active in your page and so on.

Categories