I have database in german language. I am fetching data from database and create array. problem is some german characters are converted special character. thats why not able to encode array in json. i have also tried by putting header for german language but it is for display purpose so it is not works.
please check below array created
if i copy and paste that same string manually and create array than it create perfect array. can you please solve issue.
after use of
htmlentities($str, ENT_QUOTES, 'UTF-8', false);
you can see have display both string one is fetched from database and one is encoded using htmlentities. you can see last index of array is empty so htmlentities could not encode it.
You need to escape your string. Try this:
// $str, encode quotes, UTF-8 encoding, false (do not double encode).
echo htmlentities($str, ENT_QUOTES, 'UTF-8', false);
Replace $str with your string. This should fix your issue.
The question mark in a black diamond is often caused by
latin1 encoding of text in the client, and
SET NAMES latin1 or otherwise picking (or defaulting to) latin1 when connecting, and
(It does not matter what CHARACTER SET the table/columns is.), and
Saying <meta charset=UTF-8> in the html.
A simple fix is to change the last step to be <meta charset=ISO-8859-1>, thereby working only with latin1, not utf8. That's "ok" for Western Europe, but won't work for Asia.
The "right" fix is to go with utf8mb4 at all for steps.
Related
I am getting stuck with this, previously i am using php 5 and now i came up with php 7,
the problem is when i am trying to echo value from database it returns weird special character and where in my previous web it returns normal. Is it utf-8 problem? i tried meta tag utf-8, and change collation sql into utf8_unicode_ci, but somehow it doesn't help at all...
it returns like this
 — 
what i want to return
—
What you get from the database is a UTF8 encoded string.
The characters you see is a UTF8 string interpreted with encoding Western (Windows Latin 1).
If you include that string in a web page whose character set is Latin 1 then you'll see the string you posted; if the character set is UTF-8 then you should see the correct characters (without need to convert them into HTML entities).
As the latter is not your case you can proceed as follows:
Let the characters you see are stored in the variable $string: you can get html entities with mb_convert_encoding:
$html = mb_convert_encoding( $string, 'HTML-ENTITIES', 'UTF-8' );
This will result in:
—
As after conversion you get characters in the ASCII range then the resulting string is suitable for any destination character encoding.
Note that, according to the above, even the dash — is converted (into —)
This is just a quick solution to the problem you faced.
I think the comment from Machavity:
"Take a minute and read stackoverflow.com/questions/279170/utf-8-all-the-way-through"
is a good advice.
First of all, I have to say that; I am a stranger of multilingual conversions.
I have strings that i want to mb_lowercase in UTF-8 form if possible (sth like clean url), and I use
$str = iconv("UTF-8", "ASCII//TRANSLIT", utf8_encode($str));
$str = preg_replace("/[^a-zA-Z0-9_]/", "", $str);
$str = mb_strtolower($str);
to achive my requirements (an UTF8, lowercase string)
However, when I stress that function with "çokGüŞelLl" using CocoaRestClient; I get à as $str (thanks to my client?) and iconv triggers an error complaining about an illegal character in input string (Ã).
What is the problem with iconv? the str is encoded as utf8 by utf8_encode($str) already. How can it be an illegal character?
Notes:
I read about #iconv questions here, but I think it is not a good solution to have empty database entries.
Thanks to all answers, I will read and try to understand each of them.
The PHP function utf8_encode() expects your string to be ISO-8859-1 encoded. If it isn’t, well, you get funny results.
Ensure that your data is proper UTF-8 before saving it to your database:
// Validate that the input string is valid UTF-8
if (preg_match("//u", $string) === false) {
throw new \InvalidArgumentException("String contains invalid UTF-8 characters.");
}
// Normalize to Unicode NFC form (recommended by W3C)
$string = \Normalizer::normalize($string);
Now everything is stored the same way in our database and we don't have to care about this problem anymore when receiving data from our database.
$string = $database->getSomeRecordWithUnicode();
echo mb_strtolower($string);
Done!
PS: If you want to ensure that your database is using the exact same encoding as PHP either use utf8mb4 as character set (and utf8mb4_unicode_ci as default collation for perfect sorting) or a BLOB (binary) data type.
PPS: Use your database configuration file to force proper encoding of all strings instead of using e.g. $mysqli->set_charset("utf8") or similar.
About HTML forms
Because you asked in the comments of your question. How data is sent to your server has nothing to do with the locale the user has set in his operating system. It has to do with the client's browser. All modern browsers default to utf-8 when sending form data. If you are afraid that some of your clients might be using totally broken browsers, simply tell them that you only accept utf-8. Drupal is doing that on all their forms.
<!doctype html>
<html>
<body>
<form accept-charset="UTF-8">
Now all browsers should encode the data they submit in utf-8.
If you encode çokGüŞelLl as UTF-8 you should get the following bytes:
var_dump( bin2hex('çokGüŞelLl') );
string(26) "c3a76f6b47c3bcc59e656c4c6c"
That's a check you must do. You also have this:
utf8_encode($str)
Your string contains Ş, which cannot be represented in ISO-8859-1 to begin with.
So, whatever reason you have to convert your original UTF-8 (as stored in DB) to ISO-8859-1, I'm afraid that it's corrupting your data.
You're double encoding. First you set your database to UTF-8. That means your data is now UTF-8 encoded. Then you use utf8_encode on the iconv-function. But your input is already UTF-8. Try removing your utf8_encode statement from iconv.
I have a database full of strings containing strange characters such as:
Design Tattoo Ãœbungshaut
Mehrflächiges Biozid Reinigungs- & Desinfektionsmittel
Where the Ãœ and ä should be, as I understand, an Ü and à when in proper UTF-8.
Is there a standard function to revert these multiple characters back to there proper UTF-8 form?
In PHP I have come across $url = iconv('utf-8', 'iso-8859-1', $url); which seems to get close but falls short. Perhaps I have the wrong parameters, but in any case was just wondering how well this issue is know and if there is an established fix?
The original data was taken from the eCommerce system CubeCart which seems to have no problem converting it back to normal text FYI.
The data shown as example is UTF-8 encoded data mistakenly interpreted as ISO-8859-1 (or windows-1252). The problem combinations are in fact “Ü” and “ä” (“Ā” does not appear in German). So apparently what you need to do is to read the data as UTF-8 and display it that way, instead of converting it.
If the database and output is utf-8 it could be because your not using utf-8 as the client character set.
If your using mysqli you can use set_charset or run SET NAMES utf8 as a query before fetching data.
I'm trying to compare some text to the text in a database. In the database any text with an accent is encoded like in HTML (i.e. é) when I compare the database text to my string it doesn't match because my string just shows é. When I use the PHP function htmlentities to encode the string first the é turns into é weird? Using htmlspecialchars doesn't encode the é at all.
How would you suggest I compare é to é as well as all the other accented characters?
You need to send in the correct charset to htmlentities. It looks like you're using UTF-8, but the default is ISO-8859-1. Change it like this:
$encoded = htmlentities($text, ENT_COMPAT, 'UTF-8');
Another solution is to convert the text to ISO-8859-1 before encoding, but that may destroy information (ISO-8859-1 does not contain nearly as many characters as UTF-8). If you want to try that instead, do like this:
$encoded = htmlentities(utf8_decode($text));
I'm working on french site, and I also had same problem. This is the function that I use.
function convert_accent($string)
{
return htmlspecialchars_decode(htmlentities(utf8_decode($string)));
}
What it does it decodes your string to utf8, than converts everything HTML entities. even tags. But we want to convert tags back to normal, than htmlspecialchars_decode will convert them back. So in the end you will get a string with converted accents without touching tags.
You can use pass through this function your email content before sending it to recipent.
Another issue you might face is that, sometimes with this function the content from database converts to ? . In this case you should do this before running your query:
mysql_query("SET NAMES `utf8`");
But you might need to do it, it depends on encoding in your table. I hope it helps.
The comparing task is related to the charset and the collation you selected when you create the database or the tables. If you are saving strings with a lot of accents like spanish I sugget you to use charset uft8 and the collation could be the more accurate to the language(english, french or whatever) you're using.
The best thing of using the correct charset in the database is that you can save the string in natural way e.g: my name I can store it as is "Mario Juárez" and I have no need of doing some weird conversions.
Ran into similar issues recently. Followed Emil's answer and it worked fine locally but not on our dev/stage environments. I ended up using this and it worked all around:
$title = html_entity_decode(utf8_decode($item));
Thanks for leading me in the right direction!
I did the following things:
I have a spreadsheet with data. One of the rows has a ü character in it.
I save this as a CSV file in OpenOffice.org. When it asks me for a character encoding, I choose UTF-8.
I use Navicat to create a MySQL database table, InnoDB with UTF-8 utf8_general encoding and import the CSV.
I try to use PHP function htmlspecialchars($string, ENT_COMPAT, 'UTF-8') where $string is the string containing the special ü character.
It gives me an error: Invalid multibyte sequence in argument. When I change 'UTF-8' with 'ISO8859-1', no error is thrown, but the incorrect character is shown. (The 'unknown character' character, looks like <?>)
If I use an HTML form to update the string in the database, the error disappears and the character is displayed correctly, however, when I then look at the record in Navicat, it looks two characters:
[1/4][A with some thing on top of it]
Some multibyte that isn't seen as one character.`
What is going on, where are things going wrong, and what can I do about it?
Although I don't understand where the "invalid multibyte" error comes from, I'm pretty sure htmlspecialchars() is not your culprit:
For the purposes of this function, the charsets ISO-8859-1, ISO-8859-15, UTF-8, cp866, cp1251, cp1252, and KOI8-R are effectively equivalent, as the characters affected by htmlspecialchars() occupy the same positions in all of these charsets.
In my understanding, htmlspecialchars() should work fine for a UTF-8 string without specifying a character set. My bet would be that either the HTML page containing the form, or the database connection you use is not UTF-8 encoded. For the latter, try sending a
SET NAMES utf8;
to mySQL before doing the insert.