I have some special characters for values coming from database like æ and Ø. When I inspect the database using phpmyadmin on both XAMPP and online, this is how it appears:
But in the processed PHP page, it appears fine locally, but not on the server. So basically, there is some problem on the online version which is preventing these values from being displayed properly.
I already have this in my head portion:
<meta http-equiv="content-type" content="text/html;charset=UTF-8" />
And in my .htaccess file:
AddDefaultCharset UTF-8
DefaultLanguage en-US
Collation is latin1_swedish_ci for the database, I tried switching it to utf-8 and that didn't help.
That is "Mojibake", wherein, for example, a 2-byte UTF-8 character
If I am not mistaken your examples disagree with the text. The following utf8 char maps to the matching 2-char pair when missmapped through latin1.
ü ü
ö ö
Ø Ã˜
æ æ
For more about the causes and cures: Trouble with UTF-8 characters; what I see is not what I stored
Related
I have a .tsv file using Danish letters like Æ Ø Å.
The file is uploaded in php with file_get_contents();
and then processed and made to an mysqli query.
I tried putting <?php header('Content-Type: text/html; charset=utf-8'); ?> at the very top of the code.
also using the meta tag <meta charset="UTF-8">
and in my SQL I have the rows created like:
text COLLATE utf8_danish_ci NOT NULL
and:
PRIMARY KEY (`id`)\n) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_danish_ci AUTO_INCREMENT
and:
$conn->set_charset("utf8");
.... But still no luck.
If I open my .tsv file in excel, then it shows the Æ Ø Å correctly. But when open with "TextEdit" on mac. the "Æ Ø Å" shows like "¯ ¯ ¯"
UPDATE - SOLUTION as the accepted answer refers to I should be using CP1252:
mb_convert_encoding($fileEndEnd, 'HTML-ENTITIES', "CP1252");
There are many things to consider with UTF-8. But I see this one particular comment of yours...
If I open my .tsv file in excel, then it shows the Æ Ø Å correctly. But when open with "TextEdit" on mac. the "Æ Ø Å" shows like "¯ ¯ ¯"
The problem...
If you are talking about MicroSoft Excel, then you should know that the characters above are both within the UTF-8 charset and the LATIN_1_SUPPLEMENT charset (often called CP1252). Take a look: LATIN_1_SUPPLEMENT Block
If you are saving this document, without setting an encoding of it to UTF-8, then Windows will have no reason to convert this text out of CP1252 and into UTF-8. But that is what you will need to do.
Possible solutions...
On your server: You can try to decode any windows charset or "unknown" charset from CP1252 to UTF-8. (Since Windows will save documents "according to the system default", this information may disappear by the time it hits your Linux servers.)
On the submitter's computer: You can solve this by having the user adjust their UTF-8 settings in whatever editor is generating the document (to encode their documents as UTF-8, which causes this information to be stored in the document BOM, or "byte-order mark", which your server can read). This second approach may seem user-unfriendly (and it is, sure), but it can help you identify where the data is being corrupted.
My issue is I have a database which was imported as UTF-8 that has columns that are default latin1. This is obviously an issue so when I set the charset to UTF-8 on php it gives me � instead of the expected ae character.
Now, when I originally had my encoding as windows-1252 it worked perfectly but then when I validate my file it says that windows-1252 is legacy and shouldn't be used.
Obviously I'm only trying to get rid of the error message but the only problem is I'm not allowed to change anything in the database at all. Is there any way the data can be output as utf-8 whilst still being stored as latin1 in the DB?
Time ago, I used this function to resolve printing texts in a hellish page of different lurking out-of-control charsets xD:
function to_entities($string)
{
$encoding = mb_detect_encoding($string, array('UTF-8', 'ISO-8859-1')); // and a few encodings more... sigh...
return htmlentities($string, ENT_QUOTES, $encoding, false);
}
print to_entities('á é í ó ú ñ');
1252 (latin1) can handler æ. It is hex E6. In utf8 it is hex C3A6.
� usually comes from latin1 encodings, then displaying them as utf8. So, let's go back to what was stored.
Please provide SHOW CREATE TABLE. I suspect it will say CHARACTER SET latin1, not utf8.
Then, let's see
SELECT col, HEX(col) FROM tbl WHERE ...
to see the hex. (See hex notes above.)
Assuming everything is latin1 so far, then the simple (and perhaps expedient) answer is to check the html source. I suspect it says
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
Changing to charset=ISO-8859-1 may solve the black diamond problem.
But... latin1 only handles Western European characters. If you need Slavic, Greek, Chinese, etc, then you do need utf8. I'll provide a different answer in that case.
I have figured out how to do this after looking through the link that Fred provided, thanks!
if anyone needs to know what to do
if you have a database connection file. inside that, underneath the mysqli_connect command add
mysqli_set_charset($connectvar, "utf8");
I have a file url (file:///...) with an umlaut in a intranet solution.
The url is encoded which turned the ä into an %C3%A4.
When I click the link in Firefox/Chrome the character is an ä and the file is displayed.
In IE the character is changed to à which results in a 404-error.
I tried with and without the following charset definition but it does not seem to work. (The file is UTF-8 encoded)
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
I tried without encoding which does work but since I am using PHP-DOMDoc which causes the encoding I would rather not parse the content again.
I cannot avoid the umlauts in the urls since the customer enters these.
Is there any solution for this problem?
Quick Background: I inherited a large sql dump file containing a combination of english and arabic text and (I think) it was originally exported using 'latin1'. I changed all occurrences of 'latin1' to 'utf8' prior to importing the file. The the arabic text didn't appear correctly in phpmyadmin (which I guess is normal), but when I loaded the text to a web page with the following...
<meta http-equiv='Content-Type' content='text/html; charset=windows-1256'/>
...everything looked good and the arabic text displayed perfectly.
Problem: My client is really really really picky and doesn't want to change his...
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8"/>
...to the 'Windows-1256' equivalent. I didn't think this would be a problem, but when I changed the charset value to 'UTF-8', all of the arabic characters appeared as diamonds with question marks. Shouldn't UTF-8 display arabic text correctly?
Here are a few notes about my database configuration:
Database charset is 'utf8'
Database connection collation is 'utf8_general_ci'
All databases, tables, and applicable fields have been collated as 'utf8_general_ci'
I've been scouring stack overflow and other forums for anything the relates to my issue. I've found similar problems, but not of the solutions seem to work for my specific situation. Hope someone can help!
If the document looks right when declared as windows-1256 encoded, then it most probably is windows-1256 encoded. So it was apparently not exported using latin1—which would have been impossible, since latin1 has no Arabic letters.
If this is just about a single file, then the simplest way is to convert it from windows-1256 encoding to utf-8 encoding, using e.g. Notepad++. (Open the file in it, change the encoding, via File format menu, to Arabic, windows-1256. Then select Convert to UTF-8 in the File format menu and do File → Save.)
Windows-1256 and UTF-8 are completely different encodings, so data gets all messed up if you declare windows-1256 data as UTF-8 or vice versa. Only ASCII characters, such as English letters, have the same representation in both encodings.
We can't find the error in your code if you don't show us your code, so we're very limited in how we can help you.
You told the browser to interpret the document as being UTF-8 rather than Windows-1256, but did you actually change the encoding used from Windows-1256 to UTF-8?
For example,
$ cat a.pl
use strict;
use warnings;
use feature qw( say );
use charnames ':full';
my $enc = $ARGV[0] or die;
binmode STDOUT, ":encoding($enc)";
print <<"__EOI__";
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=$enc">
<title>Foo!</title>
</head>
<body dir="rtl">
\N{ARABIC LETTER ALEF}\N{ARABIC LETTER LAM}\N{ARABIC LETTER AIN}\N{ARABIC LETTER REH}\N{ARABIC LETTER BEH}\N{ARABIC LETTER YEH}\N{ARABIC LETTER TEH MARBUTA}
</body>
</html>
__EOI__
$ perl a.pl UTF-8 > utf8.html
$ perl a.pl Windows-1256 > cp1256.html
I think you need to go back to square one. It sounds like you have a database dump in Win-1256 encoding and you want to work with it in UTF-8 from now on. It also sounds like you are using PHP but you have lots of irrelevant tags on your question and are missing the most important one, PHP.
First, you need to convert the text dump into UTF-8 and you should be able to do that with PHP. Chances are that your conversion script will have two steps, first read the Win-1256 bytes and decode them into internal Unicode text strings, then encode the Unicode text strings into UTF-8 bytes for output to a new text file.
Once you have done that, redo the database import as you did before, but now you have correctly encoded the input data as UTF-8.
After that it should be as simple as reading the database and rendering a web page with the correct UTF-8 encoding.
P.S. It is actually possible to reencode the data every time you display it, but that does not solve the problem of having a database full of incorrectly encoded data.
inorder to display arabic characters correctly , you need to convert your php file to utf-8 without Bom
this happened with me, arabic characters was displayed diamonds, but conversion to utf-8 without bom will solve this problem
I seems that the db is configured as UTF8, but the data itself is extended ascii. If the data is converted to UTF8, it will display correctly in content type set to UTF8
I have a osCommerce 2.2 MST which has some custom additions to it. osCommerce itself is in ISO-8859-1. The addition has a table in a MySQL database which is now in utf8_general_ci (the others are all in latin1_swedish_ci). The php-file I'm calling outputs
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
As I mentioned before the data from the database is in UTF-8. But letters like ö,ä,ü are correct displayed. How can this be? There should be no utf8_decode. But the letter č is displayed as ?. I get this directly as result array. If I make the query with phpmyadmin it is correct displayed.
I managed to get all letters correct displayed (only in one section of the script). This is what I made
mysql_query("SET NAMES 'utf8'");
In the php-script I also added
header('content-type: text/html; charset=utf-8');
<meta http-equiv="Content-Type" content="text/html;charset=utf-8">
But then other problems occured.
What I want to know why data in UTF-8 is "correctly" displayed when it should be not. And how do I get the letter č correctly displayed?
The system is I find rather complex. Where and how can I look what is wrong here?
I don't know the sequence of encodings/decodings that your data go through, but the reason that letters like ö, ä, and ü are correct, while č is not, is that ö, ä, and ü can be encoded in ISO-8859-1, but č cannot. You will need to use UTF-8 instead of ISO-8859-1 in your HTML to get č to display.