Hi have a look at this picture: http://ctrlv.in/175196 and as you can see the ÅÄÖ are replaced with �.
I have this at the very top of my php: <meta http-equiv="content-type" content="text/html; charset=utf-8"></meta>
and when I look at source it is indeed utf-8 - so why dont they display properly?
When you see the UNICODE REPLACEMENT CHARACTER �, it means your text is being interpreted as UTF-8 (or another Unicode encoding), but one of the byte sequences in the file was not valid in this encoding.
In other words, the file is not UTF-8 encoded.
try this.
iconv('windows-1250', 'utf-8', $your_variable);
if that is coming from an sql query set_charset('utf8') first before the query.
using notepad++ try encoding utf-8 without bom
Related
I need help with decoding url encoded scandinavian ASCII values with PHP.
I have tried decode å character like this:
$string = "%e5";
echo rawurldecode($string);
But this gives black diamond �. Same result with urldecode() function.
I am using <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<meta charset="utf-8"> in head.
When using rawurldecode() function on English letters like %61 it works great.
See http://www.backbone.se/urlencodingUTF8.htm for all url encoded ASCII codes.
E5 is the ISO-8859-1 encoded representation of the character å.
Your problem is that you're outputting an ISO-8859-1 encoded string, yet are telling the browser to interpret it as UTF-8. Either change the encoding in your HTTP headers/meta tag, or convert the string from 8859 to UTF-8:
echo utf8_encode(rawurldecode('%e5'));
(There's almost never a good time for utf8_encode, but in this case it actually succinctly performs the needed charset conversion. Usually you should prefer explicit charset conversions using iconv or mb_convert_encoding.)
utf-8 character set does not contain e5 code.
Please check a table with utf-8 charset .
Try with a valid utf-8 string.
"scandinavian ascii" character set is not supported by rawurldecode .
Try one of the functions iconv,that support CP865 (I guess this is the character set for which you want support):
http://php.net/manual/ro/function.iconv-mime-decode.php
http://php.net/manual/ro/function.iconv-mime-decode-headers.php
Quick Background: I inherited a large sql dump file containing a combination of english and arabic text and (I think) it was originally exported using 'latin1'. I changed all occurrences of 'latin1' to 'utf8' prior to importing the file. The the arabic text didn't appear correctly in phpmyadmin (which I guess is normal), but when I loaded the text to a web page with the following...
<meta http-equiv='Content-Type' content='text/html; charset=windows-1256'/>
...everything looked good and the arabic text displayed perfectly.
Problem: My client is really really really picky and doesn't want to change his...
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8"/>
...to the 'Windows-1256' equivalent. I didn't think this would be a problem, but when I changed the charset value to 'UTF-8', all of the arabic characters appeared as diamonds with question marks. Shouldn't UTF-8 display arabic text correctly?
Here are a few notes about my database configuration:
Database charset is 'utf8'
Database connection collation is 'utf8_general_ci'
All databases, tables, and applicable fields have been collated as 'utf8_general_ci'
I've been scouring stack overflow and other forums for anything the relates to my issue. I've found similar problems, but not of the solutions seem to work for my specific situation. Hope someone can help!
If the document looks right when declared as windows-1256 encoded, then it most probably is windows-1256 encoded. So it was apparently not exported using latin1—which would have been impossible, since latin1 has no Arabic letters.
If this is just about a single file, then the simplest way is to convert it from windows-1256 encoding to utf-8 encoding, using e.g. Notepad++. (Open the file in it, change the encoding, via File format menu, to Arabic, windows-1256. Then select Convert to UTF-8 in the File format menu and do File → Save.)
Windows-1256 and UTF-8 are completely different encodings, so data gets all messed up if you declare windows-1256 data as UTF-8 or vice versa. Only ASCII characters, such as English letters, have the same representation in both encodings.
We can't find the error in your code if you don't show us your code, so we're very limited in how we can help you.
You told the browser to interpret the document as being UTF-8 rather than Windows-1256, but did you actually change the encoding used from Windows-1256 to UTF-8?
For example,
$ cat a.pl
use strict;
use warnings;
use feature qw( say );
use charnames ':full';
my $enc = $ARGV[0] or die;
binmode STDOUT, ":encoding($enc)";
print <<"__EOI__";
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=$enc">
<title>Foo!</title>
</head>
<body dir="rtl">
\N{ARABIC LETTER ALEF}\N{ARABIC LETTER LAM}\N{ARABIC LETTER AIN}\N{ARABIC LETTER REH}\N{ARABIC LETTER BEH}\N{ARABIC LETTER YEH}\N{ARABIC LETTER TEH MARBUTA}
</body>
</html>
__EOI__
$ perl a.pl UTF-8 > utf8.html
$ perl a.pl Windows-1256 > cp1256.html
I think you need to go back to square one. It sounds like you have a database dump in Win-1256 encoding and you want to work with it in UTF-8 from now on. It also sounds like you are using PHP but you have lots of irrelevant tags on your question and are missing the most important one, PHP.
First, you need to convert the text dump into UTF-8 and you should be able to do that with PHP. Chances are that your conversion script will have two steps, first read the Win-1256 bytes and decode them into internal Unicode text strings, then encode the Unicode text strings into UTF-8 bytes for output to a new text file.
Once you have done that, redo the database import as you did before, but now you have correctly encoded the input data as UTF-8.
After that it should be as simple as reading the database and rendering a web page with the correct UTF-8 encoding.
P.S. It is actually possible to reencode the data every time you display it, but that does not solve the problem of having a database full of incorrectly encoded data.
inorder to display arabic characters correctly , you need to convert your php file to utf-8 without Bom
this happened with me, arabic characters was displayed diamonds, but conversion to utf-8 without bom will solve this problem
I seems that the db is configured as UTF8, but the data itself is extended ascii. If the data is converted to UTF8, it will display correctly in content type set to UTF8
I actually have a fairly simple question but I'm unable to find an answer anywhere. The PHP function html_entity_decode is supposed to "converts all HTML entities to their applicable characters from string."
So, since Ω is the HTML encoding for the Greek captical letter Omega, I'd expect that echo html_entity_decode('Ω', ENT_COMPAT, 'UTF-8'); would output Ω. But instaid, it outputs some strange characters which my browser can't recongize. Why is this?
Thanks,
Martijn
When you convert entities into UTF-8 characters like your last parameter specifies, your output encoding must be UTF-8 as well. Otherwise, in a single-byte encoding like ISO-8859-1, you will see double-byte characters as two broken single ones.
It's works fine:
http://codepad.viper-7.com/tb2LaW
Make sure your webpage encoding is UTF-8
If you have different encoding on webpage change this:
html_entity_decode('Ω', ENT_COMPAT, 'UTF-8');
^^^^^
header('Content-type: text/html;charset=utf-8');
mysql_set_charset("utf8", $conn);
Refer this URL:-
http://www.phpwact.org/php/i18n/charsets
php mysql character set: storing html of international content
I never had this problem before, it was usually my database or the html page. But now i think its my php. I import text from a csv or from a text area and in both ways it goes wrong.
for example é changes to é. I used htmlentities to fix this but it didn't work. The htmlentities function didn't return é in html but é in html entities, so it already loses the real characters before htmlentities comes in to place... So does that mean my php file has the wrong encoding or something?
I hope someone can help me out..
Thanks!
Chris
A file is usually ISO-8859-1 (Latin) or UTF-8 ... ISO-8859-1 is 1 byte per char, UTF-8 is 1-4 bytes per char. So if you get 2 chars when you expect one, then you are reading UTF-8 and showing it as ISO-8859-1 ... if you get strange chars, then you are reading ISO-8859-1 and showing it as UTF-8.
If you provide more details, it would be easier to pinpoint, but in short, you have inconsistent charsets and need to convert one or the other so they're all the same. But from what it seems, you're using ISO-8859-1 in your project, but you are reading some UTF-8 from somewhere... use utf8_decode($text) if that data should be indeed be stored as UTF-8, or find the data and convert it manually.
EDIT: If you are using AJAX somewhere, then you will ALWAYS get UTF-8 from it, and you'll have to decode it yourself with utf8_decode() if you want to keep using ISO-8859-1.
Try opening your php file and change the encoding to UTF-8
if that doesn't help, add this to your php:
header('Content-Type: text/html; charset=utf-8');
Or this to your html:
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
Take a look at PHP's iconv().
I have looked around and can't seem to find a solution so here it is.
I have the following code:
$file = "adhddrugs.xml";
$xmlstr = simplexml_load_file($file);
echo $xmlstr->report_description;
This is the simple version, but even trying this any hyphens r apostrophes are turned into: ^a (euro sign) trademark sign.
Things I have tried are:
echo = (string)$xmlstr->report_description; /* did not work */
echo = addslashes($xmlstr->report_description); /* yes I know this doesnt work with hyphens, was mainly trying to see if I could escape the apostrophes */
echo = addslashes((string)$xmlstr->report_description); /* did not work */
also htmlspecial(again i know does not work with hyphens), htmlentities, and a few other tricks.
Now the situation is I am getting the XML files from a feed so I cannot change them, but they are pretty standard. The text with the hyphens etc are encapsulated in a cdata tag and encoding is UTF-8. If I check the source I am shown the hyphens and apostrophes in the source.
Now just to see if the encoding was off or mislabeled or something else weird, I tried to view the raw XML file and sure enough it is displayed correctly.
I am sure that in my rush to find the answer I have overlooked something simple and the fact that this is really the first time I have ever used SimpleXML I am missing a very simple solution. Just don't dock me for it I really did try and find the answer on my own.
Thanks again.
This is the simple version, but even
trying this any hyphens apostrophes
are turned into: ^a (euro sign)
trademark sign.
This is caused by incorrect charset guessing (and possibly recoding).
If a text contains a "curly apostrophe" = "Right single quotation mark" = U+2019 character, saving it in UTF-8 encoding results in bytes 0xE2 0x80 0x99. If the same file is then read again assuming its charset is windows-1252, the byte stream of the apostrophe character (0xE2 0x80 0x99) is interpreted as characters ’ (=small "a" with circumflex, euro sign, trademark sign). Again if this incorrectly interpreted text is saved as UTF-8 the original character results in byte stream 0xC3 0xA2 0xE2 0x82 0xAC 0xE2 0x84 0xA2
Summary: Your original data is UTF-8 and some part of your code that reads the data assumes it is windows-1252 (or ISO-8859-1, which is usually actually treated as windows-1252). A probable reason for this charset assumption is that default charset for HTTP is ISO-8859-1. 'When no explicit charset parameter is provided by the sender, media subtypes of the "text" type are defined to have a default charset value of "ISO-8859-1" when received via HTTP.' Source: RFC 2616, Hypertext Transfer Protocol -- HTTP/1.1
PS. this is a very common problem. Just do a Google or Bing search with query doesn’t -doesn't and you'll see many pages with this same encoding error.
Do you know the document's character set?
You could do header('Content-Type: text/html; charset=utf-8'); before any content is printed, if you havent already.
Make sure you have set up SimpleXML to use UTF-8 too.
Be sure that all the entities are encoded using hex notation, not HTML entities.
Also maybe:
$string = html_entity_decode($string, ENT_QUOTES, "utf-8");
will help.
This is a symptom of declaring an incorrect character set in the <head> section of your page (or not declaring and using default character set without accents and special characters).
This does the trick for latin languages.
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
For TOTAL NEWBIES, html pages for browsers have a basic layout, with a HEAD or HEADER which serves to tell the browser some basic stuff about the page, as well as preload some scripts that the page will use to achieve its functionality(ies).
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
</head>
<body>
Hello world
</body>
</html>
if the <head> section is omitted, html will use defaults (take some things for granted - like using the northamerican character set, which does NOT include many accented letters, whch show up as "weird characters".