can anyone tell me what encoding is applied on the chinese character, so that chinese characters are converted into this code or text and stored in mysql database :
ä¸Â`国液化天然æ°â€Ã¨Â¿Â输(控股)有é™Âå…¬å¸控股`
original chinese characters which are displayed in web page :
中国液化天然气运输(控股)有限公司控股
on the web page there is a header function is used to make standard chinese chars as follow:
header('Content-type: text/html; charset=utf-8');
Thanks...
When you decode
中国液化天然气运输(控股)有限公司控股
as UTF-8, and encode as CP-1252, then you get
ä¸å›½æ¶²åŒ–天然气è¿è¾“(控股)有é™å…¬å¸æŽ§è‚¡
When you decode the above as UTF-8 and encode as CP-1252 once again, then you get
ä¸Â国液化天然æ°â€Ã¨Â¿ï¿½Ã¨Â¾â€œÃ¯Â¼Ë†Ã¦Å½Â§Ã¨â€šÂ¡Ã¯Â¼â€°Ã¦Å“‰é™�å…¬å�¸æŽ§è‚¡
That's what here is happening.
It is Unicode character set (code points) encoded as UTF-8.
Related
I am trying to show emoji using its unicode value(😀). But I am getting escaped string as \u00f0\u0178\u02dc\u20ac, which is decoded into 😀.
I am using Mysql server and PHP 5.4 in my project. In mysql, it's stored as 😀. Is there any way to unescape this and return Actual unicode from PHP server
I tried,
iconv('ASCII//TRANSLIT', 'UTF-8', '😀');, mb_convert_encoding($var, "US-ASCII", "UTF-8") and utf8_encode(). not working.
Thanks
Without knowing the structure of your database (make sure that you're using the utf8 as the character set for your table!), I think the problem may just be on the display side. Try starting your PHP script by sending a header to the browser that lets it know that you're going to be displaying UTF8 characters, rather than Western encoding (ISO-8859-1).
header('Content-type text/html; charset=UTF-8');
I have this problem where I converted a special character to be put on the URL as a parameter using Javascript Ajax request and then reads it to PHP. The character is "Ñ".
In my javascript I put the parameter as escape('PiÑa') and is converted to "Pi%D1a"
And when I read it in my php a diamond shape with a question mark is what will appear. Here is how I read it.
escape(message) // Message being the "Pi%D1a"
Like I said a weird character comes out that when I save it my database, postgreSQL, It gives out an error. How do I fix this?
D1 is the ISO-8859-1 ("Latin-1") encoded form of the "Ñ" character.
A "diamond shape with a question mark" (�) is the Unicode Replacement Character. Whenever you see one, it indicates that the browser/editor/whatever-is-interpreting-the-text is trying to interpret text as Unicode and is encountering a character that is not valid in the assumed Unicode encoding.
In other words, the character is actually Latin-1 encoded but you're telling the browser it's (likely) UTF-8 encoded. You have an encoding mismatch. Either tell the browser the right encoding via a Content-Type: text/html; charset=XXX header, or convert the character from Latin-1 to UTF-8 before working with it.
Have you tried using urldecode($message)?
%D1 is the URL encoded representation of Ñ.
I created a file on my localhost with the following code:
<?php
header('Content-Type: text/plain; charset=UTF-8');
echo "yoá";
The output in my Firefox is:
Why the Unicode replacement character?
Because your PHP script file is not saved as UTF-8 from inside your editor. All decent editors allow you to convert between and save as several different encodings (even Notepad does this now). Save in UTF-8 and you will see the character appear normally.
Technical explanation:
The character in question is code point U+00E1 ("latin small letter a with acute"). Supposing that you have saved your script in a single-byte encoding (which is most likely), this character would be represented by the byte with hex value 0xE1, which in binary is
11100001
From the UTF-8 encoding rules, we see that this byte falls in the category
1110zzzz
which is the first of exactly three bytes that encode a single character in the code point range U+0800 to U+FFFF. However, in your case there are either no more bytes following this one or if there are they do not satisfy the UTF-8 encoding restrictions.
Hence, the browser determines that there is a malformed byte sequence and displays the question mark instead.
you're sending an utf-8-header but your source-file isn't set to (and saved as) utf-8. check the settings of your code-editor/ide to correct that.
Try this :
header('Content-Type: text/plain; charset=UTF-8');
$txt= "yoá";
echo iconv("ISO-8859-1","UTF-8",$txt);
if you are not sure of if string is valid UTF8 then use lib to test it
http://hsivonen.iki.fi/php-utf8/
if(is_valid_utf8($txt)==true) ....
I never had this problem before, it was usually my database or the html page. But now i think its my php. I import text from a csv or from a text area and in both ways it goes wrong.
for example é changes to é. I used htmlentities to fix this but it didn't work. The htmlentities function didn't return é in html but é in html entities, so it already loses the real characters before htmlentities comes in to place... So does that mean my php file has the wrong encoding or something?
I hope someone can help me out..
Thanks!
Chris
A file is usually ISO-8859-1 (Latin) or UTF-8 ... ISO-8859-1 is 1 byte per char, UTF-8 is 1-4 bytes per char. So if you get 2 chars when you expect one, then you are reading UTF-8 and showing it as ISO-8859-1 ... if you get strange chars, then you are reading ISO-8859-1 and showing it as UTF-8.
If you provide more details, it would be easier to pinpoint, but in short, you have inconsistent charsets and need to convert one or the other so they're all the same. But from what it seems, you're using ISO-8859-1 in your project, but you are reading some UTF-8 from somewhere... use utf8_decode($text) if that data should be indeed be stored as UTF-8, or find the data and convert it manually.
EDIT: If you are using AJAX somewhere, then you will ALWAYS get UTF-8 from it, and you'll have to decode it yourself with utf8_decode() if you want to keep using ISO-8859-1.
Try opening your php file and change the encoding to UTF-8
if that doesn't help, add this to your php:
header('Content-Type: text/html; charset=utf-8');
Or this to your html:
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
Take a look at PHP's iconv().
INT. PALO TORCIDO HIGH SCHOOL, CAFETER�A - DAY
Hi, I uploaded a .txt to my server and got the contents with fopen/fread and alsot used file_get_contents just in case.
I can't seem to figure out how to encode the special characters...
In my HTML i have my UTF set to 8. I also tried a PHP HEADER to use UTF-8 encoding.
what is the proper way to handle files with letters not part of the english alphabet?
Try utf8_encode()
echo utf8_encode(file_get_contents('file.txt'));
This works if the *.txt is encoded in Latin1. If other encoding may be used too, detect the encoding using mb_detect_encoding() and encode it to UTF8 with mb_convert_encoding()