I am trying to convert Unicode character such as يونيکد (I don't know what type of text this is, so i have used Unicode character in title) which is an Arabic text but when i use utf8_decode() then i am receiving �?�?�?�?کد , But same string can be converted using online tools such as http://www.forgani.com/top/service/ perfectly.
I have tried many things like :
converted string to hex and back to string
used mb_convert_encoding
used htmlentities
used forceutf8 from https://github.com/neitanod/forceutf8
used setting header, like header('Content-type: text/plain; charset=utf-8');
already tried setting PDO charset to utf8mb4 and utf8
But i didn't get the desired result which is يونيکد, so i want to know how can i decode the given string to UTF-8 Or whatever which can be readable by users in PHP.
set your document charset to utf-8 as follows:
<meta http-equiv="content-type" content="text/html; charset=UTF-8">
or within your php script as follows:
header('Content-type: text/plain; charset=utf-8');
You can map the Utf8 character to Arabic or Persian character.
https://www.utf8-chartable.de/unicode-utf8-table.pl?start=1664&names=-&utf8=char
This code in Javascript decode the Utf8 encoded text:
function decode(content)
{
var decoded = content.replaceAll("ت", "ت").replaceAll("Ù‡", "ه").replaceAll("Ù‡", "ه").replaceAll("ا", "ا")
.replaceAll("س", "س").replaceAll("Ùˆ", "و").replaceAll("ÛŒ", "ی").replaceAll("Ù†", "ن")
.replaceAll("د", "د").replaceAll("ز", "ز").replaceAll("ب", "ب").replaceAll("Ù‚", "ق")
.replaceAll("ص", "ص").replaceAll("Ù¾", "پ").replaceAll("Ù¾", "پ").replaceAll("Ø´", "ش")
.replaceAll("ر", "ر").replaceAll("Ú˜", "ژ").replaceAll("Ù", "ف").replaceAll("Ø", "ح")
.replaceAll("Ø·", "ط").replaceAll("Ø«", "ث").replaceAll("ج", "ج").replaceAll("Ú†", "چ")
.replaceAll("Ø®", "خ").replaceAll("Ø°", "ذ").replaceAll("ض", "ض").replaceAll("ظ", "ظ")
.replaceAll("ع", "ع").replaceAll("غ", "غ").replaceAll("Ú©", "ک").replaceAll("Ú¯", "گ")
.replaceAll("Ù„", "ل").replaceAll("Ù…","م").replaceAll("Û°", "۰").replaceAll("Û±", "۱")
.replaceAll("Û²", "۲").replaceAll("Û³", "۳").replaceAll("Û´", "۴").replaceAll("Ûµ", "۵")
.replaceAll("Û¶", "۶").replaceAll("Û·", "۷").replaceAll("Û¸", "۸").replaceAll("Û¹", "۹")
.replaceAll("‌", "").replaceAll("Ø¡", "ء").replaceAll("Ø¢","آ").replaceAll("‎","")
.replaceAll("–", "–").replaceAll("ØŒ", "،").replaceAll("ØŒ", "،").replaceAll("Ëš", "˚");
console.log(decoded);
return decoded;
}
Related
I'm trying to convert some encoded text to display on a website; the specific example is converting the string "d83edd2a" to the 🤪 emoji.
Apparently the encoding is UTF-16 but php detects it as ASCII.
I've tried using hex2bin but this returns "Ø>Ý*" and php detects this as UTF-8, which makes sense to me.
I've tried playing around with a couple of different attempts
$newCode = mb_convert_encoding($code, "ASCII", "UTF-16");
But this returns "????"
$newCode = iconv(mb_detect_encoding($code), 'ASCII', $hex);
But this also returns "????"
I'm sure I'm missing something simple but I've ended up tying myself up in knots!
If I understand correctly you want to convert the string d83edd2a to the corresponding emoji.
The most straightforward way is to simply:
echo hex2bin('d83edd2a');
However this assumes the client uses UTF-16 charset.
If the client uses a different charset you need to convert it first, otherwise you will just see garbage.
But you cannot just use any encoding (like ASCII) because emojis are specific to unicode.
(ASCII simply doesn't "know" the concept of emojis.)
You need to use UTF-8, UTF-16 or UTF-32.
Since you mentioned website you want "UTF-8", it is the de facto standard charset for modern websites.
You can convert from UTF-16 to UTF-8 like this:
// First convert the string to binary data
// We know this is encoded in UTF-16
$UTF16Str = hex2bin('d83edd2a');
// Then we convert from UTF-16 to something more common like UTF-8
$UTF8Str = mb_convert_encoding($UTF16Str, 'UTF-8', 'UTF-16');
echo $UTF8Str;
As a last step, make sure you communicate the charset to the client (you can do this in HTML or PHP):
<meta charset="UTF-8"> <!-- inside <head> -->
Or in PHP:
header('Content-Type: text/html; charset=UTF-8');
I have these Chinese characters:
汉字/漢字''test
If I do
echo utf8_encode($chinesevar);
it displays
??/??''test
Or even if I just do a simple
echo $chinesevar
it still displays some weird characters...
So how am I going to display these Chinese characters without using the <meta> tag with the UTF-8 thingy .. or the ini_set UTF-8 thing or even the header() thing with UTF-8?
Simple:
save your source code in UTF-8
output an HTTP header to specify to your browser that it should interpret the page using UTF-8:
header('Content-Type: text/html; charset=utf-8');
Done.
utf8_encode is for converting Latin-1 encoded strings to UTF-8. You don't need it.
For more details, see Handling Unicode Front To Back In A Web App.
Look that your file is in UTF8 without BOM and that your webserver deliver your site in UTF-8
HTML:
<meta http-equiv="content-type" content="text/html; charset=UTF-8" />
in PHP:
header('Content-Type: text/html; charset=utf-8');
And if you work with a database look that your database is in UTF-8 if you read the text from your database.
$chinesevarOK = mb_convert_encoding($chinesevar, 'HTML-ENTITIES', 'UTF-8');
Perhaps take a look at the following solutions:
Your database, table and field COLLATE should be utf8_unicode_ci
Check if your records are showing the correct characters within the database...
Set your html to utf8
Add the following line to your php after connecting to the database
mysqli_set_charset($con,"utf8");
http://www.w3schools.com/php/func_mysqli_set_charset.asp
save your source code in UTF-8 No BOM
I need help with decoding url encoded scandinavian ASCII values with PHP.
I have tried decode å character like this:
$string = "%e5";
echo rawurldecode($string);
But this gives black diamond �. Same result with urldecode() function.
I am using <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<meta charset="utf-8"> in head.
When using rawurldecode() function on English letters like %61 it works great.
See http://www.backbone.se/urlencodingUTF8.htm for all url encoded ASCII codes.
E5 is the ISO-8859-1 encoded representation of the character å.
Your problem is that you're outputting an ISO-8859-1 encoded string, yet are telling the browser to interpret it as UTF-8. Either change the encoding in your HTTP headers/meta tag, or convert the string from 8859 to UTF-8:
echo utf8_encode(rawurldecode('%e5'));
(There's almost never a good time for utf8_encode, but in this case it actually succinctly performs the needed charset conversion. Usually you should prefer explicit charset conversions using iconv or mb_convert_encoding.)
utf-8 character set does not contain e5 code.
Please check a table with utf-8 charset .
Try with a valid utf-8 string.
"scandinavian ascii" character set is not supported by rawurldecode .
Try one of the functions iconv,that support CP865 (I guess this is the character set for which you want support):
http://php.net/manual/ro/function.iconv-mime-decode.php
http://php.net/manual/ro/function.iconv-mime-decode-headers.php
I have these Chinese characters:
汉字/漢字''test
If I do
echo utf8_encode($chinesevar);
it displays
??/??''test
Or even if I just do a simple
echo $chinesevar
it still displays some weird characters...
So how am I going to display these Chinese characters without using the <meta> tag with the UTF-8 thingy .. or the ini_set UTF-8 thing or even the header() thing with UTF-8?
Simple:
save your source code in UTF-8
output an HTTP header to specify to your browser that it should interpret the page using UTF-8:
header('Content-Type: text/html; charset=utf-8');
Done.
utf8_encode is for converting Latin-1 encoded strings to UTF-8. You don't need it.
For more details, see Handling Unicode Front To Back In A Web App.
Look that your file is in UTF8 without BOM and that your webserver deliver your site in UTF-8
HTML:
<meta http-equiv="content-type" content="text/html; charset=UTF-8" />
in PHP:
header('Content-Type: text/html; charset=utf-8');
And if you work with a database look that your database is in UTF-8 if you read the text from your database.
$chinesevarOK = mb_convert_encoding($chinesevar, 'HTML-ENTITIES', 'UTF-8');
Perhaps take a look at the following solutions:
Your database, table and field COLLATE should be utf8_unicode_ci
Check if your records are showing the correct characters within the database...
Set your html to utf8
Add the following line to your php after connecting to the database
mysqli_set_charset($con,"utf8");
http://www.w3schools.com/php/func_mysqli_set_charset.asp
save your source code in UTF-8 No BOM
can anyone tell me what encoding is applied on the chinese character, so that chinese characters are converted into this code or text and stored in mysql database :
ä¸Â`国液化天然æ°â€Ã¨Â¿Â输(控股)有é™Âå…¬å¸控股`
original chinese characters which are displayed in web page :
中国液化天然气运输(控股)有限公司控股
on the web page there is a header function is used to make standard chinese chars as follow:
header('Content-type: text/html; charset=utf-8');
Thanks...
When you decode
中国液化天然气运输(控股)有限公司控股
as UTF-8, and encode as CP-1252, then you get
ä¸å›½æ¶²åŒ–天然气è¿è¾“(控股)有é™å…¬å¸æŽ§è‚¡
When you decode the above as UTF-8 and encode as CP-1252 once again, then you get
ä¸Â国液化天然æ°â€Ã¨Â¿ï¿½Ã¨Â¾â€œÃ¯Â¼Ë†Ã¦Å½Â§Ã¨â€šÂ¡Ã¯Â¼â€°Ã¦Å“‰é™�å…¬å�¸æŽ§è‚¡
That's what here is happening.
It is Unicode character set (code points) encoded as UTF-8.