Strange behavior of iconv() and utf8_decode() php methods

Strange behavior of iconv() and utf8_decode() php methods - php

I have a test text which i post with and ajax call (JQuery):
čéáűőúöüó é$ß¤÷×¸¨¸˝¨´~˘˝°´˛>*čéáűőúöüó$>*ß$÷×÷;$¨˝´>$đ;ä
i just write the very same text in the response
<?php
$text=$_POST["text"];
echo "\n\nUTF8_DECODE:\n";
echo utf8_decode($text);
echo "\nISO8859-2 -> UTF-8:\n";
echo iconv("ISO-8859-2","UTF-8",$text);
echo "\nUTF-8 -> ISO-8859-2 \n";
echo iconv("UTF-8","ISO-8859-2",$text);
?>
The result should be:
UTF8_DECODE:
?éá??úöüó é$ß¤÷×¸¨¸?¨´~??°´?>*?éá??úöüó$>*ß$÷×÷;$¨?´>$?;ä
ISO8859-2 -> UTF-8:
čéáűőúöüó
é$ß¤÷×¸¨¸˝¨´~˘˝°´˛>*čéáűőúöüó$>*ß$÷×÷;$¨˝´>$đ;ä
UTF-8 -> ISO-8859:
ĂÂÄĹ ÄÄÄšÄÄšÂÄĹÄĹÄĹşÄĹ ÄĹ
$ÄÂĂÂ¤ÄËÄÂĂÂ¸ĂÂ¨ĂÂ¸ĂÂĂÂ¨ĂÂ´~ĂÂĂÂĂÂ°ĂÂ´ĂÂ>*ĂÂÄĹ
ÄÄÄšÄÄšÂÄĹÄĹÄĹşÄĹ$>*ÄÂ$ÄËÄÂÄË;$ĂÂ¨ĂÂĂÂ´>$ĂÂ;ÄÂ¤
But it is:
UTF8_DECODE:
?éá??úöüó é$ß¤÷×¸¨¸?¨´~??°´?>*?éá??úöüó$>*ß$÷×÷;$¨?´>$?;ä
ISO8859-2 -> UTF-8:
ĂÂÄĹ ÄÄÄšÄÄšÂÄĹÄĹÄĹşÄĹ ÄĹ
$ÄÂĂÂ¤ÄËÄÂĂÂ¸ĂÂ¨ĂÂ¸ĂÂĂÂ¨ĂÂ´~ĂÂĂÂĂÂ°ĂÂ´ĂÂ>*ĂÂÄĹ
ÄÄÄšÄÄšÂÄĹÄĹÄĹşÄĹ$>*ÄÂ$ÄËÄÂÄË;$ĂÂ¨ĂÂĂÂ´>$ĂÂ;ÄÂ¤
UTF-8 -> ISO-8859-2:
čéáűőúöüó
é$ß¤÷×¸¨¸˝¨´~˘˝°´˛>*čéáűőúöüó$>*ß$÷×÷;$¨˝´>$đ;ä
My question is why is that?? What i miss?
Because my text is at ISO-8859-2 and i want to transfer to UTF-8, why i need to use the opposite method when:
string iconv ( string $in_charset , string $out_charset , string $str
)
Performs a character set conversion on the string str from in_charset
to out_charset.
Maybe the ajax request encoded in UTF-8 the ISO-8859-2 characters?

Related

How to convert ASCII to ISO-8859-1 in PHP?

I'm currently trying to figure out how to convert an ASCII encoded string to ISO-8859-1 encoding to be used for utf8_encode() to display special characters like "ñ" but I can't seem to make it work. In need of help.
I've already tried this iconv(mb_detect_encoding($text, mb_detect_order(), true), "ISO-8859-1", $text); and this mb_convert_encoding($text, "ISO-8859-1"); and also this mb_convert_encoding($text, "ASCII", "ISO-8859-1"); but it doesn't work, the string is still ASCII encoded.
I've created a temporary solution for this by creating a lookup table using the string provided by reading each character of the string. But I want to use the php built-in functions, is this possible?
Here is my code:
<?php
function convertString($text) {
$text = iconv(mb_detect_encoding($text, mb_detect_order(), true), "ISO-8859-1", $text);
echo mb_detect_encoding($text) .'<br/>'; // to check what encoding the string is in, displays ASCII
return utf8_encode($text);
}
echo convertString('\xc3\xb1');
?>

utf8 not converting string in PHP

Hello i have german client and i am getting string with german alphabet which i am trying to display properly in output.I tried utf8_encode to convert string but not working for me.
Code:
echo "Desc Short=>". utf8_encode($obj->Desc_Short) . "<br>\r\n";
echo "Desc Long=>". utf8_encode($obj->Desc_Long) . "<br>\r\n";
Output:
Desc Short=>Ablagefach mittig in GepÃ¤ckraumtrennwand;ESACO_UG(122)
Desc Long=>Ablagefach mittig in GepÃ¤ckraumtrennwand inkl. verschiebbarem Haltenetz

It seems you need to simply use utf8_decode and use php header to set encoding (or set encoding in HTML document).
For the following code:
<?php
header( 'Content-type: text/html; charset=utf-8' );
$x = 'Ablagefach mittig in GepÃ¤ckraumtrennwand;ESACO_UG(122)';
echo utf8_decode($x);
Output for this is:
Ablagefach mittig in Gepäckraumtrennwand;ESACO_UG(122)

Your output indicates that the string is already utf-8 encoded.
Either you would have to use utf8_decode() to get the umlaut or - better - change any component in your application to properly handle utf-8. :)

parse the string through utf8_decode function
TRY :
utf8_decode($obj->Desc_Short)
utf8_decode($obj->Desc_Long)

Decode base64 string - php

Is there any way to decode this string??
Actual string : 其他語言測試 - testing
base64 encode while sending on mail as subject as
"=?iso-2022-jp?B?GyRCQjZCPjhsOEBCLDtuGyhCIC0gdGVzdGluZw==?="
<?php
echo base64_decode('GyRCQjZCPjhsOEBCLDtuGyhCIC0gdGVzdGluZw==');
?>
This is base 64 encode, I couldn't decode it to actual Chinese string.Since it has been encoded using "iso-2022-jp", I have also tried online base64decode.org site to decode this string, but I couldn't find the original string, how can I do that?

Use iconv():
<?php
$input = base64_decode('GyRCQjZCPjhsOEBCLDtuGyhCIC0gdGVzdGluZw==');//$BB6B>8l8#B,;n(B - testing
$input_encoding = 'iso-2022-jp';
echo iconv($input_encoding, 'UTF-8', $input); //其他語言測試 - testing
?>

What you are looking at is MIME header encoding. It can be decoded by mb_decode_mimeheader(), and generated by mb_encode_mimeheader(). For example:
<?php
mb_internal_encoding("utf-8");
$subj = "=?iso-2022-jp?B?GyRCQjZCPjhsOEBCLDtuGyhCIC0gdGVzdGluZw==?=";
print mb_decode_mimeheader($subj);
?>
其他語言測試 - testing
(The call to mb_internal_encoding() is necessary here because the contents of the subject line can't be represented in the default internal encoding of ISO8859-1.)

Try encoding the string to UTF-8 first and then encode it to base 64.
Same when decoding, decode the string from base64 and then from UTF-8.
This is working for me:
php > $base = "其他語言測試 - testing";
php > $encoded = base64_encode(utf8_encode($base));
php > $decoded = utf8_decode(base64_decode($encoded));
php > echo ($decoded === $base) . "\n";
1

Detecting the right character encoding in PHP?

I'm trying to detect the character encoding of a string but I can't get the right result.
For example:
$str = "€ ‚ ƒ „ …" ;
$str = mb_convert_encoding($str, 'Windows-1252' ,'HTML-ENTITIES') ;
// Now $str should be a Windows-1252-encoded string.
// Let's detect its encoding:
echo mb_detect_encoding($str,'Windows-1252, ISO-8859-1, UTF-8') ;
That code outputs ISO-8859-1 but it should be Windows-1252.
What's wrong with this?
EDIT:
Updated example, in response to #raina77ow.
$str = "€‚ƒ„…" ; // no white-spaces
$str = mb_convert_encoding($str, 'Windows-1252' ,'HTML-ENTITIES') ;
$str = "Hello $str" ; // let's add some ascii characters
echo mb_detect_encoding($str,'Windows-1252, ISO-8859-1, UTF-8') ;
I get the wrong result again.

The problem with Windows-1252 in PHP is that it will almost never be detected, because as soon as your text contains any characters outside of 0x80 to 0x9f, it will not be detected as Windows-1252.
This means that if your string contains a normal ASCII letter like "A", or even a space character, PHP will say that this is not valid Windows-1252 and, in your case, fall back to the next possible encoding, which is ISO 8859-1. This is a PHP bug, see https://bugs.php.net/bug.php?id=64667.

Although strings encoded with ISO-8859-1 and CP-1252 have different byte code representation:
<?php
$str = "€ ‚ ƒ „ …" ;
foreach (array('Windows-1252', 'ISO-8859-1') as $encoding)
{
$new = mb_convert_encoding($str, $encoding, 'HTML-ENTITIES');
printf('%15s: %s detected: %10s explicitly: %10s',
$encoding,
implode('', array_map(function($x) { return dechex(ord($x)); }, str_split($new))),
mb_detect_encoding($new),
mb_detect_encoding($new, array('ISO-8859-1', 'Windows-1252'))
);
echo PHP_EOL;
}
Results:
Windows-1252: 802082208320842085 detected: explicitly: ISO-8859-1
ISO-8859-1: 3f203f203f203f203f detected: ASCII explicitly: ISO-8859-1
...from what we can see here it looks like there is problem with second paramater of mb_detect_encoding. Using mb_detect_order instead of parameter yields very similar results.

Trouble with decode JSON + PHP

My php script gives out this string (for example) for JSON:
{"time":"0:38:01","kto":"\u00d3\u00e1\u00e8\u00e2\u00f6\u00e0 \u00c3\u00e5\u00ed\u00e5\u00f0\u00e0\u00eb\u00ee\u00e2","mess":"\u00c5\u00e4\u00e8\u00ed\u00fb\u00e9: *mm"}
jQuery code gets this string through JSON:
$.getJSON('chat_ajax.php?q=1',
function(result) {
alert('Time ' + result.time + ' Kto' + result.kto + ' Mess' + result.mess);
});
Browser show:
0:38:01 Óáèâöà Ãåíåðàëîâ
Åäèíûé: *mm
How can I decode this string to cyrillic?
Try use:
<META http-equiv="content-type" content="text/html; charset=windows-1251">
but nothing change
PHP Code:
$res1=mysqli_query($dbc, "SELECT * FROM chat ORDER BY id DESC LIMIT 1");
while ($row1=mysqli_fetch_array($res1)) {
$rawArray=array('time' => #date("G:i:s", ($row1['time'] + $plus)), 'kto' => $row1[kto], 'mess' => $row1[mess]);
$encodedArray = array_map(utf8_encode, $rawArray);
echo json_encode($encodedArray);
PHP ver 5.3.19

\uXXXX stands for unicode characters and in unicode 00d3 is Ó and so on. Unicode characters are unambigouos, so the character encoding of the page is ignored for them. You could use the correct unicode (i.e. \u0443 for У) or write your script so that it outputs the real characters in Windows-1251 instead of unicode sequences.
Update
I see from your comment that you fetch this data from MySQL and use json_encode() to output it. json_encode only works for UTF-8 encoded data (and d3 is Ó in UTF-8 as well, this is why you get the wrong unicode sequences).
So, you will have to convert all data from Windows-1251 to UTF-8 before passing it to json_encode, then everything else will work fine.
Converting:
$utf8Array = array_map(function($in) {
return iconv('Windows-1251', 'UTF-8', $in);
}, $rawArray);
utf8_encode will not work because it is only useful for input in ISO-8859-1 encoding.

I had similar problem when storing json datas in MySQL BDD : this solved the problem :
json_encode($json_data, JSON_UNESCAPED_UNICODE) ;

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Strange behavior of iconv() and utf8_decode() php methods - php

Related

How to convert ASCII to ISO-8859-1 in PHP?

utf8 not converting string in PHP

Decode base64 string - php

Detecting the right character encoding in PHP?

Trouble with decode JSON + PHP

Categories

Resources