I have a string like this:
$str = "\xC4";
According to wikipedia the C4 is ISO-8859-1 Hexcode for Ä. Now i want to lowercase this string to get ä (also in ISO-8859-1).
I tried various solutions using strtolower and mb_strtolower. None of them worked. The output was garbled every time.
You can specify the encoding in mb_strtolower(), so just specify it and it all works fine:
echo mb_strtolower($str, "ISO-8859-1");
//^^^^^^^^^^
output:
ä
strtolower("\xC4") works just fine. The thing is that you need to interpret the resulting byte (xE4) using the ISO-8859-1 encoding, otherwise you'll obviously see garbage. If you're doing this in a browser, set the appropriate header to clue the browser in to the expected encoding:
header('Content-Type: text/html; charset=iso-8859-1');
echo strtolower("\xC4");
Related
I have a file that contain Chinese character Like this :
合作伙伴
problem and result looks like this :
ºÏ×÷»ï°é£º
Even if I try to print the content in the browser , I get the same encoding problem.
I m sure it's an encoding problem but I can't fix it.
Chinese character encoding is usually gb2312.
try to gb2312 convert to utf-8
$str = iconv('gb2312', 'utf-8', $str);
make sure your file is utf-8 encoding.
Content-type: text/html; charset=utf-8
Convert character encoding to utf-8 and use that only:
$string = iconv('gb2312', 'utf-8', $string);
I'm trying to save a string in hebrew to file, while having the file ANSI encoded.
All attemps failed I'm afraid.
The PHP file itself is UTF-8.
So here's the code I'm trying :
$to_file = "בדיקה אם נרשם";
$to_file = mb_convert_encoding($to_file, "WINDOWS-1255", "UTF-8");
file_put_contents(dirname(__FILE__) ."/txt/TESTING.txt",$to_file);
This returns false for some reason.
Another attempt was :
$to_file = iconv("UTF-8", "windows-1252", $to_file);
This returns an empty string. while this did not work, Changing the outpout charset to windows-1255 DID work. so the function itself works, But for some reason it does not convert to 1252.
I ran this function before and after the iconv and printed the results
mb_detect_encoding ($to_file);
before the iconv the encoding is UTF-8.
after the iconv the encoding is ASCII(??)
I'd really appreciate any help you can give
Windows-1252 is a Latin encoding; you cannot encode Hebrew characters in Windows-1252. That's why it doesn't work.
Windows-1255 is an encoding for Hebrew, that's why it works.
The reason it doesn't work with mb_convert_encoding is that mb_ doesn't support Windows-1255.
Detecting encodings is by definition impossible. Windows-1255 is a single-byte encoding; it's virtually impossible to distinguish any one single byte encoding from another. The result is just as valid in ASCII as it is in Windows-1255 or Windows-1252 or ISO-8859 or any other single byte encoding.
See What Every Programmer Absolutely, Positively Needs To Know About Encodings And Character Sets To Work With Text for more information.
You can use this:
<?php
$heb = 'טקסט בעברית .. # ';
$utf = preg_replace("/([\xE0-\xFA])/e","chr(215).chr(ord(\${1})-80)",$heb);
echo '<pre>';
print_r($heb);
echo '<pre>';
echo '------';
echo '<pre>';
print_r($utf);
echo '<pre>';
?>
Output will be like this:
���� ������ .. # <-- $heb - what we get when we print hebrew ANSI Windows 1255
טקסט בעברית .. # <- $utf - The Converted ANSI Windows 1255 to now UTF ...:)
Somehow there are two different € chars in UTF-8. A correct one U+20AC and latin-1 supplement U+0080.
Using bin2hex I got hex c280 instead of the correct e282ac. Since the first one is not displayed correctly I would like to convert it.
Officiously I can't use utf8_decode() or utf8_encode(). I tried iconv('Windows-1252', 'UTF-8', $x) but that gives me "€" because in Windows-1252 € is 80.
What is the correct converter for this?
Looks like it does work if I convert from utf8_decode back to Windows-1252 and convert to utf8 again using iconv:
iconv('Windows-1252', 'UTF-8', utf8_decode($x));
I guess the string is originally Windows-1252 and was converted utf8_encode what did not work for most but not all characters.
I am getting output as
FBI believed he had a ‘doomsday device’
instead of
FBI believed he had a ‘doomsday device’
when i am using
iconv("UTF-8", "ISO-8859-1//IGNORE", $topic);
output is
FBI believed he had a âdoomsday deviceâ
I am not using any header or charset in my file.
Update
Got why is this happening
when the UTF-8 series of numbers is interpreted as if it were ISO-8859-1 the output is
’
Explaination
0xE28099 breaks down as 0xE2 (â), 0x80 (€) and 0x99 (™). What was one character in UTF-8 (’) gets mistakenly displayed as three (’) when misinterpreted as ISO-8859-1.
Still no solution to convert it
Well the output page is being interpreted in Windows-1252, not ISO-8859-1..
I recommend setting your header charset to utf-8:
In apache config:
AddDefaultCharset utf-8
Php.ini:
default_charset utf-8
Manually in php:
header("Content-Type: text/html; charset=utf-8");
If you cannot do anything of the above because of some weird reasons, you should then convert into Windows-1252 instead:
iconv("UTF-8", "Windows-1252//IGNORE", $topic);
I actually have a fairly simple question but I'm unable to find an answer anywhere. The PHP function html_entity_decode is supposed to "converts all HTML entities to their applicable characters from string."
So, since Ω is the HTML encoding for the Greek captical letter Omega, I'd expect that echo html_entity_decode('Ω', ENT_COMPAT, 'UTF-8'); would output Ω. But instaid, it outputs some strange characters which my browser can't recongize. Why is this?
Thanks,
Martijn
When you convert entities into UTF-8 characters like your last parameter specifies, your output encoding must be UTF-8 as well. Otherwise, in a single-byte encoding like ISO-8859-1, you will see double-byte characters as two broken single ones.
It's works fine:
http://codepad.viper-7.com/tb2LaW
Make sure your webpage encoding is UTF-8
If you have different encoding on webpage change this:
html_entity_decode('Ω', ENT_COMPAT, 'UTF-8');
^^^^^
header('Content-type: text/html;charset=utf-8');
mysql_set_charset("utf8", $conn);
Refer this URL:-
http://www.phpwact.org/php/i18n/charsets
php mysql character set: storing html of international content