Error with chinese encoding with php - php

I have a file that contain Chinese character Like this :
合作伙伴
problem and result looks like this :
ºÏ×÷»ï°é£º
Even if I try to print the content in the browser , I get the same encoding problem.
I m sure it's an encoding problem but I can't fix it.

Chinese character encoding is usually gb2312.
try to gb2312 convert to utf-8
$str = iconv('gb2312', 'utf-8', $str);
make sure your file is utf-8 encoding.
Content-type: text/html; charset=utf-8

Convert character encoding to utf-8 and use that only:
$string = iconv('gb2312', 'utf-8', $string);

Related

Convert iso-8859-1 hex escape sequence to lowercase

I have a string like this:
$str = "\xC4";
According to wikipedia the C4 is ISO-8859-1 Hexcode for Ä. Now i want to lowercase this string to get ä (also in ISO-8859-1).
I tried various solutions using strtolower and mb_strtolower. None of them worked. The output was garbled every time.
You can specify the encoding in mb_strtolower(), so just specify it and it all works fine:
echo mb_strtolower($str, "ISO-8859-1");
//^^^^^^^^^^
output:
ä
strtolower("\xC4") works just fine. The thing is that you need to interpret the resulting byte (xE4) using the ISO-8859-1 encoding, otherwise you'll obviously see garbage. If you're doing this in a browser, set the appropriate header to clue the browser in to the expected encoding:
header('Content-Type: text/html; charset=iso-8859-1');
echo strtolower("\xC4");

Convert UTF-8 to WINDOWS-1258 using PHP

I'm needing to convert a UTF-8 character set to Windows-1252 using PHP and i'm not having much luck thus far. My aim is to transfer text to a 3rd party system and exclude any characters not in the Windows-1252 character set.
I've tried both iconv and mb_convert_encoding but both give unexpected results.
$text = 'KØBENHAVN Ø ô& üü þþ';
echo iconv("UTF-8", "WINDOWS-1252", $text);
echo mb_convert_encoding($text, "WINDOWS-1252");
Output for both is 'K?BENHAVN ? ?& ?? ??'
I would not have expected the ?'s as these characters are in the WINDOWS-1252 character set.
Can anyone help cast some light on this for me please.
I ended up running the text from UTF-8 to WINDOWS-1252 and then back from WINDOWS-1252 to UTF-8. This gave the desire output.
$text = "Ѭjanky";
$converted = iconv("UTF-8//IGNORE", "WINDOWS-1252//IGNORE", $text);
$converted = iconv("WINDOWS-1252//IGNORE", "UTF-8//IGNORE", $converted);
echo $text; // outputs "janky"

convert UTF-8 to ANSI (windows-1252)

I'm trying to save a string in hebrew to file, while having the file ANSI encoded.
All attemps failed I'm afraid.
The PHP file itself is UTF-8.
So here's the code I'm trying :
$to_file = "בדיקה אם נרשם";
$to_file = mb_convert_encoding($to_file, "WINDOWS-1255", "UTF-8");
file_put_contents(dirname(__FILE__) ."/txt/TESTING.txt",$to_file);
This returns false for some reason.
Another attempt was :
$to_file = iconv("UTF-8", "windows-1252", $to_file);
This returns an empty string. while this did not work, Changing the outpout charset to windows-1255 DID work. so the function itself works, But for some reason it does not convert to 1252.
I ran this function before and after the iconv and printed the results
mb_detect_encoding ($to_file);
before the iconv the encoding is UTF-8.
after the iconv the encoding is ASCII(??)
I'd really appreciate any help you can give
Windows-1252 is a Latin encoding; you cannot encode Hebrew characters in Windows-1252. That's why it doesn't work.
Windows-1255 is an encoding for Hebrew, that's why it works.
The reason it doesn't work with mb_convert_encoding is that mb_ doesn't support Windows-1255.
Detecting encodings is by definition impossible. Windows-1255 is a single-byte encoding; it's virtually impossible to distinguish any one single byte encoding from another. The result is just as valid in ASCII as it is in Windows-1255 or Windows-1252 or ISO-8859 or any other single byte encoding.
See What Every Programmer Absolutely, Positively Needs To Know About Encodings And Character Sets To Work With Text for more information.
You can use this:
<?php
$heb = 'טקסט בעברית .. # ';
$utf = preg_replace("/([\xE0-\xFA])/e","chr(215).chr(ord(\${1})-80)",$heb);
echo '<pre>';
print_r($heb);
echo '<pre>';
echo '------';
echo '<pre>';
print_r($utf);
echo '<pre>';
?>
Output will be like this:
���� ������ .. # <-- $heb - what we get when we print hebrew ANSI Windows 1255
טקסט בעברית .. # <- $utf - The Converted ANSI Windows 1255 to now UTF ...:)

How can I remove special characters in a PHP string?

I am getting output as
FBI believed he had a ‘doomsday device’
instead of
FBI believed he had a ‘doomsday device’
when i am using
iconv("UTF-8", "ISO-8859-1//IGNORE", $topic);
output is
FBI believed he had a âdoomsday deviceâ
I am not using any header or charset in my file.
Update
Got why is this happening
when the UTF-8 series of numbers is interpreted as if it were ISO-8859-1 the output is
’
Explaination
0xE28099 breaks down as 0xE2 (â), 0x80 (€) and 0x99 (™). What was one character in UTF-8 (’) gets mistakenly displayed as three (’) when misinterpreted as ISO-8859-1.
Still no solution to convert it
Well the output page is being interpreted in Windows-1252, not ISO-8859-1..
I recommend setting your header charset to utf-8:
In apache config:
AddDefaultCharset utf-8
Php.ini:
default_charset utf-8
Manually in php:
header("Content-Type: text/html; charset=utf-8");
If you cannot do anything of the above because of some weird reasons, you should then convert into Windows-1252 instead:
iconv("UTF-8", "Windows-1252//IGNORE", $topic);

file_put_contents encoding used on web servers?

I am trying to use file_put_contents (and file_get_contents for that matter) with a UTF-8 ¥ following this stackoverflow post: How to write file in UTF-8 format? which uses:
$data = mb_convert_encoding($data, 'UTF-8', 'OLD-ENCODING');
Which wasn't really explained well, since it produces an error of:
mb_convert_encoding(): Illegal character encoding specified
So 'OLD-ENCODING' was just a placeholder they were using.
The question I have is what encoding should I change this to? ASCII or ISO-8859-1? What encoding do most web hosts use? Does it matter?
When I open the file, I will get the symbol correctly, only if I have my notepad set with encoding UTF-8. If I open it with another character set it will show up with a "?".
Try without third parameter.
$str = mb_convert_encoding($str, "UTF-8");
Or auto:
$str = mb_convert_encoding($str, "UTF-8", "auto");
More info and examples on:
http://php.net/manual/function.mb-convert-encoding.php
mb_convert_encoding($data, 'UTF-8', mb_detect_encoding($data));
mb_detect_encoding

Categories