I'm needing to convert a UTF-8 character set to Windows-1252 using PHP and i'm not having much luck thus far. My aim is to transfer text to a 3rd party system and exclude any characters not in the Windows-1252 character set.
I've tried both iconv and mb_convert_encoding but both give unexpected results.
$text = 'KØBENHAVN Ø ô& üü þþ';
echo iconv("UTF-8", "WINDOWS-1252", $text);
echo mb_convert_encoding($text, "WINDOWS-1252");
Output for both is 'K?BENHAVN ? ?& ?? ??'
I would not have expected the ?'s as these characters are in the WINDOWS-1252 character set.
Can anyone help cast some light on this for me please.
I ended up running the text from UTF-8 to WINDOWS-1252 and then back from WINDOWS-1252 to UTF-8. This gave the desire output.
$text = "Ѭjanky";
$converted = iconv("UTF-8//IGNORE", "WINDOWS-1252//IGNORE", $text);
$converted = iconv("WINDOWS-1252//IGNORE", "UTF-8//IGNORE", $converted);
echo $text; // outputs "janky"
Related
I have a file that contain Chinese character Like this :
合作伙伴
problem and result looks like this :
ºÏ×÷»ï°é£º
Even if I try to print the content in the browser , I get the same encoding problem.
I m sure it's an encoding problem but I can't fix it.
Chinese character encoding is usually gb2312.
try to gb2312 convert to utf-8
$str = iconv('gb2312', 'utf-8', $str);
make sure your file is utf-8 encoding.
Content-type: text/html; charset=utf-8
Convert character encoding to utf-8 and use that only:
$string = iconv('gb2312', 'utf-8', $string);
Hi I'm having a problem converting special characters to upper case.
With regular strtoupper I get something like DANIëL and when applying mb_strtoupper I get DANI?L.
Here's the code:
mb_strtoupper(rtrim($pieces[1], ","), 'UTF-8')
Mind you, I already have this running on the input:
iconv('UTF-8', 'ISO-8859-1//TRANSLIT', $tr->TD[0])
Could this be the reason? Or is there something else?
Typical issue of trying to uppercasing a Latin1 when the converter expect UTF-8
Be sure to check your string source. This sample will works if your text editor works in Latin1 pagecode, and not in UTF-8
$str = "daniël"; //or your rtrim($pieces[1],",")
$str = mb_convert_encoding($str,'UTF-8','Latin1');
echo mb_strtoupper($str, 'UTF-8');
//will echo DANIËL
My boss is forcing me to use an access mdb database (yes, I'm serious) in a php server.
I can connect it and retrieve data from it, but as you could imagine, I have problems with encodings because I want to work using utf8.
The thing is that now I have two "solutions" to translate Windows-1252 to UTF-8
This is the first way:
mb_convert_encoding($string, "UTF-8", "Windows-1252").
It works, but the problem is that special chars are not properly converted, for example char º is converted to \u00ba and char Ó is converted to \u00d3.
My second way is doing this:
mb_convert_encoding(mb_convert_encoding($string, "UTF-8", "Windows-1252"), "HTML-ENTITIES", "UTF-8")
It works too, but it happens the same, special chars are not correctly converted. Char º is converted to º
Does anybody know how to properly change encoding including special chars?
Or does anybody know how to convert from º and \u00ba to something readable?
I did simple test to convert codepoint to letters
<?php
function codepoint_decode($str) {
return json_decode(sprintf('"%s"', $str));
}
$string_with_codepoint = "Ahed \u00d3\u00ba\u00d3";
// $string_with_codepoint = mb_convert_encoding($string, "UTF-8", "Windows-1252");
$output = codepoint_decode($string_with_codepoint);
echo $output; // Ahed ÓºÓ
Credit go for this answer
I finally found the solution.
I had the solution from the beginning but I was doing my tests wrong.
My bad.
The right way to do it for me is mb_convert_encoding($string, "UTF-8", "Windows-1252")
But i was checking the result like this:
$stringUTF8 = mb_convert_encoding($string, "UTF-8", "Windows-1252");
echo json_encode($stringUTF8);
that's why it was returning unicode chars like \u20ac, if I would have done:
$stringUTF8 = mb_convert_encoding($string, "UTF-8", "Windows-1252");
echo $stringUTF8;
I should have seen the solution from the beginning but I was wrong. It was json_encode() what was turning special chars into unicode chars.
Thanks everybody for your help!!
I have source string (received from mail body)
=C7=E4=F0=E0=E2=F1=F2=E2=F3=E9=F2=E5
Online decoder says it Windows-1251 encoding and successfully convert it to UTF-8. mb_detect_encoding says it ASCII
I need to convert via PHP. I tried mb_convert_encoding and iconv, solution from stackoverflow (for example and one more) and many others. But there is no result. Source string is not changed.
Maybe you know working solution? Thank you.
Yes you could try apply iconv() in this case:
header('Content-Type: text/html; charset=utf-8');
$string = '=C7=E4=F0=E0=E2=F1=F2=E2=F3=E9=F2=E5';
$string = str_replace('=', '%', $string);
$string = rawurldecode($string);
$string = iconv('Windows-1251', 'UTF-8', $string);
echo $string; // Здравствуйте
I am using $encoding = 'utf-8'; in gettext and in my html code i have set <meta charset="utf-8">. I have also set utf-8 in my .po files, but I still get � when I write æøå! What can be wrong?
Let's see how the values you mention are at the byte level.
I copied the æøå from your question and � from your title. The reason for � is that I had to use a Windows console application to fetch the title of your question and its codepage was Windows 1252 (copying from the browser gave me Unicode Character 'REPLACEMENT CHARACTER' (U+FFFD)).
In a script encoded in UTF-8, this gives:
<?php
$s = 'æøå';
$s2 = '�';
echo "s iso-8859-1 ", #reset(unpack("H*", mb_convert_encoding($s, "ISO-8859-1", "UTF-8"))), "\n";
echo "s2 win-1252 ", #reset(unpack("H*", mb_convert_encoding($s, "WINDOWS-1252", "UTF-8"))), "\n";
s iso-8859-1 e6f8e5
s2 win-1252 e6f8e5
So the byte representation matches. The problem here is that when you write æøå either:
You're writing it in ISO-8859-1, instead of UTF-8. Check your text editor.
The value is being converted from UTF-8 to ISO-8859-1 (unlikely)
You need to set this
bind_textdomain_codeset($domain, "UTF-8");
Otherwise you will get the � character