PHP decode GB2312 - php

I'm working on an IMAP email script and I have some lines coded in GB2312 (which I assume is Chinese encoding), looks like this =?GB2312?B?foobarbazetc
How can I start working with this string? I checked mb_list_encodings() and this one is not listed.

If you have the base64-decoded data, then use mbstring or iconv. If you have the raw header, then mbstring.
<?php
$t = "\xc4\xe3\xba\xc3\n";
echo iconv('GB2312', 'UTF-8', $t);
echo mb_convert_encoding($t, 'UTF-8', 'GB2312');
mb_internal_encoding('UTF-8');
echo mb_decode_mimeheader("=?gb2312?b?xOO6ww==?=");
?>

Ignacio solved the meat of the problem with mb_decode_mimeheader() but for future reference these links are also helpful:
http://developer.loftdigital.com/blog/php-utf-8-cheatsheet
http://www.herongyang.com/PHP-Chinese/PHP-UTF-8-Chinese-String-Literals.html
The specific header string I was working with:
$subject = "=?GB2312?B?tPC4tDogUXVvdGF0aW9uIFBJSSBwcm9kdWN0cyA=?= =?GB2312?B?Rk9CIFNoYW5naGFpIG9yIE5pbmdibyBwb3J0?="
This required a page header of
<meta http-equiv="content-type" content="text/html; charset=utf-8" />
and PHP
mb_internal_encoding('utf-8');
echo mb_decode_mimeheader($subject)."<br />";
to output
主题: Quotation PII products FOB Shanghai or Ningbo port

Related

Converting t’to apostrophe in php [duplicate]

I've tried converting the text to or from utf8, which didn't seem to help.
I'm getting:
"It’s Getting the Best of Me"
It should be:
"It’s Getting the Best of Me"
I'm getting this data from this url.
To convert to HTML entities:
<?php
echo mb_convert_encoding(
file_get_contents('http://www.tvrage.com/quickinfo.php?show=Surviver&ep=20x02&exact=0'),
"HTML-ENTITIES",
"UTF-8"
);
?>
See docs for mb_convert_encoding for more encoding options.
Make sure your html header specifies utf8
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
That usually does the trick for me (obviously if the content IS utf8).
You don't need to convert to html entities if you set the content-type.
Your content is fine; the problem is with the headers the server is sending:
Connection:Keep-Alive
Content-Length:502
Content-Type:text/html
Date:Thu, 18 Feb 2010 20:45:32 GMT
Keep-Alive:timeout=1, max=25
Server:Apache/2.2.8 (Ubuntu) PHP/5.2.4-2ubuntu5.7 with Suhosin-Patch
X-Powered-By:PHP/5.2.4-2ubuntu5.7
Content-Type should be set to Content-type: text/plain; charset=utf-8, because this page is not HTML and uses the utf-8 encoding. Chromium on Mac guesses ISO-8859-1 and displays the characters you're describing.
If you are not in control of the site, specify the encoding as UTF-8 to whatever function you use to retrieve the content. I'm not familiar enough with PHP to know how exactly.
I know the question was answered but setting meta tag didn't help in my case and selected answer was not clear enough, so I wanted to provide simpler answer.
So to keep it simple, store string into a variable and process that like this
$TVrageGiberish = "It’s Getting the Best of Me";
$notGiberish = mb_convert_encoding($TVrageGiberish, "HTML-ENTITIES", 'UTF-8');
echo $notGiberish;
Which should return what you wanted It’s Getting the Best of Me
If you are parsing something, you can perform conversion while assigning values to a variable like this, where $TVrage is array with all the values, XML in this example from a feed that has tag "Title" which may contain special characters such as ‘ or ’.
$cleanedTitle = mb_convert_encoding($TVrage->title, "HTML-ENTITIES", 'UTF-8');
If you're here because you're experiencing issues with junk characters in your WordPress site, try this:
Open wp-config.php
Comment out define('DB_CHARSET', 'utf8') and define('DB_COLLATE', '')
/** MySQL hostname */
define('DB_HOST', 'localhost');
/** Database Charset to use in creating database tables. */
//define('DB_CHARSET', 'utf8');
/** The Database Collate type. Don't change this if in doubt. */
//define('DB_COLLATE', '');
It sounds like you're using standard string functions on a UTF8 characters (’) that doesn't exist in ISO 8859-1. Check that you are using Unicode compatible PHP settings and functions. See also the multibyte string functions.
We had success going the other direction using this:
mb_convert_encoding($text, "HTML-ENTITIES", "ISO-8859-1");
Just try this
if $text contains strange charaters do this:
$mytext = mb_convert_encoding($text, "HTML-ENTITIES", 'UTF-8');
and you are done..
if all seems not to work, this could be your best solution.
<?php
$content="It’s Getting the Best of Me";
$content = str_replace("’", "'", $content);
echo $content;
?>
==or==
<?php
$content="It’s Getting the Best of Me";
$content = str_replace("’", "'", $content);
echo $content;
?>
try this :
html_entity_decode(mb_convert_encoding(stripslashes($text), "HTML-ENTITIES", 'UTF-8'))
For fopen and file_put_contents, this will work:
str_replace("’", "'", htmlspecialchars_decode(mb_convert_encoding($string_to_be_fixed, "HTML-ENTITIES", "UTF-8")));
You Should check encode encoding origin then try to convert to correct encode type.
In my case, I read csv files then import to db. Some files displays well some not. I check encoding and see that file with encoding ASCII displays well, other file with UTF-8 is broken. So I use following code to convert encoding:
if(mb_detect_encoding($content) == 'UTF-8') {
$content = iconv("UTF-8", "ASCII//TRANSLIT", $content);
file_put_contents($file_path, $content);
} else {
$content = mb_convert_encoding($content, 'UTF-8', 'UTF-8');
file_put_contents($file_path, $content);
}
After convert I push the content to file then process import to DB, now it displays well in front-end
If none of the above solutions work:
In my case I noticed that the single quote was a different style of single quote. Instead of ' my data had a ’. Notice the difference in the single quote? So I simply wrote a str_replace to replace it and it fixed the problem. Probably not the most elegant solution but it got the job done.
$string= str_replace("’","'",$string);
I looked at the link, and it looks like UTF-8 to me. i.e., in Firefox, if you pick View, Character Encoding, UTF-8, it will appear correctly.
So, you just need to figure out how to get your PHP code to process that as UTF-8. Good luck!
use this
<meta http-equiv="Content-Type" content="text/html; charset=utf8_unicode_ci" />
instead of this
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
If nothing works try this mb_convert_encoding($elem->textContent, 'UTF-8', 'utf8mb4');

get file name with diacritic PHP

there are a lot of topics about diacritics/accents in PHP but none of them solved my problem.
I have this code:
<!DOCTYPE html>
<html lang="sk">
<head>
<meta http-equiv="content-type" content="text/html; charset=UTF-8" />
</head>
<body>
<?php
$items = scandir("test/");
echo $items[3];
?>
</body>
</html>
$items[3] is ľšá.png but it displays: ğšá.png
I tried:
foreach(mb_list_encodings() as $chr){
echo mb_convert_encoding($items[3], 'UTF-8', $chr) ." : ".$chr."<br>";
}
But none of them is right for me.
I also tried to put this before scandir():
mb_internal_encoding('UTF-8');
mb_http_output('UTF-8');
ini_set('default_charset', 'utf-8');
But no change.
It is very strange because my website have always been working before I saw the issue (today) and I did not affect any code.
You tried to convert from 1-byte encodings to UTF-8 (double-byte), but that wrong file name that you see has double characters in it, so its already UTF-8!
You need to convert it from UTF-8, and for me it worked like this:
mb_convert_encoding($items[3], "ISO-8859-15", 'UTF-8'); // its to ISO from UTF-8
Personally I use iconv
echo iconv("UTF-8","ISO-8859-15",$items[3]); // its from UTF-8 to ISO
but i think its no big difference if either of them actually works.
Also I suggest you to check file names on your webserver if they accidentally has been converted when uploaded.

Special Characters In Php and MySQL

I've got a problem with some specials characters in PHP. I have a table in mysql (utf8_hungarian_ci) that contains some text with special characters like á, á, Ó, Ö, ö, ü, and I would like to show this text on my page. I've tested:
$text = htmlentities($text); //to convert the simple spec chars
$search = array("& otilde;","&O tilde;","& ucirc;","&U circ;");
$replace = array("& #337;","& #336;","& #369;","& #368;");
$text = str_replace($search, $replace, $text);
echo $text;
But this code works only if $text isn't set from database. If I use this code and my $text is selected from database, it doesn't shows me any text, and if I only use:
echo $text; without htmlentities and replacements
I get characters like this one: �
I know there were some questions about this and I have tried accepted answers, but it still doesn't work, so please help me if you want and if you have time. Thank you anyway. A good day to you all!
Also try setting in your header to use UTF-8 encoding.
In your PHP file, add
header('Content-type: text/html; charset=utf-8');
as well as specifying the encoding to be UTF-8 in your <meta> tag, to ensure that you told the browser. And see if it fixes the issue.
As well as including UTF-8 encoding in your meta tag.
<head>
<meta http-equiv="Content-type" content="text/html; charset=utf-8" />
...
</head>
Edit:
If you have access to Apache configuration, see if AddDefaultCharset is set to another encoding.
Try using mysql_set_charset() (mysqli_set_charset() if you're using MySQLi).
Try to put this in you html header:
<meta http-equiv="Content-type" content="text/html; charset=utf-8" />
(Also, you may need to save your file in "utf-8" file encoding)
.
Secondly, you could use this to try to tranlate-or-remove the disturbing char that always prints out in your case:
$str_out = #iconv("ISO-8859-1", "UTF-8//TRANSLIT//IGNORE", $str_in);
This is a slightly generic answer but please read up this article I wrote on common character-encoding pitfalls in the PHP/MySQL stack and if you still have problems let's try to work through them.
http://webmonkeyuk.wordpress.com/2011/04/23/how-to-avoid-character-encoding-problems-in-php/

mbfpdf does not work

I try to make a pdf file which has some japanese character. However, the output file is some strange character. I use mbfpdf instead of fpdf.
<?php
define('FPDF_FONTPATH','fpdf/font/');
require('fpdf/mbfpdf.php');
$pdf=& new MBFPDF('P','mm','A4');
$pdf->AddMBFont(GOTHIC ,'EUC-JP');
$pdf->AddPage();
$pdf->SetFont(GOTHIC,'',20);
$pdf->Write(20,'日本語');
$pdf->Output('test.pdf');
?>
You can convert to ISO-8859-1 with utf8_decode() (some inaccuracy):
$str = utf8_decode($str);
or if iconv extension is available (preferred):
$str = iconv('UTF-8', 'windows-1252', $str);
Add the below line inside the head tags
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
If you are getting garbage texts after executing mysql queries, execute both the below queries first.
SET NAMES utf8
SET CHARACTER SET utf8

Getting ’ instead of an apostrophe(') in PHP

I've tried converting the text to or from utf8, which didn't seem to help.
I'm getting:
"It’s Getting the Best of Me"
It should be:
"It’s Getting the Best of Me"
I'm getting this data from this url.
To convert to HTML entities:
<?php
echo mb_convert_encoding(
file_get_contents('http://www.tvrage.com/quickinfo.php?show=Surviver&ep=20x02&exact=0'),
"HTML-ENTITIES",
"UTF-8"
);
?>
See docs for mb_convert_encoding for more encoding options.
Make sure your html header specifies utf8
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
That usually does the trick for me (obviously if the content IS utf8).
You don't need to convert to html entities if you set the content-type.
Your content is fine; the problem is with the headers the server is sending:
Connection:Keep-Alive
Content-Length:502
Content-Type:text/html
Date:Thu, 18 Feb 2010 20:45:32 GMT
Keep-Alive:timeout=1, max=25
Server:Apache/2.2.8 (Ubuntu) PHP/5.2.4-2ubuntu5.7 with Suhosin-Patch
X-Powered-By:PHP/5.2.4-2ubuntu5.7
Content-Type should be set to Content-type: text/plain; charset=utf-8, because this page is not HTML and uses the utf-8 encoding. Chromium on Mac guesses ISO-8859-1 and displays the characters you're describing.
If you are not in control of the site, specify the encoding as UTF-8 to whatever function you use to retrieve the content. I'm not familiar enough with PHP to know how exactly.
I know the question was answered but setting meta tag didn't help in my case and selected answer was not clear enough, so I wanted to provide simpler answer.
So to keep it simple, store string into a variable and process that like this
$TVrageGiberish = "It’s Getting the Best of Me";
$notGiberish = mb_convert_encoding($TVrageGiberish, "HTML-ENTITIES", 'UTF-8');
echo $notGiberish;
Which should return what you wanted It’s Getting the Best of Me
If you are parsing something, you can perform conversion while assigning values to a variable like this, where $TVrage is array with all the values, XML in this example from a feed that has tag "Title" which may contain special characters such as ‘ or ’.
$cleanedTitle = mb_convert_encoding($TVrage->title, "HTML-ENTITIES", 'UTF-8');
If you're here because you're experiencing issues with junk characters in your WordPress site, try this:
Open wp-config.php
Comment out define('DB_CHARSET', 'utf8') and define('DB_COLLATE', '')
/** MySQL hostname */
define('DB_HOST', 'localhost');
/** Database Charset to use in creating database tables. */
//define('DB_CHARSET', 'utf8');
/** The Database Collate type. Don't change this if in doubt. */
//define('DB_COLLATE', '');
It sounds like you're using standard string functions on a UTF8 characters (’) that doesn't exist in ISO 8859-1. Check that you are using Unicode compatible PHP settings and functions. See also the multibyte string functions.
We had success going the other direction using this:
mb_convert_encoding($text, "HTML-ENTITIES", "ISO-8859-1");
Just try this
if $text contains strange charaters do this:
$mytext = mb_convert_encoding($text, "HTML-ENTITIES", 'UTF-8');
and you are done..
if all seems not to work, this could be your best solution.
<?php
$content="It’s Getting the Best of Me";
$content = str_replace("’", "'", $content);
echo $content;
?>
==or==
<?php
$content="It’s Getting the Best of Me";
$content = str_replace("’", "'", $content);
echo $content;
?>
try this :
html_entity_decode(mb_convert_encoding(stripslashes($text), "HTML-ENTITIES", 'UTF-8'))
For fopen and file_put_contents, this will work:
str_replace("’", "'", htmlspecialchars_decode(mb_convert_encoding($string_to_be_fixed, "HTML-ENTITIES", "UTF-8")));
You Should check encode encoding origin then try to convert to correct encode type.
In my case, I read csv files then import to db. Some files displays well some not. I check encoding and see that file with encoding ASCII displays well, other file with UTF-8 is broken. So I use following code to convert encoding:
if(mb_detect_encoding($content) == 'UTF-8') {
$content = iconv("UTF-8", "ASCII//TRANSLIT", $content);
file_put_contents($file_path, $content);
} else {
$content = mb_convert_encoding($content, 'UTF-8', 'UTF-8');
file_put_contents($file_path, $content);
}
After convert I push the content to file then process import to DB, now it displays well in front-end
If none of the above solutions work:
In my case I noticed that the single quote was a different style of single quote. Instead of ' my data had a ’. Notice the difference in the single quote? So I simply wrote a str_replace to replace it and it fixed the problem. Probably not the most elegant solution but it got the job done.
$string= str_replace("’","'",$string);
I looked at the link, and it looks like UTF-8 to me. i.e., in Firefox, if you pick View, Character Encoding, UTF-8, it will appear correctly.
So, you just need to figure out how to get your PHP code to process that as UTF-8. Good luck!
use this
<meta http-equiv="Content-Type" content="text/html; charset=utf8_unicode_ci" />
instead of this
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
If nothing works try this mb_convert_encoding($elem->textContent, 'UTF-8', 'utf8mb4');

Categories