I'm using PHP imap to read emails out of an inbox. It extracts some information from headers. One of the headers looks like this:
X-My-Custom-Header: =?UTF-8?B?RXVnZW4gQmFiacSH?=
The original value of that encoded string is Eugen Babić.
When I try to decode that string using PHP, I can't get it quite right, the ć always comes back messed up.
I've tried imap_utf8, imap_mime_header_decode and a bunch of others I can't quite recall. They either don't return anything at all, or they mess up the ć as I mentioned before.
What is the correct way to decode this?
imap_utf8 and imap_mime_header_decode work just fine; there's also iconv_mime_decode:
php > echo imap_utf8('X-My-Custom-Header: =?UTF-8?B?RXVnZW4gQmFiacSH?='), "\n";
X-My-Custom-Header: Eugen Babić
php > list($k,$v) = imap_mime_header_decode('X-My-Custom-Header: =?UTF-8?B?RXVnZW4gQmFiacSH?=');
php > echo $v->text, "\n";
Eugen Babić
php > echo iconv_mime_decode('X-My-Custom-Header: =?UTF-8?B?RXVnZW4gQmFiacSH?=', 0, "utf8"), "\n";
X-My-Custom-Header: Eugen Babić
It seems that imap_utf8 returns its output in NFD, so that the accent over the c may appear out of place in some settings.
Here's what you're doing wrong: You're HTML (as generated by the PHP) is not UTF-8 encoded. So even though it's returning the accented c, the page isn't displaying it correctly.
To fix it, add this in your <head> tag:
<meta http-equiv='Content-Type' content='text/html; charset=utf-8'>
The function mb_decode_mimeheader() solved the problem
"fromName" => (isset($fromInfo->personal))
? mb_decode_mimeheader( $fromInfo->personal) : "",
Related
I'm parsing html from a website using simplehtmldom_1_5, when i echo the parsed text to the screen it's printed correctly but when i try to save it to a file using file_put_contents i've my string coded to html decimal code :
(b. andersson, 
i've already tried all possible combination of utf8_encode, utf8_decode, htmlentities... but nothing worked, same problem when i try to insert to mysql table.
mb_detect_encoding for the parsed text returns ASCII.
Any suggestions ?
header('Content-Type: text/html; charset=utf-8');
ini_set('max_execution_time', 0);
include 'simplehtmldom_1_5/simple_html_dom.php';
$html = file_get_html($curr_url);
$texts = $html->find('div[id=content_h]');
foreach($texts as $text) {
file_put_contents('queries.txt', $text->innertext . "\n", FILE_APPEND);
}
Did you also try html_entity_decode ( http://de1.php.net/html_entity_decode ) ?
Thats the function converting entities back to clear type text
*edit
I just tested this to verify it's working.
Yes it works, BUT:
your data is incorrect !
Every single entity is missing a semicolon at its end!
Thats why decoding only works in loose browser-render engines...
Your data shall be looking like this:
(b.
and not like this
(b.
See the difference?
Finally this worked for me
preg_replace('/&#(\d+)/me',"chr(\\1)", $text)
So I'm trying to find a fast way to show all my results from my database, but I can't seem to figure out why I need to add the utf8_encode() function to all of my text in order to show all my characters properly.
For the record, my database information is both French and English, so I will need special characters including à, ç, è, é, ê, î, ö, ô, ù (and more).
My form's page has the following tag:
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
My database, all my tables and all my fields are set to utf8_general_ci.
When I want to echo the database information onto the page, I use this query:
public function read_information()
{
global $db;
$query = "SELECT * FROM table WHERE id='1' LIMIT 1";
return $db->select($query);
}
and return the information like so:
$info = $query->read_information();
<?php foreach ( $info as $dbinfo ) { ?>
<pre><?php echo $dbinfo->column; ?></pre>
<?php } ?>
However, if I have French characters in my string, I need to <pre><?php echo utf8_encode($info->column); ?></pre>, and this is something I really want to avoid.
I have read up the documentation on PHP.net regarding utf8_encode/utf8_decode, htmlentities/html_entity_decode and quite a few more. However, I can't seem to figure out why I need to add a special function for every database result.
I have also tried using mysqli_query("SET NAMES 'utf8'", $mysqli); but this doesn't solve my problem. I guess what I'm looking for is some kind of shortcut where I don't have to create a function like make_this_french_friendly() type of thing.
Ensure all the stack you are working with is set to UTF8 from db, web server, page meta etc
checking things like
ini_set('default_charset', 'utf-8')
should output simple stuff then in my experience
As #deceze pointed out, this thread provided proper insight using $mysqli->set_charset('utf8');.
Maybe use UTF-8 without BOM encoding for your file?
header('Content-type: text/html; charset=utf-8');
... in PHP (you can also do it with "ini_set()" function) and:
<meta charset="utf-8">
... in HTML.
You have also to set the right encoding for you database tables.
Possible duplicate of "GET" method encoding French characters incorrectly in PHP
Maybe your text coding is not be UTF-8.
Please look: What's different between UTF-8 and UTF-8 without BOM?
Maybe it can helps you.
I am trying to do so foreign characters is showing correctly at my website.
When I try to write: "Português" it will output this:
Português
The code I use is:
$name = htmlspecialchars(stripslashes($f['forum_name']));
I also tried this:
$name = html_entity_decode(stripslashes(stripslashes($f['forum_desc'])));
But that gave me:
Português
What am I doing wrong?
Edit: $f is coming from this:
$sf=mysql_query("SELECT * FROM forum_cats WHERE forum_type='0' AND forum_type_id='".$h['forum_id']."'");
First, make sure your PHP program file is saved with UTF-8 encoding. (a decent editor should allow you to set the encoding)
Second, make sure that your HTML code specifies UTF-8 encoding: Make sure you have the following meta tag in your HTML head:
<meta charset="UTF-8">
Thirdly, throw away all that entity decoding and especially throw away the stripslashes().
You may also need to do further work to make sure that everything in your system is using UTF-8 encoding (eg the database, other input files).
Make use of utf-8 decode
<?php
echo utf8_decode("Português");//Português
EDIT : (From your latest question update)
Add this on top of your PHP code.
<?php
ini_set('default_charset','utf-8');
mysql_set_charset('utf8');
header('Content-type: text/html; charset=utf-8');
Try this:
<?php echo iconv(mb_detect_encoding($f['forum_name'], "UTF-8,ISO-8859-1"), "UTF-8", $f['forum_name']); ?>
Use mb_detect_encoding() to detect the charset type of your strings and iconv() to convert string to requested character encoding.
You can refer mb_detect_encoding and iconv on official documentation site.
I'm having some troubles with my $_POST/$_REQUEST datas, they appear to be utf8_encoded still.
I am sending conventional ajax post requests, in these conditions:
oXhr.setRequestHeader("Content-type", "application/x-www-form-urlencoded; charset=utf-8");
js file saved under utf8-nobom format
meta-tags in html <header> tag setup
php files saved under utf-8-nobom format as well
encodeURIComponent is used but I tried without and it gives the same result
Ok, so everything is fine: the database is also in utf8, and receives it this way, pages show well.
But when I'm receiving the character "º" for example (through $_REQUEST or $_POST), its binary represention is 11000010 10111010, while "º" hardcoded in php (utf8...) binary representation is 10111010 only.
wtf? I just don't know whether it is a good thing or not... for instance if I use "#º#" as a delimiter of the explode php function, it won't get detected and this is actually the problem which lead me here.
Any help will be as usual greatly appreciated, thank you so much for your time.
Best rgds.
EDIT1: checking against mb_check_encoding
if (mb_check_encoding($_REQUEST[$i], 'UTF-8')) {
raise("$_REQUEST is encoded properly in utf8 at index " . $i);
} else {
raise(false);
}
The encoding got confirmed, I had the message raised up properly.
Single byte utf-8 characters do not have bit 7(the eight bit) set so 10111010 is not utf-8, your file is probably encoded in ISO-8859-1.
i have following code when i run this code on firefox its works fine giving out put as i want when i run this code on Google chrome out put not correct showing any idea?
<?php
$encode=utf8_encode("වවවවවවවවවවව");
$decode=utf8_decode($encode);
print_r($decode);
die;
?>
thanks in advance
roshan
This code makes no sense. utf8_encode() is a function to convert ISO-8859-1 data into UTF-8.
Googling shows that your data is a singhalese character, which isn't part of ISO-8859-1. It is extremely likely that it will be destroyed in the first utf8_encode() call.
I guess the answer in this specific situation is, don't use utf8_encode(). If that doesn't work for you, please provide some more context about what you are doing. Maybe you are looking for iconv()?
Might as well move this to answer section:
Define UTF-8 as your charset in your <head>
<meta charset="utf-8">