PHP imap how to decode email body correctly? - php

I've developed an email system on my own using php's imap and everything works fine, except for emails that are written in Arabic language, i've tried all the decoding functions and nothing seems to work. I got the subject to work perfectly by using imap_utf8 but not the email body.
This is what the email body looks like:
�
رحبا
هاي الرسالة �
This is my code:
$text = imap_fetchbody($imap, $uid, $partNumber, FT_UID);
switch ($structure->encoding) {
case 3:
return imap_base64($text);
case 4:
return imap_qprint($text);
default:
return $text;
}
If anyone can help with this issue. Thanks

I suggest you have a look at this: https://github.com/mantisbt-plugins/EmailReporting/blob/master/core/Mail/Parser.php
It uses the stuff you've used as well but it adds character encoding on top of it.
Subject character encoding can happen inline. For email bodies its one character set for the entire body
The script given uses a pear package, not the IMAP extension but based on your input it should be pretty equal
Hope this helps

Try to use quoted-printable-decode() function as stated here

Related

Convert and separate emoji from strings

I have been searching for some time for what I am trying to accomplish, and I am not an expert in this emoji 'stuff' and I need some help
I have an application which has multiple SMS service providers attached (2 of them at the moment), which sends SMS messages to (inbound), and we send SMS (outbound) all via API.
When an SMS message is received (from either provider), our platform sees the incoming text (with emojis) as:
"hello \ud83e\udd2a"
I have already changed my database to store the emojis, and I changed my charset in the PHP application to display them correctly in HTML, including the email forward, so I am all good there
The issue I am running into is that 1 of the providers (when sending) will accept this as a vaild emoji:
"hello to you \ud83e\udd2a"
But the other will not. The second provider needs (i think) HTML Dec format, so when I sent to them, it needs to look like this:
"hello to you &#128512 ;"
I have 2 separate php functions to send to each provider, so I can do any conversion code from the front-app I need to before it sends it to the provider.
My front-end is using a jQuery emoji picker, so the PHP form post sends "hello to you \ud83e\udd2a" to the php function that calls the API.
Any insight you can give will be greatly appreciated!
thanks in advance.
$utf32 = mb_convert_encoding($message['text'], 'UTF-32', 'UTF-8' );
$hex4 = bin2hex($utf32);
$dec = hexdec($hex4);
$emoji_replaced = "&#$dec;";
echo "&#$dec;";
this was giving me a dec value of the whole string
As a start, when the JSON document is decoded the \ud83e\udd2a will be decoded to the standard 🤪 emoji. These escapes are part of the JSON spec.
As for the provider that "doesn't support" emojis, they likely using a text encoding other than UTF-8 that will fail to display any "special" character or non-english text, not just emojis. This is further evidenced by the fact that they can still render the emoji when you send it as an HTML entity.
While you can use the below code to convert emojis to HTML entity sequences, you should also be checking that other providers docs as to the encoding they use, and converting the messages appropriately. That said, non-UTF-8 encodings simply do not have emojis at all, so you may still want to convert to entities first.
As their name implies, HTML entities will only work when viewed as part of an HTML document. YMMV.
<?php
function emoji_to_entity($emoji) {
return sprintf(
'&#%s;',
unpack(
'Ntgt',
mb_convert_encoding($emoji, 'UTF-32', 'UTF-8')
)['tgt']
);
}
// PHP>=8.2 supports the Extended_Pictographic property
//$emoji_regex = '/\p{Extended_Pictographic}/u';
// PHP<8.2 needs to use a monstrous regex like this
// culled from: https://raw.githubusercontent.com/PCRE2Project/pcre2/master/maint/Unicode.tables/emoji-data.txt
$emoji_regex = '/[\x{23}-\x{23}\x{2a}-\x{2a}\x{30}-\x{39}\x{a9}-\x{a9}\x{ae}-\x{ae}\x{200d}-\x{200d}\x{203c}-\x{203c}\x{2049}-\x{2049}\x{20e3}-\x{20e3}\x{2122}-\x{2122}\x{2139}-\x{2139}\x{2194}-\x{2199}\x{21a9}-\x{21aa}\x{231a}-\x{231b}\x{2328}-\x{2328}\x{2388}-\x{2388}\x{23cf}-\x{23cf}\x{23e9}-\x{23f3}\x{23f8}-\x{23fa}\x{24c2}-\x{24c2}\x{25aa}-\x{25ab}\x{25b6}-\x{25b6}\x{25c0}-\x{25c0}\x{25fb}-\x{25fe}\x{2600}-\x{2605}\x{2607}-\x{2612}\x{2614}-\x{2685}\x{2690}-\x{2705}\x{2708}-\x{2712}\x{2714}-\x{2714}\x{2716}-\x{2716}\x{271d}-\x{271d}\x{2721}-\x{2721}\x{2728}-\x{2728}\x{2733}-\x{2734}\x{2744}-\x{2744}\x{2747}-\x{2747}\x{274c}-\x{274c}\x{274e}-\x{274e}\x{2753}-\x{2755}\x{2757}-\x{2757}\x{2763}-\x{2767}\x{2795}-\x{2797}\x{27a1}-\x{27a1}\x{27b0}-\x{27b0}\x{27bf}-\x{27bf}\x{2934}-\x{2935}\x{2b05}-\x{2b07}\x{2b1b}-\x{2b1c}\x{2b50}-\x{2b50}\x{2b55}-\x{2b55}\x{3030}-\x{3030}\x{303d}-\x{303d}\x{3297}-\x{3297}\x{3299}-\x{3299}\x{fe0f}-\x{fe0f}\x{1f000}-\x{1f0ff}\x{1f10d}-\x{1f10f}\x{1f12f}-\x{1f12f}\x{1f16c}-\x{1f171}\x{1f17e}-\x{1f17f}\x{1f18e}-\x{1f18e}\x{1f191}-\x{1f19a}\x{1f1ad}-\x{1f1ff}\x{1f201}-\x{1f20f}\x{1f21a}-\x{1f21a}\x{1f22f}-\x{1f22f}\x{1f232}-\x{1f23a}\x{1f23c}-\x{1f23f}\x{1f249}-\x{1f53d}\x{1f546}-\x{1f64f}\x{1f680}-\x{1f6ff}\x{1f774}-\x{1f77f}\x{1f7d5}-\x{1f7ff}\x{1f80c}-\x{1f80f}\x{1f848}-\x{1f84f}\x{1f85a}-\x{1f85f}\x{1f888}-\x{1f88f}\x{1f8ae}-\x{1f8ff}\x{1f90c}-\x{1f93a}\x{1f93c}-\x{1f945}\x{1f947}-\x{1faff}\x{1fc00}-\x{1fffd}\x{e0020}-\x{e007f}]/u';
$input = json_decode('"hello to you \ud83e\udd2a"');
var_dump(
$input,
preg_replace_callback(
$emoji_regex,
function($a) { return emoji_to_entity($a[0]); },
$input
)
);
Output:
string(17) "hello to you 🤪"
string(22) "hello to you 🤪"
And outside of a code block: string(22) "hello to you 🤪"

phpmailer subject from variable creates encoding issue

I am using phpmailer to send emails using php.
When i try to send special (Turkish) character within the subject, it displays html entity in the sent email. If I include the same variable in the body part, it works fine. Please see below:
$mail->Subject = $stuname."PhD Qualifying Exam Application";
I have tried html_entity_decode function but didnt work.
Also, if I type the Turkish character instead of getting from a variable, it works fine.
Finally, if I print the variable before sending the email, it prints fine without any encoding problem. But number of character is larger than it should be..
So, any idea why I am having encoding problem in subject are when getting the value from a variable?
Thank you!
PS:
i am also adding these headers:
$mail->SetLanguage("tr", "phpmailer/language");
$mail->CharSet ="utf-8";
$mail->Encoding="base64";
I found a solution, none of the html decoding functions were working so I wrote my own function for Turkish characters.
function replacehtml($inputText) {
$replace = array('İ','ı','Ö','ö','Ü','ü','Ç','ç','Ğ','ğ','Ş','ş');
$search = array('İ','ı','Ö','ö','Ü','ü','Ç','ç','Ğ','ğ','Ş','ş');
$outputText=str_replace($search, $replace, $inputText);
return $outputText;
}

IMAP Encoding and decoding: UTF-8 issue

I have a function which is meant to move a mail from one folder to another on a gmail account.
The function is fully functional when it comes to moving the mail. Tho my problem appears when
working with utf-8 encoded mailboxes. I decode the IMAP folder list response
but the dump of both values gives different results.
// Getting the folders
$folders = imap_list(CONNECTION, MAILBOX, PATTERN);
// After a foreach, stripping slash, prefix and such
// $folder is the raw mailbox name from the IMAP list
$mailbox = utf8_encode(imap_utf7_decode($folder)); // = string(12) "Tæstbåks"
// The entered search from the client
$search_for = "Tæstbåks"; // = string(10) "Tæstbåks"
if($search_for == $mailbox)
print "Yeah!";
else
print "Noo!";
I do not know why those two strings do not match, that is my problem.
PHP's function imap_utf7_decode($folder) is documented to return a string in ISO-8859-1 encoding. Given that IMAP's modified UTF-7 scheme can encode the whole range of Unicode (which means "a lot") and that ISO-8859-1 can only represent 256 individual characters, you cannot possibly use that function in this context. I would go as far as to suggest that the PHP developer who decided to offer such a useless function was not in his best shape the day he designed it.
It looks like the mbstring extension can do what you really want to do here -- use something like $mailbox = mb_convert_encoding($folder, "UTF-8", "UTF7-IMAP"), as suggested in the comments below the PHP's docs.

Is there a way to change the encoding of the headers in SwiftMailer?

I'm using SwiftMailer to send emails but I have some codification problems with UTF-8 subjects. Swiftmailer uses QPHeaderEncoder as default to encode email headers and the safeMap looks like it has some problems with some UTF-8 French characters. One subject I use contains the word trouvé (found in French) and when the subject gets to the user it shows trouv.
I'd like to use something similar to the NativeQPContentEncoder that's available as content encoders but for headers there's only Base64 and Quoted Printable encoders.
Is there a way to fix this, maybe I'm doing something wrong so I paste the code I'm using here
$message = Swift_Message::newInstance()
// set encoding in 8 bit
->setEncoder(Swift_Encoding::get8BitEncoding())
// Give the message a subject
->setSubject($subject)
// Set the From address with an associative array
->setFrom(array($from => $niceFrom))
// Set the To addresses with an associative array
->setTo(array($to)) ;
Check if in your PHP configuration mbstring.func_overload option has any value other than 0. If yes, change it to 0, reload your webserver and try to send message again.
mbstring.func_overload overrides some string PHP functions and may lead to tricky bugs with UTF-8.
Personally I solved exactly this problem by disabling mbstring.func_overload.
First, make sure you know how is your subject string encoded. If it is not UTF-8 then utf8_encode() it.
Also, make sure you setCharset('utf-8') your message.

ASCII-characters instead of Swedish chars?

I have tested PHP's IMAP lib. to fetch emails from a GMAIL account, but I've just can't get my head around trying to make the characters to display correctly.
At first, I was close to pull my hair off when I realized that I accidentally fetched the attachments instead of the message body - not good, but now when that is solved, I still have problems viewing the actual messages with appropriate Swedish characters, like åÅ äÄ öÖ which instead appear as their ASCII-cousins; =E4, =E5 - and so on.
What is the appropriate way to solve this? I've tested all encoding functions that I can think of by myself - and it won't work...
Thanks!
Not 100% sure, but it seems to me that the content of the message is quoted-printable encoded. Try quoted_printable_decode - http://www.php.net/manual/en/function.quoted-printable-decode.php
If you are already using the IMAP extension, you can also try imap_qprint - http://www.php.net/manual/en/function.imap-qprint.php
Try this
function fixEncoding($in_str) {
$cur_encoding = mb_detect_encoding($in_str) ;
if($cur_encoding == "UTF-8" && mb_check_encoding($in_str,"UTF-8"))
return $in_str;
else
return utf8_encode($in_str);
}

Categories