PHP sent emails have =0A=0A instead of new lines - php

For some time now I've had the problem of some of my users getting =0A=0A instead of new lines in emails I send to them via PHP. Correspondence via email client works well, but PHP generated emails always look like this with some users (a minority). Googling revealed no decent results, all search results seem to be connected with outlook somehow - and it is unacceptable to think that all outlook users would suffer from this problem. Does anyone know a correct way of handling this and avoiding these new line encoding issues?
Edit: FYI I'm using Zend's Mailer class.
Thanks
Edit 2:
Changing the encoding type did not work. I encoded the headers to base64, and the body to 64, got garbled stuff. Then I tried with base64 headers, and did base64_decode(base64_decode($body)) on the body, and that was fine on the user's "CNR Server but not in the inbox" whatever that means. When I tried mb_convert_encoding to base64, I got the encoded string instead of the body again, so no use.
What else can I try? Zend Mailer only supports Quoted Printable and Base64 header encoding. Not sure what to do to the body for it to match the quoted printable encoding...

The email body has been encoded using quoted-printable - but the mime type declared in the email is text/html (or text/plain or undefined).
How you make the encoding of the body of the email match the mime header is up to you.

Related

’ in PHP is converting to ’ when using mb_convert_encoding in Outlook subject

I have a mail() function set up in PHP, when emailing to my email to test I noticed the subject was converting my ' into ’.
$subject="Please provide an updated copy of your company's certification";
result: Please provide an updated copy of your company’s certification.
I followed Getting ’ instead of an apostrophe(') in PHP adding mb_convert_encoding but now I am getting &rsquo instead of '.
$subjectBad="Please provide an updated copy of your company's certification";
$subject= mb_convert_encoding($subjectBad, "HTML-ENTITIES", 'UTF-8');
result: Please provide an updated copy of your company&rsquo ;s certification.
It comes through fine to my personal email, so is there a way to properly display a ' in Outlooks subject or am I at the whim of whatever their system settings are?
Whatever you used to type the subject did not use a simple apostrophe ' which has a common representation across virtually all single-byte encodings and UTF8, instead it used a "fancy" right single quote ’, which is represented differently between single-byte encodings and UTF-8.
mb_convert_encoding() is converting to an HTML entity because you are literally telling it to, and email headers are not HTML so it's going to display as the literal string ’. The only character set other than UTF-8 that has "smart quotes" is Microsoft's cp1252, and that is still the wrong answer for email headers.
The simplest answer is: Don't do that. Use a normal apostrophe. Everyone hates dealing with "smart" quotes.
The more complex answer is that email headers MUST be 7-bit safe "ASCII" text, and anything else requires additional handwaving. Ideally you should be using a proper email library that handles this, and the dozens of other annoyances that will malform your emails and impact deliverability.
If you're dead-set on eroding your sanity and using mail() directly, then you're going to want to properly encode your subject line and use an explicitly-defined character set, which you should be doing anyways. Eg:
$subject = 'Please provide an updated copy of your company’s certification';
var_dump(
sprintf('=?UTF-8?Q?%s?=', quoted_printable_encode($subject))
);
Output:
string(82) "=?UTF-8?Q?Please provide an updated copy of your company=E2=80=99s certification?="

non-English characters display issue in gmail, yahoo via phpMailer class

I use phpMailer (as of today version in GitHub) to send automatic smtp activation mails from my noreply#host.com.
I tried it in gmail and yahoo. Both interpreted the characters as shown below.
nice unwanted (realized)
Ç -> Ç
ı -> Ä
ş -> Åž
mailing process order is:
"sign page" (has the form, utf-8 encoded), then
"assess_from_sign.php" (pure php without any encoding command. actual
sending or failure process is here),then
"inform page" (informs user for result)
my message in mail body starts with Dear $_POST['username']
What can I apply to $_POST['username'] variable in assess_from_sign.php page so even non-English characters seems exactly like in their own language.
note: all requests are redirecting to index.php page in my site and it has mb_internal_encoding("UTF-8"); command which applies to all pages.
thanks, regards
It's important that all aspects of the code is set to the same, specific charset. I recommend UTF-8 as you already started using, which covers most characters you'll ever need.
Below you'll find a "checklist" of what should be set to UTF-8.
PHP header - this has to be put prior to any output to the browser, and should be put on the top of all your .php pages: header('Content-Type: text/html; charset=utf-8');
HTML header - this should also be in all your pages containing HTML, and it to be put inside the <head> tags: <meta charset=utf-8" />
PHPMailer Object - specify the charset of your PHPMailer object by adding
$mail->CharSet = 'UTF-8';, where $mail is the object itself.
File-encoding: The file itself should be converted to a UTF8 charset (specifically UTF8 w/o BOM). This varies a bit on what kind of texteditor your are using, but in Notepad++ it's Format -> Convert to UTF8 (w/o Byte Order Mark).
There might be other aspects of your code that need to be set to an UTF8 charset (databases and such), but this should cover the mail-properties.
You can also reference UTF-8 all the way through.

Reliably Clean Email Message Body Encoding

I am writing a small piece of software in php which connects to a IMAP email box and stores the messages contained therein in a MySQL DB for later processing and other goodness.
I have noticed that during testing I get some strange characters appearing in the message body when I attempt to save the message body raw. I am using imap_fetchbody() to extract the message body.
I noticed that when I use quoted_printable_decode() to clean up the message body this helps! However in doing lots of research I have also learned that this will not always help and that other methods such as utf8_encode() and base64_decode() should be used instead to clean up the message body.
So, my question is: what is the best method for reliably cleaning an email message body with php to cover all encoding scenarios?
An "email body" is nowadays actually a tree of individual MIME parts. Sometimes there's just one of them, e.g. a text/plain mail. Sometimes there's a multipart/alternative which wraps inside it two "equivalent" copies of the message, one as text/plain and other as text/html. Sometimes the structure is much more complicated, with many levels of nesting. It is quite common that some of these parts are actually binary content, like images, attached ZIP files and what not.
Each of these individual MIME parts can be encoded for transport; these are specified in the Content-Transfer-Encoding header of the corresponding MIME part. The two encoding schemes which you absolutely must support to interoperate are quoted-printable and base64. An important observation is that this encoding happens separately for each part, i.e. it's perfectly legal to have a multipart/alternative with a text/plain encoded with quoted-printable and another part, text/html encoded in base64.
When you have decoded this transfer encoding, you still have to decode the text from its character encoding to Unicode, i.e. to turn the stream of bytes into Unicode text. You need to consult the encoding parameter of the Content-Type MIME header (again, the part header, not the whole-message header, unless the message itself has only one part).
All details you need to know are in RFC 2045, RFC 2046, RFC 2047 and RFC 2048 (and their corresponding updates).
FInally, there's also the interesting question on what the "main part" of an e-mail is. Suppose you have something like this:
1 multipart/mixed
+ 1.1 text/plain: "Hi, I'm forwarding Jeff's message..."
+ 1.2 message/rfc822
+ 1.2.1 multipart/alternative
+ 1.2.1.1 text/plain "Hi coleagues, I'm sending the meeting notes from..."
+ 1.2.1.2 text/html "<p>Hi colleagues,..."
i.e. this happens when Fred forwards Jeff's message to you. What is the "main part" here?

Russian Language encoded when using imap_fetch from gmail

Im reading a log file pasted into the body of an email, some are in various different languages and all language characters seem to display correctly except for Russian.
Here is an example of what the Russian says in the log file:
Ссылка на объект не указывает на экземпляр объекта.
в
From what I have read I need to specify decoding or encoding something on the lines of mb_encoding (UTF-8) but I am a bit lost on how to actual structure it without affecting code that isnt russian. But when echoed out it gets converted to this:
СÑылка на объект не указывает на ÑкземплÑÑ€ объекта.
в
Here is the code im using already, I am a php beginner and some of this isnt my code, I have edited to suit but not 100% what everything is doing:
$mailbox = "xxx#gmail.com";
$mailboxPassword = "xxx";
$mailbox = imap_open("{imap.gmail.com:993/imap/ssl}INBOX",
$mailbox, $mailboxPassword);
mb_internal_encoding("UTF-8");
$subject = mb_decode_mimeheader(str_replace('_', ' ', $subject));
$body = imap_fetchbody($mailbox, $val, 1);
$body = base64_decode($body);
echo $body;
Once I echo out body it converts from Russian into that encoding, any pointers on similar code I can dissect to learn how to fix this?
Please bear in mind there is numerous languages been read from the email, for the most part its just a few snippets and the rest is basic logging but what I am worried about is if I set a new decode that it will mess up other language characters
Despite its large adoption, email is still tricky to work with. If your IMAP client has a limited set of requirements, your job will be easy. Otherwise, for truly a general-purpose GMail client, there's no silver bullet and you have to un understand how email wokrs: SMTP, MIME and finally IMAP.
Basic MIME knowledge is absolutely needed, and I won't paste the whole wikipedia article, but you should really read it and understand how it works. IMAP is somewhat easier to understand.
Usually, email messages contains either a single text/plain body, or a multipart/alternative body with both a text/plain and a text/html part. But, you know, there are attachments, so you can also likely find a multipart/mixed and it can really contain anything, and if it's binary content you should treat it differently than text. There are two headers (which you can find in the global message or in part inside a multipart envelope) somewhat involved in charset issues: Content-Type and Content-Transfer-Encoding.
From your code, we must assume that you are only interested in textual parts base64-encoded. Once you have decoded them, they are a sequence of byte representing text in the charset specified by the sender in the Content-Type header, which is non-ASCII here and thus looks like this:
Content-Type: text/plain; charset=ISO-8859-1
Note that charset may be utf8 or really any other you can think of, you have to check this in your program. You job is transcoding this piece of input in the output charset of your HTML page. If your page does not use a Unicode encoding (like UTF-8), chances are that you can't even be able to show the message correctly, and '?' will be printed instead of missing characters. Since you require your application to be used worldwide (not just in Russia), and since it's anyway good practice, you should use UTF-8 in your HTML responses, and thus when you want to echo the message body:
echo mb_convert_encoding(imap_base64($body), "UTF-8", $input_charset);
where $input_charset is the one found in the Content-Type header for the processed part. For the subject line, you should use imap_mime_header_decode(), which returns an array of tuples (binary string, charset) which you have to output in the same manner as above.
TL;DR
The bytes in the UTF-8 encoded input text map quite nicely to the output if we assume it's CP-1252 encoded (maybe you didn't copy some non printable ones). This means that the input is UTF-8, but the browser thinks the page is Windows-1252. Likely this is the default browser behavior for your locale, and you can easily correct it by sending the appropriate header before any other input:
header("Content-Type: text/html; charset=utf-8");
This should be enough to solve this issue, but will also likely cause problem with non-ASCII characters in string literals and the database (if any). If you want a multilingual application, Unicode is the way, but you have to transcode your database and your PHP files from CP-1252 to UTF-8.

PHP chinese character IMAP

I retrieve data from an email through IMAP and i want to
detect (via PHP) whether the body have characters in Chinese, Japanese, or Korean programmatically. I know to encoding but no to detect
$mbox = imap_open ("{localhost:995/pop3/ssl/novalidate-cert}", "info#***.com", "********");
$email=$_REQUEST['email'];
$num_mensaje = imap_search($mbox,"FROM $email");
// grab the body for the same message
$body = imap_fetchbody($mbox,$num_mensaje[0],"1");
//chinese for example
$str = mb_convert_encoding($body,"UTF-8","EUC-CN");
imap_close($mbox);
Any idea
Do you mean that you don't know which CJK encoding the incoming message is in?
The canonical place to find that information is the charset= parameter in the Content-Type: header.
Unfortunately extracting that is not as straightforward as you would hope. Really you'd think that the object returned by imap_header would contain the type information, but it doesn't. Instead, you have to use imap_fetchheader to grab the raw headers from the message, and parse them yourself.
Parsing RFC822 headers isn't completely straightforward. For simple cases you might be able to get away with matching each line against ^content-type:.*; *charset=([^;]+) (case-insensitively). But to do it really properly though you'd have to run the whole message headers and body through a proper RFC822-family parser like MailParse.
And then you've still got the problem of messages that neglect to include charset information. For that case you would need to use mb_detect_encoding.
Or are you just worried about which language the correctly-decoded characters represent?
In this case the header you want to read, using the same method as above, is Content-Language. However it is very often not present in which case you have to fall back to guessing again. CJK Unification means that all languages may use many of the same characters, but there are a few heuristics you can use to guess:
The encoding that the message was in, from the above. eg if it was EUC-CN, chances are your languages is going to be simplified Chinese.
The presence of any kana (U+3040–U+30FF -> Japanese) or Hangul (U+AC00–U+D7FF -> Korean) in the text.
The presence of simplified vs traditional Chinese characters. Although some characters can represent either, others (where there is a significant change to the strokes between the two variants) only fit one. The simple way to detect their presence is to attempt to encode the string to GBK and Big5 encodings and see if it fails. ie if you can't encode to GBK but you can to Big5, it'll be traditional Chinese.

Categories