My PHP script sends email to users and when the email arrives to their mailboxes, the subject line ($subject) has characters like a^£ added to the end of my subject text. This is obviously and encoding problem. The email message content itself is fine, just the subject line is broken.
I have searched all over but can’t find how to encode my subject properly.
This is my header. Notice that I’m using Content-Type with charset=utf-8 and Content-Transfer-Encoding: 8bit.
//set all necessary headers
$headers = "From: $sender_name<$from>\n";
$headers .= "Reply-To: $sender_name<$from>\n";
$headers .= "X-Sender: $sender_name<$from>\n";
$headers .= "X-Mailer: PHP4\n"; //mailer
$headers .= "X-Priority: 3\n"; //1 UrgentMessage, 3 Normal
$headers .= "MIME-Version: 1.0\n";
$headers .= "X-MSMail-Priority: High\n";
$headers .= "Importance: 3\n";
$headers .= "Date: $date\n";
$headers .= "Delivered-to: $to\n";
$headers .= "Return-Path: $sender_name<$from>\n";
$headers .= "Envelope-from: $sender_name<$from>\n";
$headers .= "Content-Transfer-Encoding: 8bit\n";
$headers .= "Content-Type: text/plain; charset=UTF-8\n";
Update For a more practical and up-to-date answer, have a look at Palec’s answer.
The specified character encoding in Content-Type does only describe the character encoding of the message body but not the header. You need to use the encoded-word syntax with either the quoted-printable encoding or the Base64 encoding:
encoded-word = "=?" charset "?" encoding "?" encoded-text "?="
You can use imap_8bit for the quoted-printable encoding and base64_encode for the Base64 encoding:
"Subject: =?UTF-8?B?".base64_encode($subject)."?="
"Subject: =?UTF-8?Q?".imap_8bit($subject)."?="
TL;DR
$preferences = ['input-charset' => 'UTF-8', 'output-charset' => 'UTF-8'];
$encoded_subject = iconv_mime_encode('Subject', $subject, $preferences);
$encoded_subject = substr($encoded_subject, strlen('Subject: '));
mail($to, $encoded_subject, $message, $headers);
or
mb_internal_encoding('UTF-8');
$encoded_subject = mb_encode_mimeheader($subject, 'UTF-8', 'B', "\r\n", strlen('Subject: '));
mail($to, $encoded_subject, $message, $headers);
Problem and solution
The Content-Type and Content-Transfer-Encoding headers apply only to the body of your message. For headers, there is a mechanism for specifying their encoding specified in RFC 2047.
You should encode your Subject via iconv_mime_encode(), which exists as of PHP 5:
$preferences = ["input-charset" => "UTF-8", "output-charset" => "UTF-8"];
$encoded_subject = iconv_mime_encode("Subject", $subject, $preferences);
Change input-charset to match the encoding of your string $subject. You should leave output-charset as UTF-8. Before PHP 5.4, use array() instead of [].
Now $encoded_subject is (without trailing newline)
Subject: =?UTF-8?B?VmVyeSBsb25nIHRleHQgY29udGFpbmluZyBzcGVjaWFsIGM=?=
=?UTF-8?B?aGFyYWN0ZXJzIGxpa2UgxJvFocSNxZnFvsO9w6HDrcOpPD4/PSsqIHA=?=
=?UTF-8?B?cm9kdWNlcyBzZXZlcmFsIGVuY29kZWQtd29yZHMsIHNwYW5uaW5nIG0=?=
=?UTF-8?B?dWx0aXBsZSBsaW5lcw==?=
for $subject containing:
Very long text containing special characters like ěščřžýáíé<>?=+* produces several encoded-words, spanning multiple lines
How does it work?
The iconv_mime_encode() function splits the text, encodes each piece separately into an <encoded-word> token and folds the whitespace between them. Encoded word is =?<charset>?<encoding>?<encoded-text>?= where:
<encoding> is either B (for Base 64 – see base64_encode()) or Q (for Quoted-printable – see quoted_printable_encode()),
<encoded-text> is string encoded with <encoding>, which has charset <charset> after being decoded.
You can decode =?CP1250?B?QWhvaiwgc3bsdGU=?= into UTF-8 string Ahoj, světe (Hello, world in Czech) via iconv("CP1250", "UTF-8", base64_decode("QWhvaiwgc3bsdGU=")) or directly via iconv_mime_decode("=?CP1250?B?QWhvaiwgc3bsdGU=?=", 0, "UTF-8").
Encoding into encoded words is more complicated, because the spec requires each encoded-word token to be at most 75 bytes long and each line containing any encoded-word token must be at most 76 bytes long (including blank at the start of a continuation line). Don’t implement the encoding yourself. All you really need to know is that iconv_mime_encode() respects the spec.
Interesting related reading is the Wikipedia article Unicode and email.
Alternatives
A rudimentary option is to use only a restricted set of characters. ASCII is guaranteed to work. ISO Latin 1 (ISO-8859-1), as user2250504 suggested, will probably work too, because it is often used as fallback when no encoding is specified. But those character sets are very small and you’ll probably be unable to encode all the characters you’ll want. Moreover, the RFCs say nothing about whether Latin 1 should work or not.
You can also use mb_encode_mimeheader(), as Paul Norman answered, but it’s easy to use it incorrectly.
You must use mb_internal_encoding() to set the mbstring functions’ internally used encoding. The mb_* functions expect input strings to be in this encoding. Beware: The second parameter of mb_encode_mimeheader() has nothing to do with the input string (despite what the manual says). It corresponds to the <charset> in the encoded word (see How does it work? above). The input string is recoded from the internal encoding to this one before being passed to the B or Q encoding.
Setting internal encoding might not be needed since PHP 5.6, because the underlying mbstring.internal_encoding configuration option had been deprecated in favor of the default_charset option, which has been set to UTF-8 by default, since. Note that this is just a default and it may be inappropriate to rely on defaults in your code.
You must include the header name and colon in the input string. The RFC imposes a strong limit on line length and it must hold for the first line, too! An alternative is to fiddle with the fifth parameter ($indent; last one as of September 2015), but this is even less convenient.
The implementation might have bugs. Even if used correctly, you might get broken output. At least this is what many comments on the manual page say. I have not managed to find any problem, but I know implementation of encoded words is tricky. If you find potential or actual bugs in mb_encode_mimeheader() or iconv_mime_encode(), please, let me know in the comments.
There is also at least one upside to using mb_encode_mimeheader(): it does not always encode all the header contents, which saves space and leaves the text human-readable. The encoding is required only for the non-ASCII parts. The output analogous to the iconv_mime_encode() example above is:
Subject: Very long text containing special characters like
=?UTF-8?B?xJvFocSNxZnFvsO9w6HDrcOpPD4/PSsqIHByb2R1Y2VzIHNldmVyYWwgZW5j?=
=?UTF-8?B?b2RlZC13b3Jkcywgc3Bhbm5pbmcgbXVsdGlwbGUgbGluZXM=?=
Usage example of mb_encode_mimeheader():
mb_internal_encoding('UTF-8');
$encoded_subject = mb_encode_mimeheader("Subject: $subject", 'UTF-8');
$encoded_subject = substr($encoded_subject, strlen('Subject: '));
mail($to, $encoded_subject, $message, $headers);
This is an alternative to the snippet in TL;DR on top of this post. Instead of just reserving the space for Subject: , it actually puts it there and then removes it in order to be able to use it with the mail()’s stupid interface.
If you like mbstring functions better than the iconv ones, you might want to use mb_send_mail(). It uses mail() internally, but encodes subject and body of the message automatically. Again, use with care.
Headers other than Subject need different treatment
Note that you must not assume that encoding the whole contents of a header is OK for all headers that may contain non-ASCII characters. E.g. From, To, Cc, Bcc and Reply-To may contain names for the addresses they contain, but only the names may be encoded, not the addresses. The reason is that <encoded-word> token may replace just <text>, <ctext> and <word> tokens, and only under certain circumstances (see §5 of RFC 2047).
Encoding of non-ASCII text in other headers is a related but different question. If you wish to know more about this topic, search. If you find no answer, ask another question and point me to it in the comments.
mb_encode_mimeheader() for UTF-8 strings can be useful here, e.g.
$subject = mb_encode_mimeheader($subjectText,"UTF-8");
Related
Currently when sending an email from PHP which includes a spanish accent, the email is being rendered as follow:
Ω₯ζλZΫiz«’Ό*'΅ινO*^rνz{
I'm setting the following headers:
$headers = 'MIME-Version: 1.0' . "\r\n";
$headers .= 'Content-type: text/html; charset=UTF-8' . "\r\n";
$headers .= "X-Priority: 3\r\n";
$headers .= "X-Mailer: PHP". phpversion() ."\r\n";
A sample body message is:
Estudio bíblico en Web Church Connect
I'm also setting the charset of the html:
$message = '
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
<title>WebChurchConnect</title>
</head>
<body>
';
Any thoughts is appreciated.
Thanks.
E-Mail only accepts ASCII characters as many MTAs are not equipped to correctly relay other messages. You should think of MIME-E-Mails like onions: (they smell, they make you cry) and they have layers. The HTML message (inner layer) only gets decoded after the plain text of your message is handled (outer layer).
You need to explicitly encode any non-ASCII characters in the “outer” layer. You do this using the Content-Transfer-Encoding header, which can be set to either base64 or quoted-printable (some modern MTAs also support 8bit or binary but these must be set explicitly and support still isn’t as universal as one would hope for in 2014). Of course the MIME part that follows this header also needs to be actually encoded using the method specified. Fortunately Base64-Encoding is only a base64_encode call away.
Alternatively, since your message is in HTML (and you don’t seem to care about providing a plaintext alternative – which you should), you could also use HTML’s escaping mechanisms (e.g. í instead of í), but Base64 is generally safer since it’s immune to the MTAs that take it upon themselves to break up long lines after 78 characters.
Try using PHPMailer or SwiftMailer. Problems like these are there already solved, everything is tested and its much easier for you to work with.
Character sets are driving me round the bend!
My database is utf8_general_ci and the tables within it are utf8_unicode_ci.
All my PHP pages have
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
in the head.
When I type a euro symbol (€) in the PHP page and input it to the database it appears in phpMyAdmin as €
When I use a PHP page to copy it from the database, it reappears as €
So far so good, but when I write am html email using PHP, no matter what I do it comes out in the email as €
I've tried specifying the character set in the html email but it doesn't work. Probably because email clients take their character set from the mail server and not the headers.
I've also got issues with accented letters and the like, but they are being written into the text so I could overcome them by using é for é, and so on. Messy, but not impossible.
Surely there's a better way!
MY SOLUTION (SORT OF)
Thanks to all who contributed. I have tried all ways to specify the character set to utf-8 (in the mail headers, in the html head, and both) but the message still arrives in iso. So it seems the information I got from elsewhere was right: character set is defined by the server.
I have had to settle for typing things like é each time I want é into the fixed text, which is cumbersome but at least it works.
For the euro symbol, I have wrapped the variable in the htmlentities function. It works, but I will have to remember to do it with any other variables taken from the database if I encounter similar problems in similar files. It would have been easier to wrap the function around the entire html body but that doesn't work, presumably because it does funny things with the .
Check the encoding in your text editor. Crazy things can happen if this is wrong!
For MySQL, see mysql_set_charset.
Emails actually get their character set from the Content-Type header, not from the email server. Make sure you set this header to an appropriate value, such as Content-Type: text/html; charset="UTF-8". See also this question.
And keep in mind that for HTML, you can use $encoded = htmlentities( $string, ENT_QUOTES, 'utf-8' ) so that all characters which have HTML character entity equivalents are translated into these entities.
I guess that the correct answer for you is just setting utf-8 charset for the e-mail:
$headers = "MIME-Version: 1.0\r\n";
$headers.= "From: =?utf-8?b?".base64_encode($from_name)."?= <".$from_a.">\r\n";
$headers.= "Content-Type: text/plain;charset=utf-8\r\n";
$headers.= "Reply-To: $reply\r\n";
$headers.= "X-Mailer: PHP/" . phpversion();
mail($to, $s, $body, $headers);
If you open the email source in your client (ctrl+u in thunderbird). You will see a Content-Type header. This should be something like:
Content-Type: text/html; charset=utf-8
If your email contains multiple parts your need to add that header to html part.
Header values need to be encoded separately (each line).
Subject: =?utf-8?B?...?=
For the html content you can just use htmlentities() but this will not work for the headers or a text email.
I need to send emails using PHP's mail() function. The code I am using is this:
$email_message = chunk_split(base64_encode($email_message));
$headers = "Content-Transfer-Encoding: base64\r\n\r\n";
mail($to, $subject, $email_message, $headers);
There is a pound sterling symbol in the email which is not handled properly, i.e. recipient receives incorrect symbol. As its to do with character encoding and I am not sure how to set it to tell the email client how the characters are being encoded and how to deal with the pound symbol correctly. Can this information be put in the headers?
Are you correctly setting the encoding on the email? This is done by setting
'Content-type: text/html; charset=utf-8'
in the message's headers.
Plenty of documentation here if you scroll down: http://php.net/manual/en/function.mail.php
If you have pound signs coming out as  then look below
$costsum = "£".$costsum; (Does not work)
$costsum = "£".$costsum; (Does not work)
$costsum = "#163;".$costsum; (Does not work)
One answer I found was this!
$costsum = "\243".$costsum;
The \243 is a pound sign in whatever encoding that is.
I tried all the pages with UTF-8 and that didn't work either.
It was used to email a spreadsheet.
Im sending a mail using php, and the subject of the mail has accents.
Im receiving weird characters in Outlook instead the accented chars. If i see the email in a web email client i see the subject perfect.
This is the code that im using to send the email:
$to = 'whateveremail#whatever.com';
$subject = '[WEBSITE] áccent – tésts';
$message = '<p>Test message</p>
$headers = 'MIME-Version: 1.0' . "\r\n";
$headers.= 'From:equipothermomix#thermomix.com' . "\r\n";
$headers.= 'Content-type: text/html; charset=iso-8859-1' . "\r\n";
mail($to, $subject, $message , $headers);
How i could solve the issue, do i have to use another charset?
The only reliable way to send e-mail is using plain 7 bit ASCII so you need to encode strings properly when you want to use anything else. In your case, [WEBSITE] áccent – tésts should be encoded as (e.g.) =?windows-1252?B?W1dFQlNJVEVdIOFjY2VudCCWIHTpc3Rz?=, where the string splits as:
=?windows-1252?B?W1dFQlNJVEVdIOFjY2VudCCWIHTpc3Rz?=
^ ^ ^
| | |
Charset | Data
|
Base64
Check:
echo base64_decode('W1dFQlNJVEVdIOFjY2VudCCWIHTpc3Rz'); // [WEBSITE] áccent – tésts
This is just an example of one of the possible ways to encode non-ASCII strings.
When sending e-mail, there are many little details like this that need attention. That's why sooner or later you end up using a proper e-mail package like Swift Mailer or PHPMailer.
If there is a mix of encoding types, special characters are destroyed.
PHP messing with HTML Charset Encoding
PHP also has an option for encoding messages.
mb_language('uni'); // mail() encoding
PHP - Multibyte String
http://php.net/manual/en/mbstring.overload.php
http://php.net/manual/en/ref.mbstring.php
hellHi Folks,
I have a contact form on my webpage, and it workd fine so far.
Only problem is, that in my mailprogram, the name in the from field doesn't show correctly, although the sourcecode of the email seems correct:
From: Metaldemos <hello#metaldemos.com>
Reply-To: Metaldemos <hello#metaldemos.com>
Anyway, in the mailprogram, the name is 'hello'.
In php I use this headers:
$headers="Mime-Version: 1.0\nContent-Type: text/plain; charset=UTF-8\nContent-Transfer-Encoding: quoted-printable\nFrom: Metaldemos <hello#metaldemos.com>\nReply-To: Metaldemos <hello#metaldemos.com>\nReturn-Path: Metaldemos <hello#metaldemos.com>\n";
and the code for sending the mail:
mail($email, $subject, $mailbody, $headers,"-t -i -f Metaldemos <hello#metaldemos.com>");
Any idea on how I can fix this?
Greetz & thanks
Maenny
The above answer is correct. You need the \r\n at at the end of the "From" and "Reply-To" lines. AS WELL as at the end of ALL the other header lines.
According to the SMTP RFC (section "2.3.8. Lines")
Lines consist of zero or more data characters terminated by the
sequence ASCII character "CR" (hex value 0D) followed immediately by
ASCII character "LF" (hex value 0A). This termination sequence is
denoted as in this document. Conforming implementations MUST
NOT recognize or generate any other character or character sequence
as a line terminator. Limits MAY be imposed on line lengths by
servers (see Section 4).
In addition, the appearance of "bare" "CR" or "LF" characters in text
(i.e., either without the other) has a long history of causing
problems in mail implementations and applications that use the mail
system as a tool. SMTP client implementations MUST NOT transmit
these characters except when they are intended as line terminators
and then MUST, as indicated above, transmit them only as a
sequence.
So your header line of:
$headers="Mime-Version: 1.0\nContent-Type: text/plain; charset=UTF-8\nContent-Transfer-Encoding: quoted-printable\nFrom: Metaldemos <hello#metaldemos.com>\nReply-To: Metaldemos <hello#metaldemos.com>\nReturn-Path: Metaldemos <hello#metaldemos.com>\n";
is invalid, HTTP or SMTP headers MUST always end with \r\n not just a \n or \r
The correct line would be
$headers="Mime-Version: 1.0\r\n";
$headers.="Content-Type: text/plain; charset=UTF-8\n";
$headers.="Content-Transfer-Encoding: quoted-printable\n";
$headers.="From: Metaldemos <hello#metaldemos.com>\n";
$headers.="Reply-To: Metaldemos <hello#metaldemos.com>\n";
$headers.="Return-Path: Metaldemos <hello#metaldemos.com>\n";
You CAN put it all in one long line that's fine, I just split it up to make it clearer.
The reason it didn't work before is because you only changed FROM and REPLY-TO you have to change all of them.
Try adding both a carriage return and new line character. I know when I'm writing PHP scripts to send mail, I do something similar to the following:
...
$headers.= "From: John Doe <john.doe#example.com>\r\n";
$headers.= "Reply-To: Jane Doe <jane.doe#example.com>\r\n";
...
if (mail($to, $subject, $message, $headers)) {
// email sent
}
else {
// email failed
}