Sending mail from contact form with PHP and special characters [duplicate] - php

This question already has answers here:
How to send UTF-8 email?
(3 answers)
Closed 9 years ago.
I'm afraid I'm not a programmer but still I'm trying to help a client fix an annoying issue with his site.
This is part of his mailing.php:
$headers = "MIME-Version: 1.0\r\n";
$headers = "Content-Type: text/html; charset=UTF-8";
But still, when viewed in his webmail, he get's á ñ, etc (even when hitting view as html).
I've done some searching, but either I'm applying the solutions poorly or something.

Diego Martinez, encoding of your letter depends on:
encoding of your script (f.e. sendmail.php)
encoding of text in variable, that you will send in letter (fe if your text)
headers in letter
to manipulate file (script) encoding use notepad++, it can convert your file (Encoding/Convert to ...) your file must have UTF-8 encoding
to change text encoding use iconv()
$text = iconv('utf-8', 'iso-8859-2', $text);
it will convert $text from utf-8 to iso-8859-2
headers. i see, you know how to change it : )

Related

Decoding Windows-1252 characters in imap subject line to UTF-8

I have a website that will allow people to post things to it using the subject line of an email in Outlook. Using PHP and imap, I get the subject line of the text and store it in a mysql db. But every once in a while, someone will copy text from a website into the subject line of that email and I will get garbled text. Similar to this:
=?Windows-1252?Q?_Every_day_in_our_offices_we_recycle_cardboard,aluminum?=
=?Windows-1252?Q?=96_won=92t_you_join_us=3F?=
What I've done is try to decode this text so it will appear normal on the page using the following code:
$subject = strip_tags($mailHeader->subject);
$header = imap_mime_header_decode($subject);
$subject = "";
for($i=0;$i<count($header);$i++)
{
$subject .= $header[$i]->text;
}
When finished I get rid of most of the garbled text, but am left behind with replacement characters for an em dash and a curly quote that was in the original subject line text. See the result below:
Every day in our offices we recycle cardboard, aluminum, � won�t you join us?
The charset for the website is set to UTF-8. When I set the website charset to ISO-8859-1, the replacement characters are replaced with the curly quote and em dash, which is great but I want to leave the website's charset at UTF-8.
Any help on how to get rid of the replacement characters without changing the charset to ISO-8859-1 would be great. Thanks.
Code above works except for one small change to the very end:
$subject .= mb_convert_encoding($header[$i]->text, "UTF-8", $header[$i]->charset);
Each of the objects returned by imap_mime_header_decode includes a charset property, which you are ignoring. You would need to convert each one to UTF-8 in your loop, using something like:
$subject .= mb_convert_encoding($header[$i]->text, "UTF-8", $header[$i]->charset);
As an alternative, consider using the mb_decode_mimeheader or iconv_mime_decode_headers functions. Both of these functions do the entire job of decoding a MIME header for you, returning a string in PHP's internal encoding (which is usually UTF-8).

Character set in PHP, mySQL and emails

Character sets are driving me round the bend!
My database is utf8_general_ci and the tables within it are utf8_unicode_ci.
All my PHP pages have
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
in the head.
When I type a euro symbol (€) in the PHP page and input it to the database it appears in phpMyAdmin as €
When I use a PHP page to copy it from the database, it reappears as €
So far so good, but when I write am html email using PHP, no matter what I do it comes out in the email as €
I've tried specifying the character set in the html email but it doesn't work. Probably because email clients take their character set from the mail server and not the headers.
I've also got issues with accented letters and the like, but they are being written into the text so I could overcome them by using é for é, and so on. Messy, but not impossible.
Surely there's a better way!
MY SOLUTION (SORT OF)
Thanks to all who contributed. I have tried all ways to specify the character set to utf-8 (in the mail headers, in the html head, and both) but the message still arrives in iso. So it seems the information I got from elsewhere was right: character set is defined by the server.
I have had to settle for typing things like é each time I want é into the fixed text, which is cumbersome but at least it works.
For the euro symbol, I have wrapped the variable in the htmlentities function. It works, but I will have to remember to do it with any other variables taken from the database if I encounter similar problems in similar files. It would have been easier to wrap the function around the entire html body but that doesn't work, presumably because it does funny things with the .
Check the encoding in your text editor. Crazy things can happen if this is wrong!
For MySQL, see mysql_set_charset.
Emails actually get their character set from the Content-Type header, not from the email server. Make sure you set this header to an appropriate value, such as Content-Type: text/html; charset="UTF-8". See also this question.
And keep in mind that for HTML, you can use $encoded = htmlentities( $string, ENT_QUOTES, 'utf-8' ) so that all characters which have HTML character entity equivalents are translated into these entities.
I guess that the correct answer for you is just setting utf-8 charset for the e-mail:
$headers = "MIME-Version: 1.0\r\n";
$headers.= "From: =?utf-8?b?".base64_encode($from_name)."?= <".$from_a.">\r\n";
$headers.= "Content-Type: text/plain;charset=utf-8\r\n";
$headers.= "Reply-To: $reply\r\n";
$headers.= "X-Mailer: PHP/" . phpversion();
mail($to, $s, $body, $headers);
If you open the email source in your client (ctrl+u in thunderbird). You will see a Content-Type header. This should be something like:
Content-Type: text/html; charset=utf-8
If your email contains multiple parts your need to add that header to html part.
Header values need to be encoded separately (each line).
Subject: =?utf-8?B?...?=
For the html content you can just use htmlentities() but this will not work for the headers or a text email.

Russian Language encoded when using imap_fetch from gmail

Im reading a log file pasted into the body of an email, some are in various different languages and all language characters seem to display correctly except for Russian.
Here is an example of what the Russian says in the log file:
Ссылка на объект не указывает на экземпляр объекта.
в
From what I have read I need to specify decoding or encoding something on the lines of mb_encoding (UTF-8) but I am a bit lost on how to actual structure it without affecting code that isnt russian. But when echoed out it gets converted to this:
СÑылка на объект не указывает на ÑкземплÑÑ€ объекта.
в
Here is the code im using already, I am a php beginner and some of this isnt my code, I have edited to suit but not 100% what everything is doing:
$mailbox = "xxx#gmail.com";
$mailboxPassword = "xxx";
$mailbox = imap_open("{imap.gmail.com:993/imap/ssl}INBOX",
$mailbox, $mailboxPassword);
mb_internal_encoding("UTF-8");
$subject = mb_decode_mimeheader(str_replace('_', ' ', $subject));
$body = imap_fetchbody($mailbox, $val, 1);
$body = base64_decode($body);
echo $body;
Once I echo out body it converts from Russian into that encoding, any pointers on similar code I can dissect to learn how to fix this?
Please bear in mind there is numerous languages been read from the email, for the most part its just a few snippets and the rest is basic logging but what I am worried about is if I set a new decode that it will mess up other language characters
Despite its large adoption, email is still tricky to work with. If your IMAP client has a limited set of requirements, your job will be easy. Otherwise, for truly a general-purpose GMail client, there's no silver bullet and you have to un understand how email wokrs: SMTP, MIME and finally IMAP.
Basic MIME knowledge is absolutely needed, and I won't paste the whole wikipedia article, but you should really read it and understand how it works. IMAP is somewhat easier to understand.
Usually, email messages contains either a single text/plain body, or a multipart/alternative body with both a text/plain and a text/html part. But, you know, there are attachments, so you can also likely find a multipart/mixed and it can really contain anything, and if it's binary content you should treat it differently than text. There are two headers (which you can find in the global message or in part inside a multipart envelope) somewhat involved in charset issues: Content-Type and Content-Transfer-Encoding.
From your code, we must assume that you are only interested in textual parts base64-encoded. Once you have decoded them, they are a sequence of byte representing text in the charset specified by the sender in the Content-Type header, which is non-ASCII here and thus looks like this:
Content-Type: text/plain; charset=ISO-8859-1
Note that charset may be utf8 or really any other you can think of, you have to check this in your program. You job is transcoding this piece of input in the output charset of your HTML page. If your page does not use a Unicode encoding (like UTF-8), chances are that you can't even be able to show the message correctly, and '?' will be printed instead of missing characters. Since you require your application to be used worldwide (not just in Russia), and since it's anyway good practice, you should use UTF-8 in your HTML responses, and thus when you want to echo the message body:
echo mb_convert_encoding(imap_base64($body), "UTF-8", $input_charset);
where $input_charset is the one found in the Content-Type header for the processed part. For the subject line, you should use imap_mime_header_decode(), which returns an array of tuples (binary string, charset) which you have to output in the same manner as above.
TL;DR
The bytes in the UTF-8 encoded input text map quite nicely to the output if we assume it's CP-1252 encoded (maybe you didn't copy some non printable ones). This means that the input is UTF-8, but the browser thinks the page is Windows-1252. Likely this is the default browser behavior for your locale, and you can easily correct it by sending the appropriate header before any other input:
header("Content-Type: text/html; charset=utf-8");
This should be enough to solve this issue, but will also likely cause problem with non-ASCII characters in string literals and the database (if any). If you want a multilingual application, Unicode is the way, but you have to transcode your database and your PHP files from CP-1252 to UTF-8.

Character encoding for French Accents

I'm developing my first website for a French client and I'm having massive issues with accents being displayed as "?".After googling it for days, I thought I understood, but issues persists.
To simplify it, I'll explain just the email headers (the message contains french accents)
$headers = 'MIME-Version: 1.0' . "\r\n";
$headers .= 'Content-type: text/html; charset=iso-8859-1' . "\r\n";
I've tried using charset UTF-8 and the iso-8859-1, but I still get this type of emails:
Merci pour votre intérêt pour les tee shirts.
Can any one help? I'm having these issues with mySql, HTML, PHP everywhere basically.
Thanks.
If intérêt shows up as intérêt you likely (i.e. short of corruption due to double encoding) have UTF-8 encoded text being shown up as if it were ISO-8859-1.
Make sure the headers are correctly formed and present the content as being UTF-8 encoded.
First of all, make the charset in the header UTF8 again.
In your page, use utf8_encode() where appropriate to make sure values coming from a database or external files are properly encoded (try to set the encoding of the fields in your database to UTF8 as well)
Also, take a look at the htmlentities() function to parse special characters to html entities which may solve encoding issues as well.
All other languages except French work fine for me by default
In my /fr/messages.php file I was able to resolve this with
'myKey' => utf8_encode('en français'),

Email from PHP has broken Subject header encoding

My PHP script sends email to users and when the email arrives to their mailboxes, the subject line ($subject) has characters like a^£ added to the end of my subject text. This is obviously and encoding problem. The email message content itself is fine, just the subject line is broken.
I have searched all over but can’t find how to encode my subject properly.
This is my header. Notice that I’m using Content-Type with charset=utf-8 and Content-Transfer-Encoding: 8bit.
//set all necessary headers
$headers = "From: $sender_name<$from>\n";
$headers .= "Reply-To: $sender_name<$from>\n";
$headers .= "X-Sender: $sender_name<$from>\n";
$headers .= "X-Mailer: PHP4\n"; //mailer
$headers .= "X-Priority: 3\n"; //1 UrgentMessage, 3 Normal
$headers .= "MIME-Version: 1.0\n";
$headers .= "X-MSMail-Priority: High\n";
$headers .= "Importance: 3\n";
$headers .= "Date: $date\n";
$headers .= "Delivered-to: $to\n";
$headers .= "Return-Path: $sender_name<$from>\n";
$headers .= "Envelope-from: $sender_name<$from>\n";
$headers .= "Content-Transfer-Encoding: 8bit\n";
$headers .= "Content-Type: text/plain; charset=UTF-8\n";
Update   For a more practical and up-to-date answer, have a look at Palec’s answer.
The specified character encoding in Content-Type does only describe the character encoding of the message body but not the header. You need to use the encoded-word syntax with either the quoted-printable encoding or the Base64 encoding:
encoded-word = "=?" charset "?" encoding "?" encoded-text "?="
You can use imap_8bit for the quoted-printable encoding and base64_encode for the Base64 encoding:
"Subject: =?UTF-8?B?".base64_encode($subject)."?="
"Subject: =?UTF-8?Q?".imap_8bit($subject)."?="
TL;DR
$preferences = ['input-charset' => 'UTF-8', 'output-charset' => 'UTF-8'];
$encoded_subject = iconv_mime_encode('Subject', $subject, $preferences);
$encoded_subject = substr($encoded_subject, strlen('Subject: '));
mail($to, $encoded_subject, $message, $headers);
or
mb_internal_encoding('UTF-8');
$encoded_subject = mb_encode_mimeheader($subject, 'UTF-8', 'B', "\r\n", strlen('Subject: '));
mail($to, $encoded_subject, $message, $headers);
Problem and solution
The Content-Type and Content-Transfer-Encoding headers apply only to the body of your message. For headers, there is a mechanism for specifying their encoding specified in RFC 2047.
You should encode your Subject via iconv_mime_encode(), which exists as of PHP 5:
$preferences = ["input-charset" => "UTF-8", "output-charset" => "UTF-8"];
$encoded_subject = iconv_mime_encode("Subject", $subject, $preferences);
Change input-charset to match the encoding of your string $subject. You should leave output-charset as UTF-8. Before PHP 5.4, use array() instead of [].
Now $encoded_subject is (without trailing newline)
Subject: =?UTF-8?B?VmVyeSBsb25nIHRleHQgY29udGFpbmluZyBzcGVjaWFsIGM=?=
=?UTF-8?B?aGFyYWN0ZXJzIGxpa2UgxJvFocSNxZnFvsO9w6HDrcOpPD4/PSsqIHA=?=
=?UTF-8?B?cm9kdWNlcyBzZXZlcmFsIGVuY29kZWQtd29yZHMsIHNwYW5uaW5nIG0=?=
=?UTF-8?B?dWx0aXBsZSBsaW5lcw==?=
for $subject containing:
Very long text containing special characters like ěščřžýáíé<>?=+* produces several encoded-words, spanning multiple lines
How does it work?
The iconv_mime_encode() function splits the text, encodes each piece separately into an <encoded-word> token and folds the whitespace between them. Encoded word is =?<charset>?<encoding>?<encoded-text>?= where:
<encoding> is either B (for Base 64 – see base64_encode()) or Q (for Quoted-printable – see quoted_printable_encode()),
<encoded-text> is string encoded with <encoding>, which has charset <charset> after being decoded.
You can decode =?CP1250?B?QWhvaiwgc3bsdGU=?= into UTF-8 string Ahoj, světe (Hello, world in Czech) via iconv("CP1250", "UTF-8", base64_decode("QWhvaiwgc3bsdGU=")) or directly via iconv_mime_decode("=?CP1250?B?QWhvaiwgc3bsdGU=?=", 0, "UTF-8").
Encoding into encoded words is more complicated, because the spec requires each encoded-word token to be at most 75 bytes long and each line containing any encoded-word token must be at most 76 bytes long (including blank at the start of a continuation line). Don’t implement the encoding yourself. All you really need to know is that iconv_mime_encode() respects the spec.
Interesting related reading is the Wikipedia article Unicode and email.
Alternatives
A rudimentary option is to use only a restricted set of characters. ASCII is guaranteed to work. ISO Latin 1 (ISO-8859-1), as user2250504 suggested, will probably work too, because it is often used as fallback when no encoding is specified. But those character sets are very small and you’ll probably be unable to encode all the characters you’ll want. Moreover, the RFCs say nothing about whether Latin 1 should work or not.
You can also use mb_encode_mimeheader(), as Paul Norman answered, but it’s easy to use it incorrectly.
You must use mb_internal_encoding() to set the mbstring functions’ internally used encoding. The mb_* functions expect input strings to be in this encoding. Beware: The second parameter of mb_encode_mimeheader() has nothing to do with the input string (despite what the manual says). It corresponds to the <charset> in the encoded word (see How does it work? above). The input string is recoded from the internal encoding to this one before being passed to the B or Q encoding.
Setting internal encoding might not be needed since PHP 5.6, because the underlying mbstring.internal_encoding configuration option had been deprecated in favor of the default_charset option, which has been set to UTF-8 by default, since. Note that this is just a default and it may be inappropriate to rely on defaults in your code.
You must include the header name and colon in the input string. The RFC imposes a strong limit on line length and it must hold for the first line, too! An alternative is to fiddle with the fifth parameter ($indent; last one as of September 2015), but this is even less convenient.
The implementation might have bugs. Even if used correctly, you might get broken output. At least this is what many comments on the manual page say. I have not managed to find any problem, but I know implementation of encoded words is tricky. If you find potential or actual bugs in mb_encode_mimeheader() or iconv_mime_encode(), please, let me know in the comments.
There is also at least one upside to using mb_encode_mimeheader(): it does not always encode all the header contents, which saves space and leaves the text human-readable. The encoding is required only for the non-ASCII parts. The output analogous to the iconv_mime_encode() example above is:
Subject: Very long text containing special characters like
=?UTF-8?B?xJvFocSNxZnFvsO9w6HDrcOpPD4/PSsqIHByb2R1Y2VzIHNldmVyYWwgZW5j?=
=?UTF-8?B?b2RlZC13b3Jkcywgc3Bhbm5pbmcgbXVsdGlwbGUgbGluZXM=?=
Usage example of mb_encode_mimeheader():
mb_internal_encoding('UTF-8');
$encoded_subject = mb_encode_mimeheader("Subject: $subject", 'UTF-8');
$encoded_subject = substr($encoded_subject, strlen('Subject: '));
mail($to, $encoded_subject, $message, $headers);
This is an alternative to the snippet in TL;DR on top of this post. Instead of just reserving the space for Subject: , it actually puts it there and then removes it in order to be able to use it with the mail()’s stupid interface.
If you like mbstring functions better than the iconv ones, you might want to use mb_send_mail(). It uses mail() internally, but encodes subject and body of the message automatically. Again, use with care.
Headers other than Subject need different treatment
Note that you must not assume that encoding the whole contents of a header is OK for all headers that may contain non-ASCII characters. E.g. From, To, Cc, Bcc and Reply-To may contain names for the addresses they contain, but only the names may be encoded, not the addresses. The reason is that <encoded-word> token may replace just <text>, <ctext> and <word> tokens, and only under certain circumstances (see §5 of RFC 2047).
Encoding of non-ASCII text in other headers is a related but different question. If you wish to know more about this topic, search. If you find no answer, ask another question and point me to it in the comments.
mb_encode_mimeheader() for UTF-8 strings can be useful here, e.g.
$subject = mb_encode_mimeheader($subjectText,"UTF-8");

Categories