Swift_Mailer + symfony UTF-8 - php

I have an issue with Swift_Mailer in Symfony. I am sending e-mail messages in French which contain a lot of "à é è" characters. At first when i tried sending these characters came out fine in my email-client, but in my colleague's email-client they didn't.
So I put the text for the mail through a utf8_encode function and tried again. Now it is vica-versa. It shows fine in my email-client, but screwed up in my colleague's.
What is the best way to solve these e-mail UTF-8 issues with Swift_Mailer in Symfony?

Use $message->toString(); to see if your e-mail is well formatted, meaning everything is UTF-8 or uses the proper European ISO charset iso-8859-15. You can use setCharset to tell it what you're actually using.
The character set of the message (and it’s MIME parts) is set with the
setCharset() method. You can also change the global default of UTF-8
by working with the Swift_Preferences class.
Swift Mailer will default to the UTF-8 character set unless otherwise
overridden. UTF-8 will work in most instances since it includes all of
the standard US keyboard characters in addition to most international
characters.
It is absolutely vital however that you know what character set your
message (or it’s MIME parts) are written in otherwise your message may
be received completely garbled.
http://swiftmailer.org/docs/messages.html#setting-the-character-set

Related

’ in PHP is converting to ’ when using mb_convert_encoding in Outlook subject

I have a mail() function set up in PHP, when emailing to my email to test I noticed the subject was converting my ' into ’.
$subject="Please provide an updated copy of your company's certification";
result: Please provide an updated copy of your company’s certification.
I followed Getting ’ instead of an apostrophe(') in PHP adding mb_convert_encoding but now I am getting &rsquo instead of '.
$subjectBad="Please provide an updated copy of your company's certification";
$subject= mb_convert_encoding($subjectBad, "HTML-ENTITIES", 'UTF-8');
result: Please provide an updated copy of your company&rsquo ;s certification.
It comes through fine to my personal email, so is there a way to properly display a ' in Outlooks subject or am I at the whim of whatever their system settings are?
Whatever you used to type the subject did not use a simple apostrophe ' which has a common representation across virtually all single-byte encodings and UTF8, instead it used a "fancy" right single quote ’, which is represented differently between single-byte encodings and UTF-8.
mb_convert_encoding() is converting to an HTML entity because you are literally telling it to, and email headers are not HTML so it's going to display as the literal string ’. The only character set other than UTF-8 that has "smart quotes" is Microsoft's cp1252, and that is still the wrong answer for email headers.
The simplest answer is: Don't do that. Use a normal apostrophe. Everyone hates dealing with "smart" quotes.
The more complex answer is that email headers MUST be 7-bit safe "ASCII" text, and anything else requires additional handwaving. Ideally you should be using a proper email library that handles this, and the dozens of other annoyances that will malform your emails and impact deliverability.
If you're dead-set on eroding your sanity and using mail() directly, then you're going to want to properly encode your subject line and use an explicitly-defined character set, which you should be doing anyways. Eg:
$subject = 'Please provide an updated copy of your company’s certification';
var_dump(
sprintf('=?UTF-8?Q?%s?=', quoted_printable_encode($subject))
);
Output:
string(82) "=?UTF-8?Q?Please provide an updated copy of your company=E2=80=99s certification?="

php urlencod utf-8 string makes it ascii in mb_detect_encoding

During my work in updating some old projects im working through some old ANSI/ASCII files and encodings.
I want to have everything running utf-8 to make sure that i can support all kinds of languages.
I have a service where i send out sms'es using a microservice. I have an endpoint: /sms.php where i accept some parameters from _GET and these are then used in the application.
I have some test files where i make some requests to test if everything is ok.
My problem is that even though all files are utf8-encoded (i've checked multiple times)
My code looks like this:
$text = "message with æøå to make it utf8";
$params = urlencode($text);
$url = "http://localhost/sms.php?text=".$params;
echo mb_detect_encoding($text, "auto"); // this prints utf8
echo mb_detect_encoding($url, "auto"); // this prints ascii
$res = file_get_contents($url);
And this is also what i see in my receiving endpoint.
First i thought it was something to do with file_get_contents but since its being converted AFTER the urlencode it thought i might be it. But im not sure how to get around this problem.
The other problem i have is that a lot of my clients are using this old 2012 code (before i started using utf8 as standard) so i cant change the endpoint without causing them to make changes in their current setups.
In a comment i've been suggested to try to check for if the string is utf8 using
bin2hex:
bin2hex($_GET['text']); // 6d657373616765207769746820c3a6c3b8c3a520746f206d616b652069742075746638 which is inserted into the database: message with æøå to make it utf8
bin2hex(utf8_decode($_GET['text'])); // 6d657373616765207769746820e6f8e520746f206d616b652069742075746638 which is inserted into the database: message with æøå to make it utf8
Hope someone out there can point me in a correct direction.
I've looked into multiple stackoverflow entries for example
get utf8 urlencoded characters in another page using php
What's the correct encoding of HTTP get request strings?
but im not sure if what im looking for is even possible?
i was just hoping to be able to rewrite entire project to be utf8-ready
Thanks
/Wel
mb_detect_encoding gives you the first encoding in which the tested string is valid. If left to its own devices, it tests for ASCII before UTF-8. Since a URL-encoded string consists solely of a subset of ASCII characters, it is valid ASCII and mb_detect_encoding will tell you so. Whereas a string containing non-ASCII characters is not valid ASCII, so it will continue testing other encodings and eventually arrive at UTF-8.
UTF-8 is a superset of ASCII, so any string that is valid ASCII is also valid UTF-8. A string can be valid in multiple encodings at once; mb_detect_encoding telling you it's valid ASCII does not mean that it's not also valid UTF-8, or Latin-1, or numerous other encodings for that matter. That's how Mojibake is born.
Detecting encodings is largely vague nonsense anyway and you should never do that. If you expect a string to be in UTF-8, simply test whether it is valid UTF-8 or not:
mb_check_encoding($url, 'UTF-8')
If it's not valid in the expected encoding, discard it, since you have no clue what it really is then.

non-English characters display issue in gmail, yahoo via phpMailer class

I use phpMailer (as of today version in GitHub) to send automatic smtp activation mails from my noreply#host.com.
I tried it in gmail and yahoo. Both interpreted the characters as shown below.
nice unwanted (realized)
Ç -> Ç
ı -> Ä
ş -> Åž
mailing process order is:
"sign page" (has the form, utf-8 encoded), then
"assess_from_sign.php" (pure php without any encoding command. actual
sending or failure process is here),then
"inform page" (informs user for result)
my message in mail body starts with Dear $_POST['username']
What can I apply to $_POST['username'] variable in assess_from_sign.php page so even non-English characters seems exactly like in their own language.
note: all requests are redirecting to index.php page in my site and it has mb_internal_encoding("UTF-8"); command which applies to all pages.
thanks, regards
It's important that all aspects of the code is set to the same, specific charset. I recommend UTF-8 as you already started using, which covers most characters you'll ever need.
Below you'll find a "checklist" of what should be set to UTF-8.
PHP header - this has to be put prior to any output to the browser, and should be put on the top of all your .php pages: header('Content-Type: text/html; charset=utf-8');
HTML header - this should also be in all your pages containing HTML, and it to be put inside the <head> tags: <meta charset=utf-8" />
PHPMailer Object - specify the charset of your PHPMailer object by adding
$mail->CharSet = 'UTF-8';, where $mail is the object itself.
File-encoding: The file itself should be converted to a UTF8 charset (specifically UTF8 w/o BOM). This varies a bit on what kind of texteditor your are using, but in Notepad++ it's Format -> Convert to UTF8 (w/o Byte Order Mark).
There might be other aspects of your code that need to be set to an UTF8 charset (databases and such), but this should cover the mail-properties.
You can also reference UTF-8 all the way through.

Russian Language encoded when using imap_fetch from gmail

Im reading a log file pasted into the body of an email, some are in various different languages and all language characters seem to display correctly except for Russian.
Here is an example of what the Russian says in the log file:
Ссылка на объект не указывает на экземпляр объекта.
в
From what I have read I need to specify decoding or encoding something on the lines of mb_encoding (UTF-8) but I am a bit lost on how to actual structure it without affecting code that isnt russian. But when echoed out it gets converted to this:
СÑылка на объект не указывает на ÑкземплÑÑ€ объекта.
в
Here is the code im using already, I am a php beginner and some of this isnt my code, I have edited to suit but not 100% what everything is doing:
$mailbox = "xxx#gmail.com";
$mailboxPassword = "xxx";
$mailbox = imap_open("{imap.gmail.com:993/imap/ssl}INBOX",
$mailbox, $mailboxPassword);
mb_internal_encoding("UTF-8");
$subject = mb_decode_mimeheader(str_replace('_', ' ', $subject));
$body = imap_fetchbody($mailbox, $val, 1);
$body = base64_decode($body);
echo $body;
Once I echo out body it converts from Russian into that encoding, any pointers on similar code I can dissect to learn how to fix this?
Please bear in mind there is numerous languages been read from the email, for the most part its just a few snippets and the rest is basic logging but what I am worried about is if I set a new decode that it will mess up other language characters
Despite its large adoption, email is still tricky to work with. If your IMAP client has a limited set of requirements, your job will be easy. Otherwise, for truly a general-purpose GMail client, there's no silver bullet and you have to un understand how email wokrs: SMTP, MIME and finally IMAP.
Basic MIME knowledge is absolutely needed, and I won't paste the whole wikipedia article, but you should really read it and understand how it works. IMAP is somewhat easier to understand.
Usually, email messages contains either a single text/plain body, or a multipart/alternative body with both a text/plain and a text/html part. But, you know, there are attachments, so you can also likely find a multipart/mixed and it can really contain anything, and if it's binary content you should treat it differently than text. There are two headers (which you can find in the global message or in part inside a multipart envelope) somewhat involved in charset issues: Content-Type and Content-Transfer-Encoding.
From your code, we must assume that you are only interested in textual parts base64-encoded. Once you have decoded them, they are a sequence of byte representing text in the charset specified by the sender in the Content-Type header, which is non-ASCII here and thus looks like this:
Content-Type: text/plain; charset=ISO-8859-1
Note that charset may be utf8 or really any other you can think of, you have to check this in your program. You job is transcoding this piece of input in the output charset of your HTML page. If your page does not use a Unicode encoding (like UTF-8), chances are that you can't even be able to show the message correctly, and '?' will be printed instead of missing characters. Since you require your application to be used worldwide (not just in Russia), and since it's anyway good practice, you should use UTF-8 in your HTML responses, and thus when you want to echo the message body:
echo mb_convert_encoding(imap_base64($body), "UTF-8", $input_charset);
where $input_charset is the one found in the Content-Type header for the processed part. For the subject line, you should use imap_mime_header_decode(), which returns an array of tuples (binary string, charset) which you have to output in the same manner as above.
TL;DR
The bytes in the UTF-8 encoded input text map quite nicely to the output if we assume it's CP-1252 encoded (maybe you didn't copy some non printable ones). This means that the input is UTF-8, but the browser thinks the page is Windows-1252. Likely this is the default browser behavior for your locale, and you can easily correct it by sending the appropriate header before any other input:
header("Content-Type: text/html; charset=utf-8");
This should be enough to solve this issue, but will also likely cause problem with non-ASCII characters in string literals and the database (if any). If you want a multilingual application, Unicode is the way, but you have to transcode your database and your PHP files from CP-1252 to UTF-8.

UTF-8 & IsAlpha() in PHP

I'm working on a application which supports several languages and has a functionality in place which tries to use the language requested by the browser and also allows manual override of this function. This part works fine and picks the correct templates, labels, etc.
User have to enter sometimes text on their own and that's where I run into issues because the application has to accept even "complicated" languages like Chinese and Russian. So far I've taken care of the things mentioned in other posting, i.e.:
calling mb_internal_encoding( 'UTF-8' )
setting the right encoding when rendering the webpages with meta http-equiv=Content-Type content=text/html;charset=UTF-8 (format adapted due to stackoverflow limitations)
even the content arrives correctly, because mb_detect_encoding() == UTF-8
tried to set setLocale(LC_CTYPE, "UTF-8"), which doesn't seem to work because it requires the selection of one language, which I can't specify because I have to support several. And it still fails if I force it manually for testing purposes, i.e. with; setLocale(LC_CTYPE,"zh__CN.utf8") - ctype_alpha() would still fail for Chinese text
It seems that even explicit language selection doesn't make ctype_alpha() useful.
Hence the question is: how should I check for alphabetic characters in all languages?
The only idea I had at the moment is to check manually with arrays of "valid" characters - but this seems ugly especially for Chinese.
How would you solve this issue?
If you'd like to check only for valid unicode letters regardless of the used language I'd propose to use a regular expression (if your pcre-regex extension is built with unicode support):
// adjust pattern to your needs
// $input needs to be UTF-8 encoded
if (preg_match('/^\p{L}+$/u', $input)) {
// OK
} else {
// not OK
}
\p{L} checks for unicode characters with the L(etter) property which includes the properties Ll (lower case letter), Lm (modifier letter), Lo (other letter), Lt (title case letter) and Lu (upper case letter) - from: Regular Expression Details).
I wouldn't use an array of characters. That would get impossible to manage.
What I'd suggest is working out a 'default' language from the IP address and using that as the locale for a request. You could also get it from the browser-agent string in some cases. You could provide the user a way to override so that if your default isn't correct they aren't stuck with a strange site. (E.g. provide on the form 'language set to english. If this isn't correct, please change: '. This isn't the nicest thing to provide but you won't get any working validation otherwise as you NEED a language/locale set in order to have a sensible alpha validation (An A isn't a letter in chinese).
You can use the languages from
$_SERVER['HTTP_ACCEPT_LANGUAGE']
It contains something like
de-de,de;q=0.8,en-us;q=0.5,en;q=0.3
so you need to parse this string. Then you can use the preferred language in the setLocale function.
This is rather an encoding issue than a language detection issue. Because UTF-8 can encode any Unicode character.
The best approach is to use UTF-8 throughout your project: in your database, in your output and as expected encoding for the input.
Output    Make sure you encode your data with UTF-8 and declare that in the HTTP header in the Content-Type field and not just in the document itself.
Input    If you’re using forms, declare the expected encoding in the accept-charset attribute.

Categories