"Smart Quotes" not displaying properly in email from phpmailer - php

I'm dealing with a LAMP web server. I have forms that users use to submit text that is stored in a text field in mysql. Often this text is copied and pasted from Microsoft Office products, so I'm getting a lot of smart quotes and emdashes. These characters display properly if I retrieve them from the database and display them on the webpage, but where I'm running into trouble is sending the text in an email using the phpmailer class. I get stuff that looks like this: – (where it should be an emdash).
One thing that may be important: If I pull up a console in mysql and select a field that has an emdash or smart quote in it, it will display on my console incorrectly: –, however, as stated above, if my php page (using PDO) selects the field and displays it, it will display correctly in a browser (as an emdash in this case).
I'm not sure if there's a way to select a character set in phpmailer, (maybe it's a simple setting somewhere?) or if there is a better way around this problem. I think I should be clear, though, that "search and replace smart quotes and emdashes with their regular equivalents" is NOT the answer I'm looking for (hopefully that's not the only solution).
I found this information:
My php webpage: utf-8
mysql client encoding: latin1
mysql server encoding: latin1
phpmailer character set: iso-8859-1

Character set can be switched in phpmailer with the following code:
$myMail->CharSet = "UTF-8";
This solved my issue. Typographic quotes and double dashes show up in my emails from phpmailer as expected now. This may have been a sorta noobish question (blush). Thanks, Col. Shrapnel for prompting me to look into what encoding all the pieces of the puzzle were using. I'd vote you up but don't have the reputation.
For anyone interested in homework, this link really helped me understand the basics of encoding:
http://www.joelonsoftware.com/articles/Unicode.html

The PEAR Mail_MIME package lets you do this via http://pear.php.net/manual/en/package.mail.mail-mime.get.php I am pretty certain I have used this feature before, but not positive.
You may also need to run things through iconv to normalize the character sets to a single one, if there are multiple data sources.

Related

Using forceUTF8 with email body containing an emoji from a Gmail email converts emoji to "?" and What type of Emoji is a Gmail Emoji?

I am sorry for the title but I had a hard time describing my thoughts. So here we go
I currently have a script that parses emails (not necessarily just from Gmail) in which I get the body of the email and do stuff with it. Works great using this mail parser
I started noticing that sometimes the script would break when certain emails came in. I finally realized that it was when there was an emoji in the body, but it didn't always break everytime. I would get other emails just fine that contained emojis as well.
With more testing (so far) I now know that it breaks when I get emails with emojis in the body from Gmail emails.
What's the difference?
So what I have learned is that Gmail based emojis come in very strange (well for me it is). They are not an image, they are not HEX...nothing that I can tell. They are just a picture (like text) that's not an image. Let me show you a few examples.
Yahoo (comes in as an image)
Gmail (comes in as ???)
Even viewing it right from my database I see the picutre. I can highlight it like text, I can delete it like text, but it's not an image.
Now MySQL
I learned the reason my script was breaking was because I had to have my database set to utf8mb4 from the connection all the way to the table. I made the changes and it now does not break the script and it stores the emoji fine.
Now forceUTF8
I have been using forceUTF8 to clean up the body because I am dealing with different characters like other languages.
The problem I am having now is that when I try to cleanup with Encoding::fixUTF8($body); it will turn the Gmail based emoji as ?
Final Questions:
My main question is:
Why is the emoji being converted to a ? when I try to clean it up
My more important question is:
What am I dealing with when it comes to that type of emoji that's not an image or Hex and acts more like text? I feel if I better understood what I was dealing with I might be able to figure out my issue.

Odd encoding issue after UTF-8 straightens "most" things out

Ok, So we have a script that takes emails sent to thunderbird, convertes part of the message to html and saves it to a MySQL. Every file, every part written is set to UTF-8. Finally, on my end of the work, the CRM (written in PHP5.3 expected output Chrome and Firefox), I pull the message, along with other info and display something resembling GMail, but as a "task list" for our employees.
The problem I'm having, if you havn't guessed already, some customer emails are obviously using different encodings. Thus, some (not all, and certainly not majority) of the e-mails don't show all characters correctly.
At first I made use of utf8_encode to get the email messages to look right, and this helps with most email messages coming from the database, however, a few slip by with bad characters.
In the DB these "bad apostrophes" appear as ’, but after utf8_encode they come through as �??. I've tried various encoding things to guess and change as needed, however, this tends to hurt the vast majority of the other emails.
Any suggestions, on one end of the pipe or the other, how I might get these few emails to match everything else, or how i might at least create a possible preg_replace filter at the end or something?
update
it seems even the emails with bad characters are passed to end php as utf-8 according to mb_detect_encoding. This is before any extra encoding. iconv does detect the ones that ahve problems, but this really gives me no way to solve them and just puts a php error box up on the screen instead of a simple FALSE return that it says it's supposed to give, so this too seems to be no solution.
The problem is that you don't know the encoding of the mail. utf8_encode encodes only from ISO-8859-1 to UTF-8. So you could try to get the encoding with mb_detect_encoding and then convert to UTF-8 with iconv.
EDIT: You could also try to read the Content-Type's charset of the mail.
Found My Answer!
Let me start by saying thanks Sebastián Grignoli for creating this VERY handy class(raw). I ended up working it into my final solution.
Second, I added the class to Codeigniter. For any of you using CI, this is an easy implementation. Simply create a file in application/libraries named Encoding.php (yes with the capital e). Then copy in the code to that file, but comment out(or remove) namespace ForceUTF8 on line 40.
My end result looks something like:
echo(Encoding::fixUTF8(utf8_decode($msgHTML)));
I'm still double checking, but thus far, I've yet to find one single error!
If I do find another encoding issue after this, I'll make sure to update.
SO Question I found that helped.

HTML Entities in email from php mail()

I'm having a strange problem trying to get an HTML email campaign to render the proper text I need.
I'm the legalese at the bottom of my email, there are instances where I need to add a trademark symbol. I've converted all those instances to ™, $#0153; or ™, and when I run the mail script locally, everything looks as it should, however when I run the script on the intended server, all those trademark instances show an empty box character instead.
I should note that elsewhere on the email, I'm using other HTML entities that render fine... –, ’, “ - No problems, only this damn ™ thats driving me crazy.
The offending code:
....DisplayPort™ connectors, and/or DisplayPort™ compliant....
renders as
Problem Solved!
Apparently you can also use the entity code ™ for a trademark, which I was unaware of. It still doesnt explain why any of the other entity codes didn't work, but it has provided me with a working solution, so I'm calling this question answered.
Thanks to all for your help, much appreciated.
Check what character sets your "home" server and the "intended" server are using. If they're mismatched (like utf-8 at home, but latin-1 at work), that'd trash any of the non-standard ascii characters like the copyright symbol.

Error on PHP Site when Copying/Paste from Outlook into Internet Explorer

Some of our users are experiencing a problem after copying and pasting text from MS Outlook into a text area box on our PHP site (running in IE, seems to work fine in other browsers). Specifically, the contents are apparently pasted properly, but when the data is passed back to the server and stored in the PostgreSQL database, no data is actually stored in the database (I'm about to check to see if the PHP is even receiving it in the $_POST variable, I'll post an update when I've done that).
It sounds like a problem with rich-text formatting or perhaps the encoding of what is pasted.
Does anyone know of a solution that we can apply to the PHP site to enforce that the text area only accept plain text (or automatically convert it) for IE?
Thanks!
Update: Sadly, I cannot reproduce the bug on IE 6, 7 or 8 using Outlook Express. Perhaps this is user error...I'll update with more info when I figure out what the actual problem is.
This might happen when some symbols copied are high-ASCII characters, and there is a mismatch with encodings you are working with. Make sure the page, your program, and your database use the same encoding (e.g. all use UTF-8, or whatever you use). I've encountered weird problems (empty strings, cut-off at the instance, etc) with inserting data that has characters like these.
But of course, check that you're actually getting the data to your program in the fist place :)
Try calling strip_tags when pulling out from $_POST.

E-mails sent through php5+htmlMimeMail are being received with random characters replaced with =

currently using PHP5 with htmlMimeMail 5 (http://www.phpguru.org/static/mime.mail.html) to send HTML e-mail communications. Have been having issues with a number of recipients seeing random characters replaced with equals signs e.g.:
"Good mor=ing. Our school is sending our newsletter= and information through a company called..."
Have set e-mail text, HTML, and header encoding to UTF-8. The template files loaded by PHP for the e-mail (just include()'d text/HTML with a few php tags in them) are both encoded in UTF-8.
The interesting thing is that I can't duplicate the problem on any of my e-mail clients, and can't find any information by searching yahoo/googlies that would point me at the problem!!
Try sending with 8-bit encoding:
$message->setTextEncoding(new EightBitEncoding());
$message->setHTMLEncoding(new EightBitEncoding());
I had a similar issue, but mine was a little different. Since I stumbled upon this thread looking for the answer and it helped me find it, I thought I may as well post this related answer here.
In my case special characters were getting messed up in emails even through the actual mb_detect_encoding of the text strings being sent was "UTF-8" and if I echoed them they looked fine.
So I had to us the function
$message->setTextCharset('UTF-8')
and
$message->setHTMLCharset('UTF-8')
I suspect your problem is related to older versions of Exchange. Equal signs at end of line:
It may not be the quoted printable thing with high/low order characters or the encoding. Also, elsewhere on that page it says:
NOTE: A bug ("feature"?) in Exchange
may cause line feeds to be replaced
with equal signs when rich text mail
is disabled.

Categories