HTML Entities in email from php mail() - php

I'm having a strange problem trying to get an HTML email campaign to render the proper text I need.
I'm the legalese at the bottom of my email, there are instances where I need to add a trademark symbol. I've converted all those instances to ™, $#0153; or ™, and when I run the mail script locally, everything looks as it should, however when I run the script on the intended server, all those trademark instances show an empty box character instead.
I should note that elsewhere on the email, I'm using other HTML entities that render fine... –, ’, “ - No problems, only this damn ™ thats driving me crazy.
The offending code:
....DisplayPort™ connectors, and/or DisplayPort™ compliant....
renders as

Problem Solved!
Apparently you can also use the entity code ™ for a trademark, which I was unaware of. It still doesnt explain why any of the other entity codes didn't work, but it has provided me with a working solution, so I'm calling this question answered.
Thanks to all for your help, much appreciated.

Check what character sets your "home" server and the "intended" server are using. If they're mismatched (like utf-8 at home, but latin-1 at work), that'd trash any of the non-standard ascii characters like the copyright symbol.

Related

Odd encoding issue after UTF-8 straightens "most" things out

Ok, So we have a script that takes emails sent to thunderbird, convertes part of the message to html and saves it to a MySQL. Every file, every part written is set to UTF-8. Finally, on my end of the work, the CRM (written in PHP5.3 expected output Chrome and Firefox), I pull the message, along with other info and display something resembling GMail, but as a "task list" for our employees.
The problem I'm having, if you havn't guessed already, some customer emails are obviously using different encodings. Thus, some (not all, and certainly not majority) of the e-mails don't show all characters correctly.
At first I made use of utf8_encode to get the email messages to look right, and this helps with most email messages coming from the database, however, a few slip by with bad characters.
In the DB these "bad apostrophes" appear as ’, but after utf8_encode they come through as �??. I've tried various encoding things to guess and change as needed, however, this tends to hurt the vast majority of the other emails.
Any suggestions, on one end of the pipe or the other, how I might get these few emails to match everything else, or how i might at least create a possible preg_replace filter at the end or something?
update
it seems even the emails with bad characters are passed to end php as utf-8 according to mb_detect_encoding. This is before any extra encoding. iconv does detect the ones that ahve problems, but this really gives me no way to solve them and just puts a php error box up on the screen instead of a simple FALSE return that it says it's supposed to give, so this too seems to be no solution.
The problem is that you don't know the encoding of the mail. utf8_encode encodes only from ISO-8859-1 to UTF-8. So you could try to get the encoding with mb_detect_encoding and then convert to UTF-8 with iconv.
EDIT: You could also try to read the Content-Type's charset of the mail.
Found My Answer!
Let me start by saying thanks Sebastián Grignoli for creating this VERY handy class(raw). I ended up working it into my final solution.
Second, I added the class to Codeigniter. For any of you using CI, this is an easy implementation. Simply create a file in application/libraries named Encoding.php (yes with the capital e). Then copy in the code to that file, but comment out(or remove) namespace ForceUTF8 on line 40.
My end result looks something like:
echo(Encoding::fixUTF8(utf8_decode($msgHTML)));
I'm still double checking, but thus far, I've yet to find one single error!
If I do find another encoding issue after this, I'll make sure to update.
SO Question I found that helped.

What would cause an to turn into a unicode character?

I've got some documents on my website which users can edit via a rich text editor and then save them (to the DB) and print them. Some users are experiencing an issue (only happening on the live site) where some of the characters are getting screwed up. I've checked the DB, and the funny characters are in the DB, so it's not a display issue. It either happens when they save the document (submit the form on the site) or they've put something weird in there or their browser changed some of the characters.
The character that keeps appearing everywhere is  . It's an accented A followed by a space. Looking at the source HTML, it appears that the affected documents had all their 's converted. But whenever I try it, they come out fine.
What would cause an to turn into a unicode character, but only in limited cases?
Misinterpreting the UTF-8 encoding as Latin-1 will cause this.
>>> u'\xa0'.encode('utf-8').decode('latin-1')
u'\xc2\xa0'
>>> print u'\xa0*'.encode('utf-8').decode('latin-1')
 *

"Smart Quotes" not displaying properly in email from phpmailer

I'm dealing with a LAMP web server. I have forms that users use to submit text that is stored in a text field in mysql. Often this text is copied and pasted from Microsoft Office products, so I'm getting a lot of smart quotes and emdashes. These characters display properly if I retrieve them from the database and display them on the webpage, but where I'm running into trouble is sending the text in an email using the phpmailer class. I get stuff that looks like this: – (where it should be an emdash).
One thing that may be important: If I pull up a console in mysql and select a field that has an emdash or smart quote in it, it will display on my console incorrectly: –, however, as stated above, if my php page (using PDO) selects the field and displays it, it will display correctly in a browser (as an emdash in this case).
I'm not sure if there's a way to select a character set in phpmailer, (maybe it's a simple setting somewhere?) or if there is a better way around this problem. I think I should be clear, though, that "search and replace smart quotes and emdashes with their regular equivalents" is NOT the answer I'm looking for (hopefully that's not the only solution).
I found this information:
My php webpage: utf-8
mysql client encoding: latin1
mysql server encoding: latin1
phpmailer character set: iso-8859-1
Character set can be switched in phpmailer with the following code:
$myMail->CharSet = "UTF-8";
This solved my issue. Typographic quotes and double dashes show up in my emails from phpmailer as expected now. This may have been a sorta noobish question (blush). Thanks, Col. Shrapnel for prompting me to look into what encoding all the pieces of the puzzle were using. I'd vote you up but don't have the reputation.
For anyone interested in homework, this link really helped me understand the basics of encoding:
http://www.joelonsoftware.com/articles/Unicode.html
The PEAR Mail_MIME package lets you do this via http://pear.php.net/manual/en/package.mail.mail-mime.get.php I am pretty certain I have used this feature before, but not positive.
You may also need to run things through iconv to normalize the character sets to a single one, if there are multiple data sources.

E-mails sent through php5+htmlMimeMail are being received with random characters replaced with =

currently using PHP5 with htmlMimeMail 5 (http://www.phpguru.org/static/mime.mail.html) to send HTML e-mail communications. Have been having issues with a number of recipients seeing random characters replaced with equals signs e.g.:
"Good mor=ing. Our school is sending our newsletter= and information through a company called..."
Have set e-mail text, HTML, and header encoding to UTF-8. The template files loaded by PHP for the e-mail (just include()'d text/HTML with a few php tags in them) are both encoded in UTF-8.
The interesting thing is that I can't duplicate the problem on any of my e-mail clients, and can't find any information by searching yahoo/googlies that would point me at the problem!!
Try sending with 8-bit encoding:
$message->setTextEncoding(new EightBitEncoding());
$message->setHTMLEncoding(new EightBitEncoding());
I had a similar issue, but mine was a little different. Since I stumbled upon this thread looking for the answer and it helped me find it, I thought I may as well post this related answer here.
In my case special characters were getting messed up in emails even through the actual mb_detect_encoding of the text strings being sent was "UTF-8" and if I echoed them they looked fine.
So I had to us the function
$message->setTextCharset('UTF-8')
and
$message->setHTMLCharset('UTF-8')
I suspect your problem is related to older versions of Exchange. Equal signs at end of line:
It may not be the quoted printable thing with high/low order characters or the encoding. Also, elsewhere on that page it says:
NOTE: A bug ("feature"?) in Exchange
may cause line feeds to be replaced
with equal signs when rich text mail
is disabled.

Questions about iPhone emoji and web pages


Okay, so emoji basically shows the above on a computer. Is that another programming language? So how do I put those little boxes into a php file? When I put it into a php file, it turns into question marks and what not. Also, how can I store these in a MySQL without it turning into question marks and other weird things?
how do I put those little boxes into a php file?
Same way as any other Unicode character. Just paste them and make sure you're saving the PHP file and serving the PHP page as UTF-8.
When I put it into a php file, it turns into question marks and what not
Then you have an encoding problem. Work it out with Unicode characters you can actually see properly first, for example ąαд™日本, before worrying about the emoji.
Your PHP file should be saved as UTF-8; the page it produces should be served as Content-Type: text/html;charset:UTF-8 (or with similar meta tag); the MySQL database should be using a UTF-8 collation to store data and PHP should be talking to MySQL using UTF-8.
However. Even handling everything correctly like this, PCs will still not show the emoji. That's because:
they don't have fonts that include shapes for those characters, and
emoji are still completely unstandardised. Those characters you posted are in the Unicode Private Use Area, which means they don't have any official meaning at all.
Each network in Japan uses different character codes for their emoji, mapped to different areas in the PUA. So even on another mobile phone, it probably won't display the correct character, unless you spend ages manually converting emoji codes for different networks. I'm guessing the ones you posted above are from SoftBank (iPhone?).
There is an ongoing proposal led by Google and Apple to collate the different networks' emoji and give them a proper standardised place in Unicode. Until then, getting emoji to display consistently across networks is an exercise in unhappiness. See the character overview from the standardisation work to see how much converting you would have to do.
God, I hate emoji. All that pain for such a load of useless twee rubbish.
This has nothing to do with programming languages, just with encoding and fonts. As a very brief overview: Every character is stored by its character code (e.g.: 0x41 = A, 0x42 = B, etc), which is rendered as a meaningful character on your screen using a font (which says "the character with the code 0x41 should look like this ...").
These emoji occupy the "private use area" of the Unicode table, which is a range of codes that are undefined and free for anyone to use. That makes them perfectly valid character codes, it's just that no standard font has an appropriate character to display for them, since they are undefined. Only the iPhone and other handhelds, mostly in Japan, have appropriate icons for these codes. This is done to save bandwidth; instead of transmitting relatively large image files back and forth, emoji can be transmitted using a single character code.
As for how to store them: They should be storable as is, as long as you don't try to convert them to another encoding, in which case they may get lost. Just be aware that they only make sense on the iPhone and other SoftBank phones in Japan.
Character Viewer http://img.skitch.com/20091110-e7nkuqbjrisabrdipk96p4yt59.png
If you're on OSX you can copy and paste the character into the Character Viewer to find out what it is. I think there's a similar Character Map on Windows (albeit inferior ;-P). You could put it through PHP's ord(), but that only works on ASCII characters. See the discussion on the ord page for UTF8 functions.
BTW, just for the fun of it, these characters display fine on the iPhone as is, because the iPhone has a font which has icons for them:
iPhone http://img.skitch.com/20091110-bjt3tutjxad1kw4p9uhem5jhnk.png
I'm using FF3.5 and WinXP. I see little boxes in my browser, too.
This tells me the string requires a character set not installed on my computer.
When you put the string into a PHP file, the question marks tell you the same thing: your computer doesn't know how to display the characters.
You could store these emoji characters in MySQL if you encoded them differently, probably using UTF-8.
Do a web search for character encoding, as it relates to MySQL.

Categories