I'm working to clean up emails before they get stored in a database. A fandango email was sent as being encoded as 4 (quoted-printable). Here is part of the message without decoding...
=0A=0A=A0=0AJohn=0A(800) 123-4567=0A=0A----- Forwarded Message =
=20=0ASent:=20Thursday,=20July=204,=202013=204:14=20PM=0ASubject:=20Your=20Despicab=
le=20Me=202=20iTunes=20Download=0A=20=0A=0A=0ADespicable=20Me=202=20=0A=20=20=0A=20Your=20purchase=20=
of=20tickets=20for=20Despicable=20Me=202=20has=20earned=20you=20a=20complimentary=20download=20of=20t= he=20song=20'Just=20a=20Cloud=20Away'=20by=20Pharrell=20from=20the=20Original=20Motion=20Picture=20So=
undtrack=20on=20iTunes.=20=0AWe=20hope=20you=20enjoy=20the=20song=20and=20the=20film!=0AIf=20you=20ha=
ve=20iTunes=20installed,=20click=20here=20to=20start=20your=20complimentary=20download.=0AIF=20=
YOU=20DO=20NOT=20HAVE=20iTunes=20INSTALLED:=0A=0A1.=20Download=20iTunes=20for=20Mac=20or=20Window=
s,=20free=20of=20charge=20at=20www.iTunes.com.=20=0A2.=20Open=20iTunes=20and=20click=20iTunes=20Sto=
re.=20=0A3.=20Click=20Redeem=20under=20Quick=20Links.=20=0A4.=20Enter=20the=20code=20below.=20Your=20= download=20will=20start=20immediately.=20Enjoy.=20=0ADownload=20Code:=20FML6H34XXTMJ=20=0AC=
But when I use quoted_printable_decode() on the variable it produces no text.
This url has a decoder that works, albeit in ASP/VB...
http://www.motobit.com/util/quoted-printable-decoder.asp
I'm guessing the code here is relevant...
http://www.motobit.com/tips/detpg_quoted-printable-decode/
It decodes the quote-printable HTML above correctly. Hopefully this will help someone trying to help me. I'm sure I'm not the only one to encounter broken quote-printable email messages.
It looks like there are some spaces in the quoted-printable encoded string that you posted. That's probably what's causing the problem - if it's truly quoted-printable, than the encoded string should not contain any spaces. Spaces are =20 in quoted printable. If you use a replace function (e.g. PHP's str_replace) to replace the spaces in the encoded string with =20, then you get the following quoted-printable encoded string:
John=0D=0A(800)=20123-4567=0D=0A=0D=0A-----=20Forwarded=20Message=20
Then, this string can be decoded using PHP's quoted_printable_decode() function.
If you copy the quoted-printable encoded text above to a file, then run the following PHP script (which reads the quoted-printable text from the file, gets rid of the spaces using the str_replace function, then decodes the quoted-printable text using the quoted_printable_decode function), you should see that it produces the correct decoded output:
<?
$filename="./qp.txt";
$file = fopen($filename,"r");
$qp = fread($file,filesize($filename));
fclose($file);
$qp=str_replace(" ", "", $qp);
print "<plaintext>";
print quoted_printable_decode($qp);
?>
Related
I have a website that will allow people to post things to it using the subject line of an email in Outlook. Using PHP and imap, I get the subject line of the text and store it in a mysql db. But every once in a while, someone will copy text from a website into the subject line of that email and I will get garbled text. Similar to this:
=?Windows-1252?Q?_Every_day_in_our_offices_we_recycle_cardboard,aluminum?=
=?Windows-1252?Q?=96_won=92t_you_join_us=3F?=
What I've done is try to decode this text so it will appear normal on the page using the following code:
$subject = strip_tags($mailHeader->subject);
$header = imap_mime_header_decode($subject);
$subject = "";
for($i=0;$i<count($header);$i++)
{
$subject .= $header[$i]->text;
}
When finished I get rid of most of the garbled text, but am left behind with replacement characters for an em dash and a curly quote that was in the original subject line text. See the result below:
Every day in our offices we recycle cardboard, aluminum, � won�t you join us?
The charset for the website is set to UTF-8. When I set the website charset to ISO-8859-1, the replacement characters are replaced with the curly quote and em dash, which is great but I want to leave the website's charset at UTF-8.
Any help on how to get rid of the replacement characters without changing the charset to ISO-8859-1 would be great. Thanks.
Code above works except for one small change to the very end:
$subject .= mb_convert_encoding($header[$i]->text, "UTF-8", $header[$i]->charset);
Each of the objects returned by imap_mime_header_decode includes a charset property, which you are ignoring. You would need to convert each one to UTF-8 in your loop, using something like:
$subject .= mb_convert_encoding($header[$i]->text, "UTF-8", $header[$i]->charset);
As an alternative, consider using the mb_decode_mimeheader or iconv_mime_decode_headers functions. Both of these functions do the entire job of decoding a MIME header for you, returning a string in PHP's internal encoding (which is usually UTF-8).
I need to attach a confirmation code that can hold several parts and almost random characters, to an email.
The idea is to print the URL with that code in the message body (in HTML).
Is the base64_encode() function enough to make it safe to be parsed by the browser ?
Base64 encoding is to take binary data and make it suitable for being treated as text.
That isn't your problem (you said you had random characters) so you shouldn't use Base64.
You have text and want to insert it into a URL. You need to URL encode it. That is what the urlencode() function is for.
You then want to insert that URL in an HTML document. That is what the htmlspecialchars() function is for.
$data = function_to_get_random_data();
$url_safe_data = urlencode($data);
$url = "http://example.com/$url_safe_data";
$html_safe_url = htmlspecialchars($url);
$html = "$html_safe_url";
I have a problem with base64 decode
For example, I have this code:
else if($_POST['submit']==2){
$send = base64_encode($text_success);
wp_redirect( home_url('/sales-funnel/add-product?msg='.$send) ); exit;
}
I send encoded string to page, so that user can't simply read it from url.
$text_success contain html code, which is generated if $wpdb->query not contain errors:
`Darījums veiksmīgi pievienots!</br>Komentārs veiksmīgi pievienots!</br>Klients veiksmīgi pievienots!</br>Fāze ir pievienota! </br>`
In all online base64_decode it's works great, but my WordPress site return empty string when I'm trying to do:
if (isset($_GET['msg']) && !empty($_GET['msg'])){
$text = base64_decode($_GET['msg']);
echo $text;
}
But, $_GET['msg'] = RGFyxKtqdW1zIHZlaWtzbcSrZ2kgcGlldmllbm90cyE8L2JyPktvbWVudMSBcnMgdmVpa3NtxKtnaSBwaWV2aWVub3RzITwvYnI+S2xpZW50cyB2ZWlrc23Eq2dpIHBpZXZpZW5vdHMhPC9icj5GxIF6ZSBpciBwaWV2aWVub3RhISA8L2JyPg==
P.S. I tried to use it without html tags, all works great.
The problem is related to the fact that the base64 alphabet is not URL-safe. In this particular case, your base64-encoded string contains a +, which is interpreted as a space.
To solve this, you can either:
Use a URL-safe version of the base64 alphabet, i.e. replacing + with -, / with _, and trimming the trailing = paddding, as described in RFC4648. This answer includes a code sample for this approach.
URL-encode your content after base64 encoding, turning + into %2B, / into %2F, and = into %3D.
This should solve your problem, but it goes without saying that in an untrusted environment, giving users the ability to inject raw HTML into your site constitutes a serious security risk.
After obtaining info from an email body, I have a lot of symbols such as =0D, =A20, etc... How can I remove them? I do not want to use
$body = str_replace('=A20', '', $body);
because if the email body actually contains that it will be replaced.
Any ideas? Thanks!
Don't replace them to nothing - thoose characters aren't nothing, they are part of the text.
E-mail messages aren't plain text, they are encoded. Thoose examples are part of the quoted-printable encoding, which you can identify by the
Content-Transfer-Encoding: quoted-printable
line at the beginning of the e-mail message.
And php has a method to decode it
I have a PHP script that read emails/usenet messages, I found a case where I have a text that's a mix of arabic & latin words, ie.
PHP and ARABIC_WORD
ie.
PHP and الساعة
The problem is, the text is encoded, ie.
Some Text =?utf-8?b?RVByaW50cyBhbmQg2KfZhNi52LHYqNmK2Kk=?=
My question is How can I decode this ?utf-8?... when it's mixed with latin text?
I'm using PHP 5.4.15
What you've got is the MIME Encoded-Word syntax used in email messages for non US-ASCII encoded texts:
The form is: "=?charset?encoding?encoded text?=".
charset may be any character set registered with IANA. Typically it would be the same charset as the message body.
encoding can be either "Q" denoting Q-encoding that is similar to the quoted-printable encoding, or "B" denoting base64 encoding.
encoded text is the Q-encoded or base64-encoded text.
-An encoded-word may not be more than 75 characters long, including charset, encoding, encoded text, and delimiters. If it is desirable to encode more text than will fit in an encoded-word of 75 characters, multiple encoded-words (separated by CRLFSP) may be used.
So this little excerpt from wikipedia also contains how you can decode the string. Sure you're not the first one who needs to do this, therefore libraries exist. See as well:
Best way to handle email parsing/decoding in PHP?
proper way to decode incoming email subject (utf 8)
it seems to be encoded text: try with php function base64_decode.
$my_string = 'test string';
$res = base64_encode($my_string);
echo $res; //dGVzdCBzdHJpbmc=
echo base64_decode($res); // test string
in fact, decoding your string:
base64_decode("RVByaW50cyBhbmQg2KfZhNi52LHYqNmK2Kk=")
return something like this:
EPrints and العربية