I came across the following error in PHP generated by an email forwarded from a Yahoo account:
Notice: Unknown: Invalid quoted-printable sequence: =?UTF-8?Q?ck-off with Weekly Sale up to 90% off (errflg=3) in Unknown on line 0
I've spent hours researching this issue and decided to send myself the exact same output string in an email without having Yahoo involved. The original q-encoded text that decodes correctly:
=?UTF-8?Q?GOG_Forward=3A_Fw=3A_=F0=9F=98=89_A_great_Monday_kick-?= =?UTF-8?Q?off_with_Weekly_Sale_up_to_90=25_off?=
The malformed q-encoded text from Yahoo:
=?UTF-8?Q?GOG_Forward =?UTF-8?Q?ck-off_with_Weekly_Sale_up_to_90%_off?=
The correct string when decoded:
GOG Forward: Fw: 😉 A great Monday kick-off with Weekly Sale up to 90% off
Roundcube manages to decode both the normal and the malformed text though I'm not sure how and 25 megabytes is a bit much to dig through and I haven't been able to determine even where they're decoding subject headers.
How do I fix Yahoo's malformed version of q-encoding?
<?php
//These fail:
echo imap_mime_header_decode($mail_message_headers['Subject']);
echo quoted_printable_decode($mail_message_headers['Subject']);
?>
For clarification the imap_fetchstructure page clarifies the value 4 for encoding is Quoted-Printable / ENCQUOTEDPRINTABLE.
New Development
It turns out that for some reason Yahoo sends the subject twice for the same header, one malformed and the other is not. Here is the Subject header from the raw email:
Subject: =?UTF-8?Q?GOG_Forward:_Fw:_=F0=9F=98=89_A_great_Monday_ki?=
=?UTF-8?Q?ck-off_with_Weekly_Sale_up_to_90%_off?=
MIME-Version: 1.0
I created a solution that uses Roundcube's source code to decode the message.
I posted the code and demo:
You can see it here
Click the big play button to preview the extraction
Go to code tab to see the extracted Roundcube code that you could use for your project
Since you mentioned to not use classes in the example I extracted Roundcube's decode_mime_string() function from rube_mime, and a couple of things from rcube_charset such as $aliases, parse_charset(), and convert().
As far as decoding the malformed text from Yahoo:
=?UTF-8?Q?GOG_Forward =?UTF-8?Q?ck-off_with_Weekly_Sale_up_to_90%_off?=
Into this:
GOG Forward: Fw: 😉 A great Monday kick-off with Weekly Sale up to 90% off
It's impossible. There's not enough data in there. For example it's missing the "😉 A great Monday ki". Do you have the full source of the email address?
Related
I am trying to extract the message body of an encoded email. Everything worked fine for many a year but now there are some extra headings being included which have put a spanner in the works
To follow is the end of the email headers and the start of the message body itself.
When I view the email source it shows a blank line between the two sections, but when I try and split it by this, it does not find it
If I split by the first 5 characters of the message body, it actually ignores the blank line and THE line prior to that ( YamCpMIyU+au/dWzSGjp0w9hpHu/m/vs8HM=)
I am utterly confused by this and am reaching out for any advice you can give.
Content-Transfer-Encoding: base64
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=groups.io;
q=dns/txt; s=20140610; t=1551842112;
bh=QhnqlnG4IESh6eMyHbR+KrM4N9LZk0XPpXuFqpHah2U=;
h=Content-Type:Date:From:Reply-To:Subject:To;
b=DRGclGLkyYq+uYoipKgl7d7CTB3Z8MQ/SVEiJe5KwiW91BuPTXRwnTaAb9AjTa+xbxC
1QYGocd8r8ZD2v9JRdlqLWTb9m9M91nRhO8tsbBbVK7VofmOmzYzHpVEfdQMJBo/jbth8
YamCpMIyU+au/dWzSGjp0w9hpHu/m/vs8HM=
UGxhbmVQbG90dGVyIExvZyBmcm9tIE1hY2Fww6EsIEFtYXDDoSAtIEJSIGZvciAwNS8wMy8yMDE5
IFRpbWVzIGFyZSBVVEMNCkxvY2F0aW9uOiBNYWNhcMOhLCBBbWFww6EgLSBCUiwgQXV0aG9yOiBG
YWJpYW5vIEZlcnJlaXJhLCBSZWNlaXZlcjogIFNCTVENCg0KQ3JlYXRlZCB3aXRoIE5pYyBTdG9y
ZXlzIFBsYW5lUGxvdHRlciBSZXBvcnQgVmlld2VyIFZlciAyLjMgDQpEb3dubG9hZCBGcm9tOiBo
Many thanks
Alexis
I have been trying convert many IMAP message bodies to something more readable (UTF-8 or equivalent). I cannot seem to find an out-of-the box function to work.
Here is an example of what I am trying to decode:
President Trump signed an executive order Thursday tar= geting North Korea=E2=80=99s trading partners, calling it a =E2=80=9Cpowerful=E2= =80=9D new tool aimed at isolating and de-nuclearizing the regime.
More on thi= s: http://www.foxnews.com= /politics/2017/09/21/trump-signs-executive-order-targeting-north-koreas-tra= ding-partners.html
(in the sample above, any "= ", there should be a newline)
A few things that I have tried:
iconv("UTF-8", "Windows-1252//TRANSLIT//IGNORE", $data);
//this resulted in a server error 500
imap_mime_header_decode($data);
//this outputs an array (just something that I tried; yes, I know that it is only good for headers)
iconv_mime_decode($test, 0, "ISO-8859-1");
//This works for a few messages (plaintext ones) but does not output anything for the example above; for others, it only outputs part of the message body
mb_convert_encoding($test, "UTF8");
//this results in another internal server error!
$data = str_replace("=92", "'", $data);
//I have also tried to manually find and replace an occurrence of a utf-7 (I guess) encoded string
Anyways, there is something that I am doing totally wrong but not sure what. How do you all read the body of an email retrieved with IMAP?
What are some other things that I can try? People must do just this the entire time but I can't seem to find a solution...
Thank you,
Rog
You're not actually dealing with the UTF-7 encoding here. What you're actually seeing is quoted-printable.
php contains a function to decode this
I actually haven't written php in quite some time so forgive my style failures, here's an example which decodes your text:
<?php
$s = 'President Trump signed an executive order Thursday tar= geting North Korea=E2=80=99s trading partners, calling it a =E2=80=9Cpowerful=E2= =80=9D new tool aimed at isolating and de-nuclearizing the regime.';
// It's unclear why I have to replace out `= `, I have a feeling these
// are actually newlines and copy paste error?
echo quoted_printable_decode(str_replace('= ', '', $s));
?>
When run it produces:
President Trump signed an executive order Thursday targeting North Korea’s trading partners, calling it a “powerful” new tool aimed at isolating and de-nuclearizing the regime.
I need to send a very long link using PHP. Known problem: the link is getting broken by the e-mail clients. I've tried it with plain/text or html mails, I put the url in brackets () as proposed in other threads- nothing helps. I know about url shorteners and the possibilty of solving this problem with databases, BUT!!! It IS possible to send links with hundreds of characters; e.g. Ebay does, Amazon does... the link for comfirming the registration from stackoverflow contains more than 250 characters, so?! Looking at the source code of these mails all lines break after 76 characters by default. I've tried to do the same with PHP wordwrap. Result; the source code looks identical, but my links are broken, their links are not! Any ideas? I'd be very glad for help, cause that bothers me!!!! :)
I could solve the problem on my own. First, the special characters of the link must be encoded (e.g. Thunderbird will now accept the encoded link just like this). Second, set a line-break by default after 76 characters. To avoid that the link gets broken or won't be recognized as a link by the client program anymore, each line needs to end on "=" in order to be recombined...
<?php
$url = 'http://domainxy.com/index.php';
$ending = '?var1=gsgsdgsfgdhfjfgj&var2=sdferewerwrr&var3=jghjghjkloozzzz&var4=ghajsldahskhdhriehfsjndfnjnjjfnjsnjdfhksö&var5=öäüöü';
$ending = utf8_encode($ending);
$ending = rawurlencode($ending);
$link = wordwrap( $url . $ending, 75, "=<br />\n", true );
echo $link;
?>
/*
Encodes and devides the link like this:
http://domainxy.com/index.php%3Fvar1%3Dgsgsdgsfgdhfjfgj%26var2%3Dsdferewerw=
rr%26var3%3Djghjghjkloozzzz%26var4%3Dghajsldahskhdhriehfsjndfnjnjjfnjsnjdfh=
ks%C3%B6%26var5%3D%C3%B6%C3%A4%C3%BC%C3%B6%C3%BC
*/
I am encrypting form submissions in drupal with Pear. The encryption//decryption is working fine, but line breaks are not working.
What I mean by that is, whether I use Enigma in thunderbird, or I use gpg4win, the message gets decrypted, but it looks like this:
Online Form = 1\r\nFirst Name: John\r\nLast Name: Smith\r\n
I have tried \n\n \r \r\n none of these seem to work.
So the question is how do I get line breaks to output properly after being decrypted.
(I'm using the drupal mail function sending plain text email with UTF-8 encoding, although I don't think this matters since the message is being decrypted)
Hello I have two php file. One of them builds a report, the second contains the language text. When it prints, it keeps giving me the � special character everywhere, even if I am not using any special characters in my code. Why is that and how can I get rid of those?
I am running Apache 2.2, php 5, Ubuntu 8.04.
FILE 1
<?php
function glossary() {
return <<<HTML
<h1>Arteries</h1>
<p><strong>Arteries</strong> are blood vessels that carry blood <strong>away from
the heart</strong>. All arteries, with the exception of the pulmonary and umbilical
arteries, carry oxygenated blood. The circulatory system is extremely important for
sustaining life. Its proper functioning is responsible for the delivery of oxygen
and nutrients to all cells, as well as the removal of carbon dioxide and waste products,
maintenance of optimum pH, and the mobility of the elements, proteins and cells of
the immune system. In developed countries, the two leading causes of death, myocardial
infarction and stroke each may directly result from an arterial system that has been
slowly and progressively compromised by years of deterioration.</p>
HTML;
}
?>
FILE 2:
<?php
require_once("language.php");
echo glossary();
?>
This is the printout when I execute file 2.
Glossary
Arteries
Arteries�are�blood vessels�that carry blood�away from the�heart. All arteries, with the exception of the�pulmonary�and�umbilical arteries, carry oxygenated blood. The�circulatory system�is extremely important for sustaining�life. Its proper functioning is responsible for the delivery of�oxygen and�nutrients�to all cells, as well as the removal of�carbon dioxide�and waste products, maintenance of optimum�pH, and the mobility of the elements, proteins and cells of the�immune system. In�developed countries, the two leading causes of�death, myocardial infarction�and�stroke�each may directly result from an arterial system that has been slowly and progressively compromised by years of deterioration.
Autoimmunity
Autoimmunity�is the failure of an organism to recognize its own constituent parts as�self, which allows an immune response against its own cells and tissues. Any disease that results from such an aberrant immune response is termed an�autoimmune disease.�
Basal cell carcinoma
Basal cell carcinoma�is the most common type of�skin cancer. It rarely�metastasizes�or kills, but it is still considered�malignant because it can cause significant destruction and disfigurement�by invading surrounding tissues. Statistically, approximately 3 out of 10 Caucasians develop a basal cell cancer within their lifetime. In 80 percent of all cases, basal cell cancers are found on the head and neck.�There appears to be an increase in the incidence of basal cell cancer of the trunk in recent years.
Try deleting and re-entering the spaces which show up as "�".
I suspect those you re-enter will be fine. The document likely contains alternate Unicode space characters which appear normally in your editor, but are unrecognized by the PHP code running in the default character set for your server.
Did this document originally come from MS Word or some other word processor?
You need to make sure that you have your editor encoding set to something sensible such as UTF-8. You should also make sure that your output is set to UTF-8 (or whatever encoding is relevent to you). This can be done using a meta tag <meta content="text/html; charset=UTF-8" http-equiv="Content-Type"/> and setting the PHP header header('Content-type: text/html; charset=UTF-8'); before your output begins.
Have you checked your encoding? Make sure that you use the same encoding within the editor, apache, php and the browser you are using.
Hope this helps.
In small cases like this I use notepad as a lazy man's sanitizer of charsets. Paste the text into notepad. Copy and paste it back into your document. Spaces will now be spaces.