I'm trying to programmatically parse my Gmail for various indexing functions, and am having trouble finding certain headers that I thought were standard email headers. I'm using the Zend IMAP library, and have no problems with authentication and otherwise viewing/manipulating my Gmail. However, I'm having trouble with some headers missing. For instance
about 1 out of 10 of the messages are missing the "message-id" header, including many sent from other gmail addresses
occasionally, though rarely, the 'content-type','content-disposition', and 'filename' headers are missing from attachment headers. These always seem to be messages that are part of a longer thread of messages.
Can anybody explain why these headers might be missing? If the "message-id" header is missing, what is used as the unique identifier? Perhaps some sort of combination of other headers?
According to RFC 5322:
The only required header fields are the origination date field and the originator address field(s). All other header fields are syntactically optional.
The same RFC says:
Though listed as optional in the table in section 3.6, every message SHOULD have a "Message-ID:" field. Furthermore, reply messages SHOULD have "In-Reply-To:" and "References:" fields as appropriate and as described below.
So Message-ID isn't strictly-speaking mandatory. If it's missing, try looking for either the In-Reply-To or References fields.
Related
This PHP Mime Mail Parser library is very useful:
https://github.com/php-mime-mail-parser/php-mime-mail-parser
Example request:
$arrayHeaderTo = $parser->getAddresses('to');
It lets you analyze the intended recipient, sender, subject, etc., of an email message in RFC822 standards format.
For those like me that are even more curious about an email message, such as which mailserver originated the message, I'd like to extend the library to have a dependable way to print this value, regardless of how it's found in the headers.
https://toolbox.googleapps.com/apps/messageheader/analyzeheader
Google provides a tool for analyzing message headers. I pasted an example here:
In this example, it would extract that complete server address ( ec2-54-245-11-255.us-west-2.compute.amazonaws.com ), and if possible, that server's IP, too. Assuming a good deal of these two values (mailserver, mailserverIP) are going to be spoofed, I would still like to see the values that are included in the header, dependably, regardless of how that header is structured, or wherein they appear.
So I guess this is a question for those who are very familiar with email headers, and a solid way to read them to extract this information.
The company I work for provides bulk-mailing functionality to our clients [double opt-in, not spam, I promise] and we get a figurative ton of reports back via Feedback Loops from AOL, Comcast, Yahoo, etc. These are generally from people that signed up, don't want it anymore, have been conditioned to not click 'Unsubscribe' links, [because "that's how the spammers get you"] and simply mark all the messages as spam.
Now, these FBL emails follow a specific format where the message is multipart, there are one or two text parts, and then the original message is attached, usually with all recipient information stripped out. This attached email is also multipart and contains the unsubscribe link, but the section in the attached email the link occurs in is quoted-printable encoded and the link is longer than what quoted-printable allows for in a line, so it get munged. Occasionally the section seems to get base64-encoded, I think it happens if the client is using a fancy language like chinese/japanese/etc.
What I need is a mime/multipart data parser that can give me these parts. PHP has oh so helpfully not implemented any form of multipart parser that I can find outside of what's internal to either their horrid IMAP functions, or internal to PHP itself which processes multipart form data.
Does anyone know of something I can use for this short of having to write my own? I had found one script, but it relies on old PECL functionality that relies on a custom-compilation of PHP which is not an option for this server.
TL;DR: PHP's imap_* functions will parse the parts of the message received from the server, but I need to parse the parts of an email attached to the email downloaded from the server.
This guy's script is ugly as sin, but it gets the job done:
http://www.phpclasses.org/package/3169-PHP-Decode-MIME-e-mail-messages.html
I was wondering if anyone here with experience of PEAR mail or PEAR mail queue could help me out with this.
I am working on creating a bulk mailing service using PEAR and am adding X-headers to give information on where and when people signed up.
So am am trying to create a X-header similar to this:
X-Subscription: Subscribed on 2010/09/01, via web form, by 92.8.196.121 from http://mydomain.com/signup.htm
However after I pass the headers to PEAR mail mime and queue they are formatted with a line break at certain points so they end up looking like this:
X-Subscription: Subscribed on 2010/09/01, via web form, by 92.8.196.121 from
http://mydomain.com/signup.htm
I have tested this by creating a few different headers and the line break always comes after a certain amount of characters but I cannot seem to find any code in PEAR which would cause this.
Does anyone here have any experience of this? Or know of a way I could fix this?
Thanks for looking
The "issue" of headers being split onto multiple lines is correct behavior according to RFC 822, section "3.1.1. LONG HEADER FIELDS":
For convenience, the field-body portion of this conceptual
entity can be split into a multiple-line representation; this
is called "folding". The general rule is that wherever there
may be linear-white-space (NOT simply LWSP-chars), a CRLF
immediately followed by AT LEAST one LWSP-char may instead be
inserted.
As described in What is the email subject length limit?, RFC 2822 suggests to keep a line length of 78.
Is it possible to send a MIME message as it is, without adding any headers? For example, if I have a correct MIME message with all headers and content saved to a text file, is it possible to use the contents of this file without modification and send it via SMTP?
Apparently both python's SMTP.sendmail and PHP smtp::mail require at least "To:" and "From:", and passing the complete message to these functions doesn't seem to work.
It appears from the documentation that python's SMTP.sendmail should take a sender, a set of recipients, and a verbatim MIME message like the one you have. (The split here between the sender/recipients and the message itself is because you're talking SMTP. The SMTP envelope determines the actual recipients and is actually independent of the message payload.) So you should be good to go with SMTP.sendmail.
You could read up to the first blank line, use those as additional headers, then send the rest in the body.
Folks,
I have a PHP-based site (using the QCubed framework); as a part of the site, I have a daemon that's sending out several thousand emails a day (no i'm not a spammer, everything is opt-in :)). Emails are sent through a custom framework component; that component serves as an SMTP client. I'm using a paid SMTP gateway from DNSExit.com to get the emails actually delivered.
Those emails are simple HTML-based emails; they really have just simple links inside.
My issue is that these links sometimes (not consistently!) get scrambled during transition. Tags somehow get mixed up, and some links are non-functional in the email. The issue happens on a small percentage of all sent emails; it is not consistent (i.e. the same exact source message HTML may or may not cause the scrambling in transition).
Have any of you seen this? Any thoughts on how to troubleshoot?
Is it possible that you are using temp files to create the emails (or at minimum to create the variable content)? I did something vaguely similar once upon a time. The email text was generated and written to a temp file based on the exact time in seconds. Unfortunately, when sending thousands per day, we were hitting the same second more than once (since there are only 86k seconds available). That might explain a) the small error rate and b) the apparent randomness. For troubleshooting, I'd just see if the error rate increases with the number of emails and go from there.
I ran into a similar problem on a server running sendmail.
I was creating and testing an html email that would one day be mass mailed (opt-in, of course). I had myself a template for the email that was easy for any html programmer to read, but as such was heavy on the whitespace to line everything up correctly. I thought to myself, if this is going to be mass emailed, after the template is rendered, I think I will minimize the whitespace in the file to save on space! So I created a brilliant regular expression to rid any unnecessary to send whitespace from the rendered email.
Upon sending the email to myself, I opened the email and was baffled when I saw that some of the css and html were not showing up correctly, when my previous emails prior to my regexp were. By looking at the original message I noticed that every once in a while, an exclamation mark (!) was appearing seemingly randomly throughout the message, thus breaking any css and html that came in its random path.
Turns out that sendmail doesn't like it if a line in your email gets too long without a line break. When the line does get too long, sendmail will insert an exclamation mark followed by a line break right then and there, just to confuse and confound you.
Why did it not just choose a space between words to line break? Why insert the exclamation mark? Questions I'm afraid, without answers.
My solution?
sudo apt-get remove sendmail
sudo apt-get install exim4
I was having other problems with sendmail like it taking a full 60 seconds to send an email and exim4 just worked and I have never had to think about it again.
If your mail server is using sendmail, this very well could be the problem, if not, thank you for letting me share my story with you. I needed to vent.
When you're sending email you should encode it so every line in the message body is not longer then 76 characters. You could use base64 for this but most systems use the
quoted-printable encoding for text because it generates smaller messages.
Base64 is usually only used for binary data.
The problem is that HTML is not compatible with email. That is why I created Mail Markup Language.
HTML was created to operate with the HTTP protocol as those two technologies were invented by the same person at about the same time. The difference is that HTTP is a single session one way transfer from a server to a client. That never changes as the HTML document always originates on a server, is sent to a requesting client, and once the transfer completes the connection between the client and server is dropped.
Email does not behave in such a way. In email a communication originates at a client, is sent to one or more email serves, and then terminates at a distant client. The biggest difference, however, is that the document does not die with finality of a single transmission instance as is the case with a document transfer over HTTP. A document sent in SMTP can be replied to, forwarded, or copied to multiple unrequested users. This one difference is profound when consideration for an email thread is considered.
The problem is that SMTP and HTTP are different as demonstrated in the prior two paragraphs. This differences is compounded in that SMTP and HTTP have radically different formatting methods for the creation of header data. HTML has header data that is intended to be compatible with the headers of HTTP transmissions and offer no compliance to SMTP transmissions. The HTML headers also do not account for the complexity of an email thread.
The problem is exemplified when email software corrupts a HTML document to add formatting changes necessary to fit the conforming demands of that software and to also write header data directly into the document. This exemplification becomes extremely pronounced when an HTML email becomes an email thread. Since the HTML header data has no method to account for the complexities of an email thread there is no way to supply relevant presentation definitions from a stylesheet that survive the transfer of the document. Each time a HTML document, or a document with HTML formatting, is sent from one email software to another the document is corrupted and each email software device corrupts the prior corruption. Email processing software may refer to either an email client, which certainly will corrupt a document, or an email server, that may only likely corrupt an email document.
The solution to the problem is to create a markup language convention that recognizes the requirements of email header data directly. Those requirements are defined in RFC 5321 for the SMTP protocol and RFC 5322 for the client processing. The only way to properly extend this solution to account for the complexities of an email thread are to provide a convention for a multi-agent DOM.
Paragraphs deleted due to technical inaccuracy and difference between the term multi-agent DOM and the nature of an invented feature not mentioned here even prior to the edit.
EDIT: a multi-agent DOM applies some degree of hierarchy, which may not be necessary to represent an email thread.
Had 2 problems with email data - usually "?" symbol somehow got inside some words, another was UTF and title related. First got "fixed" by changing hosting provider (so it was mail-server related) second one got fixed by changing PHPmailer library.
Try to specify how exactly data is scrambled.
Have you any special attributes in your links? May be title attribute with not escaped quotes inside?
Something like this: Link