how to get email body text while getting duplicate entry? - php

After mime parsing I am getting email body with duplicate entry(plain n html) and wondering how I can get the true message body. I am using php/mysql. Is there anything in php string or mysql to solve this?
email message body Sample:
testing body from hotmail. testing word can be repeated.
testing body from hotmail. testing word can be repeated.

Ok, so as I said you receive the email in double because you receive it in plain/text and text/html format.
The best way to read email from pop3 as I found until now is Manuel Lemos POP3 Access
the email formats ussualy are received in parts, for each type or image
plain/text:
------=_Part_38964_33016848.1312149074828
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
email content in text format
text/html:
------=_Part_38964_33016848.1312149074828
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
email content in html format
you will find in headers the name of the part, that unique identifier
Content-Type: multipart/alternative;
boundary="----=_Part_38964_33016848.1312149074828"
There isn't a simple way to get the real plain/text and text/html, they are most likely to be togheter if sent from a public email service. If you send email from your scripts, I don't think you'll bother to send that email in double format.

Related

How can I send a non-html-email in bold?

I want to send a simple email in php.
Is it possible to make the Second Line of Text in bold without using html mail?
<?php
$msg = "First line of text\nSecond line of text";
mail("someone#example.com","My subject",$msg);
?>
You can't bold or format in any way inside a plain old text email (one which has a MIME type of text/plain). That's why there is such a thing as HTML (MIME type of text/html) formating for email.
FYI, in setting the MIME type to HTML is specified by Content-Type: text/html; charset=UTF-8

Receiving mail server inserts space before each new line, breaking multipart/alternative

I am using PHP to send emails on demand to clients. I have a script which seemed fairly robust in testing, generating MIME-1.0 Compatible multipart/alternative emails that had a text and html version. Emails are sent as base64 encoded strings to preserve international characters (message text is usually in German).
However, it seems that certain servers, upon receiving the mail, insert a space (0x20) just before each CR-LF sequence. This doesn't break the base64, of course, but since it breaks up the CR-LF-CR-LF sequence that separates headers from messages, the messages are not parsed properly (or, at all, actually, since the secondary headers are never seen to stop).
Here is an example message as generated:
From: example#example.com
To: example#example.org
Subject: Test Message
MIME-Version: 1.0
Content-Type: multipart/alternative; boundary="{$boundary}"
This is a multipart Message in MIME Format
--{$boundary}
MIME-Version: 1.0
Content-ID: <{$content_id}>
Content-Type: text/plain; charset="utf-8"
Content-Disposition: inline
Content-Transfer-Encoding: base64
Content-Length: {$objlen}
UkVEQUNURUQgUkVEQUNURUQgUkVEQUNURUQgUkVEQUNURUQgUkVEQUNURUQgUkVEQUNURUQgUkVE
QUNURUQgUkVEQUNURUQgUkVEQUNURUQgUkVEQUNURUQgUkVEQUNURUQgUkVEQUNURUQgUkVEQUNU
RUQgUkVEQUNURUQgUkVEQUNURUQgUkVEQUNURUQgUkVEQUNURUQgUkVEQUNURUQgUkVEQUNURUQg
UkVEQUNURUQgUkVEQUNURUQgUkVEQUNURUQgUkVEQUNURUQgUkVEQUNURUQgUkVEQUNURUQgUkVE
QUNURUQgUkVEQUNURUQgUkVEQUNURUQgUkVEQUNURUQgUkVEQUNURUQgUkVEQUNURUQgUkVEQUNU
RUQgUkVEQUNURUQgUkVEQUNURUQgUkVEQUNURUQgUkVEQUNURUQgUkVEQUNURUQgUkVEQUNURUQg
UkVEQUNURUQgUkVEQUNURUQgUkVEQUNURUQgUkVEQUNURUQgUkVEQUNURUQgUkVEQUNURUQgUkVE
QUNURUQgUkVEQUNURUQgUkVEQUNURUQgUkVEQUNURUQgUkVEQUNURUQgUkVEQUNURUQgUkVEQUNU
RUQgUkVEQUNURUQgUkVEQUNURUQgUkVEQUNURUQgUkVEQUNURUQgUkVEQUNURUQgUkVEQUNURUQg
UkVEQUNURUQgUkVEQUNURUQgUkVEQUNURUQgUkVEQUNURUQgUkVEQUNURUQgUkVEQUNURUQgUkVE
QUNURUQgUkVEQUNURUQgUkVEQUNURUQgUkVEQUNURUQgUkVEQUNURUQgUkVEQUNURUQgUkVEQUNU
RUQgUkVEQUNURUQgUkVEQUNURUQgUkVEQUNURUQgUkVEQUNURUQgUkVEQUNURUQgUkVEQUNURUQg
UkVEQUNURUQgUkVEQUNURUQgUkVEQUNURUQgUkVEQUNURUQgUkVEQUNURUQgUkVEQUNURUQgUkVE
QUNURUQgUkVEQUNURUQgUkVEQUNURUQgUkVEQUNURUQgUkVEQUNURUQgUkVEQUNURUQgUkVEQUNU
RUQgUkVEQUNURUQgUkVEQUNURUQgUkVEQUNURUQgUkVEQUNURUQgUkVEQUNURUQgUkVEQUNURUQg
UkVEQUNURUQgUkVEQUNURUQgUkVEQUNURUQgUkVEQUNURUQgUkVEQUNURUQ=
--{$boundary}
MIME-Version: 1.0
Content-ID: <{$content_id}>
Content-Type: text/html; charset="utf-8"
Content-Disposition: inline
Content-Transfer-Encoding: base64
Content-Length: {$objlen}
REVUQ0FERVIgREVUQ0FERVIgREVUQ0FERVIgREVUQ0FERVIgREVUQ0FERVIgREVUQ0FERVIgREVU
Q0FERVIgREVUQ0FERVIgREVUQ0FERVIgREVUQ0FERVIgREVUQ0FERVIgREVUQ0FERVIgREVUQ0FE
RVIgREVUQ0FERVIgREVUQ0FERVIgREVUQ0FERVIgREVUQ0FERVIgREVUQ0FERVIgREVUQ0FERVIg
REVUQ0FERVIgREVUQ0FERVIgREVUQ0FERVIgREVUQ0FERVIgREVUQ0FERVIgREVUQ0FERVIgREVU
Q0FERVIgREVUQ0FERVIgREVUQ0FERVIgREVUQ0FERVIgREVUQ0FERVIgREVUQ0FERVIgREVUQ0FE
RVIgREVUQ0FERVIgREVUQ0FERVIgREVUQ0FERVIgREVUQ0FERVIgREVUQ0FERVIgREVUQ0FERVIg
REVUQ0FERVIgREVUQ0FERVIgREVUQ0FERVIgREVUQ0FERVIgREVUQ0FERVIgREVUQ0FERVIgREVU
Q0FERVIgREVUQ0FERVIgREVUQ0FERVIgREVUQ0FERVIgREVUQ0FERVIgREVUQ0FERVIgREVUQ0FE
RVIgREVUQ0FERVIgREVUQ0FERVIgREVUQ0FERVIgREVUQ0FERVIgREVUQ0FERVIgREVUQ0FERVIg
REVUQ0FERVIgREVUQ0FERVIgREVUQ0FERVIgREVUQ0FERVIgREVUQ0FERVIgREVUQ0FERVIgREVU
Q0FERVIgREVUQ0FERVIgREVUQ0FERVIgREVUQ0FERVIgREVUQ0FERVIgREVUQ0FERVIgREVUQ0FE
RVIgREVUQ0FERVIgREVUQ0FERVIgREVUQ0FERVIgREVUQ0FERVIgREVUQ0FERVIgREVUQ0FERVIg
REVUQ0FERVIgREVUQ0FERVIgREVUQ0FERVIgREVUQ0FERVIgREVUQ0FERVIgREVUQ0FERVIgREVU
Q0FERVIgREVUQ0FERVIgREVUQ0FERVIgREVUQ0FERVIgREVUQ0FERVIgREVUQ0FERVIgREVUQ0FE
RVIgREVUQ0FERVIgREVUQ0FERVIgREVUQ0FERVIgREVUQ0FERVIgREVUQ0FERVIgREVUQ0FERVIg
REVUQ0FERVIgREVUQ0FERVIgREVUQ0FERVIgREVUQ0FERVIgREVUQ0FERVI=
--{$boundary}--
Is there some way to prevent the mail server from adding these spaces?
The problem is you are not sending your email in quoted printable encoded format. I'd strong consider using a library to send the email for you to avoid all of these issues:
Email Quoted Printable Encoding
The problem has to do with certain email servers (e.g. t-online.de) treating CRLF newline sequences as less valid than LF only newlines. When newlines were changed from CRLF to LF, everything worked fine.
On the one hand, I would think this was a flagrant disregard for the standards set out in the RFCs, but on the other hand, I've had no issues with these messages since making the changes, so either (a) it doesn't matter or (b) there have been changes about which I do not know, which is always possible.
In any case, always end with LF only, I guess, if you intend to send multipart/* messages.

Spanish characters in email appearing as question mark in mail clients

I am using an already written Mail class in php. Emails are mostly sent in spanish language. Following are the headers being passed to the php mail function -
MIME-Version: 1.0
Content-Type: multipart/mixed;
Also additional headers are being appended to the message (don't know what it does),
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 8bit
Emails appear properly in browsers but in mail clients, accented characters are replaced by question marks
for eg:
Una nueva contraseña se solicito
appears
Una nueva contrase�a se solicito
have checked this in Thunderbird and outlook
How do I fix this to show these characters correctly in mail clients as well
I guess you have to change the character set to UTF-8 in Thunderbird and Outlook as well.
The email is probably being sent out as something other than UTF-8. Make sure to convert the text to UTF-8 before passing it to the class (or convert it to UTF-8 in the class).
Like Raffael say the client have to be in UTF-8 too, the better solution is to pass with htmlentities before sending the mail and display the mail as HTML

Proper PHP way to parse email attachments from EML format

I have a file containing an email in "plain text MIME message format". I am not sure if this is the EML format. The email contains an attachment and I want to extract the attachment and create those files again. This is how the attachment part looks like -
...
...
Receive, deliver details
...
...
From: sac ascsac <sacsac#sacascsac.ascsac>
Date: Thu, 20 Jan 2011 18:05:16 +0530
Message-ID: <AANLkTimmSL0iGW4rA3tvSJ9M3eT5yZLTGsqvCvf2fFC3#mail.gmail.com>
Subject: Test attachments
To: ascsacsa#ascsac.com
Content-Type: multipart/mixed; boundary=20cf3054ac85d97721049a465e12
--20cf3054ac85d97721049a465e12
Content-Type: multipart/alternative; boundary=20cf3054ac85d97717049a465e10
--20cf3054ac85d97717049a465e10
Content-Type: text/plain; charset=ISO-8859-1
hello this is a test mail. It contains two attachments
--20cf3054ac85d97717049a465e10
Content-Type: text/html; charset=ISO-8859-1
hello this is a test mail. It contains two attachments<br>
--20cf3054ac85d97717049a465e10--
--20cf3054ac85d97721049a465e12
Content-Type: text/plain; charset=US-ASCII; name="simple_test.txt"
Content-Disposition: attachment; filename="simple_test.txt"
Content-Transfer-Encoding: base64
X-Attachment-Id: f_gj5n2yx60
aGVsbG8gd29ybGQKYWMgYXNj
...
encoded things here
...
ZyBmZyAKCjIKNDIzCnQ2Mwo=
--20cf3054ac85d97721049a465e12
Content-Type: application/x-httpd-php; name="oscomm_backup_code.php"
Content-Disposition: attachment; filename="oscomm_backup_code.php"
Content-Transfer-Encoding: base64
X-Attachment-Id: f_gj5n5gxn1
PD9waHAKCg ...
...
encoded things here
...
X2xpbmsoRklMRU5BTUVfQkFDS1VQKSk7Cgo/Pgo=
--20cf3054ac85d97721049a465e12--
I can see that the part between X-Attachment-Id: f_gj5n2yx60 and ZyBmZyAKCjIKNDIzCnQ2Mwo=, both including
is the content of the first attachment. I want to parse those attachments (file names and contents and create those files).
I got this file after parsing a dbx format file using a DBX Parser class available in PHP classes.
I searched in many places and did not find much discussion regarding this here in SO other than Script to parse emails for attachments. May be I missed some terms while searching. In that answer it is mentioned -
you can use the boundries to extract
the base64 encoded information
But I am not sure which are the boundaries and how exactly to use the boundaries? There already must be some libraries or some well defined method of doing this. I guess I will commit many mistakes if I try reinventing the wheel here.
There's an PHP Mailparse extension, have you tried it?
The manual way would be, process the mail line by line. When you hit your first Content-Type header (this one in your example):
Content-Type: multipart/mixed; boundary=20cf3054ac85d97721049a465e12
You have the boundary. This string is used as the boundary between your multiple parts (that's why they call it multipart).
Everytime a line starts with the dashes and this string, a new part begin. In your example:
--20cf3054ac85d97721049a465e12
Every part will start with headers, a blank line, and content. By looking at the content-type of the headers you can determine which are attachments, what their type is and their filename.
Read the whole content, strip the spaces, base64_decode it, and you've got the binary contents of the file. Does this help?

How to pull html encoding from email data using PHP

I'm working with emails and want to display the html in the browser, I'm not sure how to deal with the encoding. I want to extract the html to display it in the html browser. The way I plan on doing this is using an html parser on the entire email parsing the data inbetween the tags in the html section. Is there an easier/more efficient way to do this?
Here's text encoding
------=_Part_29856965_540743623.1285814590176
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Here's the html encoding
------=_Part_29856965_540743623.1285814590176
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
You can have a look at the ezComponents - Mail component. It has a lot of operations for building and using a MIME
http://ezcomponents.org/docs/tutorials/Mail

Categories