Email:
Hello World 123 123
From: uSER001 [user001#yahoo.com]
Sent: Thursday, July 19, 2013 11:21 PM
To: ts#yahoo.com
Subject: Re: Ticketing System
Can be anything sd asd asd asd asda dasdasd asda asd
From: Ticketing System [ts#yahoo.com]
Sent: Thursday, July 19, 2013 11:21 PM
To: uSER001, uSER002
Subject: Ticketing System
Content From Ticketing System
Ticketing System http://www.yahoo.com
Output Should be:
Hello World 123 123
From: uSER001 [user001#yahoo.com]
Sent: Thursday, July 19, 2013 11:21 PM
To: ts#yahoo.com
Subject: Re: Ticketing System
Can be anything sd asd asd asd asda dasdasd asda asd
--
Is this possible using preg_match($pattern,$data, $matches, PREG_OFFSET_CAPTURE) and substr?
You cannot just use preg_match to parse mail message reliably. Better use a PHP Mime Mail Parser for this task. Using Mime Mail Parser code will be as simple as:
require_once('MimeMailParser.class.php');
$path = 'path/to/mail.txt';
$Parser = new MimeMailParser();
$Parser->setPath($path);
$to = $Parser->getHeader('to');
$from = $Parser->getHeader('from');
$subject = $Parser->getHeader('subject');
$textBody = $Parser->getMessageBody('text');
$htmlBody = $Parser->getMessageBody('html');
Related
I have a file called mail.txt with the following contents :
From: elvis#tabloid.org (The King)
Subject: be seein' ya around
Date: Mon, 23 Oct 2006 11:04:13
From: The Prez <president#whitehouse.gov>
Date: Wed, 25 Oct 2006 8:36:24
Subject: now, about your vote
I'm using Sublime Text in which the Regex ^\w+: works properly.
I'm using file_get_contents() to read the content from mail.txt and then use the same Regex for preg_replace() to highlight the output.
The issue is, when I use file_get_contents(), it doesn't consider \n and for that I tried nl2br(), but that didn't work either.
Below are the outputs in Sublime and PHP :
Sublime
PHP
Below is the PHP code :
<?php
$path = "./mail.txt";
if(!file_exists($path))
die("File does not exist");
else {
if(!($handle = fopen($path, "r")))
die("File could not be opened");
else {
$file_data = file_get_contents($path);
}
}
$mod_file = preg_replace("/^\w+:/", "<span class='replaced'>$0</span>", $file_data);
echo "<pre>".$mod_file."</pre>";
?>
How to solve this issue?
You need to use m or Multiline flag.See demo.
https://regex101.com/r/cT0hV4/12
$re = "/^\\w+:/m";
$str = "From: elvis#tabloid.org (The King)\nSubject: be seein' ya around\nDate: Mon, 23 Oct 2006 11:04:13\nFrom: The Prez <president#whitehouse.gov>\nDate: Wed, 25 Oct 2006 8:36:24\nSubject: now, about your vote";
preg_match_all($re, $str, $matches);
i use the below code to extract header details from the mail.. i could not get the mail address in from, to and cc as mentioned below..
$header = explode("\n", imap_fetchheader($mbox,$msgno));
echo "<br>";
for ($i=1; $i<count($header); $i++)
{
echo $header[$i] . "<br>";
}
output:
Delivered-To: user1#examplecom
X-WM-Delivered: user1#example.com
Received: from ElcotPC ([127.0.0.1])
(envelope-sender )
by 127.0.0.1 with ESMTP
for ; Wed, 31 Jul 2013 09:14:19 +0530
From: "user1"
To:
Cc:
Subject: testing with attachment
Date: Wed, 31 Jul 2013 09:14:18 +0530
The "from","to", "cc" field are empty without the mail address..
i want the output like this..
Delivered-To: user1#examplecom
X-WM-Delivered: user1#example.com
Received: from ElcotPC ([127.0.0.1])
(envelope-sender )
by 127.0.0.1 with ESMTP
for ; Wed, 31 Jul 2013 09:14:19 +0530
From: "user1" <user1#example.com>
To: <user2#example.com>
Cc: <user1#example.com>
how to get the email address to "from", "to" and "cc" field?
Update:
It's always best to use code that is readily available, so I checked if a imap-parsing function exists already. It does: imap_rfc822_parse_headers. Read the docs for details, and links to all sorts of imap_* functions. Perhaps imap_rfc822_parse_adrlist is exactly what you need?
A basic preg_match_all call could do the job, I think:
if (preg_match_all('/^\s*(From|To|Cc):[^<]*<([^>]+)\>/m',$string, $addresses)
{
$addresses = array_merge($addresses[1], $addresses[2]);
print_r($addresses);
}
Should output:
array (
'From' => 'user1#example.com',
'To' => 'user2#example.com',
'Cc' => 'user1#example.com',
)
I think that's what you were looking for.
The regex explained:
^\s* matches the start of the line, and zero or more whitespace chars
(From|To|Cc) matches (and groups) From, To or Cc
:[^<]*<: Matches (but doesn't group) the colon, and any char, except for the address delimiting <
([^>]+): Mathces (and groups) everything after the <, that isn't >
\>: Can be left out, but matches address-delimiting >
m: multi-line. If left out the leading ^ means start of string, now it means start of line
Notes: This expression doesn't deal with comma separated addresses or multiple addresses, and it might be usefull to call:
filter_var($addresses['From'], FILTER_VALIDATE_EMAIL)
or use array_map to filter $addresses[2] prior to merging...
The Content type is handle as a part of message , how to fix that ? thanks
From - Wed Jun 05 12:29:59 2013
X-Account-Key: account1
X-UIDL: 50933ddb0000053d
X-Mozilla-Status: 0001
X-Mozilla-Status2: 00000000
X-Mozilla-Keys:
Return-Path: <flippingST#DPS_FDBD.localdomain>
Received: from xxxxxxxxx (SMTP1 [xxxxxxx])
by xxxxxx (8.13.8/8.13.8) with ESMTP id r554TvMv018216
for <leochan#pop.singtao.com>; Wed, 5 Jun 2013 12:29:57 +0800
Received: from cip.singtaonewscorp.com (cip.singtaonewscorp.com [202.66.86.162] (may be forged))
by xxxxxxx with Microsoft SMTPSVC(xxxxxxxxx);
for <leo.chan#singtaonewscorp.com>; Wed, 5 Jun 2013 12:29:56 +0800
Date: Wed, 5 Jun 2013 12:29:56 +0800
From: flippingST#DPS_FDBD.localdomain
Message-Id: <201306050429.r554TudQ021830#smtp.singtao.com>
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: AogdALq9rlEuic+b/2dsb2JhbABagzk0gkEBhw+jHgsBkhIdTBd0giMBFQE7AQo8FQEBVgcNEySIEQiPSYxDjgYBAYVBAZxtgj6BBwMEC50ji02DSw
X-IronPort-AV: E=Sophos;i="4.87,804,1363104000";
d="scan'208,217";a="25969537"
Received: from ec2-46-137-207-155.ap-southeast-1.compute.amazonaws.com (HELO DPS_FDBD.localdomain) ([46.137.207.155])
by cip.singtaonewscorp.com with ESMTP; 05 Jun 2013 12:30:48 +0800
Received: by DPS_FDBD.localdomain (Postfix, from userid 500)
id D8FEE4329E; Wed, 5 Jun 2013 00:29:28 -0400 (EDT)
To: leo.chan#singtaonewscorp.com
Subject: =?UTF-8?B?5oql56ug5YaF5a655YiG5Lqr?=
X-PHP-Originating-Script: 500:mail.php
MIME-Version: 1.0
X-EsetId: 40C9373366470C695FCF37616E1C4038
Content-type: text/html; charset=UTF-8
From: leo.chan#singtaonewscorp.com <leo.chan#singtaonewscorp.com>
Message-Id: <20130605042928.D8FEE4329E#DPS_FDBD.localdomain>
Date: Wed, 5 Jun 2013 00:29:28 -0400 (EDT)
<html><head></head><body><b>讯息 :</b>leo.chan#singtaonewscorp.com</br></br>分享连结 :</b>按此观看</br></br><img src = "https://s3-ap-southeast-1.amazonaws.com/demosource/ChangSha/2013/05/24/0/0/A/Content/1/Pg001.png" /></body></html>
__________ Information from ESET NOD32 Antivirus, version of virus signature database 8412 (20130604) __________
The message was checked by ESET NOD32 Antivirus.
http://www.eset.com
The above is the mail source code, notice the
Content type ,From: leo.chan#singtaonewscorp.com <leo.chan#singtaonewscorp.com>
Message-Id: <20130605042928.D8FEE4329E#DPS_FDBD.localdomain>
Date: Wed, 5 Jun 2013 00:29:28 -0400 (EDT)
is part of mail message
Here is the php code, I have set the Content type as header , but it seems it cut off after MIME version 1.0 ? How to fix this? thanks.
<?php
function encodeMIMEString ($enc, $string)
{
return "=?$enc?B?".base64_encode($string)."?=";
}
if (isset($_POST["data"])){
// Get posted data
$mailStr = $_POST["data"];
$mailStr = str_replace ('&page','#page',$mailStr);
$mailStr = str_replace ('&issue','#issue',$mailStr);
$info = explode("&", $mailStr);
// To send HTML mail, the Content-type header must be set
$headers = "MIME-Version: 1.0\r\nContent-type: text/html; charset=UTF-8\r\n";
$headers .= "From: $info[0] <$info[0]>";
// Read XML to place the headings in mail content
$xml = simplexml_load_file('lang'.DIRECTORY_SEPARATOR.$info[4].'.xml')
or die("Error: Cannot create object");
$subject = (string)$xml->mailTitle;
$mailMsgDes = (string)$xml->mailMsgDes;
$mailLink = (string)$xml->mailLink;
$mailLinkView = (string)$xml->mailLinkView;
$message = nl2br(htmlentities(trim($info[2]), ENT_QUOTES, "UTF-8"));
$url = str_replace ('#page','&page',$info[3]);
$url = str_replace ('#issue','&issue',$url);
$mailContent = '<html><head></head><body><b>'.$mailMsgDes.' :</b>'.$message.'</br></br>'.$mailLink.' :</b>'.$mailLinkView.'</br>';
if (isset($info[5])){
$mailContent = $mailContent."</br><img src = \"$info[5]\" />";
}
if (isset($info[6])){
$mailContent = $mailContent."</br><img src = \"$info[6]\" />";
}
$mailContent .= '</body></html>';
if (mail($info[1], encodeMIMEString("UTF-8", $subject), $mailContent, $headers))
echo 'Mail sent';
}
?>
Using \r instead of \r\n fixed the problem.
Possibly due to server settings.
Put your FROM header above the MIME-Version and Content-type definition.. Just swap 'em:
// To send HTML mail, the Content-type header must be set
$headers = "From: $info[0] <$info[0]>\r\n";
$headers .= "MIME-Version: 1.0\r\nContent-type: text/html; charset=UTF-8\r\n\r\n";
If you use the PHP_EOL constant, then you don't need to worry about \r or \n or \r\n, and your code would be more portable across servers.
I´m trying to get the email from and cc from a forwarded email, when the body looks like this:
$body = '-------
Begin forwarded message:
From: Sarah Johnson <blabla#gmail.com>
Subject: email subject
Date: February 22, 2013 3:48:12 AM
To: Email Recipient <thatwouldbe#yayyy.com>
Cc: Ralph Johnson <johnson#gmail.com>
Hi,
hello, thank you and goodbye!
blabla#gmail.com'
Now, when I do the following:
$body = strtolower($body);
$pattern = '#from: \D*\S([\w-\.]+)#((?:[\w]+\.)+)([a-zA-Z]{2,4})\S#';
if (preg_match($pattern, $body, $arr_matches)) {
echo htmlentities($arr_matches[0]);
die();
}
I correctly get:
from: sarah johnson <blabla#gmail.com>
Now, why does the cc don't work? I do something very similar, only changing from to cc:
$body = strtolower($body);
$pattern = '#cc: \D*\S([\w-\.]+)#((?:[\w]+\.)+)([a-zA-Z]{2,4})\S#';
if (preg_match($pattern, $body, $arr_matches)) {
echo htmlentities($arr_matches[0]);
die();
}
and I get:
cc: ralph johnson <johnson#gmail.com> hi, hello, thank you and goodbye! blabla#gmail.com
If I remove the email from the original body footer (removing blabla#gmail.com) then I correctly get:
cc: ralph johnson <johnson#gmail.com>
It looks like that email is affecting the regular expression. But how, and why doesn't it affect it in the from? How can I fix this?
The problem is, that \D* matches too much, i.e. it is also matching newline characters. I would be more restrictive here. Why do you use \D(not a Digit) at all?
With e.g. [^#]* it is working
cc: [^#]*\S([\w-\.]+)#((?:[\w]+\.)+)([a-zA-Z]{2,4})\S
See it here on Regexr.
This way, you are sure that this first part is not matching beyond the email address.
This \D is also the reason, it is working for the first, the "From" case. There are digits in the "Date" row, therefore it does not match over this row.
Try like this
$body = '-------
Begin forwarded message:
From: Sarah Johnson <blabla#gmail.com>
Subject: email subject
Date: February 22, 2013 3:48:12 AM
To: Email Recipient <thatwouldbe#yayyy.com>
Cc: Ralph Johnson <johnson#gmail.com>
Hi,
hello, thank you and goodbye!
blabla#gmail.com';
$pattern = '#(?:from|Cc):\s+[^<>]+<([^#]+#[^>\s]+)>#is';
preg_match_all($pattern, $body, $arr_matches);
echo '<pre>' . htmlspecialchars(print_r($arr_matches, 1)) . '</pre>';
Output
Array
(
[0] => Array
(
[0] => From: Sarah Johnson <blabla#gmail.com>
[1] => Cc: Ralph Johnson <johnson#gmail.com>
)
[1] => Array
(
[0] => blabla#gmail.com
[1] => johnson#gmail.com
)
)
$arr_matches[1][0] - "From" email
$arr_matches[1][1] - "Cc" email
I have to get any text between:
Final-Recipient: RFC822; !HERE! Action
I need !HERE! from this example. There could be any string.
I tried something like:
$Pattern = '/Final-Recipient: RFC822; (.*) Action/';
But it doesn't work.
upd
Here is the string I'm trying to parse: http://dpaste.com/187638/
Since you said "any string" which may contain spaces, the closest approximate would be
$Pattern = '/Final-Recipient: RFC822; (.*?) Action/s';
# ^ ^
# lazy match instead of greedy match ----' |
# allow . to match newline -----'
Of course it won't match "Final-Recipient: RFC822; Action Action".
Your pattern works fine for me:
$i = 'This is a MIME-encapsulated message --o3ONXoEH01blah3:35:33 +0400 (MSD) Final-Recipient: RFC822; !HERE! Action: failed Status: 4.4.7 Lblahru> From: *
#*.ru';
$pattern = '/Final-Recipient: RFC822; (.*) Action/';
$matches = Array();
preg_match($pattern, $i, $matches);
print_r($matches);
Output:
Array
(
[0] => Final-Recipient: RFC822; !HERE! Action
[1] => !HERE!
)
Note also that your pattern will fail if the "any text" contains new lines. Use the DOTALL modifier /.../s to allow the dot to also match new lines. Also note that if the text " Action" appears elsewhere in the message it will cause your regular expression to fail. Matching dot is dangerous. Try to find a more specific pattern if possible.
$Pattern = '/Final-Recipient:[^;]+[;|<|\s]+([^\s|^<|^>]+)/i';
The following expression turned out to be the best for my problems, because sometimes there are lines of the following kind:
Final-Recipient: LOCAL;<example#rambler.ru>
I am going to suggest a method that does not use them, which requires extra busywork.
<?php
$message = 'This is a MIME-encapsulated message --o3ONXoEH016763.1272152184/zvm19.host.ru The original message was received at Fri, 23 Apr 2010 03:35:33 +0400 (MSD) from roller#localhost ----- The following addresses had permanent fatal errors ----- "Flucker" ----- Transcript of session follows ----- 451 grl.unibel.by: Name server timeout Message could not be delivered for 2 days Message will be deleted from queue --o3ONXoEH016763.1272152184/*.host.ru Content-Type: message/delivery-status Reporting-MTA: dns; zvm19.host.ru Arrival-Date: Fri, 23 Apr 2010 03:35:33 +0400 (MSD) Final-Recipient: RFC822; !HERE! Action: failed Status: 4.4.7 Last-Attempt-Date: Sun, 25 Apr 2010 03:36:24 +0400 (MSD) --o3ONXoEH016763.1272152184/zvm19.host.ru Content-Type: message/rfc822 Content-Transfer-Encoding: 8bit Return-Path: Received: (from *#localhost) by *.host.ru (8.13.8/Zenon/Postman) id o3MNZX5h059932; Fri, 23 Apr 2010 03:35:33 +0400 (MSD) (envelope-from *#roller.ru) Date: Fri, 23 Apr 2010 03:35:33 +0400 (MSD) Message-Id: <201004222335.o3MNZX5h059932#*.host.ru> From: *
#*.ru';
$left_delimiter = 'Final-Recipient: RFC822; ';
$right_delimiter = ' Action';
$left_delimiter_pos = strrpos($message, $left_delimiter);
$right_delimiter_pos = strpos($message, $right_delimiter);
$desired_message_fragment = '';
if ($left_delimiter_pos !== false && $right_delimiter_pos !== false) {
$fragment_start = $left_delimiter_pos + strlen($left_delimiter);
$fragment_length = $right_delimiter_pos - $fragment_start;
$desired_message_fragment = substr(
$message, $fragment_start, $fragment_length
);
}
var_dump($desired_message_fragment);
a bit late....
but has been asked in terms of how to solve a problem that is not quite his requirements Op perhaps has joined multiple lines onto one line?(imho).
This might help others....
I'm assuming that op is trying to parse the Final-Recipient header field of a delivery status notification.
The spec for the Final-Recipient field can be seen here: https://www.rfc-editor.org/rfc/rfc3464#page-15
If the problem is broken down, op can pull the final recipient field as a single field (Final recipient followed by a char/blank line on the next line.
e.g.
Original-recipient: rfc822;some-email-that-does-not-exist#gmail.com
Final-recipient: rfc822;some-email-that-does-not-exist#gmail.com
Action: failed
Status: 5.1.1 (Remote SMTP server has rejected address)
Final recipient is followed by the start of the next field, Action which has A on the next line. ie not followed by a space or blank line.
then all he has to do is split the line on ; and take the second part
ie
String[] twoparts = "Final-recipient: rfc822;some-email-that-does-not-exist#gmail.com".split(";",2) // 2 here means (2-1) = 1 match
String email = twoparts[1]