Regular expression not working - php

I´m trying to get the email from and cc from a forwarded email, when the body looks like this:
$body = '-------
Begin forwarded message:
From: Sarah Johnson <blabla#gmail.com>
Subject: email subject
Date: February 22, 2013 3:48:12 AM
To: Email Recipient <thatwouldbe#yayyy.com>
Cc: Ralph Johnson <johnson#gmail.com>
Hi,
hello, thank you and goodbye!
blabla#gmail.com'
Now, when I do the following:
$body = strtolower($body);
$pattern = '#from: \D*\S([\w-\.]+)#((?:[\w]+\.)+)([a-zA-Z]{2,4})\S#';
if (preg_match($pattern, $body, $arr_matches)) {
echo htmlentities($arr_matches[0]);
die();
}
I correctly get:
from: sarah johnson <blabla#gmail.com>
Now, why does the cc don't work? I do something very similar, only changing from to cc:
$body = strtolower($body);
$pattern = '#cc: \D*\S([\w-\.]+)#((?:[\w]+\.)+)([a-zA-Z]{2,4})\S#';
if (preg_match($pattern, $body, $arr_matches)) {
echo htmlentities($arr_matches[0]);
die();
}
and I get:
cc: ralph johnson <johnson#gmail.com> hi, hello, thank you and goodbye! blabla#gmail.com
If I remove the email from the original body footer (removing blabla#gmail.com) then I correctly get:
cc: ralph johnson <johnson#gmail.com>
It looks like that email is affecting the regular expression. But how, and why doesn't it affect it in the from? How can I fix this?

The problem is, that \D* matches too much, i.e. it is also matching newline characters. I would be more restrictive here. Why do you use \D(not a Digit) at all?
With e.g. [^#]* it is working
cc: [^#]*\S([\w-\.]+)#((?:[\w]+\.)+)([a-zA-Z]{2,4})\S
See it here on Regexr.
This way, you are sure that this first part is not matching beyond the email address.
This \D is also the reason, it is working for the first, the "From" case. There are digits in the "Date" row, therefore it does not match over this row.

Try like this
$body = '-------
Begin forwarded message:
From: Sarah Johnson <blabla#gmail.com>
Subject: email subject
Date: February 22, 2013 3:48:12 AM
To: Email Recipient <thatwouldbe#yayyy.com>
Cc: Ralph Johnson <johnson#gmail.com>
Hi,
hello, thank you and goodbye!
blabla#gmail.com';
$pattern = '#(?:from|Cc):\s+[^<>]+<([^#]+#[^>\s]+)>#is';
preg_match_all($pattern, $body, $arr_matches);
echo '<pre>' . htmlspecialchars(print_r($arr_matches, 1)) . '</pre>';
Output
Array
(
[0] => Array
(
[0] => From: Sarah Johnson <blabla#gmail.com>
[1] => Cc: Ralph Johnson <johnson#gmail.com>
)
[1] => Array
(
[0] => blabla#gmail.com
[1] => johnson#gmail.com
)
)
$arr_matches[1][0] - "From" email
$arr_matches[1][1] - "Cc" email

Related

Extracting part of a string up to a specific character in PHP

I have a string containing a name and address lines, with a <br /> tag separating the name and each address line. For instance:
John Smith<br />999 Somewhere Lane<br />City, FL 66600
I want to separate the name from the rest of the address using PHP. Is this something that can be done?
explode or substr with strpos
$str = 'John Smith<br />999 Somewhere Lane<br />City, FL 66600';
echo substr($str,0,strpos($str,'<br />')); //John Smith
In this particular case the simplest would be to use explode:
$str = 'John Smith<br />999 Somewhere Lane<br />City, FL 66600';
$tmp = explode('<br />', $str);
$name = $tmp[0];
You may, of course use regex but this is simpler.
This should give you the data even though it looked like this <br>, <br/> <br />, etc.
$text = "John Smith<br />999 Somewhere Lane<br />City, FL 66600"
$data = preg_split("/\<br(\s+)?(\/)?\>/", $text);
print_r($data);
Array
(
[0] => John Smith
[1] => 999 Somewhere Lane
[2] => City, FL 66600
)

Regex Pattern to fetch the text between the tags

my string is:
$p['message'] = '[name]Fozia Faizan[/name]\n[cell]03334567897[/cell]\n[city]Karachi, Pakistan[/city]';
What I want to do is to use REGEX pattern so as to get the result like this:
Name: Fozia Faizan
Cell #: 03334567897
City: Karachi, Pakistan
I've tried this regex:
$regex = "/\\[(.*?)\\](.*?)\\[\\/\\1\\]/";
$message = preg_match_all($regex, $p['message'], $matches);
but it didn't work at all. Please help
Well, using the great reply from #jh314, you could write:
$p['message'] = '[name]Fozia Faizan[/name]\n[cell]03334567897[/cell]\n[city]Karachi, Pakistan[/city]';
$m = array();
preg_match_all('|\[(.*?)](.*?)\[/\1]|', $p['message'], $m);
$result = #array_combine($m[1], $m[2]);
$out = "Name: {$result['name']}\nCell #: {$result['cell']}\nCity: {$result['city']}";
echo $out;
//$outHTML = nl2br("Name: {$result['name']}\nCell #: {$result['cell']}\nCity: {$result['city']}");
//echo $outHTML;
That will give you:
Name: Fozia Faizan
Cell #: 03334567897
City: Karachi, Pakistan
EDIT: You could also add # just before the name of the function like so: #array_combine, to suppress error at top of your page, only if this does work and you get the results as expected.
Your regex already works, just combine the result in $matches:
$p['message'] = '[name]Fozia Faizan[/name]\n[cell]03334567897[/cell]\n[city]Karachi, Pakistan[/city]';
$regex = "/\\[(.*?)\\](.*?)\\[\\/\\1\\]/";
preg_match_all('~\[(.*?)](.*?)\[/\1]~', $p['message'], $matches);
$result = array_combine ($matches[1], $matches[2]);
print_r($result);
will give you:
Array
(
[name] => Fozia Faizan
[cell] => 03334567897
[city] => Karachi, Pakistan
)

how to extract full mail address in imap php

i use the below code to extract header details from the mail.. i could not get the mail address in from, to and cc as mentioned below..
$header = explode("\n", imap_fetchheader($mbox,$msgno));
echo "<br>";
for ($i=1; $i<count($header); $i++)
{
echo $header[$i] . "<br>";
}
output:
Delivered-To: user1#examplecom
X-WM-Delivered: user1#example.com
Received: from ElcotPC ([127.0.0.1])
(envelope-sender )
by 127.0.0.1 with ESMTP
for ; Wed, 31 Jul 2013 09:14:19 +0530
From: "user1"
To:
Cc:
Subject: testing with attachment
Date: Wed, 31 Jul 2013 09:14:18 +0530
The "from","to", "cc" field are empty without the mail address..
i want the output like this..
Delivered-To: user1#examplecom
X-WM-Delivered: user1#example.com
Received: from ElcotPC ([127.0.0.1])
(envelope-sender )
by 127.0.0.1 with ESMTP
for ; Wed, 31 Jul 2013 09:14:19 +0530
From: "user1" <user1#example.com>
To: <user2#example.com>
Cc: <user1#example.com>
how to get the email address to "from", "to" and "cc" field?
Update:
It's always best to use code that is readily available, so I checked if a imap-parsing function exists already. It does: imap_rfc822_parse_headers. Read the docs for details, and links to all sorts of imap_* functions. Perhaps imap_rfc822_parse_adrlist is exactly what you need?
A basic preg_match_all call could do the job, I think:
if (preg_match_all('/^\s*(From|To|Cc):[^<]*<([^>]+)\>/m',$string, $addresses)
{
$addresses = array_merge($addresses[1], $addresses[2]);
print_r($addresses);
}
Should output:
array (
'From' => 'user1#example.com',
'To' => 'user2#example.com',
'Cc' => 'user1#example.com',
)
I think that's what you were looking for.
The regex explained:
^\s* matches the start of the line, and zero or more whitespace chars
(From|To|Cc) matches (and groups) From, To or Cc
:[^<]*<: Matches (but doesn't group) the colon, and any char, except for the address delimiting <
([^>]+): Mathces (and groups) everything after the <, that isn't >
\>: Can be left out, but matches address-delimiting >
m: multi-line. If left out the leading ^ means start of string, now it means start of line
Notes: This expression doesn't deal with comma separated addresses or multiple addresses, and it might be usefull to call:
filter_var($addresses['From'], FILTER_VALIDATE_EMAIL)
or use array_map to filter $addresses[2] prior to merging...

Removing Mail Original Message using preg_match

Email:
Hello World 123 123
From: uSER001 [user001#yahoo.com]
Sent: Thursday, July 19, 2013 11:21 PM
To: ts#yahoo.com
Subject: Re: Ticketing System
Can be anything sd asd asd asd asda dasdasd asda asd
From: Ticketing System [ts#yahoo.com]
Sent: Thursday, July 19, 2013 11:21 PM
To: uSER001, uSER002
Subject: Ticketing System
Content From Ticketing System
Ticketing System http://www.yahoo.com
Output Should be:
Hello World 123 123
From: uSER001 [user001#yahoo.com]
Sent: Thursday, July 19, 2013 11:21 PM
To: ts#yahoo.com
Subject: Re: Ticketing System
Can be anything sd asd asd asd asda dasdasd asda asd
--
Is this possible using preg_match($pattern,$data, $matches, PREG_OFFSET_CAPTURE) and substr?
You cannot just use preg_match to parse mail message reliably. Better use a PHP Mime Mail Parser for this task. Using Mime Mail Parser code will be as simple as:
require_once('MimeMailParser.class.php');
$path = 'path/to/mail.txt';
$Parser = new MimeMailParser();
$Parser->setPath($path);
$to = $Parser->getHeader('to');
$from = $Parser->getHeader('from');
$subject = $Parser->getHeader('subject');
$textBody = $Parser->getMessageBody('text');
$htmlBody = $Parser->getMessageBody('html');

Regular expression to parse Final-Recipient email header

I have to get any text between:
Final-Recipient: RFC822; !HERE! Action
I need !HERE! from this example. There could be any string.
I tried something like:
$Pattern = '/Final-Recipient: RFC822; (.*) Action/';
But it doesn't work.
upd
Here is the string I'm trying to parse: http://dpaste.com/187638/
Since you said "any string" which may contain spaces, the closest approximate would be
$Pattern = '/Final-Recipient: RFC822; (.*?) Action/s';
# ^ ^
# lazy match instead of greedy match ----' |
# allow . to match newline -----'
Of course it won't match "Final-Recipient: RFC822; Action Action".
Your pattern works fine for me:
$i = 'This is a MIME-encapsulated message --o3ONXoEH01blah3:35:33 +0400 (MSD) Final-Recipient: RFC822; !HERE! Action: failed Status: 4.4.7 Lblahru> From: *
#*.ru';
$pattern = '/Final-Recipient: RFC822; (.*) Action/';
$matches = Array();
preg_match($pattern, $i, $matches);
print_r($matches);
Output:
Array
(
[0] => Final-Recipient: RFC822; !HERE! Action
[1] => !HERE!
)
Note also that your pattern will fail if the "any text" contains new lines. Use the DOTALL modifier /.../s to allow the dot to also match new lines. Also note that if the text " Action" appears elsewhere in the message it will cause your regular expression to fail. Matching dot is dangerous. Try to find a more specific pattern if possible.
$Pattern = '/Final-Recipient:[^;]+[;|<|\s]+([^\s|^<|^>]+)/i';
The following expression turned out to be the best for my problems, because sometimes there are lines of the following kind:
Final-Recipient: LOCAL;<example#rambler.ru>
I am going to suggest a method that does not use them, which requires extra busywork.
<?php
$message = 'This is a MIME-encapsulated message --o3ONXoEH016763.1272152184/zvm19.host.ru The original message was received at Fri, 23 Apr 2010 03:35:33 +0400 (MSD) from roller#localhost ----- The following addresses had permanent fatal errors ----- "Flucker" ----- Transcript of session follows ----- 451 grl.unibel.by: Name server timeout Message could not be delivered for 2 days Message will be deleted from queue --o3ONXoEH016763.1272152184/*.host.ru Content-Type: message/delivery-status Reporting-MTA: dns; zvm19.host.ru Arrival-Date: Fri, 23 Apr 2010 03:35:33 +0400 (MSD) Final-Recipient: RFC822; !HERE! Action: failed Status: 4.4.7 Last-Attempt-Date: Sun, 25 Apr 2010 03:36:24 +0400 (MSD) --o3ONXoEH016763.1272152184/zvm19.host.ru Content-Type: message/rfc822 Content-Transfer-Encoding: 8bit Return-Path: Received: (from *#localhost) by *.host.ru (8.13.8/Zenon/Postman) id o3MNZX5h059932; Fri, 23 Apr 2010 03:35:33 +0400 (MSD) (envelope-from *#roller.ru) Date: Fri, 23 Apr 2010 03:35:33 +0400 (MSD) Message-Id: <201004222335.o3MNZX5h059932#*.host.ru> From: *
#*.ru';
$left_delimiter = 'Final-Recipient: RFC822; ';
$right_delimiter = ' Action';
$left_delimiter_pos = strrpos($message, $left_delimiter);
$right_delimiter_pos = strpos($message, $right_delimiter);
$desired_message_fragment = '';
if ($left_delimiter_pos !== false && $right_delimiter_pos !== false) {
$fragment_start = $left_delimiter_pos + strlen($left_delimiter);
$fragment_length = $right_delimiter_pos - $fragment_start;
$desired_message_fragment = substr(
$message, $fragment_start, $fragment_length
);
}
var_dump($desired_message_fragment);
a bit late....
but has been asked in terms of how to solve a problem that is not quite his requirements Op perhaps has joined multiple lines onto one line?(imho).
This might help others....
I'm assuming that op is trying to parse the Final-Recipient header field of a delivery status notification.
The spec for the Final-Recipient field can be seen here: https://www.rfc-editor.org/rfc/rfc3464#page-15
If the problem is broken down, op can pull the final recipient field as a single field (Final recipient followed by a char/blank line on the next line.
e.g.
Original-recipient: rfc822;some-email-that-does-not-exist#gmail.com
Final-recipient: rfc822;some-email-that-does-not-exist#gmail.com
Action: failed
Status: 5.1.1 (Remote SMTP server has rejected address)
Final recipient is followed by the start of the next field, Action which has A on the next line. ie not followed by a space or blank line.
then all he has to do is split the line on ; and take the second part
ie
String[] twoparts = "Final-recipient: rfc822;some-email-that-does-not-exist#gmail.com".split(";",2) // 2 here means (2-1) = 1 match
String email = twoparts[1]

Categories