file_get_contents() with newLine for Regex - php

I have a file called mail.txt with the following contents :
From: elvis#tabloid.org (The King)
Subject: be seein' ya around
Date: Mon, 23 Oct 2006 11:04:13
From: The Prez <president#whitehouse.gov>
Date: Wed, 25 Oct 2006 8:36:24
Subject: now, about your vote
I'm using Sublime Text in which the Regex ^\w+: works properly.
I'm using file_get_contents() to read the content from mail.txt and then use the same Regex for preg_replace() to highlight the output.
The issue is, when I use file_get_contents(), it doesn't consider \n and for that I tried nl2br(), but that didn't work either.
Below are the outputs in Sublime and PHP :
Sublime
PHP
Below is the PHP code :
<?php
$path = "./mail.txt";
if(!file_exists($path))
die("File does not exist");
else {
if(!($handle = fopen($path, "r")))
die("File could not be opened");
else {
$file_data = file_get_contents($path);
}
}
$mod_file = preg_replace("/^\w+:/", "<span class='replaced'>$0</span>", $file_data);
echo "<pre>".$mod_file."</pre>";
?>
How to solve this issue?

You need to use m or Multiline flag.See demo.
https://regex101.com/r/cT0hV4/12
$re = "/^\\w+:/m";
$str = "From: elvis#tabloid.org (The King)\nSubject: be seein' ya around\nDate: Mon, 23 Oct 2006 11:04:13\nFrom: The Prez <president#whitehouse.gov>\nDate: Wed, 25 Oct 2006 8:36:24\nSubject: now, about your vote";
preg_match_all($re, $str, $matches);

Related

Return multiple lines from a long string

I have a large string with multiple instances of header information. For example:
HTTP/1.1 302 Found
Cache-Control: no-cache, no-store, must-revalidate
Content-Type: text/html; charset=iso-8859-1
Date: Tue, 01 Mar 2016 01:43:13 GMT
Expires: Sat, 26 Jul 1997 05:00:00 GMT
Location: http://www.google.com
Pragma: no-cache
Server: nginx/1.7.9
Content-Length: 294
Connection: keep-alive
After "Location:", I want to save all the data from that line to an array. There might be 3 or 4 lines to save from a big block of text.
How could I do this?
Thanks!
There are plenty of ways you could do this.
Here's one way:
Split the text up at the point where Location: occurs
Split the result by new lines into an array
Example:
$text = substr($text, strpos($text, 'Location:'));
$array = explode(PHP_EOL, $text);
Here's another way:
Using regex, match Location: and everything after it
As above - split the result by new lines
Example:
preg_match_all('~(Location:.+)~s', $text, $output);
$output = explode(PHP_EOL, $output[0][0]);
Note: the s modifier means match newlines as part of the . - they will otherwise be ignored and new lines will terminate the capture.
I found another way that works too I figured I would add in case it helps anyone:
foreach(preg_split("/((\r?\n)|(\r\n?))/", $bigString) as $line){
if (strpos($line, 'Location') !== false) {
// Do stuff with the line
}
}
Source: Iterate over each line in a string in PHP
There's a lot of helpful other ways in there too.

How to find the position of the first occurrence of a pattern using PHP

I am trying to figure out a way to parse an email.
I am stuck trying to figure out how to search for the first occurrence of a text that is in this format
> On Mar 12, 2015, at 7:47 AM, Mike G <email#yourdomain.com> wrote:
the text will start with > On and ends with wrote:
On Mar 12, 2015, at 7:47 AM, Mike G wrote:
How can I find that in PHP?
I could do
$msg = strpos($msg, '> On'); // to get the first position
$msg = strstr($msg, '> On', true); // with PHP 5.3+ to get the text prior the first '> On '
But I need to look for a similar pattern line to be more acurate.
I tried this code:
$matches = '';
$pattern = "/ On*<[a-zA-Z0-9._-]#[a-zA-Z0-9._-]> wrote:/";
preg_match($pattern, $msg, $matches);
$msg = strstr($msg, $matches, true);
But I am not finding any results in the text.
I think this should do it. If the whitespace is optional change the s+ to s*.
preg_match('~>\s+.*?<([^>]*)>\s+wrote:~', '> On Mar 12, 2015, at 7:47 AM, Mike G <email#yourdomain.com> wrote:', $email);
echo $email[1];
If you want to be safer and require the 'On' as well...
preg_match('~>\s+On.*?<([^>]*)>\s+wrote:~', '> On Mar 12, 2015, at 7:47 AM, Mike G <email#yourdomain.com> wrote:', $email);

regex failing with no errors

I have the following text in a string called $test:
Content-Type: text/plain
Server: testapp (4.2.1 (x86_64/linux))
Content-Length: 125
{"password":"123","email_address":"","name":"j.doe","username":"jd123"}
I am trying to write a regular expression in php that will return everything after content-length: 125.
Here's what I have so far:
if (preg_match('/^Content\-Length\:[0-9\\n]+([a-zA-Z0-9\{\}\"\:])*/',$test,$result))
{
var_dump($result[1]);
}
I don't get any error messages, but it doesn't find the pattern I've defined in my string.
I've also tried this pattern:
'/^Content\-Length\:[0-9\\n]+([a-zA-Z0-9{}\"\:])*/'
where I tried to remove the escape char infront of the curly braces. But it's still a no go.
Can you tell me what I'm missing?
Thanks.
EDIT 1
my code now looks like this:
<?php
$test = "Content-Type: text/plain
Server: kamailio (4.2.1 (x86_64/linux))
Content-Length: 125
{"password":"test123","email_address":"","name":"j.doe","username":"jd123"}";
//if (preg_match('/Content-Length\:[0-9\\n]*([a-zA-Z0-9{}\"\:])*/',$test,$result))
//{
// var_dump($result);
//}
preg_match('/({.*})/', $str, $matches);
echo $matches[0];
?>
That gives me the following error:
Undefined offset: 0 in /var/www/html/test/test.php on line 31
Line 31 is where I'm trying to echo the matches.
$str = <<<HEREDOC
Content-Type: text/plain
Server: testapp (4.2.1 (x86_64/linux))
Content-Length: 125
{"password":"123","email_address":"","name":"j.doe","username":"jd123"}
HEREDOC;
preg_match('/(\{.*\})/', $str, $matches);
echo $matches[0];
The regex here is simply matching a line that begins with { and ends with }. It's a quick and loose regex, however.
Instead of using a big pattern to match everything (which is timeconsuming) - why not use preg_split to cut your string into two pieces at your desired location?
$string = 'Content-Type: text/plain
Server: testapp (4.2.1 (x86_64/linux))
Content-Length: 125
{"password":"123","email_address":"","name":"j.doe","username":"jd123"}';
$parts = preg_split ("/Content-Length:\s*\d+\s*/", $string);
echo "The string i want is '" . $parts[1] . "'";
Output:
The string i want is '{"password":"123","email_address":"","name":"j.doe","username":"jd123"}'
You can avoid the regex altogether because the HTTP header is always separated from the response body by 2 consecutives line breaks.
list($headers, $body) = explode("\n\n", $string);
Or for windows-style breaks( which by the way are the standard for HTTP headers):
list($headers, $body) = explode("\r\n\r\n", $string);

Split PHP string from a specified word

I have this php string
$mystring ="Yes YEs I am answering! On Fri, Mar 21, 2014 at 2:49 PM, Ajey Charantimath wrote: > answer to this question > > -- >"
I want to split the string starting from "On Fri, Mar 21, 2014".How do I achieve this?
Note - the spilt condition can be general. i.e it can also be 'On Sat, Mar 22' or 'On Wed, Mar 29' etc
Also mention which php function should I use?
Because it would not work very good if you just split at the "On" word (could also exist in the text before, which I assume may be different), I suggest the following possibility:
$str = "Yes YEs I am answering! On Fri, Mar 21, 2014 at 2:49 PM, Ajey Charantimath wrote: > answer to this question > > -- >";
if (preg_match('/^(.*)(On (Mon|Tue|Wed|Thu|Fri|Sat|Sun).*)$/', $str, $matches)) {
print_r($matches);
}
This gives you an output like the following, which should include all necessary values. Feel free to add an "i" after the second slash in the preg_match regex for case insensitive.
Array
(
[0] => Yes YEs I am answering! On Fri, Mar 21, 2014 at 2:49 PM, Ajey Charantimath wrote: > answer to this question > > -- >
[1] => Yes YEs I am answering!
[2] => On Fri, Mar 21, 2014 at 2:49 PM, Ajey Charantimath wrote: > answer to this question > > -- >
[3] => Fri
)
I would suggest a regex in order to do this, dieBeiden basically has it, though I would modify his regex a bit:
^(.*)(On (Mon|Tue|Wed|Thu|Fri|Sat|Sun), \w{3} \d{2}.*)$
$arr = explode('Yes YEs I am answering! ', $string);
It will find that word, delete it, and split on that place array!
And then you get
On Fri, Mar 21, 2014 at 2:49 PM, Ajey Charantimath wrote: > answer to this question > > -- >"
next explode on ','
$arr = explde(',',$arr);
$strings = $arr[0].','.$arr[1];
Check this,this example I did found
<?php
$whois = "Record last updated on 10-Apr-2011.Record expires on 08-Oct-2012.Record Expires on 08-Oct-2008.";
$expires = preg_split('/Expires|expires/', $whois);
array_shift($expires);
echo "<pre>";
print_r($expires);
?>
gives
Array
(
[0] => on 08-Oct-2012.Record
[1] => on 08-Oct-2008.
)
You can also this
http://board.phpbuilder.com/showthread.php?10384775-RESOLVED-Split-String-at-first-word-match
you can use exploade function
http://in2.php.net/explode
Try with explode() like
$tempArr1 = explode('!' , $mystring);
$tempArr2 = explode(',' , $tempArr1);
echo $tempArr2[0].', '.$tempArr[1].', '.$tempArr[2];
If the initial part the string is ALWAYS "Yes YEs I am answering! " you can delete the first 24 characters from the string with the function
$mystring2 = substr($mystring, 24);
Also take a look at the explode() function ;)
You could us the explode function for this (http://nl1.php.net/explode) and split it on the word 'On', but when 'On' also occurs in the string, you're in trouble.
A better idea would be to use a regular expression with preg_split (http://www.php.net/manual/en/function.preg-split.php), with something like this:
<?php
$mystring ="Yes YEs I am answering! On Fri, Mar 21, 2014 at 2:49 PM, Ajey Charantimath wrote: > answer to this question > > -- >";
$splitted = preg_split('/On ..., ... [0-9]{2}, [0-9]{4} at [0-9]:[0-9]{2} (AM|PM)/', $mystring);
var_dump($splitted);
?>
Feel free to make the regular expression more sophisticated :)
If you want to retain the datetime substring, then you can consume the leading space, then lookahead for as much of the datetime substring as you like.
Code: (Demo)
$mystring ="Yes YEs I am answering! On Fri, Mar 21, 2014 at 2:49 PM, Ajey Charantimath wrote: > answer to this question > > -- >";
var_export(
preg_split('/ (?=On (?:Mon|Tue|Wed|Thu|Fri|Sat|Sun))/', $mystring)
);
Output:
array (
0 => 'Yes YEs I am answering!',
1 => 'On Fri, Mar 21, 2014 at 2:49 PM, Ajey Charantimath wrote: > answer to this question > > -- >',
)

Regular expression to parse Final-Recipient email header

I have to get any text between:
Final-Recipient: RFC822; !HERE! Action
I need !HERE! from this example. There could be any string.
I tried something like:
$Pattern = '/Final-Recipient: RFC822; (.*) Action/';
But it doesn't work.
upd
Here is the string I'm trying to parse: http://dpaste.com/187638/
Since you said "any string" which may contain spaces, the closest approximate would be
$Pattern = '/Final-Recipient: RFC822; (.*?) Action/s';
# ^ ^
# lazy match instead of greedy match ----' |
# allow . to match newline -----'
Of course it won't match "Final-Recipient: RFC822; Action Action".
Your pattern works fine for me:
$i = 'This is a MIME-encapsulated message --o3ONXoEH01blah3:35:33 +0400 (MSD) Final-Recipient: RFC822; !HERE! Action: failed Status: 4.4.7 Lblahru> From: *
#*.ru';
$pattern = '/Final-Recipient: RFC822; (.*) Action/';
$matches = Array();
preg_match($pattern, $i, $matches);
print_r($matches);
Output:
Array
(
[0] => Final-Recipient: RFC822; !HERE! Action
[1] => !HERE!
)
Note also that your pattern will fail if the "any text" contains new lines. Use the DOTALL modifier /.../s to allow the dot to also match new lines. Also note that if the text " Action" appears elsewhere in the message it will cause your regular expression to fail. Matching dot is dangerous. Try to find a more specific pattern if possible.
$Pattern = '/Final-Recipient:[^;]+[;|<|\s]+([^\s|^<|^>]+)/i';
The following expression turned out to be the best for my problems, because sometimes there are lines of the following kind:
Final-Recipient: LOCAL;<example#rambler.ru>
I am going to suggest a method that does not use them, which requires extra busywork.
<?php
$message = 'This is a MIME-encapsulated message --o3ONXoEH016763.1272152184/zvm19.host.ru The original message was received at Fri, 23 Apr 2010 03:35:33 +0400 (MSD) from roller#localhost ----- The following addresses had permanent fatal errors ----- "Flucker" ----- Transcript of session follows ----- 451 grl.unibel.by: Name server timeout Message could not be delivered for 2 days Message will be deleted from queue --o3ONXoEH016763.1272152184/*.host.ru Content-Type: message/delivery-status Reporting-MTA: dns; zvm19.host.ru Arrival-Date: Fri, 23 Apr 2010 03:35:33 +0400 (MSD) Final-Recipient: RFC822; !HERE! Action: failed Status: 4.4.7 Last-Attempt-Date: Sun, 25 Apr 2010 03:36:24 +0400 (MSD) --o3ONXoEH016763.1272152184/zvm19.host.ru Content-Type: message/rfc822 Content-Transfer-Encoding: 8bit Return-Path: Received: (from *#localhost) by *.host.ru (8.13.8/Zenon/Postman) id o3MNZX5h059932; Fri, 23 Apr 2010 03:35:33 +0400 (MSD) (envelope-from *#roller.ru) Date: Fri, 23 Apr 2010 03:35:33 +0400 (MSD) Message-Id: <201004222335.o3MNZX5h059932#*.host.ru> From: *
#*.ru';
$left_delimiter = 'Final-Recipient: RFC822; ';
$right_delimiter = ' Action';
$left_delimiter_pos = strrpos($message, $left_delimiter);
$right_delimiter_pos = strpos($message, $right_delimiter);
$desired_message_fragment = '';
if ($left_delimiter_pos !== false && $right_delimiter_pos !== false) {
$fragment_start = $left_delimiter_pos + strlen($left_delimiter);
$fragment_length = $right_delimiter_pos - $fragment_start;
$desired_message_fragment = substr(
$message, $fragment_start, $fragment_length
);
}
var_dump($desired_message_fragment);
a bit late....
but has been asked in terms of how to solve a problem that is not quite his requirements Op perhaps has joined multiple lines onto one line?(imho).
This might help others....
I'm assuming that op is trying to parse the Final-Recipient header field of a delivery status notification.
The spec for the Final-Recipient field can be seen here: https://www.rfc-editor.org/rfc/rfc3464#page-15
If the problem is broken down, op can pull the final recipient field as a single field (Final recipient followed by a char/blank line on the next line.
e.g.
Original-recipient: rfc822;some-email-that-does-not-exist#gmail.com
Final-recipient: rfc822;some-email-that-does-not-exist#gmail.com
Action: failed
Status: 5.1.1 (Remote SMTP server has rejected address)
Final recipient is followed by the start of the next field, Action which has A on the next line. ie not followed by a space or blank line.
then all he has to do is split the line on ; and take the second part
ie
String[] twoparts = "Final-recipient: rfc822;some-email-that-does-not-exist#gmail.com".split(";",2) // 2 here means (2-1) = 1 match
String email = twoparts[1]

Categories