Return multiple lines from a long string - php

I have a large string with multiple instances of header information. For example:
HTTP/1.1 302 Found
Cache-Control: no-cache, no-store, must-revalidate
Content-Type: text/html; charset=iso-8859-1
Date: Tue, 01 Mar 2016 01:43:13 GMT
Expires: Sat, 26 Jul 1997 05:00:00 GMT
Location: http://www.google.com
Pragma: no-cache
Server: nginx/1.7.9
Content-Length: 294
Connection: keep-alive
After "Location:", I want to save all the data from that line to an array. There might be 3 or 4 lines to save from a big block of text.
How could I do this?
Thanks!

There are plenty of ways you could do this.
Here's one way:
Split the text up at the point where Location: occurs
Split the result by new lines into an array
Example:
$text = substr($text, strpos($text, 'Location:'));
$array = explode(PHP_EOL, $text);
Here's another way:
Using regex, match Location: and everything after it
As above - split the result by new lines
Example:
preg_match_all('~(Location:.+)~s', $text, $output);
$output = explode(PHP_EOL, $output[0][0]);
Note: the s modifier means match newlines as part of the . - they will otherwise be ignored and new lines will terminate the capture.

I found another way that works too I figured I would add in case it helps anyone:
foreach(preg_split("/((\r?\n)|(\r\n?))/", $bigString) as $line){
if (strpos($line, 'Location') !== false) {
// Do stuff with the line
}
}
Source: Iterate over each line in a string in PHP
There's a lot of helpful other ways in there too.

Related

file_get_contents() with newLine for Regex

I have a file called mail.txt with the following contents :
From: elvis#tabloid.org (The King)
Subject: be seein' ya around
Date: Mon, 23 Oct 2006 11:04:13
From: The Prez <president#whitehouse.gov>
Date: Wed, 25 Oct 2006 8:36:24
Subject: now, about your vote
I'm using Sublime Text in which the Regex ^\w+: works properly.
I'm using file_get_contents() to read the content from mail.txt and then use the same Regex for preg_replace() to highlight the output.
The issue is, when I use file_get_contents(), it doesn't consider \n and for that I tried nl2br(), but that didn't work either.
Below are the outputs in Sublime and PHP :
Sublime
PHP
Below is the PHP code :
<?php
$path = "./mail.txt";
if(!file_exists($path))
die("File does not exist");
else {
if(!($handle = fopen($path, "r")))
die("File could not be opened");
else {
$file_data = file_get_contents($path);
}
}
$mod_file = preg_replace("/^\w+:/", "<span class='replaced'>$0</span>", $file_data);
echo "<pre>".$mod_file."</pre>";
?>
How to solve this issue?
You need to use m or Multiline flag.See demo.
https://regex101.com/r/cT0hV4/12
$re = "/^\\w+:/m";
$str = "From: elvis#tabloid.org (The King)\nSubject: be seein' ya around\nDate: Mon, 23 Oct 2006 11:04:13\nFrom: The Prez <president#whitehouse.gov>\nDate: Wed, 25 Oct 2006 8:36:24\nSubject: now, about your vote";
preg_match_all($re, $str, $matches);

regex failing with no errors

I have the following text in a string called $test:
Content-Type: text/plain
Server: testapp (4.2.1 (x86_64/linux))
Content-Length: 125
{"password":"123","email_address":"","name":"j.doe","username":"jd123"}
I am trying to write a regular expression in php that will return everything after content-length: 125.
Here's what I have so far:
if (preg_match('/^Content\-Length\:[0-9\\n]+([a-zA-Z0-9\{\}\"\:])*/',$test,$result))
{
var_dump($result[1]);
}
I don't get any error messages, but it doesn't find the pattern I've defined in my string.
I've also tried this pattern:
'/^Content\-Length\:[0-9\\n]+([a-zA-Z0-9{}\"\:])*/'
where I tried to remove the escape char infront of the curly braces. But it's still a no go.
Can you tell me what I'm missing?
Thanks.
EDIT 1
my code now looks like this:
<?php
$test = "Content-Type: text/plain
Server: kamailio (4.2.1 (x86_64/linux))
Content-Length: 125
{"password":"test123","email_address":"","name":"j.doe","username":"jd123"}";
//if (preg_match('/Content-Length\:[0-9\\n]*([a-zA-Z0-9{}\"\:])*/',$test,$result))
//{
// var_dump($result);
//}
preg_match('/({.*})/', $str, $matches);
echo $matches[0];
?>
That gives me the following error:
Undefined offset: 0 in /var/www/html/test/test.php on line 31
Line 31 is where I'm trying to echo the matches.
$str = <<<HEREDOC
Content-Type: text/plain
Server: testapp (4.2.1 (x86_64/linux))
Content-Length: 125
{"password":"123","email_address":"","name":"j.doe","username":"jd123"}
HEREDOC;
preg_match('/(\{.*\})/', $str, $matches);
echo $matches[0];
The regex here is simply matching a line that begins with { and ends with }. It's a quick and loose regex, however.
Instead of using a big pattern to match everything (which is timeconsuming) - why not use preg_split to cut your string into two pieces at your desired location?
$string = 'Content-Type: text/plain
Server: testapp (4.2.1 (x86_64/linux))
Content-Length: 125
{"password":"123","email_address":"","name":"j.doe","username":"jd123"}';
$parts = preg_split ("/Content-Length:\s*\d+\s*/", $string);
echo "The string i want is '" . $parts[1] . "'";
Output:
The string i want is '{"password":"123","email_address":"","name":"j.doe","username":"jd123"}'
You can avoid the regex altogether because the HTTP header is always separated from the response body by 2 consecutives line breaks.
list($headers, $body) = explode("\n\n", $string);
Or for windows-style breaks( which by the way are the standard for HTTP headers):
list($headers, $body) = explode("\r\n\r\n", $string);

An underscore is added to link when checking it

I am making a simple link checker to check thousands of direct links for files in a site I am managing now. All files are from archive_org. I made a textarea
<table width="100%"> <tr><td>URLs to check:</td><td><textarea name="myurl" id="myurl" cols="100" rows="20"></textarea></td></tr>
<tr><td align="center" colspan="2"><br/><input class="text" type="submit" name="submitBtn" value="Check links"></td></tr> </table>
and all links on it will be stored in an array called $url (each url is put in a new line)
$url = explode("\n", $_POST['myurl']);
I printed it using print_r and links inside the array are the same as entered without any character added.
I checked the urls using two methods: fopen() and curl functions, and no matter how many links I put, the program see all links are broken except for the last one. The last link in the array is the only one which is checked correctly.
I used get_headers function, and I noticed that all links (except for the last one) have underscore (_) added to their end. The get_headers code is:
for ($i=0;$i<count($url);$i++) {
$headers = #get_headers($url[$i]);
$headers = (is_array($headers)) ? implode( "\n ", $headers) : $headers;
print_r($headers);
echo "<br /><br />";
}
In the headers I noticed the links are as such:
HTTP/1.0 302 Moved Temporarily Server: nginx/1.1.19 Date: Mon, 02 Sep 2013 10:46:40 GMT Content-Type: text/html; charset=UTF-8 X-Powered-By: PHP/5.3.10-1ubuntu3.2 Accept-Ranges: bytes Location: http://ia600308.us.archive[dot]org/23/items/historyofthedecl00731gut/1dfre012103.mp3_ X-Cache: MISS from Dataprolinks X-Cache: MISS from AIMAN-DPL X-Cache-Lookup: MISS from AIMAN-DPL:3128 Connection: close HTTP/1.0 404 Not Found Server: nginx/1.1.19 Date: Mon, 02 Sep 2013 10:46:41 GMT Content-Type: text/html; charset=UTF-8 X-Powered-By: PHP/5.3.10-1ubuntu3.2 Set-Cookie: PHPSESSID=s2j3ct95vdji0ua89f32grd984; path=/; domain=.archive.org Expires: Thu, 19 Nov 1981 08:52:00 GMT Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0 Pragma: no-cache X-Cache: MISS from Dataprolinks X-Cache: MISS from AIMAN-DPL X-Cache-Lookup: MISS from AIMAN-DPL:3128 Connection: close
There is an underscore added to the link, except for the header of the last url, no underscore is added. I guess this underscore is responsible for the checking error.
Where am I making mistakes?
For your cases, I guess you POST the URLs in Window, when you press "ENTER" key to separate the links, the "ENTER" as "\r\n". In WWW, there must not include the "\r", therefore somewhere(php? curl? I have no idea about that.) convert it into "_".
<?php
$urls = array();
$urls[] = 'http://archive.org/download/historyofthedecl00731gut/1dfre011103.mp3';
$urls[] = 'http://archive.org/download/historyofthedecl00731gut/1dfre000103.txt';
$urls[] = 'http://archive.org/download/historyofthedecl00731gut/1dfre082103.mp3';
$urls[] = 'http://archive.org/download/historyofthedecl00731gut/1dfre001103.txt';
$urls[] = 'http://archive.org/download/historyofthedecl00731gut/1dfre141103.mp3';
print("<pre>" .print_r($urls, 1). "</pre><br /><br />");
foreach($urls as $url){
//ensure each url only start with ONE _ and end with ONE _
print("<pre>_" . $url . "_</pre>");
$header = array();
$headers = #get_headers($url);
print("<pre>" .print_r($headers, 1). "</pre><br /><br />");
}
?>
You can use my code to have a simple test: each link will be printed with "_" both in start and end. Then proof my explain. How to fix: just add the strip_tags(nl2br($url)) to remove the "\r", "\n".

How to isolate the HTTP headers/body from a PHP Sockets request

I'm using a socket connection in PHP to post data to an Apache webserver. I'm a bit new to this technique, and I am not sure how to isolate the headers from the body of the response.
Sending Code:
<?php
// collect data to post
$postdata = array(
'hello' => 'world'
);
$postdata = http_build_query($postdata);
// open socket, send request
$fp = fsockopen('127.0.0.1', 80);
fwrite($fp, "POST /server.php HTTP/1.1\r\n");
fwrite($fp, "Host: fm1\r\n");
fwrite($fp, "Content-Type: application/x-www-form-urlencoded\r\n");
fwrite($fp, "Content-Length: ".strlen($postdata)."\r\n");
fwrite($fp, "Connection: close\r\n");
fwrite($fp, "\r\n");
fwrite($fp, $postdata);
// go through result
$result = "";
while(!feof($fp)){
$result .= fgets($fp);
}
// close
fclose($fp);
// display result
echo $result;
?>
Server Code:
Hello this is server. You posted:
<pre>
<?php print_r($_POST); ?>
</pre>
When posting to one server, I get:
HTTP/1.1 200 OK
Date: Fri, 06 Jan 2012 09:55:27 GMT
Server: Apache/2.2.15 (Win32) mod_ssl/2.2.15 OpenSSL/0.9.8m PHP/5.3.2
X-Powered-By: PHP/5.3.2
Content-Length: 79
Connection: close
Content-Type: text/html
Hello this is server. You posted:
<pre>
Array
(
[hello] => world
)
</pre>
As expected. I want to strip off the headers though, and just read the body from "Hello this is server....." onwards.
How can i reliably detect the end of the headers and read the body into a variable?
Also, another server I've tested on replies with this:
HTTP/1.1 200 OK
Date: Fri, 06 Jan 2012 10:02:04 GMT
Server: Apache/2
X-Powered-By: PHP/5.2.17
Connection: close
Transfer-Encoding: chunked
Content-Type: text/html
4d
Hello this is server. You posted:
<pre>
Array
(
[hello] => world
)
</pre>
0
What are the "4d" and "0" around the body text??
Thanks!
PS before someone says use CURL, I can't unfortunately :-(
You can separate the header from the the body by splitting up on a double linebreak. It should be <CRLF><CRLF> so this would normally work:
list($header, $body) = explode("\r\n\r\n", $response, 2);
More reliably you should use a regex to catch linebreak variations (super unlikely to ever happen):
list($header, $body) = preg_split("/\R\R/", $response, 2);
The thing with the 4d and 0 is called the chunked encoding. (It's hex-numbers separated with another linebreak, and inidicate the length of the following raw content block).
To clear that up you have to look at the headers first, and see if there is an according Transfer-Encoding: entry. This is were it gets complicated, and advisable to use one of the myriad of existing HTTP userland processing classes. PEAR has one.
Headers should end with "\r\n\r\n" (two times). These 4d and 0 are possibly part of your php response (they are not part of the headers).
In most cases Mario's answer should work but I have just tried to apply this method to response from Couch DB and there are some cases when it does not work.
If response does not contain any documents then Couch DB put "\r\n\r\n" inside response body trying to keep results well-formed and in this case it is not enough to just split response by "\r\n\r\n" because you can accidentally cut end part of the body.
HTTP/1.0 200 OK Server: CouchDB/1.6.1 (Erlang OTP/R16B02) ETag: "DJNMQO5WQIBZHFMDU40F1O94T" Date: Mon, 06 Jul 2015 09:37:33 GMT Content-Type: text/plain; charset=utf-8 Cache-Control: must-revalidate
{"total_rows":0,"offset":0,"rows":[
// Couch DB adds some extra line breakers on this line
]}
Following parsing seems to be more reliable for Couch DB :
$parts = explode("\r\n\r\n", $response);
if ($parts)
{
$headers = array_shift($parts);
$body = json_decode(implode("\r\n\r\n", $parts));
}

Regular expression to parse Final-Recipient email header

I have to get any text between:
Final-Recipient: RFC822; !HERE! Action
I need !HERE! from this example. There could be any string.
I tried something like:
$Pattern = '/Final-Recipient: RFC822; (.*) Action/';
But it doesn't work.
upd
Here is the string I'm trying to parse: http://dpaste.com/187638/
Since you said "any string" which may contain spaces, the closest approximate would be
$Pattern = '/Final-Recipient: RFC822; (.*?) Action/s';
# ^ ^
# lazy match instead of greedy match ----' |
# allow . to match newline -----'
Of course it won't match "Final-Recipient: RFC822; Action Action".
Your pattern works fine for me:
$i = 'This is a MIME-encapsulated message --o3ONXoEH01blah3:35:33 +0400 (MSD) Final-Recipient: RFC822; !HERE! Action: failed Status: 4.4.7 Lblahru> From: *
#*.ru';
$pattern = '/Final-Recipient: RFC822; (.*) Action/';
$matches = Array();
preg_match($pattern, $i, $matches);
print_r($matches);
Output:
Array
(
[0] => Final-Recipient: RFC822; !HERE! Action
[1] => !HERE!
)
Note also that your pattern will fail if the "any text" contains new lines. Use the DOTALL modifier /.../s to allow the dot to also match new lines. Also note that if the text " Action" appears elsewhere in the message it will cause your regular expression to fail. Matching dot is dangerous. Try to find a more specific pattern if possible.
$Pattern = '/Final-Recipient:[^;]+[;|<|\s]+([^\s|^<|^>]+)/i';
The following expression turned out to be the best for my problems, because sometimes there are lines of the following kind:
Final-Recipient: LOCAL;<example#rambler.ru>
I am going to suggest a method that does not use them, which requires extra busywork.
<?php
$message = 'This is a MIME-encapsulated message --o3ONXoEH016763.1272152184/zvm19.host.ru The original message was received at Fri, 23 Apr 2010 03:35:33 +0400 (MSD) from roller#localhost ----- The following addresses had permanent fatal errors ----- "Flucker" ----- Transcript of session follows ----- 451 grl.unibel.by: Name server timeout Message could not be delivered for 2 days Message will be deleted from queue --o3ONXoEH016763.1272152184/*.host.ru Content-Type: message/delivery-status Reporting-MTA: dns; zvm19.host.ru Arrival-Date: Fri, 23 Apr 2010 03:35:33 +0400 (MSD) Final-Recipient: RFC822; !HERE! Action: failed Status: 4.4.7 Last-Attempt-Date: Sun, 25 Apr 2010 03:36:24 +0400 (MSD) --o3ONXoEH016763.1272152184/zvm19.host.ru Content-Type: message/rfc822 Content-Transfer-Encoding: 8bit Return-Path: Received: (from *#localhost) by *.host.ru (8.13.8/Zenon/Postman) id o3MNZX5h059932; Fri, 23 Apr 2010 03:35:33 +0400 (MSD) (envelope-from *#roller.ru) Date: Fri, 23 Apr 2010 03:35:33 +0400 (MSD) Message-Id: <201004222335.o3MNZX5h059932#*.host.ru> From: *
#*.ru';
$left_delimiter = 'Final-Recipient: RFC822; ';
$right_delimiter = ' Action';
$left_delimiter_pos = strrpos($message, $left_delimiter);
$right_delimiter_pos = strpos($message, $right_delimiter);
$desired_message_fragment = '';
if ($left_delimiter_pos !== false && $right_delimiter_pos !== false) {
$fragment_start = $left_delimiter_pos + strlen($left_delimiter);
$fragment_length = $right_delimiter_pos - $fragment_start;
$desired_message_fragment = substr(
$message, $fragment_start, $fragment_length
);
}
var_dump($desired_message_fragment);
a bit late....
but has been asked in terms of how to solve a problem that is not quite his requirements Op perhaps has joined multiple lines onto one line?(imho).
This might help others....
I'm assuming that op is trying to parse the Final-Recipient header field of a delivery status notification.
The spec for the Final-Recipient field can be seen here: https://www.rfc-editor.org/rfc/rfc3464#page-15
If the problem is broken down, op can pull the final recipient field as a single field (Final recipient followed by a char/blank line on the next line.
e.g.
Original-recipient: rfc822;some-email-that-does-not-exist#gmail.com
Final-recipient: rfc822;some-email-that-does-not-exist#gmail.com
Action: failed
Status: 5.1.1 (Remote SMTP server has rejected address)
Final recipient is followed by the start of the next field, Action which has A on the next line. ie not followed by a space or blank line.
then all he has to do is split the line on ; and take the second part
ie
String[] twoparts = "Final-recipient: rfc822;some-email-that-does-not-exist#gmail.com".split(";",2) // 2 here means (2-1) = 1 match
String email = twoparts[1]

Categories