Improve HTTP GET PHP scripts - php

This code is getting the headers and content from $url, and prints it to the browser. It is really slow, and it's not because the server. How can I improve this?
$headers = get_headers($url);
foreach ($headers as $value)
header($value);
$fh = fopen($url, "r");
fpassthru($fh);
Thanks

Why make two requests when one will do?
$fh = fopen($url, 'r');
foreach ($http_response_header as $value) {
header($value);
}
fpassthru($fh);
Or:
$content = file_get_contents($url);
foreach ($http_response_header as $value) {
header($value);
}
echo $content;

I'm not sure why you're opening a connection there on line 6 if you already have and have printed out the headers. Is this doing more than printing out headers?

If you are really looking to just proxy a page, the cURL functions are much more efficient:
<?
$curl = curl_init("http://www.google.com");
curl_setopt($curl, CURLOPT_HEADER, true);
curl_exec($curl);
curl_close($curl);
?>
Of course, cURL has to be enabled on your server, but it's not uncommon.

Are you trying to make a proxy? If so, here is a recipe, in proxy.php:
<?php
$host = 'example.com';
$port = 80;
$page = $_SERVER['REQUEST_URI'];
$conn = fsockopen($host, $port, $errno, $errstr, 180);
if (!$conn) throw new Exception("$errstr ($errno)");
$hbase = array();
$hbase[] = 'User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)';
if (!empty($_SERVER['HTTP_REFERER'])) $hbase[] = 'Referer: '.str_ireplace($_SERVER['HTTP_HOST'], $host, $_SERVER['HTTP_REFERER']);
if (!empty($_SERVER['HTTP_COOKIE'])) $hbase[] = 'Cookie: '.$_SERVER['HTTP_COOKIE'];
$hbase = implode("\n", $hbase);
if ($_SERVER['REQUEST_METHOD'] === 'POST') {
$post = file_get_contents("php://input");
$length = strlen($post);
$request = "POST $page HTTP/1.0\nHost: $host\n$hbase\nContent-Type: application/x-www-form-urlencoded\nContent-Length: $length\n\n$post";
} else $request = "GET $page HTTP/1.0\nHost: $host\n$hbase\n\n";
do {
$conn = fsockopen($host, 80, $errno, $errstr, 180);
if (!$conn) throw new Exception("$errstr ($errno)");
fputs($conn, $request);
$header = false;
$body = false;
stream_set_blocking($conn, false);
$info = stream_get_meta_data($conn);
while (!feof($conn) && !$info['timed_out']) {
$str = fgets($conn);
if (!$str) {
usleep(50000);
continue;
}
if ($body !== false) $body .= $str;
else $header .= $str;
if ($body === false && $str == "\r\n") $body = '';
$info = stream_get_meta_data($conn);
}
fclose($conn);
} while ($info['timed_out']);
$header = str_ireplace($host, $_SERVER['HTTP_HOST'], $header);
if (stripos($body, $host) !== false) $body = str_ireplace($host, $_SERVER['HTTP_HOST'], $body);
$header = str_replace('domain=.example.com; ', '', $header);
$header_array = explode("\r\n", $header);
foreach ($header_array as $line) header($line);
if (strpos($header, 'Content-Type: text') !== false) {
$body = str_replace('something', '', $body);
}
echo $body;
In .htaccess:
Options +FollowSymlinks
RewriteEngine on
RewriteBase /
RewriteRule ^(.*)$ proxy.php [QSA,L]

You may be able to pinpoint the slowness by changing $url to a know fast site, or even a local webserver. The only thing that seems to be possible is a slow response from the server.
Of course as suggested by GZipp, if you're going to output the file contents as well, just do it with a single request. That would make the server you're requesting from happier.

Related

How to find the file size by url in php script [duplicate]

This question already has answers here:
check if php file exists on server
(2 answers)
Closed 4 years ago.
how can I make sure that the file exists on the server and find out its size on the URL without first downloading the file
$url = 'http://site.zz/file.jpg';
file_exists($url); //always is false
filesize($url); //not working
Help eny one worked exemple pls
The function file_exists() only works on file that exists on the server locally.
Similarly; filesize() function returns the size of the file that exists on the server locally.
If you are trying to load the size of a file for a given url, you can try this approach:
function get_remote_file_info($url) {
$ch = curl_init($url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
curl_setopt($ch, CURLOPT_HEADER, TRUE);
curl_setopt($ch, CURLOPT_NOBODY, TRUE);
$data = curl_exec($ch);
$fileSize = curl_getinfo($ch, CURLINFO_CONTENT_LENGTH_DOWNLOAD);
$httpResponseCode = curl_getinfo($ch, CURLINFO_HTTP_CODE);
curl_close($ch);
return [
'fileExists' => (int) $httpResponseCode == 200,
'fileSize' => (int) $fileSize
];
}
Usage:
$url = 'http://site.zz/file.jpg';
$result = get_remote_file_info($url);
var_dump($result);
Example output:
array(2) {
["fileExists"]=>
bool(true)
["fileSize"]=>
int(12345)
}
Without any libraries and file openning
$data = get_headers($url, true);
$size = isset($data['Content-Length'])?(int) $data['Content-Length']:0;
Open remote files:
function fsize($path) {
$fp = fopen($path,"r");
$inf = stream_get_meta_data($fp);
fclose($fp);
foreach($inf["wrapper_data"] as $v) {
if (stristr($v, "content-length")) {
$v = explode(":", $v);
return trim($v[1]);
}
}
return 0;
}
Usage:
$file = "https://zzz.org/file.jpg";
$inbytes = fsize($filesize);
Use sockets:
function getRemoteFileSize($url){
$parse = parse_url($url);
$host = $parse['host'];
$fp = #fsockopen ($host, 80, $errno, $errstr, 20);
if(!$fp){
$ret = 0;
}else{
$host = $parse['host'];
fputs($fp, "HEAD ".$url." HTTP/1.1\r\n");
fputs($fp, "HOST: ".$host."\r\n");
fputs($fp, "Connection: close\r\n\r\n");
$headers = "";
while (!feof($fp)){
$headers .= fgets ($fp, 128);
}
fclose ($fp);
$headers = strtolower($headers);
$array = preg_split("|[\s,]+|",$headers);
$key = array_search('content-length:',$array);
$ret = $array[$key+1];
}
if($array[1]==200) return $ret;
else return -1*$array[1];
}
You can't access to filesize of a distant file.
You have to check with your local filepath.

if connection is keep alive how to read until end of stream php

$f = fsockopen("www....",80,$x,$y);
fwrite("GET request HTTP/1.1\r\nConnection: keep-alive\r\n\r\n");
while($s = fread($f,1024)){
...
}
The above stalls because of the Connection: keep-alive, and works with Connection: close.
How do you do it without stalling?
It depends on the response, if the transfer-encoding of the response is chunked, then you read until you encounter the "last chunk" (\r\n0\r\n).
If the content-encoding is gzip, then you look at the content-length response header and read that much data and then inflate it. If the transfer-encoding is also set to chunked, then you must dechunk the decoded response.
The easiest thing is to build a simple state machine to read the response from the socket while there is still data left for the response.
When reading chunked data, you should read the first chunk length (and any chunked extension) and then read as much data as the chunk size, and do so until the last chunk.
Put another way:
Read the HTTP response headers (read small chunks of data until you encounter \r\n\r\n)
Parse the response headers into an array
If the transfer-encoding is chunked, read and dechunk the data piece by piece.
If the content-length header is set, you can read that much data from the socket
If the content-encoding is gzip, decompress the read data
Once you have performed the above steps, you should have read the entire response and you can now send another HTTP request on the same socket and repeat the process.
On the other hand, unless you have the absolute need for a keep-alive connection, just set Connection: close in the request and you can safely read while (!feof($f)).
I don't have any PHP code for reading and parsing HTTP responses at the moment (I just use cURL) but if you'd like to see actual code, let me know and I can work something up. I could also refer you to some C# code I've made that does all of the above.
EDIT: Here is working code that uses fsockopen to issue an HTTP request and demonstrate reading keep-alive connections with the possibility of chunked encoding and gzip compression. Tested, but not tortured - use at your own risk!!!
<?php
/**
* PHP HTTP request demo
* Makes HTTP requests using PHP and fsockopen
* Supports chunked transfer encoding, gzip compression, and keep-alive
*
* #author drew010 <http://stackoverflow.com/questions/11125463/if-connection-is-keep-alive-how-to-read-until-end-of-stream-php/11812536#11812536>
* #date 2012-08-05
* Public domain
*
*/
error_reporting(E_ALL);
ini_set('display_errors', 1);
$host = 'www.kernel.org';
$sock = fsockopen($host, 80, $errno, $errstr, 30);
if (!$sock) {
die("Connection failed. $errno: $errstr\n");
}
request($sock, $host, 'GET', '/');
$headers = readResponseHeaders($sock, $resp, $msg);
$body = readResponseBody($sock, $headers);
echo "Response status: $resp - $msg\n\n";
echo '<pre>' . var_export($headers, true) . '</pre>';
echo "\n\n";
echo $body;
// if the connection is keep-alive, you can make another request here
// as demonstrated below
request($sock, $host, 'GET', '/kernel.css');
$headers = readResponseHeaders($sock, $resp, $msg);
$body = readResponseBody($sock, $headers);
echo "Response status: $resp - $msg\n\n";
echo '<pre>' . var_export($headers, true) . '</pre>';
echo "\n\n";
echo $body;
exit;
function request($sock, $host, $method = 'GET', $uri = '/', $params = null)
{
$method = strtoupper($method);
if ($method != 'GET' && $method != 'POST') $method = 'GET';
$request = "$method $uri HTTP/1.1\r\n"
."Host: $host\r\n"
."Connection: keep-alive\r\n"
."Accept-encoding: gzip, deflate\r\n"
."\r\n";
fwrite($sock, $request);
}
function readResponseHeaders($sock, &$response_code, &$response_status)
{
$headers = '';
$read = 0;
while (true) {
$headers .= fread($sock, 1);
$read += 1;
if ($read >= 4 && $headers[$read - 1] == "\n" && substr($headers, -4) == "\r\n\r\n") {
break;
}
}
$headers = parseHeaders($headers, $resp, $msg);
$response_code = $resp;
$response_status = $msg;
return $headers;
}
function readResponseBody($sock, array $headers)
{
$responseIsChunked = (isset($headers['transfer-encoding']) && stripos($headers['transfer-encoding'], 'chunked') !== false);
$contentLength = (isset($headers['content-length'])) ? $headers['content-length'] : -1;
$isGzip = (isset($headers['content-encoding']) && $headers['content-encoding'] == 'gzip') ? true : false;
$close = (isset($headers['connection']) && stripos($headers['connection'], 'close') !== false) ? true : false;
$body = '';
if ($contentLength >= 0) {
$read = 0;
do {
$buf = fread($sock, $contentLength - $read);
$read += strlen($buf);
$body .= $buf;
} while ($read < $contentLength);
} else if ($responseIsChunked) {
$body = readChunked($sock);
} else if ($close) {
while (!feof($sock)) {
$body .= fgets($sock, 1024);
}
}
if ($isGzip) {
$body = gzinflate(substr($body, 10));
}
return $body;
}
function readChunked($sock)
{
$body = '';
while (true) {
$data = '';
do {
$data .= fread($sock, 1);
} while (strpos($data, "\r\n") === false);
if (strpos($data, ' ') !== false) {
list($chunksize, $chunkext) = explode(' ', $data, 2);
} else {
$chunksize = $data;
$chunkext = '';
}
$chunksize = (int)base_convert($chunksize, 16, 10);
if ($chunksize === 0) {
fread($sock, 2); // read trailing "\r\n"
return $body;
} else {
$data = '';
$datalen = 0;
while ($datalen < $chunksize + 2) {
$data .= fread($sock, $chunksize - $datalen + 2);
$datalen = strlen($data);
}
$body .= substr($data, 0, -2); // -2 to remove the "\r\n" before the next chunk
}
} // while (true)
}
function parseHeaders($headers, &$response_code = null, &$response_message = null)
{
$lines = explode("\r\n", $headers);
$return = array();
$response = array_shift($lines);
if (func_num_args() > 1) {
list($proto, $code, $message) = explode(' ', $response, 3);
$response_code = $code;
if (func_num_args() > 2) {
$response_message = $message;
}
}
foreach($lines as $header) {
if (trim($header) == '') continue;
list($name, $value) = explode(':', $header, 2);
$return[strtolower(trim($name))] = trim($value);
}
return $return;
}
The following code works without any problem for me:
<?php
$f = fsockopen("www.google.de",80);
fwrite($f,"GET / HTTP/1.1\r\n Connection: keep-alive\r\n\r\n");
while($s = fread($f,1024)){
echo "got: $s";
}
echo "finished;";
?>
The funny thing is that without keep-alive this example stalls for me.
Can you add an example that can be just copy&pasted and shows your error?

Why can I not download files from some sites like this?

This is my php source code:
<?php
$path = '/images/one.jpg';
$imm = 'http://www.allamoda.eu/wp-content/uploads/2012/05/calzedonia_290x435.jpg';
if( $content = file_get_contents($imm) ){
file_put_contents($path, $content);
echo "Yes";
}else{
echo "No";
}
?>
and I get this error:
Warning: file_get_contents(http://www.allamoda.eu/wp-content/uploads/2012/05/calzedonia_290x435.jpg) [function.file-get-contents]: failed to open stream: HTTP request failed! HTTP/1.1 403 Forbidden in /opt/lampp/htdocs/test/down.php on line 4
No
Why ?
There are some headers expected by the server(especially Accept and User-Agent). Use the stream_context -argument of file_get_contents() to provide them:
<?php
$path = '/images/one.jpg';
$opts = array(
'http'=>array(
'method'=>"GET",
'header'=>"Accept-language: en\r\n" .
"Accept:image/png,image/*;q=0.8,*/*;q=0.5 \r\n".
"Host: www.allamoda.eu\r\n" .
"User-Agent: Mozilla/5.0 (Windows NT 5.1; rv:12.0) Gecko/20100101 Firefox/12.0\r\n"
)
);
$context = stream_context_create($opts);
$imm = 'http://www.allamoda.eu/wp-content/uploads/2012/05/calzedonia_290x435.jpg';
if( $content = file_get_contents($imm,false,$context) ){
file_put_contents($path, $content);
echo "Yes";
}else{
echo "No";
}
?>
You are not allowed to download this file, the server allamoda.eu says (HTTP 403).
Nothing wrong with the code. The server simply is not letting you (either you have too much requests to it, or it just blocks all scripts scraping it).
You're not allowed to open the file directly. But you can try to fetch it's content by using sockets:
function getRemoteFile($url)
{
// get the host name and url path
$parsedUrl = parse_url($url);
$host = $parsedUrl['host'];
if (isset($parsedUrl['path'])) {
$path = $parsedUrl['path'];
} else {
// the url is pointing to the host like http://www.mysite.com
$path = '/';
}
if (isset($parsedUrl['query'])) {
$path .= '?' . $parsedUrl['query'];
}
if (isset($parsedUrl['port'])) {
$port = $parsedUrl['port'];
} else {
// most sites use port 80
$port = '80';
}
$timeout = 10;
$response = '';
// connect to the remote server
$fp = #fsockopen($host, '80', $errno, $errstr, $timeout );
if( !$fp ) {
echo "Cannot retrieve $url";
} else {
// send the necessary headers to get the file
fputs($fp, "GET $path HTTP/1.0\r\n" .
"Host: $host\r\n" .
"User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.0.3) Gecko/20060426 Firefox/1.5.0.3\r\n" .
"Accept: */*\r\n" .
"Accept-Language: en-us,en;q=0.5\r\n" .
"Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7\r\n" .
"Keep-Alive: 300\r\n" .
"Connection: keep-alive\r\n" .
"Referer: http://$host\r\n\r\n");
// retrieve the response from the remote server
while ( $line = fread( $fp, 4096 ) ) {
$response .= $line;
}
fclose( $fp );
// strip the headers
$pos = strpos($response, "\r\n\r\n");
$response = substr($response, $pos + 4);
}
// return the file content
return $response;
}
Example:
$content = getRemoteFile('http://www.allamoda.eu/wp-content/uploads/2012/05/calzedonia_290x435.jpg');
Source

link extracting script is casing issue in PHP

I have created a loop to extract specific links from google. but at the end when i am running this script it is creating empty file without any links.
for ($n=0;$n<500;$n+=10)
{
$country = $_GET['country'];
// Country Changing array to scan through different countries.
$google = array (
"default" => array("google.com" , "countryUS"),
"NZ" => array("google.co.nz" , "countryNZ"),
"UK" => array("google.co.uk" , "countryUK|countryGB"),
"AU" => array("google.com.au" , "countryAU")
);
// Variables for fopen
$URL = 'www.'.$google[$country][0];
$PORT = "80";
$TIMEOUT = 30;
$Q = $_GET['query'];
// Google String
$GET = '/search?q='.urlencode($Q).'&hl=en&cr='.$google[$country][1].'&tbs=ctr:'.$google[$country][1].'&start='.$n.'&sa=N';
$fp = fsockopen($URL, $PORT, $errno, $errstr, $TIMEOUT);
if (!$fp)
{
echo "Error connecting to {$URL}<br>\n";
} else
{
$out = "GET $GET HTTP/1.1\r\n";
$out .= "Host: $URL\r\n";
$out .= "Connection: Close\r\n\r\n";
fwrite($fp, $out);
$buffer = '';
while (!feof($fp))
{
$buffer .= fgets($fp, 128);
}
fclose($fp);
write($SITEs, filter($buffer));
}
}
please tell me if i am doing anything wrong.
Note: fopen is ON in my PHP.
Adding:
function write($A, $B) {
#touch($A);
$C = fopen($A, 'a');
fwrite($C, $B);
fclose($C);
}
You haven't opened your file you're going to write in!
Also, there's no PHP function write()

Retrieving page with fsockopen adds numbers to returned string

This is very strange, on some pages it will return the HTML fine, others it will add numbers to the beginning and end of the returned string ($out).
function lookupPage($page, $return = true) {
$fp = fsockopen("127.0.0.1", 48580, $errno, $errstr, 5);
if (!$fp) {
return false;
}
else {
$out = "";
$headers = "GET /" . $page . " HTTP/1.1\r\n";
$headers .= "Host: www.site.com\r\n";
$headers .= "Connection: Close\r\n\r\n";
fwrite($fp, $headers);
stream_set_timeout($fp, 300);
$info = stream_get_meta_data($fp);
while (!feof($fp) && !$info['timed_out'] && ($line = stream_get_line($fp, 1024)) !== false) {
$info = stream_get_meta_data($fp);
if ($return) $out .= $line;
}
fclose($fp);
if (!$info['timed_out']) {
if ($return) {
$out = substr($out, strpos($out, "\r\n\r\n") + 4);
return $out;
}
else {
return true;
}
}
else {
return false;
}
}
}
e.g...
3565
<html>
<head>
...
</html>
0
It is called Chunked Transfer Encoding
It is part of the HTTP 1.1 protocol and you're decoding it in a HTTP 1.0 way. You can just check for the values and trim them if you want. They only show the length of the response so the browser knows it has the complete response.
Also maybe look at file_get_contents
My guess would be that the server responds with chunked data.
Have a look at RFC2616 Transfer codings and its introduction.

Categories