How to speed up file_get_contents? - php

Here's my code:
$language = $_GET['soundtype'];
$word = $_GET['sound'];
$word = urlencode($word);
if ($language == 'english') {
$url = "<the first url>";
} else if ($language == 'chinese') {
$url = "<the second url>";
}
$opts = array(
'http'=>array(
'method'=>"GET",
'header'=>"User-Agent: <my user agent>"
)
);
$context = stream_context_create($opts);
$page = file_get_contents($url, false, $context);
header('Content-Type: audio/mpeg');
echo $page;
But I've found that this runs terribly slow.
Are there any possible methods of optimization?
Note: $url is a remote url.

It's slow because file_get_contents() reads the entire file into $page, PHP waits for the file to be received before outputting the content. So what you're doing is: downloading the entire file on the server side, then outputting it as a single huge string.
file_get_contents() does not support streaming or grabbing offsets of the remote file. An option is to create a raw socket with fsockopen(), do the HTTP request, and read the response in a loop, as you read each chunk, output it to the browser. This will be faster because the file will be streamed.
Example from the Manual:
$fp = fsockopen("www.example.com", 80, $errno, $errstr, 30);
if (!$fp) {
echo "$errstr ($errno)<br />\n";
} else {
header('Content-Type: audio/mpeg');
$out = "GET / HTTP/1.1\r\n";
$out .= "Host: www.example.com\r\n";
$out .= "Connection: Close\r\n\r\n";
fwrite($fp, $out);
while (!feof($fp)) {
echo fgets($fp, 128);
}
fclose($fp);
}
The above is looping while there is still content available, on each iteration it reads 128 bytes and then outputs it to the browser. The same principle will work for what you're doing. You'll need to make sure that you don't output the response HTTP headers which will be the first few lines, because since you are doing a raw request, you will get the raw response with headers included. If you output the response headers you will end up with a corrupt file.

Instead of downloading the whole file before outputting it, consider streaming it out like this:
$in = fopen($url, 'rb', false, $context);
$out = fopen('php://output', 'wb');
header('Content-Type: video/mpeg');
stream_copy_to_stream($in, $out);
If you're daring, you could even try (but that's definitely experimental):
header('Content-Type: video/mpeg');
copy($url, 'php://output');
Another option is using internal redirects and making your web server proxy the request for you. That would free up PHP to do something else. See also my post regarding X-Sendfile and friends.

As explained by #MrCode, first downloading the file to your server, then passing it on to the client will of course incur a doubled download time. If you want to pass the file on to the client directly, use readfile.
Alternatively, think about if you can't simply redirect the client to the file URL using a header("Location: $url") so the client can get the file directly from the source.

Related

server not sending custom header values

I'm using PHP 5.2.17 to get a remote page, the HTTP requests contains some cookie values but cookies are not delivered to the destination page.
$url = 'http://somesite.com/';
$opts = array(
'http' => array
(
'header' => array("Cookie: field1=value1; field2=value2\r\n")
)
);
$context = stream_context_create($opts);
echo file_get_contents($url, false, $context);
Can you help me find the problem?
Note: I can't use curl.
Thanks.
Are you sure the receiving code is working properly?
I can get your example to work with the receiving page simply echoing:
<?php
echo $_COOKIE['field1'] . '::' . $_COOKIE['field2']; // returns value1::value2
?>
Alternatively, the site could be requiring a user agent (anything). I had this problem earlier in reaching Wikipedia (not restricted by all Mediawiki software; apparently just for the Wikimedia sites). Set the user_agent property inside the 'http' array to whatever value you want. (However, if you do happen to be trying to reach Wikipedia, you might instead try their api.php: http://en.wikipedia.org/w/api.php ; if another Mediawiki site, use the same relative path)
Maybe allow_url_fopen is not enabled, check the value w/ ini_get: ini_get('allow_url_fopen');
You might also do a sanity check by calling file_get_contents() on any old page you know is publicly accessible.
Other than that your example code looks just fine.
Second Option, Socket Connection:
<?php
$fp = fsockopen("somesite.com", 80, $errno, $errstr, 30);
if (!$fp) {
echo "$errstr ($errno)<br />\n";
} else {
$out = "GET / HTTP/1.1\r\n";
$out .= "Host: somesite.com\r\n";
$out .= "Cookie: field1=value1; field2=value2; path=/; domain=somesite.com;\r\n";
$out .= "Connection: Close\r\n\r\n";
fwrite($fp, $out);
while (!feof($fp)) {
echo fgets($fp, 128);
}
fclose($fp);
}
echo PHP_EOL;
$url = 'http://somesite.com/';
$opts = array(
'http' => array
(
'header' => "Cookie: field1=value1; field2=value2\r\n"
)
);
$context = stream_context_create($opts);
file_get_contents($url, false, $context);

HTTPS Post Request via PHP and Cookies

I am kinda new to PHP however I used JSP a lot before (I have quite information) and everything was easier with Java classes.
So, now, I want to perform a POST request on a HTTPS page (not HTTP) and need to get returned cookies and past it to another GET request and return the final result. Aim is to make a heavy page for mobile phones more compatible to view in a mobile browser by bypassing the login page and directly taking to the pages which are also served in an ajax user interface.
I am stuck, my code does not work, it says it is Bad Request.
Bad Request
Your browser sent a request that this
server could not understand. Reason:
You're speaking plain HTTP to an
SSL-enabled server port. Instead use
the HTTPS scheme to access this URL,
please.
<?php
$content = '';
$flag = false;
$post_query = 'SOME QUERY'; // name-value pairs
$post_query = urlencode($post_query) . "\r\n";
$host = 'HOST';
$path = 'PATH';
$fp = fsockopen($host, '443');
if ($fp) {
fputs($fp, "POST $path HTTP/1.0\r\n");
fputs($fp, "Host: $host\r\n");
fputs($fp, "Content-length: ". strlen($post_query) ."\r\n\r\n");
fputs($fp, $post_query);
while (!feof($fp)) {
$line = fgets($fp, 10240);
if ($flag) {
$content .= $line;
} else {
$headers .= $line;
if (strlen(trim($line)) == 0) {
$flag = true;
}
}
}
fclose($fp);
}
echo $headers;
echo $content;
?>
From past experience, I've never used PHP's internal functions like fsocketopen() for external data posting. The best way to do these actions are using CURL, which gives much more ease and is massively more powerful for developers to leverage.
for example, look at these functions
http://php.net/curl_setopt
and look at the one with URL, POST, POSTDATA, and COOKIESFILES which is for .JAR, which you get then retrieve and that you can use file_get_contents() to send the data using GET.

PHP - Downloading very large files with fsockopen(), fgets() and feof()

I have a simple download function in a class that might be dealing with files of many hundreds of megabytes at a time from an Amazon Web Services bucket. The whole file cannot be loaded into memory at once, so it must be streamed directly to a file pointer. This is my understanding as this is the first time I've dealt with this issue and I'm picking things up as I go along.
I've ended up with this, based on a 4 KB file buffer which simple testing showed was a good size:
$fs = fsockopen($host, 80, $errno, $errstr, 30);
if (!$fs) {
$this->writeDebugInfo("FAILED ", $errstr . '(' . $errno . ')');
} else {
$out = "GET $file HTTP/1.1\r\n";
$out .= "Host: $host\r\n";
$out .= "Connection: Close\r\n\r\n";
fwrite($fs, $out);
$fm = fopen ($temp_file_name, "w");
stream_set_timeout($fs, 30);
while(!feof($fs) && ($debug = fgets($fs)) != "\r\n" ); // ignore headers
while(!feof($fs)) {
$contents = fgets($fs, 4096);
fwrite($fm, $contents);
$info = stream_get_meta_data($fs);
if ($info['timed_out']) {
break;
}
}
fclose($fm);
fclose($fs);
if ($info['timed_out']) {
// Delete temp file if fails
unlink($temp_file_name);
$this->writeDebugInfo("FAILED - Connection timed out: ", $temp_file_name);
} else {
// Move temp file if succeeds
$media_file_name = str_replace('temp/', 'media/', $temp_file_name);
rename($temp_file_name, $media_file_name);
$this->writeDebugInfo("SUCCESS: ", $media_file_name);
}
}
In testing it's fine. However I have got into a conversation with someone who is saying that I am not understanding how fgets() and feof() work together, and he's mentioning chunked encoding as a more efficient method.
Is the code generally OK, or am I missing something vital here? What is the benefit that chunked encoding will give me?
Your solution seems fine to me, however I have a few comments.
1) Don't create a HTTP packet yourself, i.e. don't send the HTTP request. Instead use something like CURL. This is more fool proof and will support a wider range of responses the server might reply with. Additionally CURL can be setup to write directly to a file, saving you doing it yourself.
2) Using fgets may be a problem if you are reading binary data. Fgets reads to the end of a line, and with binary data this may corrupt your download. Instead I suggest fread($fs, 4096); which will handle both text and binary data.
2) Chunked encoding is a way for a webserver to send you the response in multiple chunks. I don't think this is very useful to you, however, a better encoding that the webserver might support is the gzip encoding. This would allow the webserver to compress the response on the fly. If you use a library like CURL, it will tell the server it supports gzip, and then automatically decompress it for you.
I hope this helps
Don't deal with sockets, optimize your code and use the cURL library, PHP cURL. Like this:
$url = 'http://'.$host.'/'.$file;
// create a new cURL resource
$fh = fopen ($temp_file_name, "w");
$ch = curl_init();
// set URL and other appropriate options
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_FILE, $fh);
//curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
// grab URL and pass it to the browser
curl_exec($ch);
// close cURL resource, and free up system resources
curl_close($ch);
fclose($fh);
And the final result in case it helps anyone else. I also wrapped the whole thing in a retry loop to decrease the risk of a completely failed download, but it does increase the use of resources:
do {
$fs = fopen('http://' . $host . $file, "rb");
if (!$fs) {
$this->writeDebugInfo("FAILED ", $errstr . '(' . $errno . ')');
} else {
$fm = fopen ($temp_file_name, "w");
stream_set_timeout($fs, 30);
while(!feof($fs)) {
$contents = fread($fs, 4096); // Buffered download
fwrite($fm, $contents);
$info = stream_get_meta_data($fs);
if ($info['timed_out']) {
break;
}
}
fclose($fm);
fclose($fs);
if ($info['timed_out']) {
// Delete temp file if fails
unlink($temp_file_name);
$this->writeDebugInfo("FAILED on attempt " . $download_attempt . " - Connection timed out: ", $temp_file_name);
$download_attempt++;
if ($download_attempt < 5) {
$this->writeDebugInfo("RETRYING: ", $temp_file_name);
}
} else {
// Move temp file if succeeds
$media_file_name = str_replace('temp/', 'media/', $temp_file_name);
rename($temp_file_name, $media_file_name);
$this->newDownload = true;
$this->writeDebugInfo("SUCCESS: ", $media_file_name);
}
}
} while ($download_attempt < 5 && $info['timed_out']);

Seeking in remote flv files using PHP

I'm string to seek in a remotely hosted FLV file and have it stream locally. Streaming from start works, but when I try to 'seek', player stops.
I'm using this script to seek to remote file
$fp = fsockopen($host, 80, $errno, $errstr, 30);
$out = "GET $path_to_flv HTTP/1.1\r\n";
$out .= "Host: $host\r\n";
$out .= "Range: bytes=$pos-$end\r\n";
$out .= "Connection: Close\r\n\r\n";
fwrite($fp, $out);
$content = false;
while (!feof($fp))
{
$data = fgets($fp, 1024);
if($content) echo $data;
if($data == "\r\n")
{
$content = true;
header("Content-Type: video/x-flv");
header("Content-Length: " . (urlfilesize($file) - $pos));
if($pos > 0)
{
print("FLV");
print(pack('C', 1));
print(pack('C', 1));
print(pack('N', 9));
print(pack('N', 9));
}
}
}
fclose($fp);
Any ideas ?
UPDATE
so apparently, even though the server signals it accepts range requests (with the Accept-Ranges: bytes), it does not actually do so. to see if there is another way to make the flv seekable, let's have a look at the communication between flash player and server (i use wireshark for this):
the request when starting the player is:
GET /files/xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx/ HTTP/1.1
Host: xxxxxx.megavideo.com
<some more headers>
<no range header>
this is answered with a response like that:
HTTP/1.0 200 OK
Server: Apache/1.3.37 (Debian GNU/Linux) PHP/4.4.7
Content-Type: video/flv
ETag: "<video-id>"
Content-Length: <length of complete video>
<some more headers>
<the flv content>
now when i seek in the flash player, another request is sent. it is almost the same as the initial one, with the following difference:
GET /files/xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx/8800968 HTTP/1.1
<same headers as first request>
which gets answered with a response almost the same as the initial one, with a difference only in the Content-Length header.
which lets me assume that the 8800968 at the end of the request url is the "seek range" (the byte offset in the file after seeking) we are looking for, and the second response Content-Length is the initial Content-Length (the length of the whole file) minus this range. which is the case indeed.
with this information, it should be possible to get what you want. good luck!
UPDATE END
this will only work if the server supports HTTP RANGE requests. if it does, it will return a 206 Partial Content response code with a Content-Range header and your requested range of bytes. check for these in the response to your request.

How do you get the HTTP status code for a remote domain in php?

I would like to create a batch script, to go through 20,000 links in a DB, and weed out all the 404s and such. How would I get the HTTP status code for a remote url?
Preferably not using curl, since I dont have it installed.
CURL would be perfect but since you don't have it, you'll have to get down and dirty with sockets. The technique is:
Open a socket to the server.
Send an HTTP HEAD request.
Parse the response.
Here is a quick example:
<?php
$url = parse_url('http://www.example.com/index.html');
$host = $url['host'];
$port = $url['port'];
$path = $url['path'];
$query = $url['query'];
if(!$port)
$port = 80;
$request = "HEAD $path?$query HTTP/1.1\r\n"
."Host: $host\r\n"
."Connection: close\r\n"
."\r\n";
$address = gethostbyname($host);
$socket = socket_create(AF_INET, SOCK_STREAM, SOL_TCP);
socket_connect($socket, $address, $port);
socket_write($socket, $request, strlen($request));
$response = split(' ', socket_read($socket, 1024));
print "<p>Response: ". $response[1] ."</p>\r\n";
socket_close($socket);
?>
UPDATE: I've added a few lines to parse the URL
If im not mistaken none of the php built-in functions return the http status of a remote url, so the best option would be to use sockets to open a connection to the server, send a request and parse the response status:
pseudo code:
parse url => $host, $port, $path
$http_request = "GET $path HTTP/1.0\nHhost: $host\n\n";
$fp = fsockopen($host, $port, $errno, $errstr, $timeout), check for any errors
fwrite($fp, $request)
while (!feof($fp)) {
$headers .= fgets($fp, 4096);
$status = <parse $headers >
if (<status read>)
break;
}
fclose($fp)
Another option is to use an already build http client class in php that can return the headers without fetching the full page content, there should be a few open source classes available on the net...
This page looks like it has a pretty good setup to download a page using either curl or fsockopen, and can get the HTTP headers using either method (which is what you want, really).
After using that method, you'd want to check $output['info']['http_code'] to get the data you want.
Hope that helps.
You can use PEAR's HTTP::head function.
http://pear.php.net/manual/en/package.http.http.head.php

Categories